Seeking to reduce the warmup time for server going live after scaling to zero. In HuggingFace, server setting scales to zero when not in use and requires 5-6 minutes to warm up and go live. This is due to needing to load 10GB model in HuggingFace onto the server each start-up. Seeking to pre-load the model onto the server so it doesnt need to load every single time, requiring 5-6 minutes delay to customer
Posted On: July 28, 2024 23:03 UTC
Category: Back-End Development
Skills:Python, Amazon Web Services
Country: United States
click to apply
Powered by WPeMatico
