TIL: Dynamically Installing Python Wheels and spaCy Models at Runtime
A common practice when serving models is to dynamically download model weights at runtime to keep Docker image sizes small. Recently, with spaCy, this proved more complex than anticipated. spaCy model weights are distributed as Python wheels, designed for installation with Pip.
I wanted to avoid uv adding these weights directly to my project, as this
could significantly increase image size and potentially impact CI/CD build
times.
Before embarking on solving this problem, we should measure the performance impact first. Another team member did this, since I still struggle to follow this advice sometimes! While the size of the image increased, the total build time concern was totally mitigated by image caching in out CI/CD pipeline.
The best code is no code, so this was without doubt the best approach for this particular problem. But, I was still curious about how to dynamically load python wheels in a way that was agnostic to pip or uv. Here’s the code that solved this niche problem:
import site
import tempfile
import urllib.request
import zipfile
from pathlib import Path
import spacy
def install_spacy_model(model_url: str) -> None:
"""Installs a spaCy model by downloading and extracting its wheel from a URL."""
last_reported_percent = -10
def progress_hook(
block_num: int,
block_size: int,
total_size: int,
) -> None:
"""Progress callback for urllib.request.urlretrieve."""
nonlocal last_reported_percent
downloaded = block_num * block_size
if total_size > 0:
percent = min(100, (downloaded * 100) // total_size)
# Report progress every 10%
rounded_percent = (percent // 10) * 10
if rounded_percent > last_reported_percent:
mb_downloaded = downloaded / (1024 * 1024)
mb_total = total_size / (1024 * 1024)
print(
f"Progress: {rounded_percent}% ({mb_downloaded:.1f}/{mb_total:.1f} MB)"
)
last_reported_percent = rounded_percent
try:
print(f"Downloading model from {model_url}") # Changed 'url' to 'model_url' for consistency
# Download the wheel file to a temporary location
with tempfile.NamedTemporaryFile(
suffix=".whl", delete=False
) as temp_file:
urllib.request.urlretrieve( # nosec
model_url, temp_file.name, reporthook=progress_hook
)
wheel_path = temp_file.name
# Extract the wheel to site-packages
site_packages = site.getsitepackages()[0]
with zipfile.ZipFile(wheel_path, "r") as wheel_zip:
wheel_zip.extractall(site_packages)
# Clean up temporary file
Path(wheel_path).unlink()
print("Successfully installed model")
except Exception as e:
print(f"Failed to install model: {e}")
raise e
model_name = "en_core_web_sm" # Example model name
model_url = "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.0/en_core_web_sm-3.7.0-py3-none-any.whl" # Example URL
try:
model = spacy.load(model_name)
except OSError:
print(f"Model '{model_name}' not found. Installing...") # Added single quotes for clarity
install_spacy_model(model_url)
model = spacy.load(model_name)