Deploying Machine Learning Models to Production
Taking ML models from development to production requires careful planning and robust infrastructure.
Model Serving Architecture
REST API Approach
Simple and widely supported:
from fastapi import FastAPI from pydantic import BaseModel import joblib app = FastAPI() model = joblib.load('model.pkl') class PredictionRequest(BaseModel): features: list[float] @app.post("/predict") async def predict(request: PredictionRequest): prediction = model.predict([request.features]) return {"prediction": prediction.tolist()}
gRPC for High Performance
import grpc from concurrent import futures import prediction_pb2 import prediction_pb2_grpc class PredictionService(prediction_pb2_grpc.PredictionServicer): def Predict(self, request, context): # Load and run model result = model.predict(request.features) return prediction_pb2.PredictionResponse(prediction=result)
Model Versioning
MLflow for Model Registry
import mlflow import mlflow.sklearn # Register model mlflow.sklearn.log_model( model, "model", registered_model_name="sales_predictor" ) # Load specific version model = mlflow.pyfunc.load_model( model_uri="models:/sales_predictor/Production" )
Containerization
Docker for ML Services
FROM python:3.11-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # Copy model and code COPY model.pkl . COPY app.py . # Expose port EXPOSE 8000 # Run application CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
Scalability
Horizontal Scaling
Deploy multiple instances behind a load balancer:
- Use Kubernetes for orchestration
- Implement health checks
- Configure auto-scaling policies
- Monitor resource usage
Batch Prediction
For non-real-time scenarios:
import pandas as pd from prefect import flow, task @task def load_data(path: str) -> pd.DataFrame: return pd.read_csv(path) @task def make_predictions(data: pd.DataFrame, model): return model.predict(data) @task def save_predictions(predictions, output_path: str): pd.DataFrame(predictions).to_csv(output_path) @flow def batch_prediction_pipeline(input_path: str, output_path: str): data = load_data(input_path) predictions = make_predictions(data, model) save_predictions(predictions, output_path)
Model Monitoring
Performance Metrics
Track key metrics:
- Prediction latency
- Throughput (requests/second)
- Error rate
- Model accuracy/precision/recall
Data Drift Detection
Monitor input distribution changes:
from evidently import ColumnMapping from evidently.report import Report from evidently.metric_preset import DataDriftPreset report = Report(metrics=[ DataDriftPreset(), ]) report.run( reference_data=train_data, current_data=production_data, column_mapping=column_mapping )
Concept Drift
Monitor model performance degradation:
- Track prediction accuracy over time
- Set up alerts for significant drops
- Implement A/B testing for new models
- Automate retraining pipelines
Feature Store
Centralized Feature Management
from feast import FeatureStore store = FeatureStore(repo_path=".") # Get online features for prediction features = store.get_online_features( features=[ "user_features:age", "user_features:location", "product_features:category", ], entity_rows=[{"user_id": 123, "product_id": 456}], ).to_dict()
CI/CD for ML
Automated Model Pipeline
# .github/workflows/ml-pipeline.yml name: ML Pipeline on: push: branches: [main] jobs: train-and-deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - name: Train Model run: python train.py - name: Evaluate Model run: python evaluate.py - name: Deploy if Better run: python deploy.py
Security Considerations
Input Validation
- Validate all input data
- Sanitize features
- Set rate limits
- Implement authentication
Model Protection
- Encrypt model files
- Use model serving frameworks
- Implement access controls
- Monitor for adversarial attacks
Best Practices
- Separate Training and Serving Code: Keep concerns isolated
- Version Everything: Models, data, and code
- Monitor Continuously: Track performance and data quality
- Automate Testing: Unit tests, integration tests, model validation
- Implement Rollback: Quick recovery from bad deployments
- Document Thoroughly: Model cards, API docs, runbooks
- Plan for Failure: Graceful degradation, fallback models
Deployment Strategies
Shadow Deployment
Run new model alongside current, compare results:
- Zero risk to production
- Real-world performance data
- Confidence in new model
Canary Deployment
Gradually route traffic to new model:
- 5% → 25% → 50% → 100%
- Monitor metrics at each stage
- Quick rollback if issues
Blue-Green Deployment
Maintain two identical environments:
- Instant switchover
- Easy rollback
- Zero downtime
Conclusion
Successful ML deployment requires treating models as first-class software artifacts with proper versioning, monitoring, and operational practices.