Modal
Serverless platform for running Python (especially GPU) at scale.
Publisher review
Modal is a serverless compute platform optimized for running Python workloads at scale, particularly those requiring GPU acceleration. Unlike traditional infrastructure (where you provision and pay for machines) or managed services (where you ship code to a SaaS), Modal uses a "write Python, decorate it, deploy" model: you add @modal.function decorators to your code, specify your container environment and GPU type in code, and Modal handles provisioning, scaling, and billing. The platform is built for machine learning engineers and AI teams who want to ship inference endpoints, batch processing jobs, and fine-tuning runs without DevOps overhead.
It pools capacity across multiple cloud providers to optimize both GPU availability and cost, and scales to zero—you pay nothing for idle time. Since its $87 million Series B in September 2025, Modal has gained traction with companies like Substack, Ramp, and Suno. The platform supports the full NVIDIA GPU range (T4 through H200s), with per-second billing that costs around $0.30–$1.09 per second depending on the GPU class.
The free tier includes $30/month in credits, making it accessible to startups and researchers. The trade-offs are significant. Modal-specific code creates vendor lock-in: migrating away requires rewriting function signatures, not just changing API endpoints.
The per-second pricing model works great for bursty, variable-demand workloads, but becomes expensive for sustained compute. A 3x multiplier applies for guaranteed execution, plus regional multipliers up to 2.5x, making production H100s cost closer to $3.95/hour than advertised per-second rates. And Modal ships with zero pre-built solutions: no model marketplace, no ready-made inference endpoints.
You must build everything yourself—model loading, GPU memory management, HTTP serving, error handling—which takes hours rather than minutes compared to platforms like Replicate. Modal works best for teams with the engineering capacity to build and optimize their own inference infrastructure, who value cost efficiency over speed-to-market, and who run variable-demand workloads.
How it works
-
Serverless GPU pool
Thousands of GPUs (T4–H200) available on-demand; per-second billing with automatic scale-to-zero when unused.
-
Python decorator infrastructure-as-code
@modal.function annotations specify GPU, memory, and container image; zero YAML or cloud-specific SDK setup required.
-
Elastic batch job runner
Distribute map-reduce and parallel data pipelines across auto-scaling containers for large-scale data processing.
-
Persistent volumes
Shared, distributed storage (1 TiB/month free) for model weights, datasets, and large artifacts.
-
Code execution sandboxes
Run untrusted Python code (e.g., from LLM agents) in isolated containers with controlled resource limits.
-
Web endpoint deployment
Convert Python functions into HTTPS endpoints with automatic scaling, auth, and request logging.
-
Integrated observability
Real-time function logs, cold-start metrics, GPU utilization, and execution traces in unified web console.
Strengths and trade-offs
Strengths
- Python-native development model with zero YAML—decorate functions with GPU specs, deploy instantly without cloud account setup.
- Per-second GPU billing with scale-to-zero—80% cheaper idle time than reserved instances; fine-grained cost control for bursty workloads.
- Established credibility with real customers—Substack, Ramp, and Suno in production; $87M Series B (Sept 2025).
Trade-offs
- Vendor lock-in through Modal-specific decorators—migration to another platform requires rewriting function signatures, not just environment changes.
- Hidden cost multipliers for production—3x multiplier for guaranteed execution plus 1.25–2.5x regional multipliers; actual H100 production cost ~$3.95/hr, not per-second advertised rates.
- Build burden with zero pre-built solutions—no model marketplace or ready-made endpoints; developers must write model loading, GPU memory management, HTTP serving, and error handling from scratch.
Pricing context
Modal operates on a freemium model. The Starter plan is free and includes $30/month in compute credits, limited to 3 workspace seats and 10 concurrent GPUs (suitable for learning). The Team plan costs $250/month and adds $100 monthly credits with unlimited seats and deployments.
Enterprise requires custom quotes. Compute billing is granular: $0.0000131 per CPU core per second, $0.00000222 per GiB memory per second, and GPUs from $0.000164/sec (T4, ~$0.59/hr) to $0.001736/sec (B200, ~$6.25/hr). Critically, production workloads incur a 3x multiplier for guaranteed execution, and EU/latency-optimized regions add 1.25x–2.5x multipliers.
For sustained workloads (>16 hours/day), per-second pricing becomes uncompetitive against reserved or bare-metal cloud instances. Early-stage startups and researchers can apply for grants up to $10,000 in free credits.
Getting started with Modal
-
Create a Modal account and workspace
Sign up for Modal and create a workspace. The Starter plan is free and includes $30 monthly in compute credits, enough to learn and experiment. You'll receive an API token for authentication.
-
Install Modal CLI and authenticate
Install the Modal Python SDK via pip and authenticate with your API token from the Modal dashboard. Use the CLI to test functions locally before pushing them to Modal's cloud infrastructure.
-
Decorate Python functions with GPU specs
Add @modal.function decorators to Python functions, specifying GPU type (T4, A40, H100), memory allocation, and container image. Modal uses these declarations to automatically provision the right hardware when you deploy.
-
Deploy decorated function to Modal
Deploy your decorated function to Modal using the CLI. Your code becomes an HTTPS endpoint (for web requests) or a batch job (for data processing), with automatic scaling and pay-per-second billing.
-
Monitor execution and schedule jobs
Check real-time logs, GPU utilization, and cold-start metrics in the Modal console dashboard. For batch jobs, schedule recurring execution using cron expressions or trigger them programmatically from your application code.
Frequently Asked Questions
What is Modal and how does it work?
Modal is a serverless compute platform for Python workloads requiring GPU acceleration. You decorate Python functions with @modal.function, specifying GPU type and container requirements directly in code. Modal automatically provisions infrastructure, manages scaling, and bills you per-second. No YAML or cloud account setup required.
How much does Modal cost?
Modal's Starter plan is free with $30/month compute credits. Team plan costs $250/month plus $100 credits. Compute billing starts at $0.000164/sec for T4 GPUs (~$0.59/hour) up to $0.001736/sec for B200 (~$6.25/hour). Production workloads incur 3x multiplier, and regional preferences add 1.25–2.5x multipliers.
What's the difference between Modal's advertised and actual GPU costs?
Modal advertises per-second rates, but production workloads trigger 3x multiplier for guaranteed execution plus 1.25–2.5x regional multipliers. Actual H100 production costs near $3.95/hour, significantly higher than base rates. For sustained compute over 16 hours daily, reserved cloud instances are more economical.
What are Modal's main limitations?
Modal creates vendor lock-in through @modal.function decorators; migration requires rewriting code, not just API changes. The platform has zero pre-built solutions—no model marketplace or ready-made endpoints. Developers must build model loading, GPU memory management, HTTP serving, and error handling from scratch.
Is Modal suitable for sustained workloads?
No. Modal's per-second billing excels for bursty, variable-demand workloads but becomes expensive for sustained compute. Production multipliers and regional fees compound the cost. For workloads running over 16 hours daily, bare-metal instances or reserved capacity on traditional cloud providers offer significantly better value.
Can startups and researchers access Modal affordably?
Yes. Modal's free Starter tier includes $30/month in compute credits, suitable for learning. Startups and researchers can apply for grants up to $10,000 in free credits. The free tier is limited to 3 workspace seats and 10 concurrent GPUs, supporting initial experimentation and prototyping.
Alternatives in this category
Integrations
How Modal compares
Direct head-to-head against 3 competitors. Picked by 7wData.
Modal
- Pricing
- Modal operates on a freemium model. The Starter plan is free and includes $30/month in compute credits, limited to 3 workspace seats and 10 concurrent GPUs (suitable for learning). The Team plan costs $250/month and adds $100 monthly credits with unlimited seats and deployments. Enterprise requires custom quotes. Compute billing is granular: $0.0000131 per CPU core per second, $0.00000222 per GiB memory per second, and GPUs from $0.000164/sec (T4, ~$0.59/hr) to $0.001736/sec (B200, ~$6.25/hr). Critically, production workloads incur a 3x multiplier for guaranteed execution, and EU/latency-optimized regions add 1.25x–2.5x multipliers. For sustained workloads (>16 hours/day), per-second pricing becomes uncompetitive against reserved or bare-metal cloud instances. Early-stage startups and researchers can apply for grants up to $10,000 in free credits.
- Target
- Modal is a serverless compute platform optimized for running Python workloads at scale, particularly those requiring GPU acceleration.
- Deployment
- cloud
- Strength
- Python-native development model with zero YAML—decorate functions with GPU specs, deploy instantly without cloud account setup.
- Watch for
- Vendor lock-in through Modal-specific decorators—migration to another platform requires rewriting function signatures, not just environment changes.
Replicate
- Pricing
- H100 $5.49/hr, A100-80GB $5.04/hr, T4 ~$0.59/hr. Private deployments bill idle time; public models bill active processing only.
- Target
- Teams wanting pre-deployed model APIs without infrastructure code; speed-to-ship prioritized over cost optimization.
- Deployment
- SaaS, hosted serverless
- Strength
- Model marketplace with thousands of community-contributed models deployable via a single API call, no setup.
- Watch for
- H100 at $5.49/hr runs roughly 40% higher than Modal under equivalent sustained production workloads.
RunPod
- Pricing
- H100 serverless ~$3.25/hr, A100 80GB ~$2.17/hr, T4 ~$0.40/hr; usage-based, no monthly minimum.
- Target
- ML engineers wanting lower GPU rates with a choice of persistent pods or serverless per-second billing.
- Deployment
- SaaS, cloud GPU marketplace
- Strength
- Persistent pods for sustained training alongside serverless for bursty inference, both on one platform.
- Watch for
- No metrics dashboard, no distributed tracing, SSH-only debugging. Teams shipping to production cite the monitoring gap as a dealbreaker.
Anyscale
- Pricing
- Cloud GPU pass-through (AWS/GCP/Azure) plus 50-100% platform markup. No public per-GPU rate. Custom/Contact sales.
- Target
- ML platform teams running large-scale distributed workloads across multi-cloud, already invested in Ray.
- Deployment
- SaaS, managed on AWS/GCP/Azure
- Strength
- Only fully managed Ray platform: handles distributed training, serving, and data processing natively without Ray cluster ops.
- Watch for
- Platform markup adds 50-100% over bare-metal GPU rates; an 8x H100 cluster costs roughly $2,800/month more than self-managed.
User reviews
No user reviews yet. Be the first to write one.
Sources
Reporting on this tool draws on these publicly available sources.
- modal.com — Product overview, use cases (inference, batch, training, sandboxes), target audience
- modal.com — Official pricing tiers (Starter free with $30 credits, Team $250/month), compute rates (CPU $0.0000131/core/sec, H100 $0.001097/sec), multipliers (3x production, 1.25–2.5x regional)
- modal.com — Technical documentation, setup process, language support (Python primary, JavaScript/TypeScript/Go invocation), architecture
- wavespeed.ai — Independent review of strengths (cold starts, cost efficiency, established customers) and weaknesses (zero pre-built solutions, build burden, developer experience)
- blaxel.ai — Pricing analysis with real multipliers (3.75x effective cost with production + regional), when Modal works (bursty inference) vs. when it doesn't (sustained training)
- www.spheron.network — Vendor lock-in concerns, cold-start unpredictability for large models, cost divergence between advertised and production rates