Modal

Serverless platform for running Python (especially GPU) at scale.

Reviewed by 7wData

On this page

Publisher review

Modal is a serverless compute platform optimized for running Python workloads at scale, particularly those requiring GPU acceleration. Unlike traditional infrastructure (where you provision and pay for machines) or managed services (where you ship code to a SaaS), Modal uses a "write Python, decorate it, deploy" model: you add @modal.function decorators to your code, specify your container environment and GPU type in code, and Modal handles provisioning, scaling, and billing. The platform is built for machine learning engineers and AI teams who want to ship inference endpoints, batch processing jobs, and fine-tuning runs without DevOps overhead.

It pools capacity across multiple cloud providers to optimize both GPU availability and cost, and scales to zero—you pay nothing for idle time. Since its $87 million Series B in September 2025, Modal has gained traction with companies like Substack, Ramp, and Suno. The platform supports the full NVIDIA GPU range (T4 through H200s), with per-second billing that costs around $0.30–$1.09 per second depending on the GPU class.

The free tier includes $30/month in credits, making it accessible to startups and researchers. The trade-offs are significant. Modal-specific code creates vendor lock-in: migrating away requires rewriting function signatures, not just changing API endpoints.

The per-second pricing model works great for bursty, variable-demand workloads, but becomes expensive for sustained compute. A 3x multiplier applies for guaranteed execution, plus regional multipliers up to 2.5x, making production H100s cost closer to $3.95/hour than advertised per-second rates. And Modal ships with zero pre-built solutions: no model marketplace, no ready-made inference endpoints.

You must build everything yourself—model loading, GPU memory management, HTTP serving, error handling—which takes hours rather than minutes compared to platforms like Replicate. Modal works best for teams with the engineering capacity to build and optimize their own inference infrastructure, who value cost efficiency over speed-to-market, and who run variable-demand workloads.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

How it works

  1. Serverless GPU pool

    Thousands of GPUs (T4–H200) available on-demand; per-second billing with automatic scale-to-zero when unused.

  2. Python decorator infrastructure-as-code

    @modal.function annotations specify GPU, memory, and container image; zero YAML or cloud-specific SDK setup required.

  3. Elastic batch job runner

    Distribute map-reduce and parallel data pipelines across auto-scaling containers for large-scale data processing.

  4. Persistent volumes

    Shared, distributed storage (1 TiB/month free) for model weights, datasets, and large artifacts.

  5. Code execution sandboxes

    Run untrusted Python code (e.g., from LLM agents) in isolated containers with controlled resource limits.

  6. Web endpoint deployment

    Convert Python functions into HTTPS endpoints with automatic scaling, auth, and request logging.

  7. Integrated observability

    Real-time function logs, cold-start metrics, GPU utilization, and execution traces in unified web console.

Strengths and trade-offs

Strengths

  • Python-native development model with zero YAML—decorate functions with GPU specs, deploy instantly without cloud account setup.
  • Per-second GPU billing with scale-to-zero—80% cheaper idle time than reserved instances; fine-grained cost control for bursty workloads.
  • Established credibility with real customers—Substack, Ramp, and Suno in production; $87M Series B (Sept 2025).

Trade-offs

  • Vendor lock-in through Modal-specific decorators—migration to another platform requires rewriting function signatures, not just environment changes.
  • Hidden cost multipliers for production—3x multiplier for guaranteed execution plus 1.25–2.5x regional multipliers; actual H100 production cost ~$3.95/hr, not per-second advertised rates.
  • Build burden with zero pre-built solutions—no model marketplace or ready-made endpoints; developers must write model loading, GPU memory management, HTTP serving, and error handling from scratch.

Pricing context

Modal operates on a freemium model. The Starter plan is free and includes $30/month in compute credits, limited to 3 workspace seats and 10 concurrent GPUs (suitable for learning). The Team plan costs $250/month and adds $100 monthly credits with unlimited seats and deployments.

Enterprise requires custom quotes. Compute billing is granular: $0.0000131 per CPU core per second, $0.00000222 per GiB memory per second, and GPUs from $0.000164/sec (T4, ~$0.59/hr) to $0.001736/sec (B200, ~$6.25/hr). Critically, production workloads incur a 3x multiplier for guaranteed execution, and EU/latency-optimized regions add 1.25x–2.5x multipliers.

For sustained workloads (>16 hours/day), per-second pricing becomes uncompetitive against reserved or bare-metal cloud instances. Early-stage startups and researchers can apply for grants up to $10,000 in free credits.

Getting started with Modal

  1. Create a Modal account and workspace

    Sign up for Modal and create a workspace. The Starter plan is free and includes $30 monthly in compute credits, enough to learn and experiment. You'll receive an API token for authentication.

  2. Install Modal CLI and authenticate

    Install the Modal Python SDK via pip and authenticate with your API token from the Modal dashboard. Use the CLI to test functions locally before pushing them to Modal's cloud infrastructure.

  3. Decorate Python functions with GPU specs

    Add @modal.function decorators to Python functions, specifying GPU type (T4, A40, H100), memory allocation, and container image. Modal uses these declarations to automatically provision the right hardware when you deploy.

  4. Deploy decorated function to Modal

    Deploy your decorated function to Modal using the CLI. Your code becomes an HTTPS endpoint (for web requests) or a batch job (for data processing), with automatic scaling and pay-per-second billing.

  5. Monitor execution and schedule jobs

    Check real-time logs, GPU utilization, and cold-start metrics in the Modal console dashboard. For batch jobs, schedule recurring execution using cron expressions or trigger them programmatically from your application code.

Frequently Asked Questions

What is Modal and how does it work?

Modal is a serverless compute platform for Python workloads requiring GPU acceleration. You decorate Python functions with @modal.function, specifying GPU type and container requirements directly in code. Modal automatically provisions infrastructure, manages scaling, and bills you per-second. No YAML or cloud account setup required.

How much does Modal cost?

Modal's Starter plan is free with $30/month compute credits. Team plan costs $250/month plus $100 credits. Compute billing starts at $0.000164/sec for T4 GPUs (~$0.59/hour) up to $0.001736/sec for B200 (~$6.25/hour). Production workloads incur 3x multiplier, and regional preferences add 1.25–2.5x multipliers.

What's the difference between Modal's advertised and actual GPU costs?

Modal advertises per-second rates, but production workloads trigger 3x multiplier for guaranteed execution plus 1.25–2.5x regional multipliers. Actual H100 production costs near $3.95/hour, significantly higher than base rates. For sustained compute over 16 hours daily, reserved cloud instances are more economical.

What are Modal's main limitations?

Modal creates vendor lock-in through @modal.function decorators; migration requires rewriting code, not just API changes. The platform has zero pre-built solutions—no model marketplace or ready-made endpoints. Developers must build model loading, GPU memory management, HTTP serving, and error handling from scratch.

Is Modal suitable for sustained workloads?

No. Modal's per-second billing excels for bursty, variable-demand workloads but becomes expensive for sustained compute. Production multipliers and regional fees compound the cost. For workloads running over 16 hours daily, bare-metal instances or reserved capacity on traditional cloud providers offer significantly better value.

Can startups and researchers access Modal affordably?

Yes. Modal's free Starter tier includes $30/month in compute credits, suitable for learning. Startups and researchers can apply for grants up to $10,000 in free credits. The free tier is limited to 3 workspace seats and 10 concurrent GPUs, supporting initial experimentation and prototyping.

Alternatives in this category

Integrations

Python Hugging Face PyTorch

How Modal compares

Direct head-to-head against 3 competitors. Picked by 7wData.

This tool

Modal

Pricing
Modal operates on a freemium model. The Starter plan is free and includes $30/month in compute credits, limited to 3 workspace seats and 10 concurrent GPUs (suitable for learning). The Team plan costs $250/month and adds $100 monthly credits with unlimited seats and deployments. Enterprise requires custom quotes. Compute billing is granular: $0.0000131 per CPU core per second, $0.00000222 per GiB memory per second, and GPUs from $0.000164/sec (T4, ~$0.59/hr) to $0.001736/sec (B200, ~$6.25/hr). Critically, production workloads incur a 3x multiplier for guaranteed execution, and EU/latency-optimized regions add 1.25x–2.5x multipliers. For sustained workloads (>16 hours/day), per-second pricing becomes uncompetitive against reserved or bare-metal cloud instances. Early-stage startups and researchers can apply for grants up to $10,000 in free credits.
Target
Modal is a serverless compute platform optimized for running Python workloads at scale, particularly those requiring GPU acceleration.
Deployment
cloud
Strength
Python-native development model with zero YAML—decorate functions with GPU specs, deploy instantly without cloud account setup.
Watch for
Vendor lock-in through Modal-specific decorators—migration to another platform requires rewriting function signatures, not just environment changes.

Replicate

Pricing
H100 $5.49/hr, A100-80GB $5.04/hr, T4 ~$0.59/hr. Private deployments bill idle time; public models bill active processing only.
Target
Teams wanting pre-deployed model APIs without infrastructure code; speed-to-ship prioritized over cost optimization.
Deployment
SaaS, hosted serverless
Strength
Model marketplace with thousands of community-contributed models deployable via a single API call, no setup.
Watch for
H100 at $5.49/hr runs roughly 40% higher than Modal under equivalent sustained production workloads.

RunPod

Pricing
H100 serverless ~$3.25/hr, A100 80GB ~$2.17/hr, T4 ~$0.40/hr; usage-based, no monthly minimum.
Target
ML engineers wanting lower GPU rates with a choice of persistent pods or serverless per-second billing.
Deployment
SaaS, cloud GPU marketplace
Strength
Persistent pods for sustained training alongside serverless for bursty inference, both on one platform.
Watch for
No metrics dashboard, no distributed tracing, SSH-only debugging. Teams shipping to production cite the monitoring gap as a dealbreaker.

Anyscale

Pricing
Cloud GPU pass-through (AWS/GCP/Azure) plus 50-100% platform markup. No public per-GPU rate. Custom/Contact sales.
Target
ML platform teams running large-scale distributed workloads across multi-cloud, already invested in Ray.
Deployment
SaaS, managed on AWS/GCP/Azure
Strength
Only fully managed Ray platform: handles distributed training, serving, and data processing natively without Ray cluster ops.
Watch for
Platform markup adds 50-100% over bare-metal GPU rates; an 8x H100 cluster costs roughly $2,800/month more than self-managed.

User reviews

No user reviews yet. Be the first to write one.

Sources

Reporting on this tool draws on these publicly available sources.

  1. modal.com — Product overview, use cases (inference, batch, training, sandboxes), target audience
  2. modal.com — Official pricing tiers (Starter free with $30 credits, Team $250/month), compute rates (CPU $0.0000131/core/sec, H100 $0.001097/sec), multipliers (3x production, 1.25–2.5x regional)
  3. modal.com — Technical documentation, setup process, language support (Python primary, JavaScript/TypeScript/Go invocation), architecture
  4. wavespeed.ai — Independent review of strengths (cold starts, cost efficiency, established customers) and weaknesses (zero pre-built solutions, build burden, developer experience)
  5. blaxel.ai — Pricing analysis with real multipliers (3.75x effective cost with production + regional), when Modal works (bursty inference) vs. when it doesn't (sustained training)
  6. www.spheron.network — Vendor lock-in concerns, cold-start unpredictability for large models, cost divergence between advertised and production rates