Ray

Open-source unified framework for scaling Python and AI workloads.

Reviewed by 7wData

On this page

Publisher review

Ray is an open-source distributed computing framework designed to scale Python and machine learning workloads from a laptop to thousands of GPUs. Built by UC Berkeley's RISELab and backed by Anyscale, Ray provides a Python-native runtime that executes tasks with microsecond latency and can handle millions of tasks per second—orders of magnitude faster than Apache Spark for AI patterns. The framework consists of Ray Core (primitives: tasks, actors, objects) and five specialized libraries: Ray Train for distributed model training and fine-tuning, Ray Tune for hyperparameter optimization, Ray Serve for model deployment and online inference, Ray Data for multi-modal data processing, and Ray RLlib for reinforcement learning workflows.

Ray excels at heterogeneous compute allocation, independently scaling CPUs and GPUs as workloads demand. It integrates natively with PyTorch, TensorFlow, Hugging Face, and other ML libraries, allowing them to work together in parallelized environments. OpenAI uses Ray to coordinate training of large language models, and the PyTorch Foundation now hosts the project.

For organizations moving from batch-heavy Spark pipelines to real-time AI inference, Ray typically delivers 30× cost reductions on GPU workloads. However, Ray trades maturity and institutional support for performance: its ecosystem remains smaller than Spark's, structured data processing is not its focus, and deployment without proper authentication exposes clusters to cryptomining attacks (200,000+ Ray servers exposed online as of 2025).

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

How it works

  1. Ray Core

    Distributed runtime with task, actor, and object primitives that transparently parallelizes Python code across clusters with microsecond task latency.

  2. Ray Train

    Distributed training library supporting multi-node model training, fine-tuning, and fault tolerance for PyTorch, TensorFlow, and Hugging Face models.

  3. Ray Serve

    Production inference framework that deploys models with independent scaling, enabling online prediction APIs and batch inference with GPU acceleration.

  4. Ray Tune

    Hyperparameter optimization engine integrating with population-based training, early stopping, and asynchronous scheduling across GPU clusters.

  5. Ray Data

    Framework-agnostic data loading and transformation supporting images, videos, audio, and structured data across training, tuning, and inference pipelines.

  6. Heterogeneous compute scheduling

    Automatically allocates CPU and GPU resources per task, allowing GPUs and CPUs to scale independently within the same cluster.

  7. Decentralized fault tolerance

    No single point of failure; uses object store replication and lineage recovery to handle worker failures transparently.

Strengths and trade-offs

Strengths

  • Microsecond task latency and millions of tasks per second—10-30× faster than Spark for AI/ML workloads and parallel Python execution.
  • Python-first design with native support for PyTorch, TensorFlow, Hugging Face, and other ML libraries working together in a single scalable environment.
  • Heterogeneous compute allocation: GPUs and CPUs scale independently, enabling efficient use of mixed hardware and cost-effective inference.

Trade-offs

  • Critical security vulnerability: unauthenticated API access allows remote code execution; 200,000+ exposed Ray servers online as of 2025, actively exploited for cryptomining.
  • Smaller ecosystem and community compared to Spark; fewer third-party integrations, commercial support, and institutional knowledge in enterprises.
  • Not optimized for large-scale distributed data processing; lacks SQL abstractions and ETL-centric features that make Spark the standard for data engineering.

Pricing context

Ray Core is free and open-source. Anyscale, the managed hosting platform, uses pay-as-you-go pricing with no monthly minimums. Compute costs depend entirely on instance type: CPU-only instances cost $0.0135/hour, NVIDIA T4 GPUs cost $0.57/hour, and high-end NVIDIA H100 GPUs cost $9.29/hour (H200 at $10.68/hour).

New accounts receive $100 in starter credits. Anyscale also offers BYOC (Bring Your Own Cloud) for deployment in any cloud or on-premises with enterprise support and volume discounts for committed usage. Total cost varies dramatically by GPU tier—an H100 instance costs 160× more than a CPU-only instance for the same hour.

Getting started with Ray

  1. Install Ray via pip

    Run pip install ray to download Ray Core from PyPI. Ray is free and open-source. Installation takes under five minutes on Python 3.8 or later. No sign-up or credentials required for local development and testing.

  2. Initialize a cluster on your laptop

    Import Ray and call ray.init() in your Python script to start a local cluster. This spawns worker processes and readies the runtime. Specify CPU and GPU counts if available. For single-machine evaluation, omit arguments and Ray auto-detects your hardware.

  3. Define tasks with the remote decorator

    Decorate Python functions with @ray.remote to convert them into distributed tasks. Specify resource demands: @ray.remote(num_cpus=2, num_gpus=1). Call these functions with .remote() instead of direct invocation. Ray queues them for parallel execution across cluster nodes without you managing scheduling.

  4. Run distributed tasks and collect results

    Submit tasks to the cluster and fetch results using ray.get(). Measure wall-clock time against serial execution to confirm speedup. Start with a small dataset or synthetic workload to verify Ray distributes compute correctly. Monitor the Ray dashboard (localhost:8265) to see worker utilization.

  5. Deploy to Anyscale for production

    Create a free Anyscale account and deploy your Ray code to managed clusters. Configure GPU allocation (T4, A100, H100 available), autoscaling policies, and job queues. Set up monitoring and cost tracking per instance type. Use BYOC for on-premises deployments with enterprise support.

Frequently Asked Questions

What is Ray distributed computing?

Ray is an open-source distributed computing framework scaling Python and machine learning workloads from laptops to thousands of GPUs. Built by UC Berkeley's RISELab and backed by Anyscale, it executes tasks with microsecond latency and handles millions per second—orders of magnitude faster than Apache Spark for AI workloads.

How does Ray compare to Spark?

Ray delivers 10-30× faster execution for AI/ML workloads with microsecond task latency and independent GPU/CPU scaling. However, Spark remains superior for large-scale data processing and ETL-centric tasks. Ray trades ecosystem maturity for performance: it lacks SQL abstractions and has fewer third-party integrations than Spark's established enterprise infrastructure.

What are Ray's main libraries?

Ray includes five specialized libraries: Ray Train for distributed model training and fine-tuning, Ray Tune for hyperparameter optimization, Ray Serve for model deployment and online inference, Ray Data for multi-modal data processing, and Ray RLlib for reinforcement learning. These libraries integrate natively with PyTorch, TensorFlow, and Hugging Face.

How much does Ray cost?

Ray Core is free and open-source. Anyscale's managed platform uses pay-as-you-go pricing with no monthly minimums. CPU-only instances cost $0.0135/hour, NVIDIA T4 GPUs cost $0.57/hour, and H100 GPUs cost $9.29/hour. New accounts receive $100 in starter credits. Costs vary 160× between CPU and high-end GPU tiers.

What security risks does Ray have?

Ray has a critical vulnerability: unauthenticated API access allows remote code execution. As of 2025, over 200,000 Ray servers were exposed online, actively exploited for cryptomining attacks. Deployments without proper authentication are highly vulnerable. Organizations must enable security controls before production use and keep systems patched against known exploits.

Who uses Ray and why?

OpenAI uses Ray to coordinate large language model training. Organizations moving from batch-heavy Spark pipelines to real-time AI inference typically see 30× cost reductions on GPU workloads. Ray excels at heterogeneous compute allocation, independently scaling CPUs and GPUs within mixed-hardware clusters for efficient, cost-effective inference operations.

Alternatives in this category

Integrations

PyTorch TensorFlow Hugging Face MLflow

How Ray compares

Direct head-to-head against 3 competitors. Picked by 7wData.

This tool

Ray

Pricing
Ray Core is free and open-source. Anyscale, the managed hosting platform, uses pay-as-you-go pricing with no monthly minimums. Compute costs depend entirely on instance type: CPU-only instances cost $0.0135/hour, NVIDIA T4 GPUs cost $0.57/hour, and high-end NVIDIA H100 GPUs cost $9.29/hour (H200 at $10.68/hour). New accounts receive $100 in starter credits. Anyscale also offers BYOC (Bring Your Own Cloud) for deployment in any cloud or on-premises with enterprise support and volume discounts for committed usage. Total cost varies dramatically by GPU tier—an H100 instance costs 160× more than a CPU-only instance for the same hour.
Target
Ray is an open-source distributed computing framework designed to scale Python and machine learning workloads from a laptop to thousands of GPUs.
Deployment
self-hosted
Strength
Microsecond task latency and millions of tasks per second—10-30× faster than Spark for AI/ML workloads and parallel Python execution.
Watch for
Critical security vulnerability: unauthenticated API access allows remote code execution; 200,000+ exposed Ray servers online as of 2025, actively exploited for cryptomining.

Modal

Pricing
Free tier: $30/month compute credits. Team: $250/month. Enterprise: custom pricing.
Target
AI-native startups and Python ML engineers running spiky, GPU-intensive inference or fine-tuning workloads.
Deployment
Serverless SaaS, Python-decorator-based, zero container management.
Strength
Per-second GPU billing with sub-4-second cold starts, covering inference, fine-tuning, and multi-node clusters without Kubernetes.
Watch for
Non-preemptible plus US regional surcharges stack to 3.75x advertised base rates on production workloads.

Dask

Pricing
Open-source, free. Managed layer Coiled: free 500 CPU-hours/month, then $0.05/CPU-hour.
Target
Data scientists scaling existing Pandas and NumPy pipelines without rewriting code, typically 1-10 TB datasets.
Deployment
Open-source library, local multi-core or self-managed cluster, plus managed cloud via Coiled.
Strength
Near-identical Pandas and NumPy API lets existing Python analysts parallelize data-prep pipelines with minimal code changes.
Watch for
Distributed scheduler has no high-availability failover: if it crashes, all in-flight tasks are lost and the cluster resets.

Apache Spark

Pricing
Open-source, free. Databricks DBUs from $0.15 to $0.65+/DBU plus separate cloud VM costs.
Target
Data engineers at large enterprises running ETL, batch processing, and SQL analytics at petabyte scale.
Deployment
Open-source, on-prem or any cloud, plus managed services via Databricks, AWS EMR, and Google Dataproc.
Strength
Single engine covering SQL analytics, structured streaming, and batch ETL, with Delta Lake ACID lakehouse integration via Databricks.
Watch for
JVM serialization and Py4J boundary overhead significantly slow PySpark workloads compared to Ray's shared-memory object store for tensor passing.

User reviews

No user reviews yet. Be the first to write one.

Sources

Reporting on this tool draws on these publicly available sources.

  1. www.ray.io — Ray's core positioning as distributed AI compute engine, key features, and use cases
  2. docs.ray.io — Ray architecture, five native libraries (Data, Train, Tune, Serve, RLlib), and distributed computing capabilities
  3. www.anyscale.com — Anyscale managed Ray pricing: hourly rates by GPU type (T4 $0.57/hr, A100 $4.96/hr, H100 $9.29/hr), BYOC option, and $100 starter credits
  4. domino.ai — Ray vs. Spark trade-offs: Ray's microsecond latency and actor-based asynchronous execution versus Spark's maturity and ETL focus
  5. medium.com — Ray architecture (decentralized metadata, microsecond latency), Python-first design, and positioning for AI/ML versus Spark's JVM orientation
  6. thehackernews.com — 200,000+ exposed Ray servers online, unpatched RCE vulnerability, ShadowRay 2.0 cryptomining exploitation