Alluxio

Alluxio is a data acceleration layer that sits between compute and storage to speed up AI and analytics workloads.

Reviewed by 7wData

On this page

Profile

A caching layer that accelerates data access for AI and analytics workloads by placing frequently-used data in fast memory between compute and cloud storage.

Alluxio is a data acceleration layer that sits between compute and storage to speed up AI and analytics workloads. Founded in 2013 as Tachyon, a research project from UC Berkeley's AMPLab, it has grown into an enterprise platform serving nine of the world's top ten internet companies. Haoyuan Li, the company's founder and CEO, originally built the system to solve data-sharing bottlenecks in Apache Spark; the project was rebranded as Alluxio in 2016 and open-sourced under an Apache License that year.

The company operates primarily as a caching layer: instead of replacing storage, Alluxio adds intelligent memory-based acceleration on top of S3, HDFS, and other backends, allowing compute frameworks like PyTorch, Spark, and Ray to access data at near-memory speed. Notable customers include Fireworks AI, Baidu, Tencent, Myntra, Shopee, Comcast, and Expedia. In May 2025, the company released Enterprise AI 3.6 with sub-millisecond latency for parquet queries, model distribution acceleration, and improved checkpoint writing through an ASYNC mode that delivers up to 9GB/s throughput.

Alluxio has raised $100 million in total funding, with the largest round—$50 million Series C in November 2021—led by Seven Seas Partners and including participation from Andreessen Horowitz, Volcanics Ventures, and angel investors like Sujal Patel. The company reported 50% customer growth in H1 2025, with adoption spanning tech, finance, e-commerce, and media sectors. Based in the San Francisco Bay Area with approximately 97–99 employees, Alluxio remains private and continues to refine its enterprise AI offering as workload demands for GPU memory access intensify.

Track Alluxio and 240+ vendors.

335k+ subscribers read the daily AI & data note. One email, both newsletters. Unsubscribe anytime.

Who buys this

  • Large-scale AI/ML teams training foundation models and running inference at hyperscalers
  • Data platforms and analytics teams on Spark, Ray, or Databricks needing faster S3/HDFS access
  • E-commerce and media companies optimizing recommendation and personalization models
  • Financial services firms running real-time analytics on massive datasets

Publicly disclosed clients

  • Fireworks AI
  • Baidu
  • Tencent
  • Meta
  • Uber
  • Salesforce
  • Dyna Robotics
  • Myntra
  • Shopee
  • Comcast
  • Expedia Group
  • Samsung
  • Geely
  • Lenovo

Strengths and what to watch

Strengths

  • Proven adoption at scale: nine of top ten internet companies, with 50% customer growth in H1 2025 and sub-millisecond latency now standard for AI workloads.
  • Native open-source foundation built on Apache Spark research, with active contributor base and ecosystem integration across PyTorch, TensorFlow, Ray, and Spark.
  • Recent product momentum: Enterprise AI 3.6 and 3.7 releases in 2025 added checkpoint optimization (9GB/s async writes), multi-tenancy, and management console.

Watch for

  • Customer concentration: Heavy reliance on top ten internet companies and hyperscalers; market shifts in AI infrastructure funding could affect adoption.
  • Competitive pressure: DuckDB, Trino, and cloud-native analytics (Databricks, Snowflake) are expanding caching and acceleration features in-house, eroding the standalone positioning.
  • Limited public revenue disclosure: No ARR or revenue figures disclosed since 2022; private status and flat funding since Nov 2021 Series C warrant scrutiny on profitability path.

Recent moves

Key Information

Industry
Storage
Founded
2013
Headquarters
San Francisco Bay Area

Frequently Asked Questions

What is Alluxio?

Alluxio is a data acceleration layer that sits between compute frameworks and cloud storage to speed up AI and analytics workloads. Founded in 2013 as Tachyon at UC Berkeley, it uses intelligent memory-based caching on top of S3 and HDFS, allowing near-memory-speed data access.

How does Alluxio improve data access?

Alluxio accelerates data access by placing frequently-used data in fast memory between compute and storage backends. Instead of replacing storage, it adds an intelligent caching layer on top of S3, HDFS, and other systems, enabling compute frameworks like Spark and PyTorch to read data at near-memory speeds.

When should you use Alluxio?

Use Alluxio if you're running large-scale AI training, inference, or analytics on Spark, Ray, or PyTorch with data stored in S3 or HDFS. It's ideal for teams needing faster data access without replacing storage, especially in e-commerce, finance, media, and hyperscaler environments handling GPU-intensive workloads.

Which companies use Alluxio?

Alluxio serves nine of the world's top ten internet companies. Notable customers include Fireworks AI, Baidu, Tencent, Meta, Uber, Salesforce, Myntra, Shopee, Comcast, Expedia, Samsung, Geely, and Lenovo. The company reported 50% customer growth in the first half of 2025 across tech, finance, e-commerce, and media sectors.

What features does Alluxio Enterprise AI 3.6 include?

Alluxio Enterprise AI 3.6, released in May 2025, features sub-millisecond latency for parquet queries, model distribution acceleration, and checkpoint writing through an ASYNC mode delivering up to 9GB/s throughput. It also adds multi-tenancy support and an improved management console for enterprise deployments.

Is Alluxio open source?

Yes, Alluxio is open source under the Apache License, available for free since its 2016 rebranding from Tachyon. The project originated from UC Berkeley's AMPLab and maintains an active contributor base. However, Alluxio also offers a commercial Enterprise version with additional features, support, and management tools for production deployments.

How Alluxio compares

Direct head-to-head against 3 competitors. Picked by 7wData.

This company

Alluxio

Positioning
A caching layer that accelerates data access for AI and analytics workloads by placing frequently-used data in fast memory between compute and cloud storage.
Customer segments
Large-scale AI/ML teams training foundation models and running inference at hyperscalers
Strengths
Proven adoption at scale: nine of top ten internet companies, with 50% customer growth in H1 2025 and sub-millisecond latency now standard for AI workloads.
Watch for
Customer concentration: Heavy reliance on top ten internet companies and hyperscalers; market shifts in AI infrastructure funding could affect adoption.
Recent moves
Alluxio Closes Strong Q2 with 50% Customer Growth and Sub-Millisecond Latency for AI Data

WEKA

Positioning
AI data platform replacing legacy storage tiers for GPU-heavy training and inference, now marketing NeuralMesh architecture.
Customer segments
Fortune 50 enterprises, AI cloud providers, and government research labs running large-scale GPU workloads.
Strengths
Multi-protocol file system (NFS, S3, POSIX) with NeuralMesh (GA March 2026) delivering sub-millisecond latency to GPU clusters.
Watch for
Proprietary file system creates infrastructure lock-in; switching away requires full storage re-architecture and substantial migration cost.
Recent moves
NeuralMesh AI Data Platform reached general availability March 16, 2026, targeting NVIDIA-aligned AI Factory deployments.

VAST Data

Positioning
All-flash, software-defined platform sold as an AI operating system unifying storage, database, and compute.
Customer segments
Hyperscalers, AI cloud providers, and enterprise AI teams running training clusters and high-throughput inference at scale.
Strengths
Disaggregated, scale-out all-flash architecture eliminates storage tiers, keeping GPUs fed without a separate caching layer.
Watch for
All-flash cost per TB substantially exceeds tiered storage plus caching for workloads carrying large cold-data volumes.
Recent moves
Series F closed at $30 billion valuation, March 2026, led by Drive Capital with NVIDIA participation.

MinIO

Positioning
S3-compatible, software-defined object storage targeting AI and analytics at exabyte scale as a high-throughput storage layer.
Customer segments
Platform engineering and DevOps teams at Fortune 500 companies running self-managed AI data pipelines on-premises or hybrid.
Strengths
S3 API compatibility with GPUDirect-like RDMA throughput in AIStor, enabling high-performance AI pipelines without proprietary client libraries.
Watch for
June 2025: admin UI stripped from Community Edition, triggering documented bait-and-switch backlash and enterprise trust concerns.
Recent moves
AIStor launched November 2024 as enterprise-only paid tier, formally ending active development of open-source Community Edition.

Sources

  1. www.alluxio.io — Company overview, products, customer list (Fireworks AI, Meta, Salesforce, Uber, Dyna Robotics)
  2. www.alluxio.io — Founding history, leadership team (Haoyuan Li CEO, Amelia Wong co-founder, Bin Fan VP Tech), headquarters, investor list
  3. www.alluxio.io — Named customer list: Fireworks AI, Myntra, Baidu, Tencent, Bilibili, Comcast, Expedia, Lenovo, Shopee, Samsung, Geely, Unisound, Blackout Power Trading, Dyna Robotics, RedNote
  4. www.alluxio.io — Recent announcements: Alluxio Enterprise AI 3.6 (May 2025), March 2025 vLLM partnership, Q2 2025 results, 2025 Intellyx Award
  5. www.globenewswire.com — Q2 2025: 50% customer growth, sub-millisecond latency achievement, MLPerf benchmark leadership
  6. www.storagenewsletter.com — Alluxio Enterprise AI 3.6 features: model distribution, checkpoint optimization (9GB/s async writes), management console, multi-tenancy
  7. en.wikipedia.org — Founding date 2013 (as Tachyon at UC Berkeley), rebranded as Alluxio 2016, Apache license, major customers (Baidu, Huawei, Tencent, Wells Fargo)
  8. www.crunchbase.com — Series C: $50M, November 16, 2021, led by Seven Seas Partners
  9. solutionsreview.com — Series B extension: $15.5M, April 2020, led by Volcanics Ventures and a16z
  10. www.datamation.com — Total funding $100M disclosed, Series C $50M 2021 context
  11. tracxn.com — Employees: ~97–99 headcount as of Feb–Mar 2026, headquarters San Francisco Bay Area
  12. www.glassdoor.com — Employee reviews: low-trust environment noted, limited review sample (6 reviews)