Alluxio
Alluxio is a data acceleration layer that sits between compute and storage to speed up AI and analytics workloads.
Profile
A caching layer that accelerates data access for AI and analytics workloads by placing frequently-used data in fast memory between compute and cloud storage.
Alluxio is a data acceleration layer that sits between compute and storage to speed up AI and analytics workloads. Founded in 2013 as Tachyon, a research project from UC Berkeley's AMPLab, it has grown into an enterprise platform serving nine of the world's top ten internet companies. Haoyuan Li, the company's founder and CEO, originally built the system to solve data-sharing bottlenecks in Apache Spark; the project was rebranded as Alluxio in 2016 and open-sourced under an Apache License that year.
The company operates primarily as a caching layer: instead of replacing storage, Alluxio adds intelligent memory-based acceleration on top of S3, HDFS, and other backends, allowing compute frameworks like PyTorch, Spark, and Ray to access data at near-memory speed. Notable customers include Fireworks AI, Baidu, Tencent, Myntra, Shopee, Comcast, and Expedia. In May 2025, the company released Enterprise AI 3.6 with sub-millisecond latency for parquet queries, model distribution acceleration, and improved checkpoint writing through an ASYNC mode that delivers up to 9GB/s throughput.
Alluxio has raised $100 million in total funding, with the largest round—$50 million Series C in November 2021—led by Seven Seas Partners and including participation from Andreessen Horowitz, Volcanics Ventures, and angel investors like Sujal Patel. The company reported 50% customer growth in H1 2025, with adoption spanning tech, finance, e-commerce, and media sectors. Based in the San Francisco Bay Area with approximately 97–99 employees, Alluxio remains private and continues to refine its enterprise AI offering as workload demands for GPU memory access intensify.
Who buys this
- Large-scale AI/ML teams training foundation models and running inference at hyperscalers
- Data platforms and analytics teams on Spark, Ray, or Databricks needing faster S3/HDFS access
- E-commerce and media companies optimizing recommendation and personalization models
- Financial services firms running real-time analytics on massive datasets
Publicly disclosed clients
- Fireworks AI
- Baidu
- Tencent
- Meta
- Uber
- Salesforce
- Dyna Robotics
- Myntra
- Shopee
- Comcast
- Expedia Group
- Samsung
- Geely
- Lenovo
Strengths and what to watch
Strengths
- Proven adoption at scale: nine of top ten internet companies, with 50% customer growth in H1 2025 and sub-millisecond latency now standard for AI workloads.
- Native open-source foundation built on Apache Spark research, with active contributor base and ecosystem integration across PyTorch, TensorFlow, Ray, and Spark.
- Recent product momentum: Enterprise AI 3.6 and 3.7 releases in 2025 added checkpoint optimization (9GB/s async writes), multi-tenancy, and management console.
Watch for
- Customer concentration: Heavy reliance on top ten internet companies and hyperscalers; market shifts in AI infrastructure funding could affect adoption.
- Competitive pressure: DuckDB, Trino, and cloud-native analytics (Databricks, Snowflake) are expanding caching and acceleration features in-house, eroding the standalone positioning.
- Limited public revenue disclosure: No ARR or revenue figures disclosed since 2022; private status and flat funding since Nov 2021 Series C warrant scrutiny on profitability path.
Recent moves
Key Information
- Industry
- Storage
- Founded
- 2013
- Headquarters
- San Francisco Bay Area
Frequently Asked Questions
What is Alluxio?
Alluxio is a data acceleration layer that sits between compute frameworks and cloud storage to speed up AI and analytics workloads. Founded in 2013 as Tachyon at UC Berkeley, it uses intelligent memory-based caching on top of S3 and HDFS, allowing near-memory-speed data access.
How does Alluxio improve data access?
Alluxio accelerates data access by placing frequently-used data in fast memory between compute and storage backends. Instead of replacing storage, it adds an intelligent caching layer on top of S3, HDFS, and other systems, enabling compute frameworks like Spark and PyTorch to read data at near-memory speeds.
When should you use Alluxio?
Use Alluxio if you're running large-scale AI training, inference, or analytics on Spark, Ray, or PyTorch with data stored in S3 or HDFS. It's ideal for teams needing faster data access without replacing storage, especially in e-commerce, finance, media, and hyperscaler environments handling GPU-intensive workloads.
Which companies use Alluxio?
Alluxio serves nine of the world's top ten internet companies. Notable customers include Fireworks AI, Baidu, Tencent, Meta, Uber, Salesforce, Myntra, Shopee, Comcast, Expedia, Samsung, Geely, and Lenovo. The company reported 50% customer growth in the first half of 2025 across tech, finance, e-commerce, and media sectors.
What features does Alluxio Enterprise AI 3.6 include?
Alluxio Enterprise AI 3.6, released in May 2025, features sub-millisecond latency for parquet queries, model distribution acceleration, and checkpoint writing through an ASYNC mode delivering up to 9GB/s throughput. It also adds multi-tenancy support and an improved management console for enterprise deployments.
Is Alluxio open source?
Yes, Alluxio is open source under the Apache License, available for free since its 2016 rebranding from Tachyon. The project originated from UC Berkeley's AMPLab and maintains an active contributor base. However, Alluxio also offers a commercial Enterprise version with additional features, support, and management tools for production deployments.
How Alluxio compares
Direct head-to-head against 3 competitors. Picked by 7wData.
Alluxio
- Positioning
- A caching layer that accelerates data access for AI and analytics workloads by placing frequently-used data in fast memory between compute and cloud storage.
- Customer segments
- Large-scale AI/ML teams training foundation models and running inference at hyperscalers
- Strengths
- Proven adoption at scale: nine of top ten internet companies, with 50% customer growth in H1 2025 and sub-millisecond latency now standard for AI workloads.
- Watch for
- Customer concentration: Heavy reliance on top ten internet companies and hyperscalers; market shifts in AI infrastructure funding could affect adoption.
- Recent moves
- Alluxio Closes Strong Q2 with 50% Customer Growth and Sub-Millisecond Latency for AI Data
WEKA
- Positioning
- AI data platform replacing legacy storage tiers for GPU-heavy training and inference, now marketing NeuralMesh architecture.
- Customer segments
- Fortune 50 enterprises, AI cloud providers, and government research labs running large-scale GPU workloads.
- Strengths
- Multi-protocol file system (NFS, S3, POSIX) with NeuralMesh (GA March 2026) delivering sub-millisecond latency to GPU clusters.
- Watch for
- Proprietary file system creates infrastructure lock-in; switching away requires full storage re-architecture and substantial migration cost.
- Recent moves
- NeuralMesh AI Data Platform reached general availability March 16, 2026, targeting NVIDIA-aligned AI Factory deployments.
VAST Data
- Positioning
- All-flash, software-defined platform sold as an AI operating system unifying storage, database, and compute.
- Customer segments
- Hyperscalers, AI cloud providers, and enterprise AI teams running training clusters and high-throughput inference at scale.
- Strengths
- Disaggregated, scale-out all-flash architecture eliminates storage tiers, keeping GPUs fed without a separate caching layer.
- Watch for
- All-flash cost per TB substantially exceeds tiered storage plus caching for workloads carrying large cold-data volumes.
- Recent moves
- Series F closed at $30 billion valuation, March 2026, led by Drive Capital with NVIDIA participation.
MinIO
- Positioning
- S3-compatible, software-defined object storage targeting AI and analytics at exabyte scale as a high-throughput storage layer.
- Customer segments
- Platform engineering and DevOps teams at Fortune 500 companies running self-managed AI data pipelines on-premises or hybrid.
- Strengths
- S3 API compatibility with GPUDirect-like RDMA throughput in AIStor, enabling high-performance AI pipelines without proprietary client libraries.
- Watch for
- June 2025: admin UI stripped from Community Edition, triggering documented bait-and-switch backlash and enterprise trust concerns.
- Recent moves
- AIStor launched November 2024 as enterprise-only paid tier, formally ending active development of open-source Community Edition.
Sources
- www.alluxio.io — Company overview, products, customer list (Fireworks AI, Meta, Salesforce, Uber, Dyna Robotics)
- www.alluxio.io — Founding history, leadership team (Haoyuan Li CEO, Amelia Wong co-founder, Bin Fan VP Tech), headquarters, investor list
- www.alluxio.io — Named customer list: Fireworks AI, Myntra, Baidu, Tencent, Bilibili, Comcast, Expedia, Lenovo, Shopee, Samsung, Geely, Unisound, Blackout Power Trading, Dyna Robotics, RedNote
- www.alluxio.io — Recent announcements: Alluxio Enterprise AI 3.6 (May 2025), March 2025 vLLM partnership, Q2 2025 results, 2025 Intellyx Award
- www.globenewswire.com — Q2 2025: 50% customer growth, sub-millisecond latency achievement, MLPerf benchmark leadership
- www.storagenewsletter.com — Alluxio Enterprise AI 3.6 features: model distribution, checkpoint optimization (9GB/s async writes), management console, multi-tenancy
- en.wikipedia.org — Founding date 2013 (as Tachyon at UC Berkeley), rebranded as Alluxio 2016, Apache license, major customers (Baidu, Huawei, Tencent, Wells Fargo)
- www.crunchbase.com — Series C: $50M, November 16, 2021, led by Seven Seas Partners
- solutionsreview.com — Series B extension: $15.5M, April 2020, led by Volcanics Ventures and a16z
- www.datamation.com — Total funding $100M disclosed, Series C $50M 2021 context
- tracxn.com — Employees: ~97–99 headcount as of Feb–Mar 2026, headquarters San Francisco Bay Area
- www.glassdoor.com — Employee reviews: low-trust environment noted, limited review sample (6 reviews)