Startup Dremio emerges from stealth, launches memory-based BI query engine

Startup Dremio emerges from stealth

When the open source Apache Arrow project was launched early last year, I covered it with great interest. The project's active contributors hailed from 13 other open source projects as wide-ranging as Cassandra, Impala, Pandas, Spark and Hadoop itself. All of these projects have occasion to place data in memory in a column-oriented fashion, and they've all done it their own way. The Arrow project is all about creating a standard that the other projects can share, so that they can also share data between themselves, without having to convert its in-memory representation.

In addition to the many companies, like Hortonworks, Cisco and LinkedIn, who lent personnel to this project, a new startup, called Dremio, was the major force behind it. Though the company has been in stealth mode until today, its support of, and focus on, Arrow was explicit. Two of Dremio's founders, Tomer Shiran (Dremio's CEO) and Jaques Nadeau (Dremio's CTO and Program Management Committee Chair of Arrow), both hailed from MapR (where Shiran was VP of product) and, significantly, from the Apache Drill project as well.

Also read: SQL and Hadoop: It's complicated

Drill acts as a single SQL engine that, in turn, can query and join data from among several other systems. Drill can certainly make use of an in-memory columnar data standard. But while Dremio was still in stealth, it wasn't immediately obvious what Drill's strong intersection with Arrow might be. That made it hard to guess what Dremio was up to.

Introducing Dremio, the product But with Dremio emerging from stealth today, the association is more clear, because today the company is launching a namesake product that alsoacts as a single SQL engine that can query and join data from among several other systems, and it accelerates those queries using Apache Arrow.

Let's back off the comparison with Drill though, and understand Dremio in its own right. It all stems from Dremio's credo that BI today involves too many layers. Source systems, via ETL processes, feed into data warehouses, which may then feed into OLAP cubes. BI tools themselves may add another layer, building their own in-memory models in order to accelerate query performance. Dremio thinks that's a huge mess.

Data lingua franca Dremio disintermediates things by providing a direct bridge between BI tools and the source system they're querying. The BI tools connect to Dremio as if it were a primary data source, and query it via SQL. Dremio then delegates the query work to the true back-end systems through push-down queries that it issues. Dremio can connect to relational databases (both commercial and open source), NoSQL stores, Hadoop, cloud blob stores and ElasticSearch, among others.

In an interview last week, Shiran and Nadeau told me that Dremio does not materialize its own data store in between the BI tool and the physical back-end databases, and yet it makes queries against that back-end data -- even when it's true Big Data -- perform like queries against "small data" that a BI tool might have in its own local model.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Containers vs. virtual machines: How to tell which is the right choice for your enterprise

30 May, 2016

Name a tech company, any tech company, and they’re investing in containers. Google, of course. IBM, yes. Microsoft, check. But, …

Read more

How to Pick the Perfect Color Combination for Your Data Visualization

15 Jul, 2016

Choosing any color scheme — whether for graphics, websites, brands, etc. — is a challenge in and of itself. That …

Read more

Security In The Cloud Is Enhanced By Artificial Intelligence

6 Apr, 2021

One of the initial hesitations in many enterprise organizations moving into the cloud in the last decade was the question …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.