Learn from Google’s Data Engineers: Dimensional Data Modeling is Dead

Learn from Google’s Data Engineers: Dimensional Data Modeling is Dead

My first day at Google was in 2019, and one of my biggest surprises was that I didn’t find any dimensional data marts. I was used to joining facts to dimensions and could rattle off normal forms and preach best practices of data modeling. I considered myself an expert on slowly-changing dimensions and how to apply them. Dimensional data modeling, popularized by Ralph Kimball with his 1996 book, is a method of organizing data within a data warehouse. While many benefits are preached, I believe that it exists for 3 primary reasons; optimizing compute, organizing data by topic, and optimizing storage. These foundational purposes driving the advent of dimensional modeling have evolved over time. Let’s take this moment in history to revisit why dimensional modeling exists, and how its roots meet our needs today.

In early days of computing, storage cost a premium; as much as $90,000 in 1985. At this rate, data needed to be organized in such a way that minimized the number of times the same value would be stored. To do this, pointers were used in the form of database keys, to point a unique identifier to many records of the same value. Database normalization came about to describe how de-duplicated and storage-optimized a database had become. Instead of storing a long descriptive value every time a dimension existed in the table, we could store this description just once, and connect it to the record it is meant to describe. With the fastest computer putting out 1.9 gigaflops of computational power in 1985 for a whopping $32M, compute-optimization was equally as important. For reference, some of the fastest computers today put out over 400 petaflops of compute, or 20,000 times more flops. In the early days of databases, reducing the number of operations and the average complexity of each operation could save companies millions. For one example, instead of needing to analyze long complex string values to relate records, we could assign a single integer to unique instances, and relate these integers instead of strings. In order to accomplish these optimizations, data was organized into topic-oriented data models. One common topic-oriented model was the star schema.

The benefit of this model was that the center of the star; the fact table, contained values that were physically indexed and easily retrievable. The more costly values to retrieve were stored on dimension tables and could be selectively retrieved, saving on processing cost. If new dimensions were to arise relating to the fact, new dimension tables would have to be put in place, key relationships would need to be enforced, and the process of normalization compliance would be maintained. In successful dimensional modeling, the source data tables were torn apart, distributed among many tables, and if done correctly, could be re-assembled back to the source table if necessary.

Database normalization is showing its age. The cost of 1gb of Google Cloud storage per month is just 2 cents. The returns for breaking a long and/or wide table down into a star or snowflake schema are poor.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

7 Ways To Improve Your Business Using Business Analytics

5 Nov, 2016

Tracking a company’s internal business performance is a very important step towards identifying the critical areas within an organization that …

Read more

Big data vs. the Internet of Things: how the projects differ

2 Aug, 2016

What do you call big data on steroids? The Internet of Things While big data projects may seem a complicated …

Read more

5 tips for building your innovation ecosystem

23 May, 2022

In their quest for IT innovation, CIOs need to look beyond the typical “most innovative companies” lists. Innovation today is …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.