4 Data Lake Solution Patterns for Big Data Use Cases

4 Data Lake Solution Patterns for Big Data Use Cases

When I took wood shop back in eighth grade, my shop teacher taught us to create a design for our project before we started building it. The way we captured the design was in what was called a working drawing. In those days it was neatly hand sketched showing shapes and dimensions from different perspectives and it provided enough information to cut and assemble the wood project.

The big data solutions we work with today are much more complex and built with layers of technology and collections of services, but we still need something like working drawings to see how the pieces fit together.

Solution patterns (sometimes called architecture patterns) are a form of working drawing that help us see the components of a system and where they integrate but without some of the detail that can keep us from seeing the forest for the trees. That detail is still important, but it can be captured in other architecture diagrams.

In this blog I want to introduce some solution patterns for data lakes. (If you want to learn more about what data lakes are, read "What Is a Data Lake?") Data lakes have many uses and play a key role in providing solutions to many different business problems.

The solution patterns described here show some of the different ways data lakes are used in combination with other technologies to address some of the most common big data use cases. I’m going to focus on cloud-based solutions using Oracle’s platform (PaaS) cloud services.

These are the patterns:

Let’s start with the data science Lab use case. We call it a lab because it’s a place for discovery and experimentation using the tools of data science. data science Labs are important for working with new data, for working with existing data in new ways, and for combining data from different sources that are in different formats. The lab is the place to try out machine learning and determine the value in data.

Before describing the pattern, let me provide a few tips on how to interpret the diagrams. Each blue box represents an Oracle cloud service. A smaller box attached under a larger box represents a required supporting service that is usually transparent to the user. Arrows show the direction of data flow but don’t necessarily indicate how the data flow is initiated.

The data science lab contains a data lake and a data visualization platform. The data lake is a combination of object storage plus the Apache Spark™ execution engine and related tools contained in Oracle Big Data Cloud. Oracle Analytics Cloud provides data visualization and other valuable capabilities like data flows for data preparation and blending relational data with data in the data lake. It also uses an instance of the Oracle Database Cloud Service to manage metadata.

The data lake object store can be populated by the data scientist using an Open Stack Swift client or the Oracle Software Appliance. If automated bulk upload of data is required, Oracle has data integration capabilities for any need that is described in other solution patterns. The object storage used by the lab could be dedicated to the lab or it can be shared with other services, depending on your data governance practices.

Data warehouses are an important tool for enterprises to manage their most important business data as a source for business intelligence. Data warehouses, being built on relational databases, are highly structured. Data therefore must often be transformed into the desired structure before it is loaded into the data warehouse.

This transformation processing in some cases can become a significant load on the data warehouse driving up the cost of operation. Depending on the level of transformation needed, offloading that transformation processing to other platforms can both reduce the operational costs and free up data warehouse resources to focus on its primary role of serving data.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How to Use Business Intelligence in Sales Acceleration

25 Mar, 2017

It’s easy to list off all the reasons that sales acceleration boosts profitability for businesses. What can be forgotten, however, …

Read more

IT leaders get creative to fill data science gaps

3 Aug, 2022

For the past few years, IT leaders at a US financial services company have been struggling to hire data scientists …

Read more

How Blockchain Could Contribute to Ending Poverty in All Its Forms

1 Mar, 2022

Technological advancements have reduced global poverty significantly in the past 100 years. Many people have been able to leave poverty …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.