4 Data Lake Solution Patterns for Big Data Use Cases
- by 7wData
When I took wood shop back in eighth grade, my shop teacher taught us to create a design for our project before we started building it. The way we captured the design was in what was called a working drawing. In those days it was neatly hand sketched showing shapes and dimensions from different perspectives and it provided enough information to cut and assemble the wood project.
The big data solutions we work with today are much more complex and built with layers of technology and collections of services, but we still need something like working drawings to see how the pieces fit together.
Solution patterns (sometimes called architecture patterns) are a form of working drawing that help us see the components of a system and where they integrate but without some of the detail that can keep us from seeing the forest for the trees. That detail is still important, but it can be captured in other architecture diagrams.
In this blog I want to introduce some solution patterns for data lakes. (If you want to learn more about what data lakes are, read "What Is a Data Lake?") Data lakes have many uses and play a key role in providing solutions to many different business problems.
The solution patterns described here show some of the different ways data lakes are used in combination with other technologies to address some of the most common big data use cases. I’m going to focus on cloud-based solutions using Oracle’s platform (PaaS) cloud services.
These are the patterns:
Let’s start with the data science Lab use case. We call it a lab because it’s a place for discovery and experimentation using the tools of data science. data science Labs are important for working with new data, for working with existing data in new ways, and for combining data from different sources that are in different formats. The lab is the place to try out machine learning and determine the value in data.
Before describing the pattern, let me provide a few tips on how to interpret the diagrams. Each blue box represents an Oracle cloud service. A smaller box attached under a larger box represents a required supporting service that is usually transparent to the user. Arrows show the direction of data flow but don’t necessarily indicate how the data flow is initiated.
The data science lab contains a data lake and a data visualization platform. The data lake is a combination of object storage plus the Apache Spark™ execution engine and related tools contained in Oracle Big Data Cloud. Oracle Analytics Cloud provides data visualization and other valuable capabilities like data flows for data preparation and blending relational data with data in the data lake. It also uses an instance of the Oracle Database Cloud Service to manage metadata.
The data lake object store can be populated by the data scientist using an Open Stack Swift client or the Oracle Software Appliance. If automated bulk upload of data is required, Oracle has data integration capabilities for any need that is described in other solution patterns. The object storage used by the lab could be dedicated to the lab or it can be shared with other services, depending on your data governance practices.
Data warehouses are an important tool for enterprises to manage their most important business data as a source for business intelligence. Data warehouses, being built on relational databases, are highly structured. Data therefore must often be transformed into the desired structure before it is loaded into the data warehouse.
This transformation processing in some cases can become a significant load on the data warehouse driving up the cost of operation. Depending on the level of transformation needed, offloading that transformation processing to other platforms can both reduce the operational costs and free up data warehouse resources to focus on its primary role of serving data.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
How to Use Business Intelligence in Sales Acceleration
25 Mar, 2017It’s easy to list off all the reasons that sales acceleration boosts profitability for businesses. What can be forgotten, however, …
IT leaders get creative to fill data science gaps
3 Aug, 2022For the past few years, IT leaders at a US financial services company have been struggling to hire data scientists …
How Blockchain Could Contribute to Ending Poverty in All Its Forms
1 Mar, 2022Technological advancements have reduced global poverty significantly in the past 100 years. Many people have been able to leave poverty …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.