5 top data challenges that are changing the face of data centers
- by 7wData
Data is clearly not what it used to be! Organizations of all types are finding new uses for data as part of their digital transformations. Examples abound in every industry, from jet engines to grocery stores, for data becoming key to competitive advantage. I call this new data because it is very different from the financial and ERP data that we are most familiar with. That old data was mostly transactional, and privately captured from internal sources, which drove the client/server revolution.
New data is both transactional and unstructured, publicly available and privately collected, and its value is derived from the ability to aggregate and analyze it. Loosely speaking we can divide this new data into two categories: big data – large aggregated data sets used for batch analytics – and fast data – data collected from many sources that is used to drive immediate decision making. The big data–fast data paradigm is driving a completely new architecture for data centers (both public and private).
Over the next series of blogs, I will cover each of the top five data challenges presented by new data center architectures:
New data is captured at the source. The volume of data collected at the source will be several orders of magnitude higher than we are familiar with today. For example, an autonomous car will generate up to 4 terabytes of data per day. Scale that for millions – or even billions of cars, and we must prepare for a new data onslaught.
It is clear that we cannot capture all of that data at the source and then try to transmit it over today’s networks to centralized locations for processing and storage. This is driving the development of completely new data centers, with different environments for different types of data characterized by a new “edge computing” environment that is optimized for capturing, storing and partially analyzing large amounts of data prior to transmission to a separate core data center environment.
The new edge computing environments are going to drive fundamental changes in all aspects of computing infrastructures: from CPUs to GPUs and even MPUs (mini-processing units)—to low power, small scale flash storage—to the Internet of Things (IoT) networks and protocols that don’t require what will become precious IP addressing.
Let’s consider a different example of data capture. In the bioinformatics space, data is exploding at the source. In the case of mammography, the systems that capture those images are moving from two-dimensional images to three-dimensional images. The 2-D images require about 20MB of capacity for storage, while the 3-D images require as much as 3GB of storage capacity representing a 150x increase in the capacity required to store these images. Unfortunately, most of the digital storage systems in place to store 2-D images are simply not capable of cost-effectively storing 3-D images. They need to be replaced by big data repositories in order for that data to thrive.
In addition, the type of processing that organizations are hoping to perform on these images is machine learning-based, and far more compute-intensive than any type of image processing in the past. Most importantly, in order to perform machine learning, the researchers must assemble a large number of images for processing to be effective. Assembling these images means moving or sharing images across organizations requiring the data to be captured at the source, kept in an accessible form (not on tape), aggregated into large repositories of images, and then made available for large scale machine learning analytics.
Images may be stored in their raw form, but metadata is often added at the source. In addition, some processing may be done at the source to maximize “signal-to-noise” ratios.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
Multicloud vs hybrid cloud: Which one is right for your organization?
2 Oct, 2022Check out the pros and cons of multicloud and hybrid cloud deployment models, and get advice on what to consider …
4 Business Risks Preventing Big Data ROI
29 Dec, 2016Evaluating risk vs. return of a big data initiative can be tricky, especially because the open source market is so …
How Hyperconvergence and Smart Cities Will Work Together
17 Feb, 2020For many people who are not versed in the intricacies of IT and technological infrastructures, smart cities seem like a …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.