Pivotal Greenplum: Innovation in Data Management for Analytics
- by 7wData
Enterprises that want to mine new data sources and types—like text, geospatial, graph, and machine-generated data—are confronted with a growing number of proprietary and open-source data management systems that address an ever-expanding number of use cases.
Choice is usually a good thing, but in this case, it has downsides. Users are wary of being locked-in by proprietary systems, and teams become exhausted from constantly having to find the right system for the newest use case. The proliferation of data management software leads to an environment that is under-utilized and under-optimized. While many enterprises benefit from cloud processing for ease of self-service, public cloud service providers are rapidly becoming another source of lock-in. What’s needed is a data management system that is based on the contributions of a large community, not the directives of a single vendor, and can be deployed wherever the business needs, not in only one environment.
As users search for an alternative, there’s been a recognizable resurgence of interest in Postgres as the go-to solution for managing data in both operational and analytical contexts -- it’s now the fourth most popular data management system, according to DBEngines.com.
Why the renewed interest in Postgres?
Postgres preserves the benefits of solid relational theory, like the performance of mature optimizers for efficient, fast querying—something not achieved simply by grafting SQL on a distributed key-value store. That said, there are useful features of modern databases that users would love to have in their analytics projects. Near the top of many wish lists are:
Is there a way to help users find an open-source escape from proprietary software while avoiding the treadmill of niche data management systems?
For 15 years, Pivotal has developed Greenplum, the best massively parallel processing version of Postgres for BI, analytics, and machine learning at scale. We’re now combining that experience with our market-leading experience in platforms, tools, and methodologies for application transformation, to make Pivotal Greenplum a first-class citizen in a modern application setting.
In the latest version of Greenplum, we’ve made sizable technology investments in the areas of transaction processing and support for data streams. We’re also announcing Greenplum for Kubernetes, for deploying Greenplum with this increasingly popular container orchestration system. Finally, we have made important contributions to Apache MADlib, significantly expanding the analytical capabilities of Greenplum.
Greenplum increasingly blurs the line between transactional and analytical databases that have otherwise been separate and distinct. With improved transaction processing capability and support for streaming ingest, Greenplum can address workloads across a spectrum of operational and analytic contexts from business intelligence to deep learning.
Greenplum combines fast analytic reads with higher-performance for low-latency writes. For some workloads, this translates up to a 50X performance improvement over Greenplum 5. Because of this, users can consolidate a diverse array of applications in one environment—for example, point queries, data science exploration, fast event processing, and long-running reporting queries—all with greater scale and concurrency.
When these performance improvements are combined with our new Confluent-certified Kafka connector, Greenplum is also better positioned to address a variety of sensor-driven workloads characteristic of IoT applications.
Greenplum is also smarter about how it processes data. With replicated tables, dimensions are replicated on local segments. Joining dimensions with facts locally reduces the need to move traffic across the cluster and improves speed.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More