When To Build vs. Buy Data Pipelines

When To Build vs. Buy Data Pipelines

Deciding whether to build or buy a new software is a challenge every engineer has to deal with. In the world of data engineering, building data pipelines in house was a pretty common choice because it only required a few scripts to pipe your data into your data warehouse or data lake. But this is changing rapidly.

As data engineers, we now have to handle dozens of constantly changing data sources, and with the rise of real-time use cases latency matters more than ever. There are many approaches we can take in this new world to develop data infrastructure. If we choose to build our own data pipelines, it leads to data integration systems that are hand-crafted by multiple engineers over a long period of time. Each adding their own special spin to the code base. In the end, most of these data pipeline systems end up looking very similar to a framework that already exists like Airflow.

This is because, at the end of the day, most pipeline systems require several key components:

As engineers, we do have a tendency to approach most of our problems as build vs. buy. However, we don’t always weigh the opportunity costs, and sometimes building is not the best option. It’s dependent upon overall company goals and where our company is in its analytical maturity cycle. In this article, we will discuss the build vs. buy decision when it comes to event streaming and ETL/ELT pipelines to help your team make the right choice for their next data infrastructure component.

Building data infrastructure is a long process, and maintaining it is time-consuming. Even small requests can become arduous to take on. This is amplified if your company works with dozens of data sources, requiring you to maintain all the connectors as the underlying APIs and sources change.

In addition, we are often bombarded with ad hoc requests from other teams while maintaining the current code base. You know the feeling, it’s like death by 1,000 cuts. It keeps yoururgent and important quadrant full of redundant and uninspiring work, and it keeps you from other more strategically important priorities.

The key takeaway here is that constant maintenance and ad hoc requests can significantly slow down real business impact and introduce scalability challenges, so buying solutions or using managed services can be a good choice for many teams.

There are always trade-offs between build and buy. Let’s start by talking about some benefits of buying solutions.

Quick turn around - Bought solutions often meet the majority of a company's use cases quickly. After the sales cycle, the only time required is implementation. This means your team can immediately implement new tooling once purchased. Often, you’ll have a head start because you’ve already tested out the tool via a free offering or trial.

Less maintenance - Maintenance cost is an open secret. All solutions, built or bought, have maintenance costs. The difference is between who pays this cost. When you buy a solution, the solution provider shoulders the burden for all maintenance and any technical debt, distributing these costs over their whole customer base. This offloads the burden of maintenance and frees your team to spend time working on ways to add value rather than running the hamster wheel of maintenance.

Don’t need to keep up with APIs (In the case of connectors) - Keeping up with connector changes is a big (and really annoying) time suck as a data engineer. This is somewhat connected to maintenance. However, rebuilding connectors is such a staple piece of many data engineers' work that it basically requires its own point. Many tools provide connectors out of the box, shifting the maintenance of keeping up with connectors from the company and to the solution provider.

New features don’t need to be built by you - Buying a solution removes the need for your company to try to continue to improve the tool. Instead, all optimization and new feature development is really in the hands of the purchased solution.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

The 10 Vs of Big Data

1 Sep, 2017

Big data goes beyond volume, variety, and velocity alone. You need to know these 10 characteristics and properties of big …

Read more

DataRobot aims to help create data science executives

18 Sep, 2016

Data scientists are in short supply. But so too are managers that understand data science and machine learning enough to …

Read more

Machine Learning as a microservice in a Docker container on a Kubernetes cluster — say what?

10 Nov, 2017

It is always fascinating to see the versatile ways in which machine learning can be used. At Outfittery, algorithms help …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.