Data Management 2021 • By Yves Mulkers

5 Things Every Data Engineer Needs To Know About Data Observability

4 min read

Automation, Big Data, Collaboration

Curated from montecarlodata.com →

As a new or aspiring data engineer, there are someessential technologies and frameworksyou should know. How to build a data pipeline? Check. How to clean, transform, and model your data? Check. How to prevent broken data workflows before you get that frantic call from your CEO about her missing data? Maybe not.

By leveraging best practices from our friends in software engineering and developer operations (DevOps), we can think more strategically about tackling the “good pipelines, bad data” problem. For many, this approach incorporates observability, too.

Jesse Anderson, managing director of Big Data Institute and author of Data Engineering Teams: Creating Successful Big Data Teams and Products, andBarr Moses, co-founder and CEO of Monte Carlo, share everything you need to know to get started with this next layer of the data stack.

Data engineering is often called the “plumbing of data science” — usually, referring to the way data engineers make sure all the pipelines and workflows are functioning properly, with the right data flowing in the right directions to the right stakeholders. But most data engineers I talk to also relate to plumbers in one very specific way: you only call them when something goes wrong.

The late-night email from your VP — I need the latest numbers for my board presentation tomorrow morning, and my Looker dashboard is broken.

The early-morning phone call from a data scientist — the data set they’re consuming for a model isn’t working right anymore.

The mid-meeting Slack from a marketing lead — my campaign ROI is out of whack this month. I think something’s wrong with the attribution data.

The message that never comes — the data in this report is perfect. Keep up the good work!

OK, hopefully your company does recognize and appreciate a job consistently well done, but the truth remains: too many data engineers spend too much time fighting fires, troubleshooting issues, and trying to patch burst pipelines.

One way to get out of the vicious late-night-email cycle? Data observability.

Data observability is a new layer in the modern data tech stack, providing data teams with visibility, automation, and alerting into broken data (i.e., data drift, duplicate values, broken dashboards… you get the idea). Frequently, observability leads to faster resolution when issues occur, and can even help prevent downtime from impacting data consumers in the first place.

Beyond its obvious benefit — healthier data! — data observability can also build trust and foster a data-driven culture across your entire organization. When observability tooling and frameworks are made available to data consumers as well as engineers and data scientists, they can more fully understand where data is coming from and how it’s being used, as well as get real-time insight into the status of known issues. This added transparency leads to better communication, more efficient collaboration, and more trust in data.

And with data observability tooling in place, engineers can reclaim valuable time that was previously spent fire-fighting and responding to data emergencies. For example, the data engineering team at Blinkist found that automated monitoring saved up to 20 hours per engineer each week. Those valuable hours can now be spent on innovation and problem-solving–not wrangling data gone wrong.

All this talk of observability, downtime, monitoring, and alerting likely sounds familiar to anyone with experience in software engineering. That’s because the parallels are intentional: the concept of data observability was inspired by DevOps, following the principles and best practices that software engineers developed over the last 20 years to prevent application downtime.

Just like in DevOps, data observability leverages a blanket of diligence for data, flipping the script from ad-hoc troubleshooting to proactive automation of monitoring, alerting, and triaging.

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.

Get the AI & data signal, daily.

Continue Reading

Yves Mulkers

Related Articles

The New Era of Real Estate with Location Intelligence

The Eight Attributes of Bottom-Up Innovation Leaders

Data lineage: Making artificial intelligence smarter