Your Data Literacy Depends on Understanding the Types of Data and How They’re Captured
- by 7wData
The data-related concepts non-technical people need to understand fall into five buckets: (i) data generation, collection and storage, (ii) what data looks and feels like to data scientists and analysts, (iii) statistics intuition and common statistical pitfalls, (iv) model building, machine learning and AI, and (v) the ethics of data, big and small. The first two are easily overlooked. The capture of data depends on the use case. Data scientists mostly encounter data in one of three forms: (i) tabular data (that is, data in a table, like a spreadsheet), (ii) image data or (iii) unstructured data, such as natural language text or html code, which makes up the majority of the world’s data.
The ability to understand and communicate about data is an increasingly important skill for the 21st-century citizen, for three reasons. First, data science and AI are affecting many industries globally, from healthcare and government to agriculture and finance. Second, much of the news is reported through the lenses of data and predictive models. And third, so much of our personal data is being used to define how we interact with the world.
When so much data is informing decisions across so many industries, you need to have a basic understanding of the data ecosystem in order to be part of the conversation. On top of this, the industry that youwork in will more likely than not see the impact of data analytics. Even if you yourself don’t work directly with data, having this form of literacy will allow you to ask the right questions and be part of the conversation at work.
To take just one striking example, imagine if there had been a discussion around how to interpret probabilistic models in the run up to the 2016 U.S. presidential election. FiveThirtyEight, the data journalism publication, . As Allen Downey, Professor of Computer Science at Olin College, points out, fewer people would have been shocked by the result had they been reminded that, Trump winning, according to FiveThirtyEight’s model, was a bit more likely than flipping two coins and getting two heads – hardly something that’s impossible to imagine.
The data-related concepts non-technical people need to understand fall into five buckets: (i) data generation, collection and storage, (ii) what data looks and feels like to data scientists and analysts, (iii) statistics intuition and common statistical pitfalls, (iv) model building, machine learning and AI, and (v) the ethics of data, big and small.
The first four buckets roughly correspond to key steps in the data science hierarchy of needs, as recently proposed by Monica Rogati. Although it has not yet been formally incorporated into data science workflows, I have added data ethics as the fifth key concept because ethics needs to be part of any conversation about data. So many people’s lives, after all, are increasingly affected by the data they produce and the algorithms that use them. This article will focus the first two; I’ll leave the other three for a future article.
Every time you engage with the Internet, whether via web browser or mobile app, your activity is detected and most often stored. To get a feel for some of what your basic web browser can detect, check out , a project that opens a window into the extent of passive data collection online. If you are more adventurous, you can install , which “collect[s] the same information you provide to Facebook, while still respecting your privacy.”
The collection of data isn’t relegated to merely the world of laptop, smartphone and tablet interactions but the far wider Internet of Things (IoT), a catch-all for traditionally dumb objects, such as radios and lights, that can be smartified by connecting them to the Internet, along with any other data-collecting devices, such as fitness trackers, Amazon Echo and self-driving cars.
All the collected data is stored in what we colloquially refer to as “the cloud” and it’s important to clarify what’s meant by this term.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More