Your Data Literacy Depends on Understanding the Types of Data and How They’re Captured

Your Data Literacy Depends on Understanding the Types of Data and How They’re Captured

The data-related concepts non-technical people need to understand fall into five buckets: (i) data generation, collection and storage, (ii) what data looks and feels like to data scientists and analysts, (iii) statistics intuition and common statistical pitfalls, (iv) model building, machine learning and AI, and (v) the ethics of data, big and small. The first two are easily overlooked. The capture of data depends on the use case. Data scientists mostly encounter data in one of three forms: (i) tabular data (that is, data in a table, like a spreadsheet), (ii) image data or (iii) unstructured data, such as natural language text or html code, which makes up the majority of the world’s data.

The ability to understand and communicate about data is an increasingly important skill for the 21st-century citizen, for three reasons. First, data science and AI are affecting many industries globally, from healthcare and government to agriculture and finance. Second, much of the news is reported through the lenses of data and predictive models. And third, so much of our personal data is being used to define how we interact with the world.

When so much data is informing decisions across so many industries, you need to have a basic understanding of the data ecosystem in order to be part of the conversation. On top of this, the industry that youwork in will more likely than not see the impact of data analytics. Even if you yourself don’t work directly with data, having this form of literacy will allow you to ask the right questions and be part of the conversation at work.

To take just one striking example, imagine if there had been a discussion around how to interpret probabilistic models in the run up to the 2016 U.S. presidential election. FiveThirtyEight, the data journalism publication, . As Allen Downey, Professor of Computer Science at Olin College, points out, fewer people would have been shocked by the result had they been reminded that, Trump winning, according to FiveThirtyEight’s model, was a bit more likely than flipping two coins and getting two heads – hardly something that’s impossible to imagine.

The data-related concepts non-technical people need to understand fall into five buckets: (i) data generation, collection and storage, (ii) what data looks and feels like to data scientists and analysts, (iii) statistics intuition and common statistical pitfalls, (iv) model building, machine learning and AI, and (v) the ethics of data, big and small.

The first four buckets roughly correspond to key steps in the data science hierarchy of needs, as recently proposed by Monica Rogati. Although it has not yet been formally incorporated into data science workflows, I have added data ethics as the fifth key concept because ethics needs to be part of any conversation about data. So many people’s lives, after all, are increasingly affected by the data they produce and the algorithms that use them. This article will focus the first two; I’ll leave the other three for a future article.

Every time you engage with the Internet, whether via web browser or mobile app, your activity is detected and most often stored. To get a feel for some of what your basic web browser can detect, check out , a project that opens a window into the extent of passive data collection online. If you are more adventurous, you can install , which “collect[s] the same information you provide to Facebook, while still respecting your privacy.”

The collection of data isn’t relegated to merely the world of laptop, smartphone and tablet interactions but the far wider Internet of Things (IoT), a catch-all for traditionally dumb objects, such as radios and lights, that can be smartified by connecting them to the Internet, along with any other data-collecting devices, such as fitness trackers, Amazon Echo and self-driving cars.

All the collected data is stored in what we colloquially refer to as “the cloud” and it’s important to clarify what’s meant by this term.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How Geisinger Health System Uses Big Data to Save Lives

20 Dec, 2016

Major industries from retail to aeronautics are leveraging big data. But despite the abundance of data in healthcare, and the …

Read more

How to Know Which Digital Trends Are Worth Chasing

23 Jul, 2016

The rapid pace of digital change has put companies in the unenviable position of trying to decide which tech trends …

Read more

How to Create a Culture of Innovation

14 Mar, 2017

How do organizations revolutionize their products and services? Is it possible to create a culture of innovation? Is organizational culture …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.