How to Think Like a Data Scientist in 12 Steps

How to Think Like a Data Scientist in 12 Steps

At the moment, data scientists are getting a lot of attention, and as a result, books about data science are proliferating. While searching for good books about the space, it seems to me that the majority of them focus more on the tools and techniques rather than the nuanced problem-solving nature of the data science process. That is until I encountered Brian Godsey’s “ Think Like a Data Scientist ” — which attempts to lead aspiring data scientists through the process as a path with many forks and potentially unknown destinations. It discusses what tools might be the most useful, and why, but the main objective is to navigate the path — the data science process — intelligently, efficiently, and successfully, to arrive at practical solutions to real-life data-centric problems.

Lifecycle of a data science project
In the book, Brian proposes that a data science project consists of 3 phases:

The 1st phase is preparation — time and effort spent gathering information at the beginning of a project can spare big headaches later.

The 2nd phase is building the product, from planning through execution, using what you learned during the preparation phase and all the tools that statistics and software can provide.

The 3rd and final phase is finishing — delivering the product, getting feedback, making revisions, supporting the product, and wrapping up the project.

As you can see from the image, these 3 phases encompass 12 different tasks. I’d like to use this post to summarize these 12 steps as I believe any aspiring data scientists can benefit from being familiar with them.

Phase I — Preparing
The process of data science begins with preparation. You need to establish what you know, what you have, what you can get, where you are, and where you would like to be. This last one is of utmost importance; a project in data science needs to have a purpose and corresponding goals. Only when you have well-defined goals can you begin to survey the available resources and all the possibilities for moving toward those goals.
1 — Setting Goals
In a data science project, as in many other fields, the main goals should be set at the beginning of the project. All the work you do after setting goals is making use of data, statistics, and programming to move toward and achieve those goals.

First off, every project in data science has a customer. Sometimes the customer is someone who pays you or your business to do the project — for example, a client or contracting agency. In academia, the customer might be a laboratory scientist who has asked you to analyze their data. Sometimes the customer is you, your boss, or another colleague. No matter who the customer might be, they have some expectations about what they might receive from you, the data scientist who has been given the project.
In order to understand such expectations, you need to ask good questions about their data. Asking questions that lead to informative answers and subsequently improved results is an important and nuanced challenge that deserves much more discussion than it typically receives. Good questions are concrete in their assumptions, and good answers are measurable success without too much cost. Getting an answer from a project in data science usually looks something like the formula, or recipe, below.
Although sometimes one of the ingredients — good question, relevant data, or insightful analysis — is simpler to obtain than the others, all three are crucial to getting a useful answer. The product of any old question, data, and analysis isn’t always an answer, much less a useful one. It’s worth repeating that you always need to be deliberate and thoughtful in every step of a project, and the elements of this formula are not exceptions. For example, if you have a good question but irrelevant data, an answer will be difficult to find.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Mariana Uses Artificial Intelligence to Build Personas and Find Target Audiences

24 Mar, 2016

Mariana Uses Artificial Intelligence to Build Personas and Find Target Audiences Can computers understand an individual human’s personality (and then, …

Read more

Amazon Echo will bring artificial intelligence into our lives much sooner than expected

3 Oct, 2016

What’s all the fuss about the voice-activated home speaker that Amazon is due to release in the UK and Germany …

Read more

How the Internet of Things is transforming manufacturing

8 Sep, 2016

The Industrial Revolution marked a major shift in how products were manufactured. This period marked an era of unprecedented growth, …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.