How to Think Like a Data Scientist in 12 Steps
- by 7wData
At the moment, data scientists are getting a lot of attention, and as a result, books about data science are proliferating. While searching for good books about the space, it seems to me that the majority of them focus more on the tools and techniques rather than the nuanced problem-solving nature of the data science process. That is until I encountered Brian Godsey’s “ Think Like a Data Scientist ” — which attempts to lead aspiring data scientists through the process as a path with many forks and potentially unknown destinations. It discusses what tools might be the most useful, and why, but the main objective is to navigate the path — the data science process — intelligently, efficiently, and successfully, to arrive at practical solutions to real-life data-centric problems.
Lifecycle of a data science project
In the book, Brian proposes that a data science project consists of 3 phases:
The 1st phase is preparation — time and effort spent gathering information at the beginning of a project can spare big headaches later.
The 2nd phase is building the product, from planning through execution, using what you learned during the preparation phase and all the tools that statistics and software can provide.
The 3rd and final phase is finishing — delivering the product, getting feedback, making revisions, supporting the product, and wrapping up the project.
As you can see from the image, these 3 phases encompass 12 different tasks. I’d like to use this post to summarize these 12 steps as I believe any aspiring data scientists can benefit from being familiar with them.
Phase I — Preparing
The process of data science begins with preparation. You need to establish what you know, what you have, what you can get, where you are, and where you would like to be. This last one is of utmost importance; a project in data science needs to have a purpose and corresponding goals. Only when you have well-defined goals can you begin to survey the available resources and all the possibilities for moving toward those goals.
1 — Setting Goals
In a data science project, as in many other fields, the main goals should be set at the beginning of the project. All the work you do after setting goals is making use of data, statistics, and programming to move toward and achieve those goals.
First off, every project in data science has a customer. Sometimes the customer is someone who pays you or your business to do the project — for example, a client or contracting agency. In academia, the customer might be a laboratory scientist who has asked you to analyze their data. Sometimes the customer is you, your boss, or another colleague. No matter who the customer might be, they have some expectations about what they might receive from you, the data scientist who has been given the project.
In order to understand such expectations, you need to ask good questions about their data. Asking questions that lead to informative answers and subsequently improved results is an important and nuanced challenge that deserves much more discussion than it typically receives. Good questions are concrete in their assumptions, and good answers are measurable success without too much cost. Getting an answer from a project in data science usually looks something like the formula, or recipe, below.
Although sometimes one of the ingredients — good question, relevant data, or insightful analysis — is simpler to obtain than the others, all three are crucial to getting a useful answer. The product of any old question, data, and analysis isn’t always an answer, much less a useful one. It’s worth repeating that you always need to be deliberate and thoughtful in every step of a project, and the elements of this formula are not exceptions. For example, if you have a good question but irrelevant data, an answer will be difficult to find.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More