What is Feature Engineering and Why Does It Need To Be Automated? Blog

What is Feature Engineering and Why Does It Need To Be Automated?

by 7wData
April 5, 2020

Artificial intelligence is becoming more ubiquitous and necessary these days. From preventing fraud, real-time anomaly detection to predicting customer churn, enterprise customers are finding new applications of machine learning (ML) every day. What lies under the hood of ML, how does this technology make predictions and which secret ingredient makes the AI magic work?

In the data science community, the focus is typically on algorithm selection and model training, and indeed those are important, but the most critical piece in the AI/ML workflow is not how we select or tune algorithms but what we input to AI/ML, i.e., Feature engineering.

Feature engineering is the holy grail of data science and the most critical step that determines the quality of AI/ML outcomes. Irrespective of the algorithm used, feature engineering drives model performance, governs the ability of machine learning to generate meaningful insights, and ultimately solve business problems.

Feature engineering is the process of applying domain knowledge to extract analytical representations from raw data, making it ready for machine learning. It is the first step in developing a machine learning model for prediction.

Feature engineering involves the application of business knowledge, mathematics, and statistics to transform data into a format that can be directly consumed by machine learning models. It starts from many tables spread across disparate databases that are then joined, aggregated, and combined into a single flat table using statistical transformations and/or relational operations.

For example, predicting customers likely to churn in any given quarter implies having to identify potential customers who have the highest probability of no longer doing business with the company. How do you go about making such a prediction? We make predictions about the churn rate by looking at the underlying causes. The process is based on analyzing customer behavior and then creating hypotheses. For example, customer A contacted customer support five times in the last month – implying customer A has complaints and is likely to churn. In another scenario, customer A’s product usage might have dropped by 30% in the previous two months, again, implying that customer A has a high probability of churning. Looking at the historical behavior, extracting some hypothesis patterns, testing those hypotheses is the process of feature engineering.

Feature engineering is about extracting the business hypothesis from historical data. A business problem that involves predictions such as customer churn is a classification problem.

There are several ML algorithms that you can use, such as classical logistic regression, decision tree, support vector machine, boosting, neural network. Although all these algorithms require a single flat matrix as their inputs, raw business data is stored in disparate tables (e.g., transactional, temporal, geo-locational, etc.) with complex relationships.

We may join two tables first and perform temporal aggregation on the joined table to extract temporal user behavior patterns. Practical FE is far more complicated than simple transformation exercises such as One-Hot Encoding (transform categorical values into binary indicators so that ML algorithms can utilize). To implement FE, we are writing hundreds or even thousands of SQL-like queries, performing a lot of data manipulation, as well as a multitude of statistical transformations.

In the machine learning context, if we know the historical pattern, we can create a hypothesis. Based on the hypothesis, we can predict the likely outcome – like which customers are likely to churn in a given time period. And FE is all about finding the optimal combination of hypotheses.

Feature Engineering is critical because if we provide wrong hypotheses as an input, ML cannot make accurate predictions. The quality of any provided hypothesis is vital for the success of an ML model.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

What is Feature Engineering and Why Does It Need To Be Automated?

Leave a Reply Cancel reply

Upcoming Events

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

MarkLogic World | Amsterdam

Categories

Tags

You Might Be Interested In

Dealing With Unsanitized Data

Top Big Data Advantages That Matter Now and in Future

A life by the numbers is not worth living

Recent Jobs

D365 Business Analyst

Judiciary Research Manager (Court Executive 2B)

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

What is Feature Engineering and Why Does It Need To Be Automated?

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change