Data Preparation and Data Wrangling Best Practices – Part 1 Blog

Data Preparation and Data Wrangling Best Practices – Part 1

by 7wData
April 11, 2018

Rekha Sree is a customer success Architect, using her expertise in Data Integration, Data Warehouse and Big Data to help drive customer success at Talend. Prior to joining Talend, Rekha worked at Target Corporation India Pvt Ltd for more than a decade using her vast knowledge in building their enterprise and analytical data warehouse.

Talend Data Preparation Cloud is a self-service application that enables information workers to cut hours out of their workday by simplifying and expediting the time-consuming process of preparing data for analysis or other data-driven tasks. If you are brand new to data preparation, take some time to go through my earlier blog An Introduction to Data Preparation to get the basics and learn a little bit about how it can come in handy as a self-service data preparation tool. In this blog, I want to highlight some best practices that I’ve come across as I've worked with Talend Data Preparation. So, without further delay lets jump into the topic.

A “best practice” for naming conventions really depends on the person or organization. However, following some sort of naming convention structure each and every time makes it significantly easier for subsequent users of the data to understand what the system is doing and how to fix or extend the source code for new business needs. In my experience, the best practice is primarily to follow the naming standards agreed upon the folders. Here are a few suggestions to consider when coming up with naming conventions:

Typically, preparations and datasets are tied to a specific project. Hence the naming conventions for preparations and datasets could be set either globally at the organization level or at the project level. You should do your best to ensure that the naming conventions are strictly followed. Here are a few tips from my own experience:

Now, let's talk about context variables. Context variables are user-defined variables provided by Talend whose value can be changed at runtime. Providing the values of the context variables at runtime allows jobs to be executed in different ways with different parameters. Context variables should also follow standard naming conventions. Here are a couple more suggestions around context variables:

Folder structures are used to group items of similar categories or behavior. As this is completely related to individual needs, I recommend having folder structures defined in the project’s initial phases. The screenshot below shows an example of a folder structure that might be used in a bank. Here the folders are divided by the unit of the module. Some recommendations for folder structures are things like business modules, data sources, rules applied or intake areas.

There’s a saying that I quite like that goes, “It’s not about having a lot of data, it’s about having the right data”. Data selection is about finding the data that’s needed right now, but it should also make it easier to find data later when similar needs arise.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Data Preparation and Data Wrangling Best Practices – Part 1

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

The Latest Analytics Trends in Retail, Marketing, and Insurance

Why your Cloud Strategy Should Include Multiple Vendors

5 Big Data Sources for Improving Data Quality and Business Analytics

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Data Preparation and Data Wrangling Best Practices – Part 1

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change