Data Preparation and Data Wrangling Best Practices – Part 1

Data Preparation and Data Wrangling Best Practices – Part 1

Rekha Sree is a customer success Architect, using her expertise in Data Integration, Data Warehouse and Big Data to help drive customer success at Talend. Prior to joining Talend, Rekha worked at Target Corporation India Pvt Ltd for more than a decade using her vast knowledge in building their enterprise and analytical data warehouse.

Talend Data Preparation Cloud is a self-service application that enables information workers to cut hours out of their workday by simplifying and expediting the time-consuming process of preparing data for analysis or other data-driven tasks. If you are brand new to data preparation, take some time to go through my earlier blog An Introduction to Data Preparation to get the basics and learn a little bit about how it can come in handy as a self-service data preparation tool. In this blog, I want to highlight some best practices that I’ve come across as I've worked with Talend Data Preparation. So, without further delay lets jump into the topic.

A “best practice” for naming conventions really depends on the person or organization. However, following some sort of naming convention structure each and every time makes it significantly easier for subsequent users of the data to understand what the system is doing and how to fix or extend the source code for new business needs. In my experience, the best practice is primarily to follow the naming standards agreed upon the folders. Here are a few suggestions to consider when coming up with naming conventions:

Typically, preparations and datasets are tied to a specific project. Hence the naming conventions for preparations and datasets could be set either globally at the organization level or at the project level. You should do your best to ensure that the naming conventions are strictly followed. Here are a few tips from my own experience:

Now, let's talk about context variables. Context variables are user-defined variables provided by Talend whose value can be changed at runtime. Providing the values of the context variables at runtime allows jobs to be executed in different ways with different parameters. Context variables should also follow standard naming conventions.  Here are a couple more suggestions around context variables:

Folder structures are used to group items of similar categories or behavior. As this is completely related to individual needs, I recommend having folder structures defined in the project’s initial phases. The screenshot below shows an example of a folder structure that might be used in a bank. Here the folders are divided by the unit of the module. Some recommendations for folder structures are things like business modules, data sources, rules applied or intake areas. 

There’s a saying that I quite like that goes, “It’s not about having a lot of data, it’s about having the right data”. Data selection is about finding the data that’s needed right now, but it should also make it easier to find data later when similar needs arise.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

The Latest Analytics Trends in Retail, Marketing, and Insurance

6 Jan, 2018

It’s no secret that analytics has become a crucial aspect helping to facilitate most companies’ operations. Consumers today create endless …

Read more

Why your Cloud Strategy Should Include Multiple Vendors

28 Oct, 2017

You must be aware of the benefits that cloud computing offers to your organization. Cloud computing gives you the power …

Read more

5 Big Data Sources for Improving Data Quality and Business Analytics

5 Nov, 2017

Information is power – especially in the world of e-commerce. You can make a much larger impact and save money …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.