Data Analysis using SQL

Data Analysis using SQL

SQL is a key cog in a data science professional’s armory. I’m speaking from experience – you simply cannot expect to carve out a successful career in either analytics or data science if you haven’t yet picked up SQL.

And why is SQL so important?

As we move into a new decade, the rate at which we are producing and consuming data is skyrocketing by the day. To make smart decisions based on data, organizations around the world are hiring data professionals like business analysts and data scientists to mine and unearth insights from the vast treasure trove of data.

And one of the most important tools required for this is – you guessed it – SQL!

Structured Query Language (SQL) has been around for decades. It is a programming language used for managing the data held in relational databases. SQL is used all around the world by a majority of big companies. A data analyst can use SQL to access, read, manipulate, and analyze the data stored in a database and generate useful insights to drive an informed decision-making process.

In this article, I will be discussing 8 SQL techniques/queries that will make you ready for any advanced data analysis problems. Do keep in mind that this article assumes a very basic knowledge of SQL.

I would suggest checking out the below courses if you’re new to SQL and/or business analytics:

What is the best way to learn data analysis? By performing it side by side on a dataset! For this purpose, I have created a dummy dataset of a retail store. The customer data table is represented by ConsumerDetails.

Our dataset consists of the following columns:

Note:- I will be using MySQL 5.7 for going forward in the article. You can download it from here – My SQL 5.7 Downloads.

We will begin our analysis with the simplest query, i.e, counting the number of rows in our table. We will do this by using the function – COUNT().

Great! Now we know the number of rows in our table which is 10. It may seem to be funny using this function on a small test dataset but it can help a lot when your rows run into the millions!

A lot of times, our data table is filled with duplicate values. To attain the unique value, we use the DISTINCT function.

In our dataset, how can we find the unique industries that customers belong to?

You guessed it right. We can do this by using the DISTINCT function.

You can even count the number of unique rows by using the count along with distinct. You can refer to the below query:

Aggregation functions are the base of any kind of data analysis. They provide us with an overview of the dataset. Some of the functions we will be discussing are – SUM(), AVG(), and STDDEV().

We use the SUM() function to calculate the sum of the numerical column in a table.

Let’s find out the sum of the amount spent by each of the customers:

In the above example, sum_all is the variable in which the value of the sum is stored. The sum of the amount of money spent by consumers is Rs. 12,560.

To calculate the average of the numeric columns, we use the AVG() function. Let’s find the average expenditure by the consumers for our retail store:

The average amount spent by customers in the retail store is Rs. 1256.

If you have looked at the dataset and then the average value of expenditure by the consumers, you’ll have noticed there’s something missing. The average does not quite provide the complete picture so let’s find another important metric – Standard Deviation. The function is STDDEV().

The standard deviation comes out to be 829.7 which means there is a high disparity between the expenditures of consumers!

The next type of analysis is to identify the extreme values which will help you understand the data better.

The maximum numeric value can be identified by using the MAX() function. Let’s see how to apply it:

The maximum amount of money spent by the consumer in the retail store is Rs. 3000.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Data Science is Changing and Data Scientists will Need to Change Too – Here’s Why and How

16 Jun, 2019

Summary:  Deep changes are underway in how data science is practiced and successfully deployed to solve business problems and create …

Read more

Crowdsourcing big-data analysis

2 Nov, 2017

In the analysis of big data sets, the first step is usually the identification of “features” — data points with …

Read more

Is big data a sound investment?

6 Mar, 2017

Big data is quoted a hype in the market. Companies have adopted it and have started to see huge benefits. …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.