Real-Time Big Data Processing with Spark and MemSQL

Real-Time Big Data Processing with Spark and MemSQL

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

I got an opportunity to work extensively with big data and analytics in Myntra, an e-commerce store based in India. Data-driven intelligence is one of the core values at Myntra, so crunching and processing data and reporting meaningful insights for the company is of utmost importance.

Every day, millions of users visit Myntra via the app or website, generating billions of clickstream events. It's very important for the data platform team to scale to such a huge number of incoming events, ingest them in real-time with minimal or no loss, and process the unstructured or semi-structured data to generate insights.

We use a varied set of technologies and in-house products to achieve the above, including Go, Kafka, Secor, Spark, Scala, Java, S3, Presto, and Redshift.

As more and more business decisions tend to be based on data and insights, batch and offline reporting from data were simply not enough. We required real-time user behavior analysis, real-time traffic, real-time notification performance, and more to be available with minimal latency. We needed to ingest as well as filter and process data in real-time and also persist it in a write-fast performant data store to do dashboarding and reporting.

Meterial is a pipeline that does exactly this and even more with a feedback loop for other teams to take action from the data in real-time.

Our event collectors written in Golang sit behind Amazon ELB to receive events from the app or website. They add a timestamp to the incoming clickstream events and push them into Kafka. From Kafka, a Meterial-ingestion layer based on Apache Spark streaming ingests somewhere around four million events per minute, filters and transforms the incoming events based on a configuration file, and persists them to a MemSQL row-store every minute.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

9 Natural Language Processing Trends in 2023

15 Mar, 2023

Natural language processing (NLP) is a subset of AI which finds growing importance due to the increasing amount of unstructured …

Read more

Is object storage good for databases?

19 Oct, 2022

If you’d asked an IT professional whether object storage is any good for databases, the answer over most of the past decade …

Read more

The Reason So Many Analytics Efforts Fall Short

13 Sep, 2016

Given the role analytics has played in reshaping industries and rewarding innovative adopters over the last two decades, it is …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.