Industry Watch: Microservices and scaling out Big Data

Industry Watch: Microservices and scaling out Big Data

This month’s column is a transcript of a fascinating conversation I had with MapR executives Jim Scott, the director of enterprise strategy and architecture at Hadoop solution provider MapR, and Jack Norris, senior VP of data and applications, on the subject of microservices and scaling Big Data.

SD Times: So, we know that scaling data can be a hassle. What is the impact of microservices on this issue?

Jack Norris: There are some complementary technologies that really are game-changing in terms of how to take advantage of [microservices]. The underlying data layer is an incredible enabler of microservices. If you’re doing microservices that are ephemeral and don’t require a lot of stateful data, then I think it’s pretty well understood and people can be quite successful with it. But the data issues drive a lot of complexity for the developers and for the administrators, and that’s an area that Jim has championed for quite a while, and his experience as an architect and a developer allowed him to grasp this and see it early on.

Jim Scott: There are two different ways to look at it when you look at the more ephemeral services. If you were to take just kind of a general front-end service that’s handling the primary load of a consumer-facing application, it’s probably not going to be doing a lot of work. It’s probably going to be handing off the workload to other services that are sitting behind it. Those services sitting behind it are the ones that are more likely to fall into this model. So, if you were to imagine companies building websites like Amazon, where it consists of 100-plus different service calls to a bunch of different back-end services, there’s the need to compile all the different information to bring back and build a user experience.

When you start looking at those services, being able to have a linearly scalable back-end data flow is pretty important. As you scale out your services, which are going to be doing some of the work, they need to figure out who the user is, what information is relevant to them, they then need to give that information back to the front end to render a front end for the user. The compilation of those different data sets is pretty important. Being able to scale out that tier that is intelligent, where it’s clearly doing some level of computational work, is one thing. But in the same vein, without the data that it depends on, it can’t really do anything.

So, as you scale that service up, you will see how much work each instance of that microservice can perform. You know your scaling factors, and then you know based off of how many different services you have what your workloads are on your back-end data platform, and so when you exceed the total capabilities, you just add another server to that cluster. The same goes for whether it’s a streaming capability, a database capability or a file system capability.

Those microservices, when you imagine for just a moment when you start deploying microservices, if you are the software engineer, you need to have visibility into your services. And that is to say, how fast are they performing? Are there bottlenecks? Are there certain types of requests that are coming in that are causing errors? So, when you look at performance and application monitoring, you must be able to emit data from these instances of microservices so that you can troubleshoot. In the old troubleshooting model, we typically did that by doing complete isolation for different servers, and then each server had its own logs, and you could just trace it that way. Trouble is, that doesn’t scale very well from a cost perspective.

The great thing is, if you imagine how it was done last year, or five years ago, or 10 years ago, however far back you want to go, they were the equivalent of multipurpose applications, monolithic if you will, and those applications had a long life cycle to be able to get updates into them. And the scaling factors for them were all or nothing.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why Social Media Analytics Cannot Rely On Machine Learning

30 Jan, 2017

Social media has been the main driver of public conversation for the past decade now, with Facebook having been founded …

Read more

Towards a Data-driven Organization: A Roadmap for Analytics

1 Jan, 2018

Building a Data-driven Organization requires identifying and prioritizing the opportunities where advanced analytics can make a material difference to the …

Read more

The 3 Reasons Why Companies Should Use Data Intensive Computing

2 Mar, 2017

Researchers have estimated that 25 years ago, around 100GB of data was generated every day. By 1997, we were generating …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.