Top 11 Tools For Distributed Machine Learning Blog

Top 11 Tools For Distributed Machine Learning

by 7wData
January 3, 2020

There are two fundamentally different and complementary ways of accelerating machine learning workloads:

2. By horizontal scaling or scaling-out, where one adds more nodes to the system

But when it comes to the degree of distribution within a machine learning ecosystem, they are classified as:

Centralised systems employ a strictly hierarchical approach. But the distributed system consists of a network of independent nodes and where no specific roles are assigned to certain nodes.

A centralised solution is not the right choice when data is inherently distributed or too big to store on single machines. For instance, think about astronomical data that is too large to move and centralise.

In a recent work published by the researchers at Delft University of Technology, Netherlands, they wrote in detail about the current state-of-the-art distributed ML models and how they affect computation latency and other attributes.

The advantages of using distributed ML models are plenty, it is beyond the scope of this article, however, here we list down of popular toolkits and techniques that enable distributed machine learning:

MapReduce is a framework for processing data and was developed by Google in order to process data in a distributed setting. First, all data is split into tuples during the map phase, which is followed by the reduce phase, where these tuples are grouped to generate a single output value per key. MapReduce and Hadoop heavily rely on the distributed file system in every phase of the execution.

Transformations in linear algebra, as they occur in many machine learning algorithms, are typically highly iterative in nature and the paradigm of the map and the reduce operations are not ideal for such iterative tasks. This is what Apache Spark has been developed to resolve.

The key difference here is the MapReduce tasks, which would require to write all (intermediate) data to disk for it to be executed. Whereas, Spark can keep all the data in memory, which saves expensive reads from the disk.

AllReduce uses common high-performance computing technology to iteratively train stochastic gradient descent models on separate mini-batches of the training data. Baidu claims linear speedup when applying this technique in order to train deep learning networks.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Top 11 Tools For Distributed Machine Learning

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Predictive analytics and machine learning: A dynamic duo

Execs Bullish on AI but Wary of Data Leadership

Overwhelmed by Data? Here’s How to Get Control of It

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Top 11 Tools For Distributed Machine Learning

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change