It’s time to solve deep learning’s productivity problem
- by 7wData
Deep learning is fueling breakthroughs in everything from consumer mobile apps to image recognition. Yet running Deep learning-based AI models poses many challenges. One of the most difficult roadblocks is the time it takes to train the models.
The need to crunch lots of data and the computational complexity of building deep learning-based AI models also slows down the progress in accuracy and the practicality of deploying deep learning at scale. It’s the training times — often measured in days, sometimes weeks — that slow down implementation.
In order to cut the time it takes to create deep learning models with high precision, we need to reduce the time associated with deep learning training from days to hours to minutes or seconds.
In order to understand the problem deep learning researchers are trying to solve, consider the simple tale of the Blind Men and the Elephant. In the fable, each blind man feels a different part of the elephant — but only one part, such as the side or the tusk. Then they argue about what the entire elephant looks like based on their own limited experience.
If you gave the blind men some time, they could share enough information to piece together a pretty accurate picture of an elephant. It’s the same with graphics processing units (GPUs), which are used with CPUs to accelerate deep learning, analytics, and computing.
If you have slow compute chips in a system, you can keep them synced on their learning progress fairly easily.
But, as GPUs become smarter and faster, they crunch through their learning very quickly, and they need a better means of communicating or they get out of sync. Then they spend too much time waiting for each other’s results. So, you can get no speedup — and potentially even degraded performance — from using more, faster-learning GPUs.
To achieve improved fast-model training, data scientists and researchers need to distribute deep learning across a large number of servers. However, most popular deep learning frameworks scale across GPUs or learners within a server, but not to many servers with GPUs.
The challenge is, it’s difficult to orchestrate and optimize a deep learning problem across many servers, because the faster GPUs run, the faster they learn. GPUs also need to share their learning with all of the other GPUs, but at a rate that isn’t possible with conventional software.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More