10 Best Machine Learning Algorithms

3 min read
Curated from unite.ai →

Though we’re living through a time of extraordinary innovation in GPU-accelerated machine learning, the latest research papers frequently (and prominently) feature algorithms that are decades, in certain cases 70 years old.

Some might contend that many of these older methods fall into the camp of ‘statistical analysis’ rather than machine learning, and prefer to date the advent of the sector back only so far as 1957, with the invention of the Perceptron.

Given the extent to which these older algorithms support and are enmeshed in the latest trends and headline-grabbing developments in machine learning, it’s a contestable stance. So let’s take a look at some of the ‘classic’ building blocks underpinning the latest innovations, as well as some newer entries that are making an early bid for the AI hall of fame.

In 2017 Google Research led a research collaboration culminating in the paper Attention Is All You Need. The work outlined a novel architecture that promoted attention mechanisms from ‘piping’ in encoder/decoder and recurrent network models to a central transformational technology in their own right.

The approach was dubbed Transformer, and has since become a revolutionary methodology in Natural Language Processing (NLP), powering, amongst many other examples, the autoregressive language model and AI poster-child GPT-3.

Transformers elegantly solved the problem of sequence transduction, also called ‘transformation’, which is occupied with the processing of input sequences into output sequences. A transformer also receives and manages data in a continuous manner, rather than in sequential batches, allowing a ‘persistence of memory’ which RNN architectures are not designed to obtain. For a more detailed overview of transformers, take a look at our reference article.

In contrast to the Recurrent Neural Networks (RNNs) that had begun to dominate ML research in the CUDA era, Transformer architecture could also be easily parallelized, opening the way to productively address a far larger corpus of data than RNNs.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

Transformers captured the public imagination in 2020 with the release of OpenAI’s GPT-3, which boasted a then record-breaking 175 billion parameters. This apparently staggering achievement was eventually overshadowed by later projects, such as the 2021 release of Microsoft’s Megatron-Turing NLG 530B, which (as the name suggests) features over 530 billion parameters.

Transformer architecture has also crossed over from NLP to computer vision, powering a new generation of image synthesis frameworks such as OpenAI’s CLIP and DALL-E, which use text>image domain mapping to finish incomplete images and synthesize novel images from trained domains, among a growing number of related applications.

Though transformers have gained extraordinary media coverage through the release and adoption of GPT-3, the (GAN) has become a recognizable brand in its own right, and may eventually join deepfake as a verb.

First proposed and primarily used for image synthesis, a Generative Adversarial Network is composed of a Generator and a Discriminator. The Generator cycles through thousands of images in a dataset, iteratively attempting to reconstruct them. For each attempt, the Discriminator grades the Generator’s work, and sends the Generator back to do better, but without any insight into the way that the previous reconstruction erred.

This forces the Generator to explore a multiplicity of avenues, instead of following the potential blind alleys that would have resulted if the Discriminator had told it where it was going wrong (see #8 below). By the time the training is over, the Generator has a detailed and comprehensive map of relationships between points in the dataset.

By analogy, this is the difference between learning a single humdrum commute to central London, or painstakingly acquiring .

The result is a high-level collection of features in the latent space of the trained model.

Continue Reading

Enjoyed this summary? Read the complete article at the source:

Continue at unite.ai →

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.