What is boosting in machine learning?
- by 7wData
We train machine learning models to predict values such as the weather, stock prices, the class of an image, or the sentiment of a social media post. However, often, machine learning models fail to meet the performance levels that we expect of them.
There are several solutions to improve the accuracy of machine learning models. One popular method is “boosting,” an ensemble learning technique that brings together several ML models that perform poorly alone but stronger together.
Before we get into boosting, it is worth visiting the concept of “weak” and “strong” learners. Weak learners are ML models that perform poorly, sometimes only slightly better than random guessing. There can be several reasons for an ML model becoming a weak learner. For example, there might not be enough training data or the model may not be complex enough.
In contrast, a strong learner makes mostly correct predictions with high confidence (the desired accuracy and confidence may vary depending on the application). Our goal in machine learning is to create strong learners.
Boosting is closely related to “bagging,” another ensemble method. Bagging (short for “bootstrap aggregating”) trains several weak learners on different bootstrap samples drawn from the training data (bootstrap samples are random samples taken with replacement). This results in the ML models learning different patterns. After training, when the ML model is presented with a new input, it runs it through all the weak learners and uses a majority voting system to make a final prediction. In a classification problem, the bagging model will choose the class that receives the most vote from the weak learners.
Boosting is like bagging but with the difference that it trains a sequence of weak learners that try to correct the mistakes of their predecessors. There are several different popular boosting techniques.
Like bagging, boosting trains a series of weak learners on samples drawn from the training dataset. However, unlike bagging, boosting methods draw their samples “without replacement.” This means that the same example can’t be drawn twice from the training dataset when gathering a sample.
The weak learners are trained sequentially. First, the boosting algorithm draws a subset of training examples from the training dataset and trains a weak learner on them. The ML model will correctly classify some examples and misclassify others.
The algorithm then draws a second set of samples (without replacement) to train the second ML model. But this time, it also adds 50 percent of the examples that were misclassified by the first weak learner.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More