Wednesday, October 2 2024

A full season of boosting

Boosting is one of the most popular tabular machidne learning techniques. Effectively, it has become the default for machine learning challenges but also for industry use-cases. That is why we felt it appropriate to invest in a small season of whiteboard videos that highlight how they work and what useful properties they have.

All episodes

The first episode in this series can be seen as an introduction. It explains the big idea behind boosting and gives plenty of visuals to help support the motivation.

The second part of the series goes a bit deeper into how the underlying tree models work.

The next video expands on the idea of boosting by introducing a technique that involves histograms. It turns out that you can massively speed up the training procedure if you are clever in what values you consider when building a tree. This is where histograms can really help out.

Boosted models aren't just great for predictive performance though. If you have domain knowledge you can also steer the algorithm in the right direction by passing monotonic constraints to it.

The playlist also has a video where the hyperparameters of the boosting algorithms are explored. Boosting techniques can still overfit on the training set so it pays off to search for inputs that prevent this.

Finally, the season ends by explaining that boosting won't solve every problem out there. The technique is great, but not every problem out there is a supervised learning problem. Just to name one example: there are times when you may need to consider semi-supervised systems instead.

We hope that you enjoyed our season of boosting content. We are eager to explore more tools so do let us know if you feel like there are topics missing!

scikit-learn

A full season of boosting

All episodes

Related posts

Anything Classification with scikit-learn

The StandardScaler is not standard

Our first podcast: Sample Space