What is the difference between xgboost and gradient boosting?

Xgboost is more enhanced version of gradient boosting and give more better performance.

Can anyone explain how they are different mathematically.

👍︎ 5
💬︎
📅︎ Dec 11 2021
🚨︎ report
[D]: Random Forest vs Gradient Boosting out of distribution

Hello everyone,

I'm working on a classification task where I have data from a certain company for years between 2017 and 2020. Trying to train different models (Random Forest, XgBoost, LightGBM, Catboost, Explainable Boosting Machines) on separate data with one year at a time from 2017 to 2019 and looking at the results for 2020, I see a curious behavior and I would like to understand whether it is a normal one in the literature or dependent on the particular data.

In particular, while training with data from 2019, all the boosting algorithms obtain better performances than random forest (0.78-0.79 AUC vs 0.76). This dramatically changes, when I train a model on 2017 or 2018 data for 2020. This data is slightly out of distribution, as there is for sure label shift and data is quite different. (and the learned models' feature importances/PdP are quite different between the years). But here Random Forest still learns to generalize decently (for 2020 data we have a AUC of 0.704 if trained on 2017 and 0.706 if trained on 2018), while the boosting algorithms have on average worse performance, with a big difference for LightGbm between the two datasets ( For 2017 Xgboost 0.567, LGBM, 0.565, Catboost 0.639, EBM 0.521; for 2018 Xgboost 0.661, LightGBM 0.734 (??), Catboost 0.639, EBM 0.685).

Provided I have not performed extensive hyperparameter tuning and further testing and this might be a really particular case dependent on data and hyperparameters, still, I was wondering:

Does there exist some literature (I cannot find) on the robustness out of distribution of Random Forest vs Boosting algorithms which might explain this behavior?

Because intuitively, it might make sense that the variance reduction obtained by bagging would help even out of distribution, as some learners might still have learnt something relevant, but I am not sure it is enough.

PS As a sanity check I also tried with a logistic regression and a gaussian NB, which have the same consistent decrease in performance (0.7 to 0.45-0.6).

👍︎ 29
💬︎
👤︎ u/sirpopiel
📅︎ Jun 18 2021
🚨︎ report
Decision Trees, Random Forests & Gradient Boosting in R idownloadcoupon.com/coupo…
👍︎ 2
💬︎
📅︎ Sep 20 2021
🚨︎ report
is there any tutorial with basic math to help me understand GRADIENT BOOSTING?

I'm willing to take a course in statistics that involves GRADIENT BOOSTING. I am unfamiliar with the formulas as they are, but I get a sense of what they mean when reading the text. Because my math is bad, I'm looking to boost it a bit to catch up with the course level of math. I throughout a tutorial (for dummies) that break this down to me can help me identify what I am missing. I'd appreciate any suggestion or advice

👍︎ 2
💬︎
👤︎ u/Naj_md
📅︎ Aug 07 2021
🚨︎ report
Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice biorxiv.org/content/10.11…
👍︎ 2
💬︎
👤︎ u/sburgess86
📅︎ Aug 04 2021
🚨︎ report
Gradient Boosting models robust for skewness in features?

I was experimenting with my datasets and saw that sometimes I get higher event coverage and sometimes lower, when I use the log transformation to remove the skewness in my data .

I have set (>50 skew) a threshold for Data transformation. There are some features with low skew (<50) in the training data but they have high skew in OOT dataset. So, if I ignore the skew in these additional features, I get slightly better output than when I transform them. Is it because the model is trained without removing their skewness?

Tldr: Basically, I am in a dilemma whether to use the log transformation at all to remove the skew while training GBMs. If yes, should I transform all the features with high skew in OOT or just those with high skew in training?

👍︎ 7
💬︎
👤︎ u/itsallkk
📅︎ May 26 2021
🚨︎ report
Extracting rules from Gradient Boosting in Spark ML

Hello,

I have to extract decision rules and tree view in GBT in Spark M. I couldn't find any tutorial about this ( I checked stack overflow and api docs of pyspark).

Any help is really appreciated.

Thank you.

👍︎ 3
💬︎
👤︎ u/ezgiu
📅︎ Jun 01 2021
🚨︎ report
I wrote a very brief article on Decision Trees , Random Forests and Gradient Boosting. Please give it a look if interested. rob-sneiderman.medium.com…
👍︎ 157
💬︎
👤︎ u/RobStats
📅︎ Nov 17 2020
🚨︎ report
[D] How does one express a decision tree as an analytic function one can compute gradients for? (gradient boosting)

This has really confused me. Most online explanatjons for Gradient Boosting say that one chooses a model, be it a linear function or a tree etc, does inference, then computes the difference between the predictions and the true value and then fit the data to the derivative of the Loss function with respect to the function chosen earlier. I understand what derivate with respect to a function means but what does it mean to fit my data to that derivative?

Also, how does one express a decision tree(weak learner) analytically?

Thanks

👍︎ 5
💬︎
👤︎ u/axyz1995
📅︎ Feb 09 2021
🚨︎ report
[N] IBM, UMich & ShanghaiTech Papers Focus on Statistical Inference and Gradient-Boosting

A team from University of Michigan, MIT-IBM Watson AI Lab and ShanghaiTech University publishes two papers on individual fairness for ML models, introducing a scale-free and interpretable statistically principled approach for assessing individual fairness and a method for enforcing individual fairness in gradient boosting suitable for non-smooth ML models.

Here is a quick read: Improving ML Fairness: IBM, UMich & ShanghaiTech Papers Focus on Statistical Inference and Gradient-Boosting

The papers Statistical Inference for Individual Fairness and Individually Fair Gradient Boosting are on arXiv.

👍︎ 7
💬︎
👤︎ u/Yuqing7
📅︎ Apr 06 2021
🚨︎ report
[P] ICML2020, NGBoost: Natural Gradient Boosting for Probabilistic Predictions

Hi everyone,

NGBoost: https://stanfordmlgroup.github.io/projects/ngboost/

Many of us are interested in models that output predictive uncertainty (typically in the form of probabilistic predictions). Many of us are also fans of Gradient Boosting for its ability to produce high accuracy models, especially over tabular input data. So far there hasn't been a simple way to get both at the same time.

NGBoost brings probabilistic prediction capability to gradient boosting in a generic and modular way. The Y|X can be any parametrized probability distribution with differentiable parameters (including, importantly, multi-parameter distributions such as Normal, Weibull, etc.). It also supports multiple scoring rules (MLE, CRPS) by supporting Generalized Natural Gradients. It is also possible to do survival prediction with NGBoost (my personal use-case for the project).

The key conceptual difference between GBM and NGBoost is that, GBM performs functional gradient descent (on the space of functions), whereas NGBoost performs natural gradient descent on the space of statistical manifolds (where the Riemannian metric of the statistical manifold is implied by the choice of scoring rule such as MLE or CRPS).

We hope you find the project useful for some of your projects!

👍︎ 42
💬︎
👤︎ u/avati
📅︎ Jun 17 2020
🚨︎ report
[P] SpaceOpt: Hyperparameter optimization algorithm via gradient boosting regression.

tl;dr

  • I am looking for people willing to try my new approach to hyperparameter optimization and report in the comments a comparison with whatever they use currently.

https://github.com/ar-nowaczynski/spaceopt


Why another hyperparameter optimization algorithm?

- I'm kind of disappointed with what we have so far: there is no clear winner in the field and once in a while I read post about another comeback of random search (for example: https://www.reddit.com/r/MachineLearning/comments/cycw35/r_random_search_outperforms_stateoftheart_nas/). Optuna or hyperopt are great frameworks, but they lack smart algorithms inside. I couldn't find anything that would satisfy my needs, so one year ago I decided to create simple, understandable algorithm for myself and now I'm publishing it for everyone.

What is the main idea behind SpaceOpt?

In short: integrating more human knowledge into the process, gradient boosting regression for exploitation, random sampling for exploration.

How do we integrate more human knowledge into the process? And why to even do this?

- In my opinion, a smart algorithm should accept any human input and decide on its own what to do about it, otherwise it's not smart. While doing hyperparameter optimization I often want to evaluate my own guesses about the parameters and I expect the algorithm to leverage that. No library I've tried supported this. Maybe I couldn't find it, maybe I didn't try hard enough. But believe it or not, lack of this simple feature made me so frustrated that I decided to create my own tool. And the first must-have was: the algorithm must take into account user-defined points.

- The next thing was continuous space. I never liked uniform and log spaces. I wanted something that works with the lists of ordered values. Why? Because then me or the algorithm can build specific knowledge about specific values from that list. For example, if you define space for learning_rate as a list of values, then it's easy to compute statistics regarding specific values (count, objective mean/std). You wouldn't be able to do that when values are sampled all over the space. Discretizing continuous space by hand-picking some values from it is also a great way to incorporate more human knowledge into the optimization process (assuming you know that values to pick) making job easier for the algorithm. For example you'll probably never specify two values that are extremly close to each other, because you have a gut fee

... keep reading on reddit ➡

👍︎ 104
💬︎
📅︎ Jan 28 2020
🚨︎ report
[D] Gradient Boosting for Time Series

is this the right idea of gradient boosting (e.g. xgboost) with time series? Suppose you have weekly earnings for a company, e.g. 2010- Jan-1st week- $100,000.00

you turn the above data into 4 numeric variables: year, month, week of the month, earnings.

then, you make a tree based regression model :

earnings ~ f(month, year, week of the month)

mathematically, does this approach make sense?

👍︎ 2
💬︎
📅︎ Dec 25 2020
🚨︎ report
Explain Like I'm 5: Gradient Boosting reddit.com/gallery/jjthcw
👍︎ 2
💬︎
👤︎ u/robofied
📅︎ Oct 28 2020
🚨︎ report
[N] IBM, UMich & ShanghaiTech Papers Focus on Statistical Inference and Gradient-Boosting

A team from University of Michigan, MIT-IBM Watson AI Lab and ShanghaiTech University publishes two papers on individual fairness for ML models, introducing a scale-free and interpretable statistically principled approach for assessing individual fairness and a method for enforcing individual fairness in gradient boosting suitable for non-smooth ML models.

Here is a quick read: Improving ML Fairness: IBM, UMich & ShanghaiTech Papers Focus on Statistical Inference and Gradient-Boosting

The papers Statistical Inference for Individual Fairness and Individually Fair Gradient Boosting are on arXiv.

👍︎ 3
💬︎
👤︎ u/Yuqing7
📅︎ Apr 06 2021
🚨︎ report
How does one express a decision tree as an analytic function one can compute gradients for? (gradient boosting)

his has really confused me. Most online explanatjons for Gradient Boosting say that one chooses a model, be it a linear function or a tree etc, does inference, then computes the difference between the predictions and the true value and then fit the data to the derivative of the Loss function with respect to the function chosen earlier. I understand what derivate with respect to a function means but what does it mean to fit my data to that derivative? Also, how does one express a decision tree(weak learner) analytically? Thanks

👍︎ 2
💬︎
👤︎ u/axyz1995
📅︎ Feb 09 2021
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.