Submitted by MichelMED10 t3_y9yuza in MachineLearning

Hello,

While doing some tests, I saw that XGBOOST is way better than a multilayer NN classifier for classifcation.

So I thought that first training a CNN/Transformer as a backbone with a "normale" classifier as a head for any classfication/regression task then freezing the backbone and training an XGBOOST for classfication is a good Idea.

But none of the new papers do that and they all tend to use a linear/multiplayer NN classifier.

Anyone know why ?

Thanks !

7

Comments

You must log in or register to comment.

patrickSwayzeNU t1_it8irde wrote

This post will likely get deleted. You should post in r/learnmachinelearning

Xgboost tends to be best on tabular data - you didn’t mention what domain you’re working on.

Creating entity embeddings from NNs and passing them to other downstream classifiers definitely is a thing

7

m98789 t1_it8rax1 wrote

This was a popular approach early on to use a DNN essentially as a feature extractor, and then providing those features to a sophisticated classifier separately, such as a SVM. E.g., separate the process into two distinct steps.

Generally speaking, this approach fell out of favor when it became evident that “end to end” learning performed better. That is, you don’t just learn a feature extractor but also the classifier, together.

As the E2E approach took favor, folks did try to include more sophisticated approaches to the last layers to simulate various kinds of classical classifiers. Ultimately, it was found that a simple approach for the final layers yielded just as performant results.

7

MichelMED10 OP t1_it9d28r wrote

Yes but the idea is if we train the model end to end. In the end if we feed the extracted features to an XGboost, in the worst case the XGboost will perfrom as good as the classifier. And we will still trained our model end to end prior the the freezing of the encoder. So theorically it seems better

−5

SlowFourierT198 t1_itbv7y2 wrote

XGBOOST is not strictly better than NNs in classification. I can guarantee you as I worked on a classification dataset where a NN performed significantly better then XGBOOST. While your statement will be correct for small sample datasets I am pretty sure, that it will not hold for large datasets with complicated features

1

dluther93 t1_it982ce wrote

I've done this before for multi-modal classification tasks.
Train CNN end-to-end, take the layer before last for a dense vector of embeddings.
use that dense vector as a feature set alongside my tabular data in an XGBoost or Catboost model. Boom

Easy to do on a local machine, cumbersome to try and reliably deploy this model though.

3

Bonsanto t1_ita8okg wrote

Do you have any example/implementation at hand?

1

dluther93 t1_itbglnd wrote

Nothing I’m able to pass off publicly unfortunately. Just build a cnn, then concat the outputs into your original dataset :)

1

abstract000 t1_itc0l85 wrote

You really had a significant improvement? I tried this and it performed poorly, but maybe it was just the dataset. BTW did you test that on a Kaggle competition?

1

dluther93 t1_itc1ldm wrote

It was significant to us. Our base case is the xgboost model with tabular data only. We were looking at ways to augment our tabular performance, not improve imaging performance. It was a method of feature engineering for the problem.

1

abstract000 t1_itc3cw1 wrote

OK I will try this next time I work on tabular data.

1

Kitchen-Ad-5566 t1_itanl4e wrote

You don’t see it in papers because it’s an already well-known trick to try, not so interesting for a publication, and it might work well for a certain problem/dataset and might not for another one. You can probably see similar things in application-oriented publications, where they would be trying to find the optimal results for a certain problem/dataset.

2