Submitted by dhruvnigam93 t3_zux783 in MachineLearning

Been an industry data scientist for 6 years in fintech and gaming.
In fintech, I sensed a need for interpretability and robustness. Also, I was not working with a lot of data(~500k observations to train models). Consequently, I got into the habit of building tree-based models by default, specifically xgboost. Used explainability techniques such as shap to explain models.

After moving to online gaming, the scrutiny is less and the scale is far more. I now have the freedom to use deep learning. I need to be able to demonstrate the effectiveness using experiments, but beyond that, do not need explainability at a granular level. Advantages I see with using deep learning-

  1. Custom loss functions - basically any differentiable loss function can be trained on. This has huge advantages when the business goal is not aligned with the loss functions out of the box
  2. Learning Embeddings - The ability to condense features into dense, latent representations which can be used for any number of use cases
  3. Multiple outputs per model - tweaking the architecture

See all this, Deep learning seems to offer a lot of advantages, even if the performance might be similar to tree-based methods. What do you guys think?

31

Comments

You must log in or register to comment.

Naive-Progress4549 t1_j1mniyu wrote

I think you need a comprehensive benchmark, you might find your deep learning model to miserably fail even in a simple scenario. Thus I would recommend to double check the requirements, if your business does not particularly care about some possible bad predictions, then it should be fine, otherwise I would look for some more deterministic models.

13

blablanonymous t1_j1msyd5 wrote

  1. Can’t you also create custom loss functions for XGBoost? I’ve never used it myself but it seems as easy as doing it for an ANN

  2. Is it always trivial to get meaningful embeddings? Does taking the last hidden layer of ANN guarantee that representation will be useful in many different contexts? I think it might need more work than you expect. I’m actually looking for a write up about what conditions needs to be met for a hidden layer to provide meaningful embeddings. I think using a triplet loss intuitively favors that but I’m not sure in general.

  3. XGBoost allows for this too, doesn’t it? The scikit-learn API definitely at least let’s you create MultiOutput models very easily. Granted it can be silly to have multiple models under the hood but whatever works.

Sorry I’m playing devil’s advocate here, but the vibe I’m getting from your post is that you’re excited to finally getting to play with DNN. Which I can relate to. But don’t get lost in that intellectual excitement: at the end of the day, people want you to solve a business problem. The fastest you can get to a good solution the better.

In the end it’s all about trade offs. People who employ you just want the best value for their money.

44

jbreezeai t1_j1n20mh wrote

So one consistent feedback I have got from financial services customers is transparency and explain ability. My understanding is these 2 factors are reason why the adoption of dl in low. Especially when you have tons of audit and regulatory requirements.

4

rshah4 t1_j1nbfkn wrote

I am with you. While I generally favor trees for tabular data, there are some advantages of deep learning as you mentioned. I haven't heard many success stories out of industry for moving away from trees to deep learning, outside of Sean Taylor talking about using deep learning at Lyft. My guess is the extra complexity of using deep learning is probably only useful in a small set of use cases.

Deep learning is probably also useful in multimodal use cases. If people are using deep learning for tabular because of these advantages, I would love to hear about it.

2

zenonu t1_j1o62bm wrote

Don't get attached to any particular method. Train and combine them all (ensemble learning) to get the best possible loss on your validation data you could hope for.

1

yunguta t1_j1rm1lp wrote

I agree with you, another benefit I might add is the scalability to very large data.

To give an example, the team I work on processes point cloud data, which is easily in the millions or billions of points for a single dataset. Random forest is popular to do per-point classification of the point cloud. However, you need distributed computing for large real-world datasets (think large geographic extents), whereas with a simple MLP, you can train the model in batches with a GPU. Multi-GPU is the next step for scaling. Very natural progression here. Inference is still blazing fast here too.

I personally see DL models as a flexible and modular way to build models with the benefit of improving a model through deeper networks, different activation functions, and network modules. If you need to go simple, just use less layers :-)

As others have mentioned, use the tool which fits the problem. But, a neural network does have the advantages you mentioned, and should also be considered.

2

acardosoj t1_j1sibr6 wrote

For me, the only time deep learning was worth all the work was when I had a large unlabelled dataset and I've done pre-training with it before the main task.

1

Agreeable-Ad-7110 t1_j25sh94 wrote

Sorry, I’m not following, what’s the problem here? I had done this but the model inference only had to be run once in the morning every morning so it was a little different. Is there something else I’m missing with the maintenance of the ensemble?

1