[deleted] t1_j0fnk3v wrote on December 16, 2022 at 7:56 AM

#942,768

[deleted]

curiousshortguy t1_j0gegtd wrote on December 16, 2022 at 1:28 PM

#943,968

I think there's some interest in learning optimal decision trees in the community, as well as robust learning methods under different kinds of adversarial influence. They're less open problems and more areas of potential improvement though.

Featureless_Bug t1_j0guhqs wrote on December 16, 2022 at 3:27 PM

#945,013

Replying to [deleted] (#942,768)

This is a joke and not a paper, tbh. "Therefore, for continuous activations, the neural network equivalent tree immediately becomes infinite width even for a single filter," - the person who wrote this has no idea what infinity actually means, and that a decision tree with infinite width is by definition not a decision tree anymore. And they try to sell it as something that would increase explainability of neural networks, just wow. Is there a way to request removal of a "paper" from arxiv?

BrisklyBrusque t1_j0hx440 wrote on December 16, 2022 at 7:39 PM

#947,222

Yes, lots. For example, in 2019 a paper introduced a new split rule for categorical variables that reduces computational complexity.

https://peerj.com/articles/6339/

A lot of researchers are also exploring adjacent tree ensembles such as extremely randomized trees (2006) and Bayesian additive regression trees (2008). The former is very similar to random forests. There is a strong possibility other tree ensembles have yet to be discovered!

If you’re a fan of computer science / optimized code, there is a great deal of research concerning making tree models faster. The ranger library in R was introduced as an improvement on the randomForest package. There is also interest in making random forests scale up to millions of variables, to deal with genetics data.

Hummingbird is a Microsoft project that seeks to refactor common machine learning methods using tensor algebra, so those methods can take advantage of GPUs. I don’t know if they got around to random forests yet.

Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.

AdFew4357 t1_j0i4mo2 wrote on December 16, 2022 at 8:30 PM

#947,615

BARTS (Bayesian additive regression trees). Also just in general ensemble learning is a useful approach to advance modeling in other areas with different kinds of data. For example, the area im reading about now which is time series classification has a ton of literature on time series models using ensemble learners under the hood. For example check out models like Arsenal, ProxmityForest, or other ensemble based methods for time series classification

chaosmosis t1_j0i51ka wrote on December 16, 2022 at 8:33 PM

#947,643

Replying to BrisklyBrusque (#947,222)

> Random forests raise a lot of questions about the relationship between ensemble diversity and ensemble accuracy, about which there are many mysteries.

By way of Jensen's inequality, there's a generalization of the bias-variance decomposition of mean-squared error that holds for all convex loss functions, see the paper Generalized Negative Correlation Learning that came out in 2021. From there, you can view linear averaging of model outputs as a special case of the method of control variates, where their diversity matters insofar as it's harnessed to reduce error due to variance. I think control variates give us a unified theoretical framework for investigating ensembles. They've got all sorts of fun generalizations like nonlinear control variates that are as yet completely unexplored in the machine learning literature.

In other words, you should diversify ensembles in exactly the same way as you should diversify a portfolio of financial investments according to optimal portfolio theory. See also Phillip Tetlock's work on his "extremizing algorithm" for an application of similar ideas to human forecasting competitions.

The main outstanding question with respect to ensembles, to my mind, is not how to make the most use of a collection of models, but when and whether to invest computational effort into running multiple models in parallel and optimizing the relationships between their errors rather than into training a bigger model.

BrisklyBrusque t1_j0i9ldc wrote on December 16, 2022 at 9:04 PM

#947,899

Replying to chaosmosis (#947,643)

Thanks for the suggestion.

chaosmosis t1_j0ib3ja wrote on December 16, 2022 at 9:14 PM

#947,995

Replying to BrisklyBrusque (#947,899)

No problem at all. I'm leaving ML research for at least the next couple years, and I want my best ideas to get adopted by others. I figured out all of the above in a three month summer internship in 2020 and nobody there cared because it couldn't immediately be used to blow things up more effectively, which was incredibly disappointing.

As far as I can tell, nobody but me and this one footnote in an obscure economics paper I've forgotten the citation of has ever noted that ensembles and financial portfolios deal with the same problem if you cast both in terms of control variates. In theory, bridging between the two by way of control variates should allow for stealing lots and lots of ideas from finance literature for ML papers. Would really like seeing someone make something of the connection someday.

chaosmosis t1_j0icgvf wrote on December 16, 2022 at 9:23 PM

#948,096

Replying to BrisklyBrusque (#947,899)

As an example, imagine that Bob and Susan are estimating the height of a dinosaur and Bob makes errors that are exaggerated versions of Susan's, so if Susan underestimates its height by ten feet then Bob underestimates it by twenty, or if Susan overestimates its height by thirty feet then Bob overestimates it by forty. You can "artificially construct" a new prediction to average with Susan's predictions by taking the difference between her prediction and Bob's, flipping its sign, and adding it to her prediction. Then you conduct traditional linear averaging on the constructed prediction with Susan's prediction.

Visually, you can think about it as if normal averaging draws a straight line between two different models' individual outputs in R^n , then chooses some point between them, while control variates extend that line further in both directions and allow you to choose a point that's more extreme.

It's a little more complicated with more predictors and when issuing predictions in higher dimensions than in one dimension, but not by much. Intuitively, you have to avoid "overcounting" certain relationships when you're trying to build a flipped predictor. This is why the financial portfolio framework is helpful; they're already used to thinking about correlations between lots of different investments.

The tl;dr version is, you want models with errors that balance each other out.

WigglyHypersurface t1_j0igsnc wrote on December 16, 2022 at 9:54 PM

#948,361

There is recent work on causal forests which also reinterprets forests as a kernel method. The same group also came up with local linear forests, which can help in cases where smoothness and/or extrapolation is desired.

https://arxiv.org/pdf/1510.04342 https://arxiv.org/pdf/1807.11408

[deleted] t1_j0itwb1 wrote on December 16, 2022 at 11:30 PM

#949,234