Bulky_Highlight_3352 t1_jbyxv8s wrote on March 12, 2023 at 8:20 PM

#2,216,109

I've tried Aleph's playground and mostly saw it generate complete garbage. Not sure how they will solve any of the ChatGPT's problems.

Icy-Curve2747 t1_jbyyux1 wrote on March 12, 2023 at 8:27 PM

#2,216,156

Very interesting, thanks for sharing. I can’t wait for explainable AI to catch up with the rest of ML

currentscurrents t1_jbz1hbw wrote on March 12, 2023 at 8:46 PM

#2,216,272

TL;DR they suppress one token at a time and map how it affects the cross-entropy loss. Tokens which have a big impact must have been important for the output. It reminds me of older techniques for image explainability.

Paper link: https://arxiv.org/abs/2301.08110

bmrheijligers t1_jbz402l wrote on March 12, 2023 at 9:04 PM

#2,216,368

AnAtMan

pyepyepie t1_jbz57hd wrote on March 12, 2023 at 9:12 PM

#2,216,422

Replying to currentscurrents (#2,216,272)

To be fair the paper looks interesting, the news title is garbage but it's not the fault of the authors who did a pretty cool job. Anyway, it seems like a nice application of a very well-known idea, which is cool.

By the way, is measuring the perturbation influence on the loss a common idea? Because I am mostly aware of using it to see how the regression value or class probabilities change - and the perturbation is done on the inputs, not params (edit ** incorrect, they do the perturbation on the inputs).

edit: "We follow the results of the studies [Koh and Liang, 2017; Bis et al., 2021] to approximate the perturbation effect directly through the model’s parameters when executing Leaving-One-Out experiments on the input. The influence function estimating the perturbation of an input z is then derived as:" - seems like I misunderstood it due to their notation. Seems like a pretty regular method.

ShowerVagina t1_jbz680l wrote on March 12, 2023 at 9:20 PM

#2,216,481

Replying to currentscurrents (#2,216,272)

Can you explain this like I'm 5?

pyepyepie t1_jbz766k wrote on March 12, 2023 at 9:26 PM

#2,216,517

Replying to ShowerVagina (#2,216,481)

Correct me if I am wrong, I did to read the whole paper yet - they mask tokens out and see how it changes the loss, they do some trick that I had no energy to look for. It's not going to change the world. It's similar to this one: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html

ShowerVagina t1_jbz7ts9 wrote on March 12, 2023 at 9:31 PM

#2,216,550

Replying to pyepyepie (#2,216,517)

So how would this affect real world usage?

fastglow t1_jbz854s wrote on March 12, 2023 at 9:33 PM

#2,216,569

This post does not specify the problem with ChatGPT as it purports to, nor does it solve anything.

pyepyepie t1_jbz9363 wrote on March 12, 2023 at 9:40 PM

#2,216,602

Replying to ShowerVagina (#2,216,550)

The TLDR of XAI is that you can "see" (or think you see) how features influence the decisions of your models. For example, if you have a sentence "buy this pill to get skinny!!!!!" and you try to classify if it's spam, the "!!!" might be marked as very spammy. You often find it by masking the "!!!" and seeing that now the message is maybe not classified as spam (often you look at the output dist). Of course, there are many more sophisticated methods to do so and there is a lot of impressive work, but it's the TLDR.

There are many explainability methods, it's a very hot topic. It might be yet another paper, or not. The title makes no sense at all, there are gazillion explainability methods for transformers. I am sorry, I did not read all of the paper so I should probably not talk too much. It just looks very similar to things I already saw.

Generally speaking, you should start using XAI if you do ML, if you do NLP - look into the proven methods, e.g. SHAP and LIME first. If you work with trees, look into TreeSHAP. If you work with vision, look into what I shared here. Sorry if my preceding comments were inaccurate but I hope I still provide some value here :).