Viewing a single comment thread. View all comments

currentscurrents t1_jbz1hbw wrote

TL;DR they suppress one token at a time and map how it affects the cross-entropy loss. Tokens which have a big impact must have been important for the output. It reminds me of older techniques for image explainability.

Paper link: https://arxiv.org/abs/2301.08110

4

pyepyepie t1_jbz57hd wrote

To be fair the paper looks interesting, the news title is garbage but it's not the fault of the authors who did a pretty cool job. Anyway, it seems like a nice application of a very well-known idea, which is cool.

By the way, is measuring the perturbation influence on the loss a common idea? Because I am mostly aware of using it to see how the regression value or class probabilities change - and the perturbation is done on the inputs, not params (edit ** incorrect, they do the perturbation on the inputs).

edit: "We follow the results of the studies [Koh and Liang, 2017; Bis et al., 2021] to approximate the perturbation effect directly through the model’s parameters when executing Leaving-One-Out experiments on the input. The influence function estimating the perturbation  of an input z is then derived as:" - seems like I misunderstood it due to their notation. Seems like a pretty regular method.

1

ShowerVagina t1_jbz680l wrote

Can you explain this like I'm 5?

1

pyepyepie t1_jbz766k wrote

Correct me if I am wrong, I did to read the whole paper yet - they mask tokens out and see how it changes the loss, they do some trick that I had no energy to look for. It's not going to change the world. It's similar to this one: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html

1

ShowerVagina t1_jbz7ts9 wrote

So how would this affect real world usage?

1

pyepyepie t1_jbz9363 wrote

The TLDR of XAI is that you can "see" (or think you see) how features influence the decisions of your models. For example, if you have a sentence "buy this pill to get skinny!!!!!" and you try to classify if it's spam, the "!!!" might be marked as very spammy. You often find it by masking the "!!!" and seeing that now the message is maybe not classified as spam (often you look at the output dist). Of course, there are many more sophisticated methods to do so and there is a lot of impressive work, but it's the TLDR.

There are many explainability methods, it's a very hot topic. It might be yet another paper, or not. The title makes no sense at all, there are gazillion explainability methods for transformers. I am sorry, I did not read all of the paper so I should probably not talk too much. It just looks very similar to things I already saw.

Generally speaking, you should start using XAI if you do ML, if you do NLP - look into the proven methods, e.g. SHAP and LIME first. If you work with trees, look into TreeSHAP. If you work with vision, look into what I shared here. Sorry if my preceding comments were inaccurate but I hope I still provide some value here :).

2