currentscurrents t1_jbz1hbw wrote on March 12, 2023 at 8:46 PM

TL;DR they suppress one token at a time and map how it affects the cross-entropy loss. Tokens which have a big impact must have been important for the output. It reminds me of older techniques for image explainability.

Paper link: https://arxiv.org/abs/2301.08110

pyepyepie t1_jbz57hd wrote on March 12, 2023 at 9:12 PM

To be fair the paper looks interesting, the news title is garbage but it's not the fault of the authors who did a pretty cool job. Anyway, it seems like a nice application of a very well-known idea, which is cool.

By the way, is measuring the perturbation influence on the loss a common idea? Because I am mostly aware of using it to see how the regression value or class probabilities change - and the perturbation is done on the inputs, not params (edit ** incorrect, they do the perturbation on the inputs).

edit: "We follow the results of the studies [Koh and Liang, 2017; Bis et al., 2021] to approximate the perturbation effect directly through the model’s parameters when executing Leaving-One-Out experiments on the input. The influence function estimating the perturbation of an input z is then derived as:" - seems like I misunderstood it due to their notation. Seems like a pretty regular method.

ShowerVagina t1_jbz680l wrote on March 12, 2023 at 9:20 PM

Can you explain this like I'm 5?

pyepyepie t1_jbz766k wrote on March 12, 2023 at 9:26 PM

Correct me if I am wrong, I did to read the whole paper yet - they mask tokens out and see how it changes the loss, they do some trick that I had no energy to look for. It's not going to change the world. It's similar to this one: https://christophm.github.io/interpretable-ml-book/pixel-attribution.html

ShowerVagina t1_jbz7ts9 wrote on March 12, 2023 at 9:31 PM

So how would this affect real world usage?

pyepyepie t1_jbz9363 wrote on March 12, 2023 at 9:40 PM

The TLDR of XAI is that you can "see" (or think you see) how features influence the decisions of your models. For example, if you have a sentence "buy this pill to get skinny!!!!!" and you try to classify if it's spam, the "!!!" might be marked as very spammy. You often find it by masking the "!!!" and seeing that now the message is maybe not classified as spam (often you look at the output dist). Of course, there are many more sophisticated methods to do so and there is a lot of impressive work, but it's the TLDR.

There are many explainability methods, it's a very hot topic. It might be yet another paper, or not. The title makes no sense at all, there are gazillion explainability methods for transformers. I am sorry, I did not read all of the paper so I should probably not talk too much. It just looks very similar to things I already saw.

Generally speaking, you should start using XAI if you do ML, if you do NLP - look into the proven methods, e.g. SHAP and LIME first. If you work with trees, look into TreeSHAP. If you work with vision, look into what I shared here. Sorry if my preceding comments were inaccurate but I hope I still provide some value here :).

ShowerVagina t1_jbzafks wrote on March 12, 2023 at 9:49 PM

That makes sense. Thank you for the write up!

pyepyepie t1_jbzax1m wrote on March 12, 2023 at 9:52 PM

Happy to help :)