Only some papers used few-shot prompting, and it was usually beneficial and sometimes it helped to beat the SOTA.
Yeah, OpenAI definitely does not care about these benchmarks, but I think they are still useful to see how capable the models are. I find it hard to imagine that the models could be used in some applications if they can not reliably do the even simple tasks evaluated by these benchmarks.
matus_pikuliak OP t1_je0am88 wrote
Reply to comment by rshah4 in [P] ChatGPT Survey: Performance on NLP datasets by matus_pikuliak
Only some papers used few-shot prompting, and it was usually beneficial and sometimes it helped to beat the SOTA.
Yeah, OpenAI definitely does not care about these benchmarks, but I think they are still useful to see how capable the models are. I find it hard to imagine that the models could be used in some applications if they can not reliably do the even simple tasks evaluated by these benchmarks.