matus_pikuliak OP t1_je0am88 wrote on March 28, 2023 at 2:24 PM

Reply to comment by rshah4 in [P] ChatGPT Survey: Performance on NLP datasets by matus_pikuliak

Only some papers used few-shot prompting, and it was usually beneficial and sometimes it helped to beat the SOTA.

Yeah, OpenAI definitely does not care about these benchmarks, but I think they are still useful to see how capable the models are. I find it hard to imagine that the models could be used in some applications if they can not reliably do the even simple tasks evaluated by these benchmarks.