Viewing a single comment thread. View all comments

Franck_Dernoncourt t1_j9v5wwh wrote

Why SOTA? Did they compare against GPT 3.5? Only comparison against GPT 3.5 I found in the LLaMA paper was:

> Despite the simplicity of the instruction finetuning approach used here, we reach 68.9% on MMLU. LLaMA-I (65B) outperforms on MMLU existing instruction finetuned models of moderate sizes, but are still far from the state-of-the-art, that is 77.4 for GPT code-davinci-002 on MMLU (numbers taken from Iyer et al. (2022)).

3