kawin_e

kawin_e t1_jdxz4bh wrote

The Stanford Human Preferences dataset (SHP): https://huggingface.co/datasets/stanfordnlp/SHP

It contains pairwise preferences for posts (so tuples (post, response_A, response B)), but you can certainly turn it into an instruction dataset by only considering responses that meet a certain cut-off. I'm currently aware of one academic/industry group that is already doing this.

2