Viewing a single comment thread. View all comments

dojoteef t1_j1v4j4r wrote

You don't need to tell them one is AI or model generated. Could be two model generated texts or two human written texts. Merely having another text for comparison allows people to better frame the task since otherwise they essentially need to imagine a baseline for comparison, which people rarely do.

−3

respeckKnuckles t1_j1v66iq wrote

You say it allows them to "better frame the task", but is your goal to have them maximize their accuracy, or to capture how well they can distinguish AI from human text in real-world conditions? If the latter, then this establishing of a "baseline" leads to a task with questionable ecological validity.

7

Ulfgardleo t1_j1vcqri wrote

  1. you are asking humans to solve this task untrained, which is not the same as the human ability to distinguish the two.

  2. you are then also making it harder by phrasing the task in a way that makes it difficult for the human brain to solve it.

2

respeckKnuckles t1_j1vempm wrote

> you are asking humans to solve this task untrained, which is not the same as the human ability to distinguish the two.

This is exactly my point. There are two different research questions being addressed by the two different methods. One needs to be aware of which they're addressing.

> you are then also making it harder by phrasing the task in a way that makes it difficult for the human brain to solve it.

In studying human reasoning, sometimes this is exactly what you want. In fact, for some work in studying Type 1 vs. Type 2 reasoning, we actually make the task harder (e.g. by adding WM or attentional constraints) in order to elicit certain types of reasoning. You want to see how they will perform in conditions where they're not given help. Not every study is about how to maximize human performance. Again, you need to be aware of what your study design is actually meant to do.

7

Ulfgardleo t1_j1vjc6q wrote

I don't think this is one of those cases. The question we want to answer is whether texts are good enough that humans will not pick up on it. Making the task as hard as possible for humans is not indicative of real world performance once people get presented these texts more regularly.

1