ianitic t1_je0mjqx wrote on March 28, 2023 at 3:42 PM

Reply to comment by cegras in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-

Oh I haven't tested this on textbooks, but I have asked chatGPT to give me pages of a novel and it did word for word. I suspect it had to have trained on PDFs? I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

It is obvious when a book is a part of its training set or not though based on the above test.

currentscurrents t1_je12d3k wrote on March 28, 2023 at 5:22 PM

Nobody knows exactly what it was trained on, but there exist several datasets of published books.

>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.

They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.

mcilrain t1_je1a7cl wrote on March 28, 2023 at 6:11 PM

Current tech could be used to allow you to ask an AI assistant to read you a book.

DreamWithinAMatrix t1_je3c6kl wrote on March 29, 2023 at 2:39 AM

There was that time Google was taken to court for scanning and indexing books for Google Books or whatever and Google won:

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.