Submitted by Balance- t3_124eyso in MachineLearning
ianitic t1_je0mjqx wrote
Reply to comment by cegras in [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
Oh I haven't tested this on textbooks, but I have asked chatGPT to give me pages of a novel and it did word for word. I suspect it had to have trained on PDFs? I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.
It is obvious when a book is a part of its training set or not though based on the above test.
currentscurrents t1_je12d3k wrote
Nobody knows exactly what it was trained on, but there exist several datasets of published books.
>I'm highly surprised I haven't seen any news of authors/publishers suing yet tbh.
They still might. But they don't have a strong motivation; it doesn't really directly impact their revenue because nobody's going to sit in the chatgpt window and read a 300-page book one prompt at a time.
mcilrain t1_je1a7cl wrote
Current tech could be used to allow you to ask an AI assistant to read you a book.
DreamWithinAMatrix t1_je3c6kl wrote
There was that time Google was taken to court for scanning and indexing books for Google Books or whatever and Google won:
https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc.
Viewing a single comment thread. View all comments