Submitted by vyasnikhil96 t3_1190lw8 in MachineLearning
Disastrous_Elk_6375 t1_j9nrm6w wrote
Reply to comment by currentscurrents in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
> It does memorize short snippets in some cases (especially when a snippet is repeated many times in training data)
And, to be fair, how can it not? How many different ways can you write a simple for loop to print some objects, or match a regex, call an API, and so on?
visarga t1_j9qxgt2 wrote
If you go down to individual words or characters, everything is reused. If you go up, usually a random 10 word snippet is nowhere else in the internet. But boilerplate and basic things might be replicated in all shapes and forms.
Viewing a single comment thread. View all comments