Submitted by vyasnikhil96 t3_1190lw8 in MachineLearning
1973DodgeChallenger t1_j9kz2wb wrote
It's an interesting problem... I ask ChatGPT for code, it spits out something that it mined from GitHub. Microsoft didn't just by github to spend money. They knew it was one of the best, if not the best, source for AI code mining. So Yoink! I set all of my github projects to private but I don't know if that helps. The user agreement may be structured to "anonymously mine" code private or otherwise.
So ya...if you store your code on GitHub...I'd bet a dollar Microsoft/OpenAi will be mining it and eventually burp it out in Chat GPT.
currentscurrents t1_j9ld7we wrote
>it spits out something that it mined from GitHub.
Having used GitHub Copilot a bunch, it's doing a lot more than just mining snippets. It learns patterns and can use them to creatively solve new problems.
It does memorize short snippets in some cases (especially when a snippet is repeated many times in training data), but in the general case it comes up with new code to match your specifications.
>I set all of my github projects to private but I don't know if that helps.
Honestly, kinda selfish. We'll all benefit from these powerful new tools and I don't appreciate you trying to hamper them.
Disastrous_Elk_6375 t1_j9nrm6w wrote
> It does memorize short snippets in some cases (especially when a snippet is repeated many times in training data)
And, to be fair, how can it not? How many different ways can you write a simple for loop to print some objects, or match a regex, call an API, and so on?
visarga t1_j9qxgt2 wrote
If you go down to individual words or characters, everything is reused. If you go up, usually a random 10 word snippet is nowhere else in the internet. But boilerplate and basic things might be replicated in all shapes and forms.
1973DodgeChallenger t1_j9lgjq4 wrote
Just for example, you work at a company that has spent millions investing in a proprietary software product. You're saying everyone should have access to the source code, through Chat GPT or otherwise?
Can I have all of your and your companies source code please. I'll send you my email address.
[deleted] t1_j9pa2ae wrote
[deleted]
currentscurrents t1_j9pb0by wrote
You had your source code public until you got freaked out by ChatGPT, so you were entirely okay with publishing it for everyone to see.
ChatGPT doesn't even allow direct access to source code, it's just learning how to solve problems using existing source code as training examples.
visarga t1_j9qxt97 wrote
Well, you can't. Because it is really hard to extract any verbatim replications of training data from chatGPT. You need to put a considerable portion from the work as prompt, to put the model in the right place, and then sample your way ahead. Doesn't work for most stuff, like 99%.
visarga t1_j9qwzlf wrote
> Honestly, kinda selfish. We'll all benefit from these powerful new tools and I don't appreciate you trying to hamper them.
They took their little pebble from the beach back home, that'll show them.
Viewing a single comment thread. View all comments