Comments

You must log in or register to comment.

boyetosekuji t1_iuj0v1n wrote

great news, how much would it cost to train

12

MostlyRocketScience t1_iujpd9l wrote

I'm excited for open source code generation models. So I won't have to pay Github every month. And if this is a bigger dataset and permissively licensed, this means there will be no chance that it will generate copyrighted code.

16

nomadiclizard t1_iujxwax wrote

I'm curious which 'permissive' licenses have terms permitting the use of the code as training data in machine learning algorithms. Are we assuming licenses which allow code to be modified/redistributed, also include this right?

What if a commercial for-profit company trains on a lot of copyleft code, then commercialises the result and refuses to release the model? Is that ethical?

39

elcomet t1_iujync7 wrote

> What if a commercial for-profit company trains on a lot of copyleft code, then commercialises the result and refuses to release the model? Is that ethical?

I would assume this is the same as licences which allow to use the code to commercialise software when using it

19

I_draw_boxes t1_iuk27ck wrote

Permissive licenses basically allow the user to do anything they want with the code save sue the author.

>What if a commercial for-profit company trains on a lot of copyleft code, then commercialises the result and refuses to release the model?

That probably isn't legal, but copyleft licenses are not permission licenses and are not included in this dataset for that reason.

15

sitmo t1_iukbw82 wrote

As an open-source code writer this feels like an abuse of my contributions, they are monetizing on my code, building a brand out of other people's content, and cash big time with a Stock IPO in the near future.

In order to take back control I decided to change my naive flower-power-every-body-happy MIT license projects to the more protective GPL3

−12