Submitted by JohnyWalkerRed t3_123oovw in MachineLearning
lazybottle t1_jec8i0c wrote
Reply to comment by wind_dude in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed
Alpaca is not Apache 2.0
https://huggingface.co/datasets/tatsu-lab/alpaca#licensing-information
> The dataset is available under the Creative Commons NonCommercial (CC BY-NC 4.0).
Edit: I see the source of confusion. https://github.com/tatsu-lab/stanford_alpaca
While the code is released under apache 2.0, the instruct dataset as pointed out by OP is not. One could potentially repro the steps, possibly with human ground truth, and release under a more amenable data license.
wind_dude t1_jec9lb4 wrote
Interesting I didn't realise the dataset was on HF with a different license. The dataset (https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) is also in the code repo which has the apache 2.0 license, so the dataset would be covered by it.
Viewing a single comment thread. View all comments