liquiddandruff t1_iz3c7zc wrote on December 6, 2022 at 3:48 AM

Uh, how about all those guides and blogs on any number of command line utilities?

VitaminD263 t1_iz3p5wq wrote on December 6, 2022 at 5:53 AM

There's still not enough data. I believe it must have had access to some environment in which it could have executed commands. Compare how well ChatGPT performs on computing stuff and how badly it performs on other topics. E.g. is there significantly more data available on the web on just some specific kind of shell command (note that it generates the correct shell output for any kind of input) compared to say blog posts on real analysis? If you try to query chatgpt for its understanding of real analysis definitions it will abysmally fail, but there should be way more text available on that topic than some random shell command and definitely not enough data for any kind of input. I really don't believe that current generation language models are capable of learning the semantics of terminal commands.

vino_and_data t1_iz957hr wrote on December 7, 2022 at 11:48 AM

OMG. >>I believe it must have had access to some environment in which it could have executed commands.

Calm down maybe??!

VitaminD263 t1_izax31y wrote on December 7, 2022 at 7:41 PM

Calm down?

It's not as if I'm the only one claiming that. https://twitter.com/yoavgo/status/1599886211656491008

baconninja0 t1_iz4uwd3 wrote on December 6, 2022 at 2:26 PM

The shell commands found on websites will probably be more similar site to site than non-code topics, especially since I’m pretty sure a lot of code content farm sites just steal each other’s code anyways. This makes it much easier for the bot to learn than other topics because it sees the exact same command so many times, instead of just similar commands (which it has to learn are similar)

VitaminD263 t1_iz4w5i9 wrote on December 6, 2022 at 2:36 PM

Yea or you know you could just make up input, let it execute the code and get the output to create your training data...

PromiseChain t1_iz90i6y wrote on December 7, 2022 at 10:45 AM

You’re just demonstrating you don’t understand this technology. This is not piping anything into a terminal anywhere. There is no 3080 that actually got installed by OpenAI to provide this data, they explain their data transparently. This is modeled from stackoverflow answers most likely.

VitaminD263 t1_izb0tz8 wrote on December 7, 2022 at 8:06 PM

I'm not saying that it's using a terminal behind the scenes, I'm saying the data used to train this was likely generated by using an execution environment. There are serious NLP researchers believing this as well: https://twitter.com/yoavgo/status/1599886211656491008

2b100k t1_izri10k wrote on December 11, 2022 at 8:10 AM

Agree with you here, there are resources available online but I wouldn't think it's enough to train an AI on it's own.

I am very impressed by chatgpt, it immediately gave me the correct answer on how to resolve an issue when I accidentally skipped a step in installing Gentoo linux. It also gives really detailed answers on troubleshooting all sorts of linux programs.

It's hard to explain, it feels too accurate a lot of the time for answers that would have to be trained from relatively small amounts of data (for an AI)

[D] OpenAI’s ChatGPT is unbelievable good in telling stories!

VitaminD263 t1_iz0dmzt wrote on December 5, 2022 at 3:27 PM