BelgianBeerGuy t1_jd2wcva wrote on March 21, 2023 at 1:31 PM

>> Say Google wants to develop an AI that writes books right? So they need a lot of text written by humans to train it right? Well, Google Docs is full of that.

I don’t think Google wants to train an AI on all the crap I wrote in Google docs. Let alone all the spelling and grammar errors people make in those docs.
For an AI that can write books, they probably just use actual books.

Ieris19 t1_jd2wndl wrote on March 21, 2023 at 1:33 PM

Read my other comment. I was more trying to make an example rather than something anyone would actually wanna do.

It was more about illustrating that the use we have for data is not necessarily the same one a company has for it.

Never said it had to be a successful AI, or a good idea

kimbosdurag t1_jd2ukbd wrote on March 21, 2023 at 1:17 PM

Interesting I didn't think about that. It also wouldn't be tough for them to just scrape blogs and news sites, sites like Wattpad that host writing. Lots of data out there for the taking. I'm very curious to see how ai evolves from this point out as a consumer product.

Ieris19 t1_jd2vf0v wrote on March 21, 2023 at 1:23 PM

That was more an example, rather than something they would actually do. Of course there is a million other ways of doing it, but the more control you have over the data, the better you can develop an AI.

I mean, Google’s already mastered AI. People tend to think of natural intelligence (like humans) when they think about the development of AI.

AI is just a learning system. Google recommendations are a complex calculation on everything that you’ve recently interacted with to figure out the thing most likely you’ll want next.

Unless the function is completely static, which I doubt, it would be considered AI, even if it doesn’t attempt to imitate real intelligence. The function is probably given some weights from Google engineers (basically, what results are valuable), and through trial and error, the program is likely learning how to get more clicks. The more data it can process, the more users it has to try with, the faster it can advance.

This is of course pretty simplistic compered to the math behind how this works, but it gets the point across

StateChemist t1_jd3aga6 wrote on March 21, 2023 at 3:11 PM

There is some potential legal issues if you scrape someone else’s data to train your AI. If your users signed the ToS there is no legal recourse so they can use anything.

IdlyOverthink t1_jd373mj wrote on March 21, 2023 at 2:48 PM

This speculation borders on misinformation. According to Google's privacy policy they have no access to content you've saved in Google drive except where required by law, or with your explicit permission.

I'm not trying to defend a big corporation; it's likely that Google is doing other questionably ethical things, but comments like this which point in patently false directions distract from the actually important transgressions.

This is entirely different from a model being trained on public GitHub code; it's not possible without Google making claims that opens themselves up to litigation. (Companies won't do this... There's no reason to make themselves financially vulnerable like that.)

Ieris19 t1_jd39bbu wrote on March 21, 2023 at 3:03 PM

Again, that is mostly an example. Of course, it wouldn’t even be a good idea to begin with.

But people seem confused, so now my question is how would I make it more obvious that is just a simplified example

IdlyOverthink t1_jd3qa8e wrote on March 21, 2023 at 4:52 PM

I think your point is that "Google likes having [the data in the services OP asks about] because it could mine that data."

Per their own site:

>We never use the content you create and store in apps like Drive, Gmail, and Photos for any ads purposes.

Here's their source for how they don't use it for training an ML model either.

I think I would choose a different example to support your point because it implies too many (false) conditions, and in doing so establishes a non-existent precedent.

>Of course, it wouldn’t even be a good idea to begin with.

This still entertains the premise that they'd try, and I think that's what I'm trying to address. It's not that it's not a good idea, it can't be an idea. Google has made commitments to making this impossible, so worrying about the ethics, whether it's worth the cost, whether it's a worthy source, etc is a distraction from the actual possibilities/answers.

As said by others, Google Drive is a gateway drug into Google's other services. Beacuse of that, it can be private even from Google because Google uses data from those other services to train their models, and provide ads data.

For example, when you're working on a research paper, Google can glean your area of study (adjacent to "your interests), your level of education (and more) from your search keywords, the time you're searching, etc.

Ieris19 t1_jd3r6in wrote on March 21, 2023 at 4:58 PM

The fact they they currently don’t need to and the fact that they don’t plan to, doesn’t mean they can’t. They’re sitting on a huge stockpile of stuff they can use, and thinking a company will store my gigabytes of data for years on end and never delete it and not even use it in hopes to get me to use their other products is ridiculous. They’re clearly using it in one way or another, whichever that way turns out to be.

No one expected Microsoft to run the same shit on all their products yet here we are regardless.

alchippa t1_jd2yxlh wrote on March 21, 2023 at 1:50 PM

Suppose I stored some code in GitHub, can anyone else just take it like that? Can GitHub use my code for training without my permission? Or did I already grant permission in their fine print?

Ieris19 t1_jd2z28i wrote on March 21, 2023 at 1:51 PM

That is precisely why they’re getting sued. We’re not sure if it’s legal, ethical or how copyright applies since it’s not using your code but learning from it

ELI5: Why does Google offer all these free services like Google Docs, Sheets, Drive, Sites, Forms, etc. without any ads on them? How does Google benefit from this and why do they invest so much in creating and maintaining them?

kimbosdurag t1_jd2r9tm wrote on March 21, 2023 at 12:50 PM

Ieris19 t1_jd2u1ei wrote on March 21, 2023 at 1:13 PM