So, are you using Google search? Your argument of "I don't trust it with my data...

imiric · on March 25, 2023

That's not a great comparison, as privacy-focused search engines do exist (Kagi, DDG to an extent, et al.). And you can still use mainstream search engines with frontends like SearX. Most of my privacy concerns are with adtech corporations tying my search terms to my profile, that they later sell to advertisers, and whoever else on shady data broker markets. I don't want to be complicit with my data being exploited to later manipulate me, nor do I want to make them money in exchange of a "free" service.

These are partly the same reasons I don't voluntarily use proprietary services at all. I don't want to train someone else's model, nor help them build a profile on me. Even if they're not involved in adtech—a rarity nowadays—you have no guarantees of how this data will be used in the future.

For AI tools, there's currently no alternative. Large corporations are building silos around their models, and by using their services you're giving them perpetual access to your inputs. Even if they later comply with data protection laws and allow you to delete your profile, they won't "untrain" their models, so your data is still in there somewhere. Considering that we're currently talking about 32,000 tokens worth of input, and soon people uploading their whole codebases to it, that's an unprecedented amount of data they can learn from, instead of what they can gather from web search terms. No wonder adtech is salivating at opening up the firehose for you to feed them even more data.

The use cases of AI tools are also different, and more personal. While we use search engines for looking things up on the web, and some personal information can be extracted from that, LLMs are used in a conversational way, and often involve much more personal information. It's an entirely different ballpark of privacy concerns.

JW_00000 · on March 25, 2023

I think it's more about personal data being used for training.

I may use Google to look up if that slight itch I feel is a symptom of cancer (I'm exaggerating), and I store mails with personal details, my calendar, and messages on Google. But I also assume they're not using those texts to train an AI.

When you enter a code snippet or a personal question in ChatGPT, and press the little thumbs up/down next to the answer, you're adding your data to a training set. The next generation of the model might regurgitate that text verbatim.

throwthrowuknow · on March 25, 2023

Right, because Google doesn’t use ML or your data for marketing and advertising.

Is your concern simply that it might spit out the same thing you typed in? That’s highly unlikely unless you and thousands of other people type in exactly the same thing. I don’t see how that’s anymore worrisome than Google having all of your documents and email on its servers.

fanagra32 · on March 25, 2023

Maybe they are ok with Google seeing search terms but not with Google seeing their companies code.

ornornor · on March 25, 2023

https://en.m.wikipedia.org/wiki/Whataboutism

PartiallyTyped · on March 25, 2023

This is not whataboutism.

GP identifies an action that analogous and holds certain properties as the original action, in the process illuminating how the issues of approach A exist in approach B.

ornornor · on March 25, 2023

But it is. "You're concerned about your data when using ChatGPT but you're probably using Google so your concerns are invalid"

PartiallyTyped · on March 25, 2023

They are not expressing that the concerns are invalid, they are expressing that one is held onto a higher standard than the other.

ornornor · on March 25, 2023

Agree to disagree then.

hsjqllzlfkf · on March 25, 2023

> They are not expressing that the concerns are invalid

> Agree to disagree then

Since you're discussing what I was expressing, I can tell you who's correct, since I know what I was expressing. And you're wrong. I wasn't expressing that the concerns are invalid. They're very valid.

Instead, what I was expressing was that OP doesn't actually have those concerns, not that they're invalid.