Your argument of "I don't trust it with my data and won't until you can self host" should apply to google search as well, no?
And alternative take is that for whatever reason you've decided you didn't want to use new tools, a posteriori created an argument to justify that, and haven't realized the same argument applies to your old tools.
That's not a great comparison, as privacy-focused search engines do exist (Kagi, DDG to an extent, et al.). And you can still use mainstream search engines with frontends like SearX. Most of my privacy concerns are with adtech corporations tying my search terms to my profile, that they later sell to advertisers, and whoever else on shady data broker markets. I don't want to be complicit with my data being exploited to later manipulate me, nor do I want to make them money in exchange of a "free" service.
These are partly the same reasons I don't voluntarily use proprietary services at all. I don't want to train someone else's model, nor help them build a profile on me. Even if they're not involved in adtech—a rarity nowadays—you have no guarantees of how this data will be used in the future.
For AI tools, there's currently no alternative. Large corporations are building silos around their models, and by using their services you're giving them perpetual access to your inputs. Even if they later comply with data protection laws and allow you to delete your profile, they won't "untrain" their models, so your data is still in there somewhere. Considering that we're currently talking about 32,000 tokens worth of input, and soon people uploading their whole codebases to it, that's an unprecedented amount of data they can learn from, instead of what they can gather from web search terms. No wonder adtech is salivating at opening up the firehose for you to feed them even more data.
The use cases of AI tools are also different, and more personal. While we use search engines for looking things up on the web, and some personal information can be extracted from that, LLMs are used in a conversational way, and often involve much more personal information. It's an entirely different ballpark of privacy concerns.
I think it's more about personal data being used for training.
I may use Google to look up if that slight itch I feel is a symptom of cancer (I'm exaggerating), and I store mails with personal details, my calendar, and messages on Google. But I also assume they're not using those texts to train an AI.
When you enter a code snippet or a personal question in ChatGPT, and press the little thumbs up/down next to the answer, you're adding your data to a training set. The next generation of the model might regurgitate that text verbatim.
Right, because Google doesn’t use ML or your data for marketing and advertising.
Is your concern simply that it might spit out the same thing you typed in? That’s highly unlikely unless you and thousands of other people type in exactly the same thing. I don’t see how that’s anymore worrisome than Google having all of your documents and email on its servers.
GP identifies an action that analogous and holds certain properties as the original action, in the process illuminating how the issues of approach A exist in approach B.
> They are not expressing that the concerns are invalid
> Agree to disagree then
Since you're discussing what I was expressing, I can tell you who's correct, since I know what I was expressing. And you're wrong. I wasn't expressing that the concerns are invalid. They're very valid.
Instead, what I was expressing was that OP doesn't actually have those concerns, not that they're invalid.
Your argument of "I don't trust it with my data and won't until you can self host" should apply to google search as well, no?
And alternative take is that for whatever reason you've decided you didn't want to use new tools, a posteriori created an argument to justify that, and haven't realized the same argument applies to your old tools.