Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Every Git commit is likely to contain personal data, in the form of the author’s name and email address usually present in a commit’s metadata. Furthermore, unless GitHub is prohibiting users from submitting personal data via their ToS (which, given the above, would be impractical), the only thing that matters is whether the data in fact contains personal data or not. GitHub cannot just assume that it doesn’t. And processing that data for new purposes requires user consent.


By that logic, you can't use any user input to train an LLM, because what if they decide to write their own name.


Indeed, you can’t unless you have appropriate consent. Which isn’t difficult to obtain if you have clearly defined purposes, but you have to do it.


Since commits aren't code, that's no problem.

The idea that because any piece of code could possibly contain some personal data -- while 99.99% of it doesn't -- that therefore the entirety is PD is not supported by the gdpr. You could as well say any text field anywhere can hypothetically have someone type their name and is thus personal data as well.


The current change applies to all input and output from and to Copilot. This can be used to create profiles about personal preferences, for example.

Personal data is about identifying a person and relating information to that person. A name in an unrelated text field isn’t personal data if you can’t tell the relation between the name and the person who input it, or any surrounding data. The contents of a repository, however, and the interaction with Copilot, can very well help identifying the account holder and their personal data. For example, I might be processing personal health data identifiable as such in a private repository with the help of Copilot.


> This can be used to create profiles about personal preferences

And since it's not, so what?

> I might be processing personal health data identifiable as such in a private repository with the help of Copilot.

That remains nonsense. The fact that you could put PD in a place not intended to hold PD does not magically transform entire datasets into PD because 1 record may contain it. This is covered in a24 (risk-based), and multiple edpb discussions of proportionate measures. There is zero requirement to guarantee anything collected for a different purpose is not misused by the user, assuming you're not encouraging that misuse.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: