Every Git commit is likely to contain personal data, in the form of the author’s...

fph · 2026-03-27T23:10:49 1774653049

By that logic, you can't use any user input to train an LLM, because what if they decide to write their own name.

layer8 · 2026-03-27T23:15:46 1774653346

Indeed, you can’t unless you have appropriate consent. Which isn’t difficult to obtain if you have clearly defined purposes, but you have to do it.

x0x0 · 2026-03-28T00:02:19 1774656139

Since commits aren't code, that's no problem.

The idea that because any piece of code could possibly contain some personal data -- while 99.99% of it doesn't -- that therefore the entirety is PD is not supported by the gdpr. You could as well say any text field anywhere can hypothetically have someone type their name and is thus personal data as well.

layer8 · 2026-03-28T00:59:28 1774659568

The current change applies to all input and output from and to Copilot. This can be used to create profiles about personal preferences, for example.

Personal data is about identifying a person and relating information to that person. A name in an unrelated text field isn’t personal data if you can’t tell the relation between the name and the person who input it, or any surrounding data. The contents of a repository, however, and the interaction with Copilot, can very well help identifying the account holder and their personal data. For example, I might be processing personal health data identifiable as such in a private repository with the help of Copilot.

x0x0 · 2026-03-28T20:02:59 1774728179

> This can be used to create profiles about personal preferences

And since it's not, so what?

> I might be processing personal health data identifiable as such in a private repository with the help of Copilot.

That remains nonsense. The fact that you could put PD in a place not intended to hold PD does not magically transform entire datasets into PD because 1 record may contain it. This is covered in a24 (risk-based), and multiple edpb discussions of proportionate measures. There is zero requirement to guarantee anything collected for a different purpose is not misused by the user, assuming you're not encouraging that misuse.