It seems to be a common view on HN that licenses and conditional access to websites should be ignored (i.e. WRT ad-blockers), but also that licenses on Open-Source Software repositories should be respected (i.e. WRT LLM training). I believe that holding these contradictory views is common, but the conflict would need to be resolved to come to a conclusion on how to proceed with LLM training.
Replication is not the same as reproduction; I can replicate an API without violating someone's license or copyright (which I would by reproducing their work).
Developers are permitted to learn from open source code with restrictive copyrights, and apply those lessons to developing other software which does not comply with the copyright of their 'example'.
As an aside, I do believe that LLM trainers are ignoring and violating many licenses, but open-source software is not a clear example of a violation.
Depends on how you define "learn": usually, a company wanting to rebuild and publish something under a different license prohibits their developers from having ever looked at original code, to avoid the risk of copying over exact snippets out of their memory accidentally.
Copyright protects only arbitrarily non-trivial parts of the original being reproduced, but that means that you have to be careful with learning from copyrighted material. Programming books will have direct clauses allowing snippet reuse, but not for teaching purposes.
> Sure, but developers are permitted to learn from open source code with restrictive copyrights, and apply those lessons to developing other software which does not comply with the copyright of their 'example'.
This was a different argument. And there is no contradiction to separate LLMs and people.
> As an aside, I do believe that LLM trainers are ignoring and violating many licenses, but open-source software is not a clear example of a violation.
You seem to be conflating copyright with access rights. Two very different things. Regardless of your feelings on either, there is no contradiction in holding different views on them.
Well no, it’s about legally gating the ability to copy so the original author doesn’t have to compete in the same market to sell his own book with every other bloke with a printing press and a copy of the book. Everything else is an addendum.
Don’t confuse the social justification with the actual purpose of copyright law just because it’s written into the US Constitution that way. America didn’t invent copyright law.
Where and when? In cases where LLM coding assistants reproduce copyleft code in someone's work assignment? The responsibility in those would be on the user, not on AI.
Are you doing a full search of every GPL licensed repository every time you use an LLM to ensure that it isn't giving you GPL licensed code? That doesn't seem reasonable
That's because licenses are an abstract complexity tacked on to a simple material reality in order "to promote the progress of science and the useful arts".
Just like many cultural rules, they keep growing in complexity until they reach a phase change where they become ignored because they have become too complicated.
OSS licenses haven't grown in complexity all that much in the past forty or so years. They're being ignored more now because it's become easier to ignore them, not because it's become harder to abide by them.