Why would anyone want to stop Copilot is beyond me.
Reinventing the wheel, millions of time a day, is an atrocity.
Millions of (wo)man hours, wasted, every single day, on writing solutions to problems that have already been solved. There is a partial solution to this, and it's making people angry, it's crazy.
If you put your code publicly on the internet, you should expect that people will reuse your code at some point, no one broke into your privates repositories.
Why would anyone waste their time to make other people waste more of their time is really beyond me.
You want to use my code, without ever knowing I wrote it? You want to use my hard work, regurgitated anonymously, stripped of all credit, stripped of all attribution, stripped of all identity and ancestry and citation? FUCK YOU!
Training must be opt in, not opt out.
Every artist, every creative individual, must EXPLICITLY OPT IN to having their hard work regurgitated anonymously by Copilot or Dall-E or whatever.
If you want to donate your code or your painting or your music so it can easily be "written" or "painted", in whole or in part, by everyone else, without attribution, then go ahead and opt in.
But if they don't EXPLICITLY OPT IN, you can't use the artist's or author's creative work for training.
All these code/art washing systems, that absorb and mix and regurgitate the hard work of creative people must be strictly opt in.
Every human is using the hard work of other humans down through the entirety of history and mostly without credit or attribution.
None us exists in a vacuum and we are all copying each other constantly.
Should students need to attribute the copyrighted textbooks and lessons that they learned from for all their future work?
Should artists attribute every reference they've used? Even if they draw stick figures based on the reference? Even if they only use small parts from multiple references?
What's different from a machine learning something and a human learning it?
I think in terms of practical open source/permissive licenses it makes the most sense for new licenses to be made that include no-training clauses for the rights holders that dislike machine learning.
Dall-E's use of training on non-permissive copyrighted web-scraped data seems more complicated and I imagine there will eventually be lawsuits to figure that out.
I just don't understand this at all. I publish my code as open source when I can because I want others to find it useful, either by using the software that I wrote or by reusing the code. If I didn't want that, I wouldn't publish the code. But I do want it, so I'm glad there's a way for people to access it more easily.
I understand the argument from an artist's perspective much more, since they don't really have the option to publish their work in a way that any AI or any other artist can't copy off of.
Simply being public doesn't mean it's in the public domain - this applies to movies, art, code, etc.
One example of restrictive but public licenses include requiring others to share their source code if it's derived from yours, allowing individuals to use a product but not allowing business to use it (businesses can use it under a different - likely paid for license), or requiring attribution or acknowledgement that they used your code.
There is an argument for fair use if it counts as a substantial derivative, which is a different discussion from why people make it publicly viewable without making it flat out public domain.
That's great for you. I hope you choose a license and copyright terms that enable this specific vision.
The vast majority of open source licenses and copyright terms specifically stipulate the legal requirements for reproducing even just parts of the code. Which at a minimum require reproducing the license and copyright with all software including the licensed and copyrighted code.
You’re missing the point. It’s not an ego problem: if you put your code on the internet with a license you should expect people to respect the license’s rules…
I think it's a gray area in the license. Much of the code was intended to be used freely and commercially by others, but not for AI training. It follows the license to the letter, but not the intent.
I expect we'll see new licenses appear making it clear whether or not the content can be used for training.
Who's to say the intent? I've published lots of code with very permissive licenses and I did so because I want people to be able to use that code for any reason. That's why I choose those licenses.
Are you saying that's fair use? If so, then we won't see new licenses appear related to it, since a license can only give you more permissions on top of fair use, not take away fair use. If not, then we still won't see new licenses appear related to it, since the existing licenses already don't allow it.
Good point. I'm not a lawyer, but looking it up, the factors for fair use are:
1. the purpose and character of the use; 2. the nature of the copyrighted work;
3. the amount and substantiality of the portion used; 4. the effect of the use upon the potential market for the original work.
All of these are quite debatable, and I'll leave it to someone more familiar with the law.
Though if it's not, I believe there are licenses that allow derivative uses of code and licenses that don't. For many of these, the intention is that they create more code, but not be used to fuel AI behemoths.
It doesn't matter what you believe. It matters what the judge and jury say when this goes to trial, and it will go to trial because Microsoft has a lot of money.
So? Most of the developed world have legal systems that does believe in intellectual property. The fact that a few people "don't believe in intellectual property" because they want to torrent movies/games is mostly irrelevant when it comes to the software engineering profession.
Expect you're not licensing functions, you're licensing a repository.
If I use a sentence or even a paragraph from a copyrighted book, it's not copyright infringement.
> If I use a sentence or even a paragraph from a copyrighted book, it's not copyright infringement.
Note that this may not actually be true, and you may need to pay to license even shorter excerpts of creative work. Copyright is a complex topic. It's not always safe to assume that you have the rights you think you have, in terms of reproducing others' work.
For example: "The proportion of a total work is not the only factor, though. If you are including the most crucial aspect of a work, even if it is only a small part, then the question of “substantiality” comes into play." [1]
It can be if you fail to give attribution. Plagiarism isn't just unethical, unprofessional and immoral (not to mention evidence that the plagiarist is an uncreative dullard). It's illegal. How many words or sentences it takes to trigger a complaint is mostly governed by what it takes to prove a violation. The more material copied, the easier that can be. In this situation providing attribution (tooltip when you mouse over the code?) would probably satisfy 9/10 of potential complaints. But big companies usually won't make that kind of minimal effort without being hit upside the metaphorical head with a piece of metaphorical lumber (like with an actual lawsuit).
If you take a function from a repository (or a sentence from a book), it is the unlicensed use of copyrighted material. Everything in the repository is covered by the license, functions, files… everything.
Whether or not it is infringement depends on if the use can be considered fair use. This is a more nuanced question and is not always clear.
In this case (Copilot) the real question is how transformative the AI training is. Given how verbatim some of the outputs are makes the argument less clear.
>If I use a sentence or even a paragraph from a copyrighted book, it's not copyright infringement.
I'm assuming you're referring to fair use. In that case whether it's copyright infringement or not is very situational (the legal standard consists of a test with various subjective factors) and isn't as simple as "it's less than a paragraph so I can copy whatever I want".
> Reinventing the wheel, millions of time a day, is an atrocity.
> Millions of (wo)man hours, wasted, every single day, on writing solutions to problems that have already been solved. There is a partial solution to this, and it's making people angry, it's crazy.
Following this line of thought, do you think that all code from all software should be open source and publicly available (and free to copy and use), in the interest of saving more person hours from reinventing the wheel?
> Following this line of thought, do you think that all code from all software should be open source and publicly available
Let's help shape this thought: Copyright should be abolished entirely. It is one of many monetization schemes and its negative effects greatly outweigh its positives.
We know people won't stop writing software in the absence of copyright. We know they won't stop writing books, singing songs, etc. Copyright is not the primary motivator for either science or art.
Will we need new monetization structures? Of course. But generally speaking we already have them where it matters.
> Let's help shape this thought: Copyright should be abolished entirely. It is one of many monetization schemes and its negative effects greatly outweigh its positives.
Even if you're right in principle (and I would love new monetization structures), this will never happen in reality.
Meanwhile, this idealism will get applied asymmetrically in the real world. If you (or the comment I was replying to) say "Copilot is fine, all code should be publicly available anyway", it downplays the fact that this wish will never happen with big players like Microsoft and will only happen with little players like anyone who used Github to host their code. The big player will typically hide their code behind copyright and lawyers to enforce it, whereas the little players have no similar recourse.
So, I see the issue as an exploitation, as Microsoft is selling a product built on the little players and not the big players. The debate around whether copyright should exist at all, while interesting, is not that relevant to most of the concerns being aired in the context of Copilot.
There is no such as "owning" a work. We use that as a euphemism for owning copyrights, and the only function of copyrights are to prevent others from making copies. To prevent others from sharing.
The question is whether the monetization model presented by copyright is a net positive for the author, after accounting for its chilling effect on communications for all other people in the world.
The answer is almost certainly "no," as empirically demonstrated by entire segments of IP work opting out of copyright. The open source model clearly demonstrates that you do not need to own a work to fund it or monetize it. There are similar models in other areas of art and science which allow for the funding of works without preventing others from copying them.
It says why in the linked post. People aren't doing open source for free; they do it for the community. But Copilot is there to extract value from it, giving nothing back, not even credit.
Can't open-source programmers improve their own open-source code with Copilot? Does the inherent improvements that all Copilot offers just not apply to people who write open-source code?
I understand that there is a balance, but as an open-source advocate who would love better tools to make their open-source projects better I'm lost as to why this point doesn't counter the "giving nothing back" we hear so often.
Since there's no way to know how code generated by Copilot might be licensed without expensive code-scanning tools, I don't think OSS can safely derive any substantial improvements from it.
If that's the only issue, I can't see the difference when I search for something on the web, copy the code and paste into my solution. There's no attribution, there's no giving back, nothing. Because I'm the community that you are saying the code is supposed to benefit.
Yeah, that sounds like a good definition of someone who is not part of the community. I copy code off Stack Overflow too, but often provide attribution in a comment. But like piracy, it's easier to hunt the whales than the small offenders.
Stack Overflow facilitates the same thing too, so it's an interesting comparison, but SO makes attribution easy and clear, and it actually made it effortless to contribute back.
Like everything in life. Its all about extracting value from someone else who has no control over the exploitation. You only notice when you are the one being exploited though.
As long as some company can improve its bottom line it’s all good though
That's... the exact opposite of a community. Communities are about contributing whatever you can, and taking what you need. There's more joy in giving than taking. Exploitation happens when someone is taking advantage of that tendency to give.
Eventually someone comes in and takes everything that isn't nailed down and then sells it, and that becomes the problem.
Copilot is not selling code, they're selling GPU time.
If you're ready to buy a hundred GPU/TPUs to train a new Copilot that is just as good, but for free, then go do it please, everyone will thank you
So you are saying, copyright does not apply to Spotify when it comes to music because they aren’t selling the music but rather a service to olay said music from a catalog?
Also to your other comment about copyright not being an issue if you just use a paragraph from a book - I am not a lawyer but I would think that copyright applies just the same way it applies to musicians who use portions of the melody of other musicians’ songs.
You're asking for people to be okay with potential copyright violations and a removal of attribution because of the common need. Like all things, there must be balance. Open source would not exist if the only use of its output was to train ML models that hide where the code comes from. Part of the allure of open source--maybe the biggest allure, honestly--is the community aspect. I get to find friends, contribute philanthropically, and feel proud of my contributions. Copilot removes any incentive I have to produce code for free.
when you put code on github.com you grant GitHub the right to show that code to others, independent of the license you choose for your code. full stop. doesn't matter if it's on a webpage, a git client, or a github-developed plugin to an IDE.
So this doesn't negate the license. Microsoft cannot just roll the code into windows for example, closed source and proprietary. They have to abide by the license, regardless what their ToS says.
Here's a fun way to see it, suppose someone writes code licensed GPL. I take it, fork it, modify a line in it or not, and also license it GPL because I have to by law. I put it on my github account and what, I now just gave Microsoft rights to the code I don't even have? So by putting it on github I'm violating a license? It doesn't add up. The license to the code is the license to the code, no matter what site it's on and noatter what any ToS says. Otherwise what's to stop me from putting a ToS on my personal website partaining to your use of my eyeballs that says "if your creation becomes viewable by my eyeballs in any way I can use it however I want, publishing your work in such a way that it can be viewed by my eyeballs is consent to this ToS"?
> So by putting it on github I'm violating a license?
yes. if you don't have the rights to upload code to github.com, including all of the rights required of one that uploads that code to github.com, and you do so anyway, then you are in violation of the GitHub terms of service.
fortunately for you, the GPL allows what you are describing: "1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty;..."
While that certainly covers some code on GitHub, much of the code on there is just mirrored from other locations by non-owners: you can find copies of the Linux kernel and SQLite on GitHub, for instance. The users who upload those to GitHub have the right to do so (legally) but do not have any rights that they could grant to GitHub.
again, read the document. all this talk of license violation and almost no one is reading the agreements which say what rights users have given GitHub...
without the right to grant those licenses to GitHub, by uploading that code to GitHub, you are in violation of the terms of service, and the responsibility of acting in compliance with the license is on the shoulders of the user which uploaded that code to github.com.
Said another way, GitHub has no way to know if the person mirroring SQLite (for example) is acting in accordance with their rights, so the terms of service require that you attest that you are acting within your rights, acknowledge that it is solely your responsibility if you are not, and that by uploading you grant license to GitHub and its users.
When you upload code to github you give other people the right to fork it... I knew that already. But you license it. You don't give anyone the right to fork it and not abide the license. So if I fork it, I'm still giving Microsoft rights I don't have, I'm giving them the right to violate the license. That makes it illegal for me to fork it.
Let's say I am on a git mailing list, following a project, and I upload that project to github one day. It's licensed GPL. Microsoft says I give them the right to violate the license, and in uploading it I implicitly attest that I have the right to do so. I've violated the license? It's illegal for me to upload the code, with the license, to github, because Microsoft demands rights I don't have to give? Then let's say someone else forks it. They've now also violated the law?
It's nonsensical. The license is the binding ToS here, period, it doesn't matter what Microsoft's lawyers argue. Everything else is secondary.
> When you upload code to github you give other people the right to fork it... I knew that already. But you license it. You don't give anyone the right to fork it and not abide the license.
you are talking multiple separate things here.
when I upload code to github.com I attest that I have the rights required to do so, and the rights required to grant GitHub the licenses I've agreed to grant it by uploading.
> You don't give anyone the right to fork it and not abide the license.
correct, you can't grant a right to violate the rights granted. users of the code hold the responsibility of acting in accordance with the license.
> So if I fork it, I'm still giving Microsoft rights I don't have, I'm giving them the right to violate the license. That makes it illegal for me to fork it.
no. you did not upload code that you forked from a GitHub.com repository. if you are talking about uploading code that you copied somewhere else, and you're calling that a fork, you have violated the terms by uploading code that you do not have rights to upload. remember, by uploading code to github.com you attest that you have the rights required to do so, according to the terms of service. if you lie, you are responsible for that lie and its consequences.
> Microsoft says I give them the right to violate the license
your premise in this part is flawed. see above.
> Microsoft demands rights I don't have [the right] to give?
by uploading to GitHub.com you attest that you have the ability to grant those rights. If you lied, and you don't have those rights, that's your responsibility and your ass if a law suit comes around because of it.
perfectly sensible to me. GitHub gets to say that they require users to grant the rights in order to upload, and that the users necessarily had the rights to give to GitHub. if a user lied, that is not GitHub's fault; the user entered into a legal agreement saying they had the rights needed.
If an ai model is allowed to emit copy-left code verbatim in proprietary software, you can effectively create a gpl 3 stripper. I don't think that ultimately serves your goal of intellectual sharing
I don't want to stop copilot. I put my code publicly on the internet precisely so people can use it, and not just as a user either, they can repurpose it, incorporate it into their software, whatever they want to do. It's called free software for a reason, and i mean it when I say it.
Reinventing the wheel, millions of time a day, is an atrocity.
Millions of (wo)man hours, wasted, every single day, on writing solutions to problems that have already been solved. There is a partial solution to this, and it's making people angry, it's crazy.
If you put your code publicly on the internet, you should expect that people will reuse your code at some point, no one broke into your privates repositories.
Why would anyone waste their time to make other people waste more of their time is really beyond me.
Let go of your egos for once.