GitHub is my Copilot

It isn’t, but I thought that made for a good title. You have probably heard about GitHub Copilot, the new AI-driven pair programming buddy. Copilot is trained on a wealth of publicly-available code, including code under copyleft licenses like the GPL. This has lead many people to question the legality of using Copilot. Does code developed with it require a copyleft license?

The legal parts

Reminder: I am not a lawyer.

No. While I’d love to see the argument play out in court, I don’t think it’s an issue. For as much as I criticize how we apply AI in society, I don’t think this is an illegal case. In the same way that the book I’m writing on program management isn’t a derivative work of all of the books and articles I’ve read over the years, Copilot-produced code isn’t a derivative work either.

“But, Ben,” you say. “What about the cases where machine learning models have produce verbatim snippets from code?” In those cases, I doubt the snippets rise to the level of copyrightability on their own. It’d be one thing to reproduce a dozen-line function. But even giving two or three lines…eh.

The part where verbatim reproduction gets interesting is by leaking secrets. I’ve seen anecdotal tales of Copilot helpfully suggesting private keys. This is either: Copilot producing strings that are gibberish because it expects gibberish or Copilot producing a string that someone accidentally checked into a repo. The latter seems more likely. And it’s not a licensing concern at that point. I’m not sure it’s any legal concern at all. But it’s a concern to the owner of the secret if that information gets out into the wild.

The community parts

But being legally permissible doesn’t mean Copilot is acceptable to the community. It certainly feels like it’s a two-trillion dollar company (Microsoft, the parent of GitHub) taking advantage of individual and small-team developers—people who are generally under-resourced. I can’t argue with that. I understand why people would find it gross, even if it’s legal. Of course, open source licenses by nature often permit behavior we don’t like.

Pair programming works well, or so I’m told. If a service like Copilot can be that second pair of eyes sometimes, then it will have a net benefit for open and proprietary code alike. In the right context, I think it’s a good idea. The execution needs some refinement. It would be good to see GitHub proactively address the concerns of the community in services like this. I don’t think Copilot is necessarily the best solution, but it’s a starting point.

[Full disclosure: I own a limited number of Microsoft shares.]

FOSS licenses permit, not restrict

Last week, Matthew Wilson shared a very correct take on Twitter:

A few people in the mentions argued that the GPL is doing it wrong by his definition. This is incorrect. Copyleft licenses do not prevent the user from doing things, they ensure that subsequent users can do the same thing.

This may seem like a semantic argument, but there’s substance to it. All licenses (except those that amount to a public domain dedication) contain some conditions, minimal though they may be. It’s important to remember that the default is that you can do nothing with a work. Copyright is by definition a monopoly on a work.The entire point of free and open source software licenses is to tell you what you can do, because the default position is that you can’t.

One of the most annoying things about license wars is the argument that one category of license is somehow more free than another. That’s dumb. Both copyleft and permissive licenses promote freedom, just from different perspectives. Permissive licenses give the next person in line the freedom to do (essentially) whatever they want. Copyleft licenses preserve freedoms for all subsequent users, no matter how many hands the work passes through. There are plenty of philosophical and practical reasons you might choose one class of license over the other (I tend to prefer copyleft licenses, myself), but it’s wrong to paint one or the other as anti-freedom.

Getting back to Matthew’s point, there has been a fair amount of license weaponization in the last few years. By this I mean the use of a license to try to exclude a certain class of user. Some of this I’m sympathetic to (e.g. the “ethical source” movement), some of this I’m not (e.g. the various “you can do what you want, just don’t make a successful software-as-a-service offering” licenses that have popped up). In both cases, I think copyright is the wrong mechanism for achieving the goals.

Excluding classes of users is antithetical to ideals free software and open source. That may be okay. As I’ve written, free software is not the end goal. But if you’re going to claim to be open source, you should act open source.

CopyleftConf was great, you should go next year

Two weeks ago, I was fortunate to attend the inaugural Copyleft Conference. It was held in Brussels, Belgium the day after FOSDEM. Since I was in town anyway, I figured I should just extend my trip by a day to attend this conference. I couldn’t be happier that I did.

Software licensing doesn’t get enough discussion at conference as it probably should. And among the talks that do happen, copyleft licenses specifically get only a portion of that. But with major projects like the Linux kernel using copyleft licenses — and the importance of copyleft principles to open source software generally — the Software Freedom Conservancy decided that a dedicated conference is in order.

I was impressed with how well-organized and well-attended the conference was for a first try. The venue was excellent, apart from some acoustic issues in the main room. The schedule was terrific: three rooms all day, each filled with talks from the world’s leading experts. I commented to a friend that if the building were to collapse, 80% of the worlds copyleft expertise would disappear.

For me, some of the excitement was just being around all of those people:

Molly deBlanc’s keynote was simultaneously inspiring and disturbing. She spoke of how software freedom matters to everyone, but how it matters to marginalized people in different ways. Ad networks can expose that someone at risk is seeking help. “Smart” homes can be used by domestic abusers to torment their victims. The transparency that free software brings isn’t just a nice-to-have, it can materially impact people’s lives.

The other session that was particularly interesting to me was Chris Lamb’s discussion of the Commons Clause. Chris was more focused on the response of the community to Redis Labs’ decision to adopt it than the Commons Clause itself. He viewed Redis Labs’ decision to adopt and subsequent refusal to abandon the Commons Clause as a failure of the copyleft community to make a compelling argument. Drawing on the work of Aristotle, Chris argued that we, as interested and knowledgeable parties, should have done a better job making our case. The question, of course, is who the “we” is that Chris is exhorting. This is a particularly key question for his advice to proactively address the concerns of companies.

Some of the other talks focused more directly on adapting to a new environment. Version 3 of the GNU General Public License was published in 2007. At the time, Amazon Web Services (as we currently know it) was just over a year old. The original iPhone was released on the same day. While the principles behind the GPLv3 haven’t changed, the reality of how we use software has changed dramatically. Van Lindberg’s talk on a new license he’s drafting for a client explored what copyleft looks like in 2019. And Alexios Zavras noted that the requirements to provide source code don’t necessarily apply as-written anymore.

In addition to meeting some new friends and idols, I was also able to spend some time with friends that I don’t get to see often enough. I’m already looking forward to CopyleftConf 2020.

Licensing and open source communities

At FOSDEM 2014, Eileen Evans gave a talk entitled “Licensing Models and Building an Open Source Community“. The talk is basically a discussion how Evans changed her mind about the suitability of permissive licenses in vibrant open source communities. She proposes that a vibrant community requires excellent technology, suitable governance, and a license that the community perceives as fair.

A decade ago, Evans was working at Sun and considering what license to use for OpenSolaris. The decision at the time was that because copyleft licenses require downstream changes to be returned to the community (in the sense that they remain freely-licensed), copyleft licenses are necessary for a healthy community.

In the intervening years, many projects have adopted permissive licenses. The GPL family is no longer the majority license, according to several surveys. Vendor participating in open source projects favored strong copyleft until around 2006, but the preference has shifted toward permissive licenses. A survey of GitHub projects showed the MIT license with a dramatic lead over the next-most-widely-used license.

Based on this, Evans concluded that permissive licenses can, in fact, be used

Is that still true today? Projects are increasingly using permissive licenses. MIT dominates GitHub. Vendor engagement (participation in projects) was toward strong copyleft until ~2006 when permissive licenses take over. 5x increased in contributors to CloudStack after changing from copyleft to permissive. Permissive licenses may be used to build a community.

Of course, there are few who would take the position these days that permissive licenses can’t be used. Even noted copyleft advocate Bradley Kuhn can be heard agreeing on the video, though he points out his view that copyleft licenses make for better communities. Perhaps the question should be phrased as “what kind of communities develop?”

In conducting research for my thesis, I came across a study that showed copyleft licenses were associated with higher user engagement, but permissive licenses were associated with higher developer engagement. This makes sense, since not all developers develop FLOSS. A developer who isn’t developing FLOSS would probably be more drawn to a project where the license was conducive to proprietary downstreams.

Evans’ anecdote about the increase in contributions to CloudStack when it switched from copyleft to permissive licensing may or may not tell us something. It may be purely coincidental. An increase in the popularity of the project or of cloud computing generally may have driven the change. And of course, there’s more to a community than the number of committers.

I suspect that the license itself may be less important than the overall governance model. It’s certainly an area that merits further research.

Open source is about more than code

The idea of open source developed in a closed manner is hardly new. The first real discussion of it came, as best as I can tell, in Eric S. Raymond’s The Cathedral and the Bazaar. A culture of open discussion and decision making is still a conscious act for projects. It’s not always pretty: consensus decision making is frustrating and some media outlets jump on every mailing list suggestion as the final word on a project’s direction. Still, it’s important for a project to make a decision about openness one way or the other.

Bradley Kuhn recently announced the copyleft.org project, which seeks to “create and disseminate useful information, tutorial material, and new policy ideas regarding all forms of copyleft licensing.” In the first substantive post on the mailing list, Richard Fontana suggested the adoption of the “Harvey Birdman Rule,” which has been used in his copyleft-next project. The limited response has been mostly favorable, though some have questioned its utility given that to date the work is almost entirely Kuhn’s. One IRC user said the rule “seems to apply only to discussions, not decisions. The former are cheap and plentiful, but the latter actually matter.”

I argue that the discussions, while cheap and plentiful, do matter. If all of the meaningful discussion happens in private, those who are not privy to the discussion will have a hard time participating in the decision-making process. For some projects, that may be okay. A ruling cadre makes the decisions and other contributors can follow along or not. But I see open source as being more than just meeting the OSI’s definition (or the FSF’s definition of free software for that matter). Open source is about the democratization of computing, and that means putting the sausage-making on public display.

Why am I giving my work away for free?!

Recently, I began writing a regular weather blog for the local newspaper.  I’m not getting paid for this, so people may wonder why I’m giving free content to a for-profit organization.  I asked myself this very question, and the answer is that I don’t find the terms sufficiently objectionable.  Although the blog appears on the Journal & Courier website, they likely don’t make too much money off the ad revenue.  And while I don’t make any money either, I get the chance to refine and showcase my writing skills for a different audience than I currently have, and I get the chance to bring a little bit of traffic here (maybe I should start selling ads).  Of course there’s always the joy of sharing my knowledge, proving a public service, and keeping all of that meteorology I learned in school in my head a little longer.  Finally, I’m a compulsive favor-doer.

More than any of that, though, I am philosophically in favor of sharing information.  The vast majority of the writing I do is released under some form of the Creative Commons licenses.  The Fedora Project requires me to use the CC-BY-SA license, which does not prohibit commercial use.  In that sense, writing documentation for Fedora and writing my weather blog both could result in people who are not me making money off my work.  That’s fine, because I’m not doing it for money (although if someone wants to leave an envelope of cash on my doorstep, that’s okay).  In both cases, I consider the free access to my effort to be fair trade.  My Fedora work is my way of contributing to the project that provides me with free (both gratis and libre) software that I use on a daily basis.  The writing I do for the Journal & Courier I see as contributing to the betterment of my society (or at least the lowering of my blood pressure. Weather-related stupidity angers me quite effectively).  The fact that one is a non-profit and the other is for-profit is not a consideration for me.

I am a firm believer in freedom for users, but I also believe that content creators should be free to license their works as they see fit.  Copyleft licenses like the GPL are preferable to more restrictive licenses, but if someone wants to put a restrictive license on his work, that right should be available.  In each case, a decision must be reached as to what is and is not acceptable.  In the cases I’ve discussed here, I have determined that, for my own criteria, the terms are acceptable.  The nice thing about volunteer work is that if I determine at some point that the terms are no longer tolerable, I can simply stop contributing.  In the meantime, I hope as many people as possible enjoy the fruits of my labor, and I look forward to enjoying the works of others.