Copilot is an AI programming tool that makes sample code easier to find might sound like a godsend for software developers, but the reception for this Microsoft’s new GitHub Copilot tool has been a bit chillier. Copilot launched last week in an invite-only Technical Preview, promising to save time by responding to users’ code with its own smart suggestions.
Not long after Copilot’s launch, some developers started sounding alarms over the use of public code to train the tool’s AI.
One concern is that if Copilot reproduces large enough chunks of existing code, it could violate copyright or effectively launder open-source code into commercial uses without proper licensing. The tool can also spit out personal details that developers have posted publicly, and in one case it reproduced widely-cited code from the 1999 PC Game Quake III.
Hi. I know you’re excited about copilot.
GitHub scraped your code. And they plan to charge you for copilot after you help train it further.
It’s truly disappointing to watch people cheer at having their work and time exploited by a company worth billions.
— Brian P. Hogan (@bphogan) July 2, 2021
Cole Garry, a Github spokesperson, declined to comment on those issues and only pointed to the company’s existing FAQ on Copilot’s web page, which does acknowledge that the tool can produce verbatim code snippets from its training data. This happens roughly 0.1% of the time, GitHub says, typically when users don’t provide enough context around their requests or when the problem has a commonplace solution. The company’s FAQ says:
“We are building an origin tracker to help detect the rare instances of code that is repeated from the training set, to help you make good real-time decisions about GitHub Copilot’s suggestions.”
In the meantime, GitHub CEO Nat Friedman has argued that training machine learning systems on public data are fair use, though he acknowledged that “IP and AI will be an interesting policy discussion” in which the company will be an eager participant. The tool also has defenders outside of Microsoft, including Google Cloud principal engineer Kelsey Hightower. He said that developers should be as afraid of GitHub Copilot as mathematicians are of calculators.