Traditionally, computer users could see the end product of what a piece of software did by, for instance, writing a document in Microsoft Word or playing a video game. But the underlying programming – the source code – was proprietary, kept from public view. Opening source material in computer science is a big deal because the more people that look at code, the more likely it is that bugs and long-term opportunities and risks can be worked out.
Openness is increasingly a big deal in science as well, for similar reasons. The traditional approach to science involves collecting data, analyzing the data and publishing the findings in a paper. As with computer programs, the results were traditionally visible to readers, but the actual sources – the data and often the software that ran the analyses – were not freely available. Making the source available to all has obvious communitarian appeal; the business appeal of open source is less obvious.
Microsoft, Google, Facebook and Amazon have been making remarkable progress developing artificial intelligence systems. Recently they have released much of their work to the public for free use, exploration, adaptation and perhaps improvement.
This seems bizarre: why would companies reveal the methods at the core of their businesses? And what does their embrace of open-source AI say about the current state of artificial intelligence?
Each technology that’s being revealed displays remarkable capabilities that go well beyond what was possible even just 10 years ago. They center on what is called “deep learning” – an approach that organizes layers of neural networks hierarchically to analyze very large data sets not just in search of simple statistics but also seeking to identify rich and interesting abstract patterns.
Among the technologies that major tech companies have opened recently are:
To understand what is driving these trends toward open source AI, it is helpful to consider other organizations in the broader social context in which these companies operate.
One useful comparison is DARPA, the research arm of the U.S. Department of Defense. It is hard to imagine an organization likely to be more concerned about others taking advantage of open information. Yet, DARPA has made a big push toward open-source machine learning technologies.
Indeed, the DARPA XDATA program resulted in a catalog of state-of-the-art machine learning, visualization and other technologies that anyone can download, use and modify to build custom AI tools. (I was a research lead on the CrossCat/BayesDB project that was supported through this program.)
The fact that DARPA and the Defense Department are so supportive of open-sourcing strongly indicates that the advantages of open sourcing outweigh the disadvantages of making high-quality tools available to potential adversaries.
Another useful comparison is the OpenAI project, recently announced by tech entrepreneurs Elon Musk and Sam Altman, among others. The effort will study the ethics of creating and releasing machines with increasing abilities to interact with and understand the world.
While these goals will be familiar to anyone who has read Isaac Asimov, they belie a deeper issue: even experts do not understand when or how AI might become powerful enough to cause harm, damage or injury.
Open sourcing of code allows many people to think through the consequences both individually and together. Ideally, that effort will advance software that is increasingly powerful and useful, but also broadly understandable in its mechanisms and their implications.
AI systems involve large – often very very large – amounts of code, so much that it stretches the ability of any single individual to understand in both breadth and depth. Scrutiny, troubleshooting and bug-fixing are especially important in AI, where we are not designing tools to do a specific job (e.g., build a car), but to learn, adapt and make decisions in our stead. The stakes are larger both for the positive and potentially negative outcomes.
Neither the motivations of DARPA nor OpenAI explain exactly why these commercial technology companies are open sourcing their AI code. As technology companies, their concerns are more immediate and concrete. After all, if nobody is using their products, then what good are nice clean code and well-intentioned algorithms?
There is a common view within the industry that technology companies like Google, Facebook and Amazon are not in the businesses one might assume. Over the long run, Google and Facebook are not really in the business of selling ads, and Amazon is not in the business of selling merchandise. No, these technology companies are powered by your eyeballs (and data). Their currency is users. Google, for example, gives away email and search for free to draw users to its products; it needs to innovate quickly, producing more and better products to ensure you stay with the company.
These companies open-source their AI software because they wish to be the foundations on which other people innovate. Any entrepreneur who does so successfully can be bought up and easily integrated into the larger parent. AI is central because it, by design, learns and adapts, and even makes decisions. AI is more than a product: it is a product generator. In the near future, AI will not be relegated to serving up images or consumer products, but will be used to identify and capitalize on new opportunities by innovating new products.
Open-sourcing AI serves these companies' broader goals of staying at the cutting edge of technology. In this sense, they are not giving away the keys to their success: they are paving the way to their own future.