The world of data science is awash in open source: PyTorch, TensorFlow, Python, R, and much more. But the most widely used tool in data science isn’t open source, and it’s usually not even considered a data science tool at all.
It’s Excel, and it’s running on your laptop.
Excel is “the most successful programming system in the history of homo sapiens,” says Anaconda CEO Peter Wang in an interview “because regular ‘muggles’ can take this tool…put their data in it…ask their questions…[and] model things.” In short, it’s easy to be productive with Excel.
Superior ease and productivity: This is the future Wang envisions for the popular Python programming language.
Although Excel has succeeded without open source, Wang believes Python will succeed precisely because of open source.
Software, in short, is always a process and not really a product.
Open source was early to clue into this fact. Wang says, “What open source does is it opens the doors. It’s like the right to tinker, the right to repair, the right to extend.” In other words, open source embraces the idea of software as a service—as a process.
More important, this means that open source encourages more people to participate in its creation and success. With most softwares, Wang estimates that 90% to 95% of users are left out of the creation process. They might see the demos but they’re trusting others to deliver software value on their behalf. By contrast, “open source for data science has become so successful because a whole new category of users got turned into makers and builders,” Wang says.