Data science and machine learning professionals have driven the adoption of the Python programming language, but data science and machine learning are still lacking key tools in business and have room to grow before becoming essential for decision-making, according to Anaconda, the maker of a data science distribution of Python.
Most respondents (63%) said they used Python frequently or always while 71% of educators said they’re teaching machine learning and data science with Python, which has become popular because of its ease of use and easy learning curve. An impressive 88% of students said they were being taught Python in preparation to enter the data science/machine learning field.
Over a third of 4,299 data science professionals, students and academics who responded to Anaconda’s online survey this April to May said their organizations decreased investments in data science, while 26% increased their investment and 24% said investments were flat. It’s not clear what impact the pandemic has had on investments in data science tools and technology.
Some 39% reported that “many” of their business decisions rely on data science, while 35% said only some business decisions were based on insights from their team. A quarter of respondents said they lacked the resources for effective analysis, while another quarter said decision-makers at their organization struggle with data literacy, and 11% said they or their team couldn’t demonstrate a business impact.
Only 36% described their organization’s decision-makers as “very data literate” and actually understood data visualization and models. Just over half said decision-makers were “mostly data literate”.
Anaconda also asked respondents to nominate all the skills they believe their organization currently lacks. The top missing skill was in “big data management” at 38%, while 26% said their organization was lacking advanced mathematics, and a quarter cited “business knowledge” as lacking.
Other commonly cited skills in short supply were deep learning (27%), communication skills (22%), data visualization (22%), machine learning (21%), Python (20%), and probability and statistics (19%).
The top problem that most data science folks felt needed to be tackled in artificial intelligence and machine learning was “social impacts from bias in data and models” (31%), followed by “impacts to individual privacy”. Both of these issues have been highlighted by the adoption of AI and facial recognition in public surveillance systems. Microsoft president Brad Smith recently called for the government to regulate facial recognition due to racial bias.
Other top concerns included job losses from automation (19%), advanced information warfare (15%), and lack of diversity and inclusion in the profession (10%).
Just 10% of respondents said their organization had implemented a solution to ensure fairness and mitigate bias, but Anaconda found 30% were planning to implement a step in the next year.
The Explainability and interpretability of ML models was another large gap. Some 31% said their organization lacked plans to ensure explainability and interpretability, but 41% said plans were in place to implement some steps in the next 12 months or have one step already.
Most respondents (65%) said their employers encouraged them to contribute to open-source projects, but 18% of respondents said employer support for open source decreased due to COVID-19 or other factors.
Some 41% said security bugs in open source software was the main obstacle preventing their organization from using open-source software. Python and many of its popular data science and machine learning packages/libraries, such as NumPy and TensorFlow, are open source projects.
Interestingly, a quarter of respondents said they were not securing their open-source pipeline while 20% didn’t know what steps their organization was taking to ensure vulnerabilities are managed. Anaconda provides an enterprise service to help organizations block or include packages that meet an enterprise’s standards. It also has a managed library of 7,500 open-source packages for Python.