Python is one of the most widely used languages by Data Scientists and Machine Learning experts across the world. Though there is no shortage of alternatives in the form of languages like R, Julia and others, python has steadily and rightfully gained popularity.

Core Data Handling Libraries:

Numpy

Python has a strong set of data types and data structures. Yet it wasn’t designed for Machine Learning per say. Enter numpy (pronounced as num-pee). Numpy is a data handling library, particularly one which allows us to handle large multi-dimensional arrays along with a huge collection of mathematical operations. The following is a quick snippet of numpy in action.

The following are some of the goto features that make it special:

  • Matrix (and multi-dimensional array) manipulation capabilities like transpose, reshape,etc.
  • Highly efficient data-structures which boost performance and handle garbage collection with a breeze.
  • Capability to vectorize operation, again improves performance and parallelization capabilities.

Scipy

Pronounced as Sigh-Pie, this is one of the most important python libraries of all time. Scipy is a scientific computing library for python. It is also built on top of numpy and is a part of the Scipy Stack.

CatBoost

This Implementation from Yandex research is one of the leading variants of boosted trees. It provides capabilities similar to the two variants discussed above. It claims to be better at handling categorical variables and provides support for multi-GPU training. It is also one of the fastest algorithms when it comes to inference.

The three variants/competing implementations discussed above have a lot in common yet have some features better than the rest.

Keras

Keras is a high-level Deep Learning framework which has eased the way we develop and work with deep neural networks. Developed primarily in python, it rests on the shoulders of giants like Theano, Tensorflow, and MXNet (also called as backends). Keras utilizes these backends to do the heavy lifting while transparently allowing us to think in terms of layers. For Keras, the basic building block is a layer. Since, most neural networks are different configurations of layers, working in such a manner eases the workflow immensely.

FastAi

This a high-level library (similar to keras) built on top of PyTorch. As the name suggests, it enables the development of fast and accurate neural networks. It provides consistent APIs and built-in support for vision/image, text, etc

Caffe

Caffe or Convolutional Architecture for Fast Feature Embedding is a deep learning framework developed by Yangqing Jia for his PhD thesis. It was primarily used/designed for image classification and related tasks though it supports other architectures including LSTMs and Fully Connected ones as well.

Gluon

Gluon is a high-level deep learning library/api from AWS and Microsoft. It is currently available through Apache MXNet and allows for ease of use of AWS and Microsoft Azure clouds. It is designed to be developer friendly, fast and consistent.

Bokeh

Bokeh is short for visualization on steroids! No, this isn’t a joke. Bokeh provides interactive zoomable visualizations using the power of javascript in a python environment. Bokeh Visualizations are a perfect solution for sharing results through a Jupyter notebook along with its interactive visualizations.

Python Build System

The build system is easy to use for most tasks, yet it can be mind-boggling to figure out setups for some of the most widely used libraries. The tasks get slightly more complicated when you are working on an OS which has system-installed version of python. Proceed with caution and read installation steps before installing the libraries.

Tags: , , , , , , , , , , , , , , , , , , ,