SciKit-Learn – Devstyler.io

The Last 20 Python Packages You Will Ever Need

nikoleta — Mon, 20 Sep 2021 12:31:55 +0000

Here are the 20 Python Packages you should know for all your Data Science, Data Engineering, and Machine Learning projects. Those are the packages that the machine learning engineer Sandro Luck recently listed as the most useful during his career as an Engineer and Python Programmer.

1. Open CV

The open-source computer vision library, Open-Cv, is your best friend when it comes to images and videos. It offers great efficient solutions to common image problems such as face detection and object detection. If you are planning to work with Images in data science this library is a must.

Matplotlib

Data visualization is your main way to communicate with non-Data Wizards. If you think about it, even apps are merely a way to visualize various data interactions behind the scene. Matplolib is the basis of image visualization in python, from visualizing your edge detection algorithm to looking at distributions in your data, Matplolib is your partner in crime.

Pip

Given that we are talking about Python packages, we have to take a moment to talk about their master PIP. Without it, you can’t install any of the others. Its only purpose is to install packages from the Python Package Index or places such as GitHub. But you can also use it to install your own custom-build packages.

Numpy

Python wouldn’t be the most popular programming language without Numpy. It is the foundation of all data science and machine learning packages, an essential package for all math-intensive computations with python. All that nasty linear algebra and fancy math you learned in university are basically handled by Numpy in a very efficient way. Its syntax style can be seen in many of the important data libraries.

Pandas

Build mostly on Numpy it is the heart of all data science you can ever do with python. “Import pandas as PD” is much more than excel on steroids. Its declared goal is to become the most powerful open source data tool available in any language, and maybe they are more than halfway there.

Python-dateutil

If you ever worked with dates in Python, you know doing it without dateutil is a pain. It can compute given the current date, the next month, or the distance between dates in seconds. And most importantly it handles the timezone issues for you, which if you ever tried doing it without a library can be a massive pain.

Scikit-Learn

If Machine Learning is your passion, the Scikit-Learn project got you covered. The best place to get started and also the first place to look for any algorithm that you could possibly want to use for your predictions. It also features tons of handy evaluation methods and training helpers, such as grid search. Whatever predictions you are trying to get out of your data, sklearn will help you do it more efficiently.

Scipy

This is kind of confusing, but there is a Scipy library and there is a Scipy stack. This includes Numpy, Matplolib, IPython, and Pandas. Just like Numpy, you most probably won’t use Scipy itself, but the above-mentioned Scikit-Learn library heavily relies on it. Scipy provides the core mathematical methods to do the complex machine learning processes.

TQDM

If you ever wondered what my favourite Python package is, look no further, it’s this stupid little application called TQDM. All it really does is that it gives you a processing bar that you can throw around any for loop and it will give you a progress bar that tells you how long each iteration takes on average, and most importantly how long it will take such that you know exactly for how long you can watch youtube videos before you have to go back to work

TensorFlow

The most popular Deep Learning framework and really what made python what it is today. Tensorflow is an entire end-to-end open-source machine learning platform that includes many more packages and tools such as tensorboard, collab, and the What-If tool. Chosen by many of the world-leading companies for their deep learning needs, TensorFlow is with a staggering 159’000 stars on Github the most popular python package of all time. It is used for various deep learning use cases by companies such as Coca-Cola, Twitter, Intel, and its creator Google.

11. KERAS

A deep learning framework made for humans as their slogan goes. It made rapidly developing new neural networks a thing. Keras is based on top of TensorFlow and really the way developers start when they first try around with a new architecture for their model. It reduced the entry barrier for starting to program neural networks by so much that most high school students could do it by now.

PyTorch

Tensorflows’ main competitor in the deep learning space. It has become a great alternative for developing neural networks. Its community is a bit stronger in the realm of Natural Language processing, while TensorFlow tends to be a bit more on the image and video side. As with Keras, it has its own simplifying library Pytorch lightning.

13. Statsmodels

Statsmodel in contrast to the fancy new Machine Learning world is your door to the classical world of statistics. It contains many helpful statistical evaluations and tests. In contrast, these tend to be a lot more stable and surely something any Data Scientist should use every now and then.

Plotly

The big alternative to Matplolib is Plotly, objectively more beautiful, and far better for interactive data visualizations. The main difference to matplolib is that it is browser-based and slightly harder to start with, but once you understand the basics it is truly an amazing tool. Its strong integration with Jupyter makes me believe that it will become more and more standard and make people move away from matplotlib integrations.

NLTK

Short for the Natural Language Toolkit is your best friend when you are trying to make sense of any text. It contains extensive algorithms for various grammatical transformations such as stemming and incredible lists of symbols that you might want to remove before processing text in your models, such as dots and stop words. It will also detect what is most likely a sentence and what is not, to correct grammatical errors made by the “writers” of your dataset.

Scrapy

If you ever tried doing data science without data, probably you realized that is rather pointless. Luckily the internet contains information about almost everything. But sometimes it’s not stored in a nice CSV-like format and you first have to go out into the wild and gather it. This is exactly where scrapy can help you by making it easy to crawl websites around the globe using a few lines of code. Next time you have an idea where no one pre-gathered the dataset for you.

Beautiful Soup

A very similar use case, often these damn web developers store their data in an inferior data structure called HTML. To make use of that nested craziness beautiful soup has been created. It helps you extract various aspects of the HTML such as titles and tags, and lets you iterate them like normal dictionaries.

XGBOOST

Once our dataset size crosses a certain terabyte threshold it can be hard to use the common vanilla implementation of Machine Learning algorithms often offered. XGBoost is there to rescue you from waiting weeks for the computations to end. It is a highly scalable and distributed gradient boosting library that will make sure your calculations run as efficiently as possible.

PySpark

Data Engineering is part of every Data Science workflow, and if you ever tried to process billions of data points you know that your conventional for loop will only get you this far. PySpark is the python implementation of the very popular Apache Spark data processing engine. It is like pandas but build with distributed computing in mind from the very beginning. If you ever get the feeling that you can’t process your data fast enough to keep track this surely is exactly what you need. They also started focusing on massive parallel Machine Learning for your very big data use cases.

20. Urllib3

Urllib3 is a powerful, user-friendly HTTP client for Python. If you are trying to do anything with the internet in Python, this or something that builds on it is a must. API crawlers and connection to various external data sources included.

11 Hot Language Projects Riding WebAssembly

nikoleta — Mon, 28 Jun 2021 10:39:15 +0000

From blazing-fast web apps to Python data science in the browser, these programming language and compiler projects offer different twists on the promise of WebAssembly.

WebAssembly is a low-level, assembly-like language with a compact binary format that runs with near-native performance in web browsers. Hailed as a way to both improve web application performance and allow languages other than JavaScript to be used in the development of browser apps. WebAssembly has led to the development of a range of new technologies, including whole new programming languages, that harness its power. So, here are 10 language projects that have made big bets on WebAssembly.

Blazor WebAssembly

Blazor WebAssembly is a framework for building interactive, client-side, single-page web apps using .NET and hosting those apps in modern browsers (including mobile browsers) on a WebAssembly-based .NET runtime. No plug-ins or recompiling of code into other languages is required. The runtime enables the .NET code to access browser functionality via WebAssembly’s JavaScript APIs.

When a Blazor WebAssembly app is run in the browser, C# code files and Razor files are compiled into .NET assemblies, which are downloaded to the browser along with the .NET runtime. The apps can be deployed standalone or with server-side support.

Cheerp

Leaning Technologies’ Cheerp is positioned as an enterprise-grade C/C++ compiler for the web, compiling C and C++, up to C++ 17, into WebAssembly, JavaScript, or a combination of the two. Cheerp is integrated into LLVM/Clang infrastructure, with custom optimizations intended to improve performance and minimize the size of the compiled output. Primarily used to port existing C/C++ libraries and applications to HTML5, Cheerp also can be used to write web applications and WebAssembly components. Cheerp is offered under open source and commercial licenses.

Binaryen

Binaryen is a compiler toolchain infrastructure library for WebAssembly. Written in C++, Binaryen is intended to make compiling to WebAssembly easy, effective, and fast. It has a C API in a single header, and it can be used from JavaScript. Input is accepted in WebAssembly-like form but a general control graph also is accepted for compilers that prefer it.

The internal IR (intermediate representation) of Binaryen uses compact data structures and draws on all CPU cores for parallel collagen and optimization. The IR also compiles down to WebAssembly easily because it is essentially a subset of WebAssembly. WebAssembly-specific optimizations improve both code size and speed, making Binaryen useful as a compiler back end by itself.

CheerpJ

Billed as “the Java compiler for the web,” this LLVM-based compiler converts any Java client application into WebAssembly, JavaScript, and HTML, enabling Java client applications to run in modern browsers. CheerpJ leverages three components: an AOT compiler, a runtime in WebAssembly and JavaScript, and JavaScript DOM interoperability APIs, to access the DOM from Java. With CheerpJ, JAR archives can be compiled using the AOT compiler. CheerpJ does not require any server-side support.

Emscripten

This open-source compiler toolchain compiles C and C++, or any other language using LLVM compiler technology, into WebAssembly for deployment on the web, Node.js, or a Wasm runtime such as Wasmer. Emscripten has been used to convert a list of real-world codebases into WebAssembly, including commercial codebases such as the Unreal Engine 4 game engine and Unity 3D platform. Emscripten supports the C and C++ standard libraries, C++ exceptions, and OpenGL/WebGL graphics commands. The Emscripten SDK used to install the Emscripten toolchain can be used on Linux, macOS, and Windows.

Forest

Forest is a functional programming language that compiles to WebAssembly. The goal behind Forest is to provide a language that makes it easier to create web apps that are complex, interactive, and functional, but without the traditional overhead of that approach, developer Nick Johnstone said.

Currently described as “pre-alpha, experimental, conceptual research software,” Forest features static typing, pattern matching, immutable data structures, multiple syntaxes, and automatic code formatting. The first syntax in development is inspired by Elm and Haskell.

Design principles of the Forest language include ease of collaboration, painless-as-possible testing, and agreement on structure and semantics while agreeing to disagree on syntax. Johnstone strives to make Forest fast enough for building complex games so that normal web apps will be “blazing fast.”

JWebAssembly

JWebAssembly, from I-Net Software, is a Java bytecode to WebAssembly compiler that takes Java class files as input and generates WebAssembly binary format (.wasm file) or text format (.wat file) as output. The target is to run natively in the browser with WebAssembly. In theory, JWebAssembly can compile any language that compiles to Java bytecode such as Clojure, Groovy, JRuby, Kotlin, and Scala, pending testing.

JWebAssembly is not yet production-ready. Although everything necessary for the JWebAssembly 1.0 release has been implemented, testing still remains to be done. The version 1.0 roadmap calls for capabilities such as a Java bytecode parser, a test framework, and a Gradle plug-in. I-Net Software expects to ship JWebAssembly 1.0 this year.

Uno Platform

An alternative to the Xamarin mobile app platform, Uno Platform is a UI platform for .NET teams to build single-codebase applications for WebAssembly, the web, Windows, macOS, Linux, iOS, and Android, using C# and XAML. Uno leverages the Mono-WASM runtime in .NET 5 to run C# code in all of the major web browsers and serves as a bridge for WinUI and UWP (Universal Windows Platform) apps to run natively on WebAssembly. For building web apps with Uno, developers can use Visual Studio or Visual Studio Code.

Pyodide

The Pyodide project, which recently moved from Mozilla to become an independent project, compiles Python and the Python scientific stack to WebAssembly, bringing the Python 3.8 runtime, NumPy, SciPy, Matplotlib, Scikit-learn, and dozens of other packages to the browser. Pyodide provides transparent conversion of objects between JavaScript and Python and gives Python access to web APIs. Pyodide began in 2018 as part of the Iodide project for doing data science in a browser. Pyodide can be tried from a REPL in the browser.

TeaVM

An ahead-of-time compiler for Java bytecode, TeaVM emits WebAssembly and JavaScript to run in the browser. However, note that WebAssembly support is currently experimental. Like close cousin GWT (Google Web Toolkit), TeaVM allows developers to write applications in Java and deploy them as JavaScript. Unlike GWT, TeaVM works with compiled class files, not source code. In addition, TeaVM relies on existing compilers such as javac, kotlinc, and scalac, so can compile Kotlin and Scala code as well as Java. TeaVM is primarily a web development tool; it’s not designed for taking large codebases in Java or Kotlin and producing JavaScript. A TeaVM subproject, Flavour, serves as a framework for writing single-page web applications.

Grain

The Grain language brings features from academic and functional languages to the 21st century, the project website states. Compiling to WebAssembly via the Binaryen toolchain and compiler infrastructure, Grain can run in the browser, on the server, and potentially anywhere. There are no runtime type errors and no need for type annotations. The Grain toolchain features a CLI, compiler, runtime, and standard library, shipping as a single binary. Developers will need Node.js and Yarn to build Grain from source, and binaries are available for Linux, macOS, and Windows.

Announcing Anaconda for Linux on IBM Z & LinuxONE

nikoleta — Wed, 19 May 2021 06:45:05 +0000

IBM is bringing the Python data science platform Anaconda to the company’s LinuxONE and IBM Z customers.

Anaconda is the world’s most popular Python distribution platform and boasts over 25 million users worldwide. This announcement is the latest part of IBM’s effort to bring popular data science frameworks and libraries to its enterprise platforms. Barry Baker, VP of Product Management for IBM Z & LinuxONE, commented:

“Data scientists who already know and love Anaconda can now expand their open-source data science experience to include IBM Z & LinuxONE while continuing to work with their favourite tools and frameworks like conda, XGBoost and SciKit-Learn. This expands and enables choice in AI frameworks and tooling for end-to-end data science directly on the platform, including development, training, testing and production. Data scientists can benefit from the security capabilities, high availability and scalability of the IBM Z & LinuxONE platforms when implementing AI deployments targeting time-sensitive workloads or transactions when they are taking place.”

According to new research commissioned by IBM in partnership with Morning Consult, 90% of respondents said that being able to build and run AI projects wherever their data resides is important. Workloads running on IBM Z & LinuxONE often need to adhere to strict latency and SLA requirements to support transactions that are key to our modern life such as online purchases. With Anaconda for Linux on Z & LinuxONE, organizations can perform AI analysis in close proximity to their data, addressing latency to deliver insights where and when they are needed.

Customers can start using Anaconda Individual Edition and Anaconda Commercial Edition by downloading the Individual Edition or Miniconda installer, and following the associated installation documentation:

Individual Edition
Miniconda

For more information on using Anaconda Individual or Commercial Edition, you can visit docs.anaconda.com.