NVIDIA Releases Updates to CUDA-X AI Software

18 May, 2021

No Comments

NVIDIA CUDA-X AI is a deep learning software stack for researchers and software developers to build high performance GPU-accelerated applications for conversational AI, recommendation systems and computer vision.

NVIDIA Jarvis Open Beta

NVIDIA announced major capabilities to the fully accelerated conversational AI framework. It includes highly accurate automated speech recognition, real-time machine translation for multiple languages and text-to-speech capabilities to create expressive conversational AI agents.

Highlights include:

Speech recognition model trained on thousands of audio hours with greater than 90% accuracy
Real-time machine translation for five languages that run under 100ms per sentence
Expressive TTS that delivers 30x higher throughput with FastPitch+HiFiGAN vs Tacotron2+WaveGlow

Triton Inference Server 2.7

NVIDIA announced Triton Inference Server 2.9. Triton is an open source inference serving software that maximizes performance and simplifies production deployment at scale. Release updates include:

Model Navigator (alpha), a new tool in Triton which automatically converts TensorFlow and PyTorch models to a TensorRT plan, validates accuracy, and sets up the deployment environment
Model Analyzer will now automatically determine optimal batch size and model instances to maximize performance, based on latency or throughput requirements
Support for OpenVINO backend (beta) for high performance inferencing on CPU, Windows Triton build (alpha), and integration with MLOps platforms: Seldon and Allegro

TensorRT 7.2 is Now Available

TensorRT 8.0 is the latest version of the high-performance deep learning inference SDK. This version includes:

Quantization Aware Training for FP32 accuracy with INT8 precision
Sparsity support on Ampere GPUs delivers up to 50% higher throughput
Up to 2x faster inference for transformer based networks like BERT with new compiler optimizations
TensorRT 8.0 will be freely available to members of NVIDIA Developer Program in Q2, 2021.

NVIDIA NeMo 1.0 RC

NVIDIA NeMo is an open-source toolkit for developing state-of-the-art conversational AI models, including:

ASR collection: Added new state-of-the-art model architectures – CitriNet and Conformer-CTC. Also used the Mozilla Common Voice dataset and AIshell-2 corpus to add speech recognition support for multiple languages including – Mandarin, Spanish, German, French, Italian, Russian, Polish, and Catalan.
NLP collection: Added ten neural machine translation language models supporting bidirectional translation between English and Spanish, Russian, Mandarin, German and French
TTS collection: Added support for HiFiGan, MelGan, GlowTTS, UniGlow, and SqueezeWave model architectures and pre-trained models.

NGC Updates (Includes Framework Updates)

The NGC catalog is a hub of GPU-optimized containers, pre-trained models, SDKs and Helm charts designed to accelerate end-to-end AI workflows. Updates include:
Deep Learning Frameworks
Brand new UI – enables users to navigate, find and download content faster than before with features such as improved search and filtering, tagged content, and direct links to all documentation on the home page.
New and Updated Partner Software