Yesterday, OpenAI released Triton, an open source, Python-like programming language that enables researchers to write highly efficient GPU code for AI workloads. Triton makes it possible to reach peak hardware performance with relatively little effort, OpenAI claims, producing code on par with what an expert could achieve in as few as 25 lines.
Deep neural networks have emerged as an important type of AI model, capable of achieving state-of-the-art performance across natural language processing, computer vision, and other domains. The strength of these models lies in their hierarchical structure, which generates a large amount of highly parallelizable work well-suited for multicore hardware like GPUs. Frameworks for general-purpose GPU computing such as CUDA and OpenCL have made the development of high-performance programs easier in recent years. Yet GPUs remain especially challenging to optimize, in part because their architectures rapidly evolve.
Domain-specific languages and compilers have emerged to address the problem, but these systems tend to be less flexible and slower than the best handwritten compute kernels available in libraries like cuBLAS, cuDNN, or TensorRT. Reasoning about all these factors can be challenging even for seasoned programmers. The purpose of Triton, then, is to automate these optimizations, so that developers can focus on the high-level logic of their code.