TileIR is a portable and language agnostic intermediate representation for CUDA kernels
With Tile IR, we introduce a new operation set and programming model to retain CUDA’s performance across architectures while regaining portability and improving productivity for developers using matrix operations on new architectures. We virtualize tensor-cores and their associated programming model to the point that we can innovate new approaches in hardware without invalidating investments in software.