TileIR is a portable and language agnostic intermediate representation for CUDA kernels
copied from cf-post-staging / cuda-tileirasWith Tile IR, we introduce a new operation set and programming model to retain CUDA’s performance across architectures while regaining portability and improving productivity for developers using matrix operations on new architectures. We virtualize tensor-cores and their associated programming model to the point that we can innovate new approaches in hardware without invalidating investments in software.