CUB is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming.