WhatsGNU protein allele frequency analysis for AllTheBacteria (2.4M+ genomes)
A custom reimplementation of WhatsGNU optimised for AllTheBacteria scale. Builds an LMDB-backed sharded database of protein allele frequencies across 2,438,285 bacterial genomes, and provides fast querying of any bacterial genome to obtain per-protein allele counts and species distributions. Includes a downloader for the pre-built database hosted on OSF.