A lightweight CAZyme annotation tool
Dbcanlight is a lightweight rewrite of a widely used CAZyme annotation tool run_dbcan. It uses pyhmmer, a Cython bindings to HMMER3, to instead the cli version of HMMER3 suite as the backend for the search processes, which improves the multithreading performance. In addition, it also solves the inconvenience process in the run dbcan that the large sequence file required manual splitting beforehand.
The main program dbcanlight comprises 3 modules - build, search and conclude. The build module help to download the required databases from dbcan website; the search module searches against protein HMM, substrate HMM or diamond databases and reports the hits separately; and the conclude module gathers all the results made by each module and provides a brief overview. The output of dbcanlight is resemble to rundbcan with slight cleanup. Rundbcan output the same substrate several times for a gene that hits multiple profiles with the same substrate; in dbcanlight we only report it once.
Dbcanlight only re-implemented the core features of run_dbcan, that is searching for CAZyme and substrate matches by hmmer/diamond/dbcansub. Submodules like signalP, CGCFinder, etc. are not implemented.