umicollapse
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI).
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI).
To install this package, run one of the following:
UMIs are a popular way to identify duplicate DNA/RNA reads caused by PCR amplification. This requires software for collapsing duplicate reads with the same UMI, while accounting for sequencing/PCR errors. This tool implements many efficient algorithms for orders-of-magnitude faster UMI deduplication than previous tools (UMI-tools, etc.), while maintaining similar functionality. This is achieved by using faster data structures with n-grams and BK-trees, along other techniques that are carefully implemented to scale well to larger datasets and longer UMIs. Users of UMICollapse have reported speedups from taking hours or days to run with a previous tool to taking only a few minutes with this tool with real datasets! doi 10.7717/peerj.8275.
Summary
Accelerating the deduplication and collapsing process for reads with Unique Molecular Identifiers (UMI).
Last Updated
Oct 17, 2024 at 17:19
License
MIT
Total Downloads
2.6K
Supported Platforms