Pipeline to identify candidate cfDNA biomarker sequences from WGS data
cfDNA-BiomarkerDiscovery is a pipeline designed to identify candidate biomarker sequences from cell-free DNA (cfDNA) derived from blood samples. The pipeline takes as input: • An archive of FASTQ files containing paired-end reads from whole-genome sequencing (WGS) of one or more case cohorts and a control cohort. • Additional required files: adapter sequences, genome version for download, genome information file, parameter file, and an archive with genome FASTA files. The pipeline performs: 1. Preprocessing: adapter trimming, quality filtering, and alignment to the reference genome. 2. Analysis: identification of candidate genomic regions as potential biomarkers. If informative regions are identified, the pipeline generates a CSV file listing the candidate biomarker sequences along with their associated genomic coordinates.