About Anaconda Help Download Anaconda

BiRD / packages / singlecellpipeline 0.8.0

  • 104 total downloads
  • Last upload: 8 years and 2 months ago

Installers

  • linux-64 v0.8.0

conda install

To install this package run one of the following:
conda install bird::singlecellpipeline

Description

Description

This pipeline aims to provide unsupervised data analysis from Singlecell RNAseq data. The fastq files are aligned on a reference genome with Tophat2 and count with htseqcount. After filtering, they are normalized with DeSeq2 and transformed with Vst. A quality control is provided with fastqc. Finally, the unsupervised analysis is done with WGCNA.

Prerequisites

  • The computing grid is expected to run on a beegfs partition (or at least a multi-thread capable partition)
  • Miniconda3 is a necessity

Input data

  • The fastq(.gz) files need to be gathered in a directory. The pathway to this directory will be specified in the config.json.
  • A conditionSheet.csv file is also expected in this directory. It gathers the technical and the functional information about the samples to be analysed.
    • The first column is expected to be the functional names of the samples
    • The second column is expected to be the technical names of the samples (without the fastq(.gz) extension)
    • The remaining columns provide technical and functional information about the samples
  • Exemple of a conditionSheet.csv file :

~~~ Samplename,SeqID,samplenumber,Singlecell,libraryprep,Plate,sequencingrun,Familly,CultureBatch,Embryoscore,Embryonumber,TCmedia,Sampletype L019H01,LD96sH1,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,L019 WA09H10,LD96sH10,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,WA09 WA09H11,LD96sH11,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,WA09 WA09H12,LD96s_H12,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,WA09 ~~~

  • A csv file which contains genes of interest (optional)
    • The first column is expected to be the name of the genes
    • The second column is expected to be the gene group
  • Exemple of a specificMarkers.csv file :

~~~ GENE,Lineage CDX2,"TE" CLDN10,"TE" DAB2,"TE" ~~~

Installation of the virtual environments

~~~ conda create -n myVirtualEnvironment singlecellpipeline -c bird -c conda-forge -c bioconda -c r conda info --env # To get the path of the directory of myVirtualEnvironment : myCondaPath cd myCondaPath/singlecellpipeline/ conda env create -f virtualEnvs/TophatEnv.yml ~~~

config.json

~~~ |-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| FASTQPATH: | directory of the fastq data |
| | fastq files will be moved toward the subdirectory fastqSE/ for Single-End or the subdirectory fastqPE/ for Paired-End |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| FASTQ
TYPE: | "singleEnd" for single-end fastq files and "pairedEnd" for paired-end fastq files |
| | for paired-end files, the expected name pattern is fastqName.R1.fastq(.gz) and fastqName.R2.fastq(.gz) |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| GENEREFERENCE: | pathway of the gtf file for the analysis |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| BOWTIE
INDEX: | pathway of the bowtie fasta reference for the analysis (ex: "/mnt/beegfs/ylelievre/singlecell/index-bowtie-2.2.4/humang1kv37") |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TOPHATCPU: | number of thread used by TopHat2 |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMING
GENEMIN: | for a valid sample, the minimum number of genes |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMING
Q30MIN: | for a valid sample, the minimum Q30 |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMING
COUNTSMIN: | for a valid gene, the minimum number of reads |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMING
SAMPLESMIN: | for a valid gene, the minimum number of samples that contain at least one read |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| ISA2
SEEDS: | isa2 parameter, it corresponds to the number of origin for the research of biclusters |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| ISA2ROWSTRINGENCY: | isa2 parameter, it corresponds to an arbitrary stringency value for the correlation between samples |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| ISA2COLUMNSTRINGENCY: | isa2 parameter, it corresponds to an arbitrary stringency value for the correlation between genes |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| WGCNAPOWER: | WGCNA parameter, it corresponds to the soft power (optional: calculated automatically if not provided) |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| WGCNA
SPECIES: | WGCNA parameter, it corresponds to the 2-letter species abbreviation. org.XX.eg.db R annotation package must be installed |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| WGCNA_MARKERS: | WGCNA parameter, it corresponds to the file which contains genes of interest (optional). | | | format: csv, header, 1st col = genes, 2nd col = gene group (ex: which tissueis specific of the gene) |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
~~~

Execution of the pipeline

~~~ source activate myVirtualEnvironment snakemake -p --latency-wait 60 --cluster "qsub -o ./logs/ -e ./logs/" --jobs 100 --jobscript singlecell.sh ~~~

Output data

  • fastQC : this directory contains the results obtain with FastQC on the fastq(.gz) files
  • BAM : this directory contains the resulting bam files obtained with Tophat2 on the fastq(.gz) files
  • counts : this directory contains the resulting counts files obtained with htseqcount on the bam files
  • QC : this directory contains the result of the quality control
  • analysis : this directory contains the results from the different analysis of the data
    • the quality control overviews
    • the PCA analysis
    • the bicluster
    • WGCNA
  • tables : this directory contains the raw counts table, the trimmed counts table with its corresponding trimmed condition sheet, the normalized counts table with DESeq2 and the transformed counts table with VST

© 2025 Anaconda, Inc. All Rights Reserved. (v4.2.2) Legal | Privacy Policy