This pipeline aims to provide unsupervised data analysis from Singlecell RNAseq data. The fastq files are aligned on a reference genome with Tophat2 and count with htseqcount. After filtering, they are normalized with DeSeq2 and transformed with Vst. A quality control is provided with fastqc. Finally, the unsupervised analysis is done with WGCNA.
~~~ Samplename,SeqID,samplenumber,Singlecell,libraryprep,Plate,sequencingrun,Familly,CultureBatch,Embryoscore,Embryonumber,TCmedia,Sampletype L019H01,LD96sH1,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,L019 WA09H10,LD96sH10,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,WA09 WA09H11,LD96sH11,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,WA09 WA09H12,LD96s_H12,,TRUE,1,2,1,NA,NA,NA,NA,TeSR,WA09 ~~~
~~~ GENE,Lineage CDX2,"TE" CLDN10,"TE" DAB2,"TE" ~~~
~~~ conda create -n myVirtualEnvironment singlecellpipeline -c bird -c conda-forge -c bioconda -c r conda info --env # To get the path of the directory of myVirtualEnvironment : myCondaPath cd myCondaPath/singlecellpipeline/ conda env create -f virtualEnvs/TophatEnv.yml ~~~
~~~
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| FASTQPATH: | directory of the fastq data |
| | fastq files will be moved toward the subdirectory fastqSE/ for Single-End or the subdirectory fastqPE/ for Paired-End |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| FASTQTYPE: | "singleEnd" for single-end fastq files and "pairedEnd" for paired-end fastq files |
| | for paired-end files, the expected name pattern is fastqName.R1.fastq(.gz) and fastqName.R2.fastq(.gz) |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| GENEREFERENCE: | pathway of the gtf file for the analysis |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| BOWTIEINDEX: | pathway of the bowtie fasta reference for the analysis (ex: "/mnt/beegfs/ylelievre/singlecell/index-bowtie-2.2.4/humang1kv37") |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TOPHATCPU: | number of thread used by TopHat2 |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMINGGENEMIN: | for a valid sample, the minimum number of genes |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMINGQ30MIN: | for a valid sample, the minimum Q30 |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMINGCOUNTSMIN: | for a valid gene, the minimum number of reads |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| TRIMMINGSAMPLESMIN: | for a valid gene, the minimum number of samples that contain at least one read |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| ISA2SEEDS: | isa2 parameter, it corresponds to the number of origin for the research of biclusters |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| ISA2ROWSTRINGENCY: | isa2 parameter, it corresponds to an arbitrary stringency value for the correlation between samples |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| ISA2COLUMNSTRINGENCY: | isa2 parameter, it corresponds to an arbitrary stringency value for the correlation between genes |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| WGCNAPOWER: | WGCNA parameter, it corresponds to the soft power (optional: calculated automatically if not provided) |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| WGCNASPECIES: | WGCNA parameter, it corresponds to the 2-letter species abbreviation. org.XX.eg.db R annotation package must be installed |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
| WGCNA_MARKERS: | WGCNA parameter, it corresponds to the file which contains genes of interest (optional). |
| | format: csv, header, 1st col = genes, 2nd col = gene group (ex: which tissueis specific of the gene) |
|-------------------------|------------------------------------------------------------------------------------------------------------------------------------|
~~~
~~~ source activate myVirtualEnvironment snakemake -p --latency-wait 60 --cluster "qsub -o ./logs/ -e ./logs/" --jobs 100 --jobscript singlecell.sh ~~~