Workflow

Variants where called following the GATK best practices workflow: Reads were mapped onto GRCh38.86 with BWA mem, and both optical and PCR duplicates were removed with Picard, followed by base recalibration with GATK. The GATK HaplotypeCaller was used to call variants per-sample, including summarized evidence for non-variant sites (GVCF approach). Then, GATK genotyping was done in a joint way over GVCF files of all samples. Genotyped variants were filtered using hard thresholds. For SNVs, the criterion QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 was used, for Indels the criterion QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0 was used. Finally, SnpEff was used to predict and report variant effects. In addition, quality control was performed with FastQC, Samtools, and Picard and aggregated into an interactive report via MultiQC.

Detailed software versions can be found under Rules.

Results

Calls

all.vcf.gz Download

Filtered and annotated variant calls as gzipped VCF file. Variants that do not pass filters are kept, but marked with a value other than PASS in ther FILTER column.

calls.tsv.gz Download

Filtered and annotated variant calls as gzipped tab separated table (TSV). All variants that do not pass filters have been removed.

Plots

allele-freqs.svg Download

Per variant per sample allele frequency, i.e., m ⁄ n where m is the number of reads supporting the variant allele and n is the total number of reads over the variant allele in that sample.

depths.svg Download

Read depth distribution over variant alleles of each sample.

Quality control

multiqc.html Download

Quality controls aggregated into an interactive report via MultiQC.

Statistics

If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue.

Rules

Rule Jobs Output Singularity Conda environment
vcf_to_tsv 1
  • tables/calls.tsv.gz
  • rust-bio-tools =0.2.6
  • bcftools =1.8
plot_stats 1
  • plots/depths.svg
  • plots/allele-freqs.svg
  • python =3.6
  • matplotlib =2.2
  • pandas =0.23
  • seaborn =0.8
snpeff 1
  • annotated/all.vcf.gz
  • snpeff/all.csv
  • snpeff ==4.3.1t
multiqc 1
  • qc/multiqc.html
  • multiqc ==1.2
  • networkx <2.0
merge_calls 1
  • filtered/all.vcf.gz
  • picard ==2.9.2
mark_duplicates 3
  • dedup/B-2.bam
  • qc/dedup/B-2.metrics.txt
  • dedup/B-1.bam
  • qc/dedup/B-1.metrics.txt
  • dedup/A-1.bam
  • qc/dedup/A-1.metrics.txt
  • picard ==2.9.2
fastqc 3
  • qc/fastqc/B-2.html
  • qc/fastqc/B-2.zip
  • qc/fastqc/B-1.html
  • qc/fastqc/B-1.zip
  • qc/fastqc/A-1.html
  • qc/fastqc/A-1.zip
  • fastqc ==0.11.7
samtools_stats 3
  • qc/samtools-stats/B-2.txt
  • qc/samtools-stats/B-1.txt
  • qc/samtools-stats/A-1.txt
  • samtools ==1.6
hard_filter_calls 2
  • filtered/all.indels.hardfiltered.vcf.gz
  • filtered/all.snvs.hardfiltered.vcf.gz
  • gatk4 ==4.0.5.1
map_reads 3
  • mapped/B-2.sorted.bam
  • mapped/B-1.sorted.bam
  • mapped/A-1.sorted.bam
  • bwa ==0.7.15
  • samtools ==1.5
  • picard ==2.9.2
recalibrate_base_qualities 3
  • recal/B-2.bam
  • recal/B-1.bam
  • recal/A-1.bam
  • gatk4 ==4.0.5.1
select_calls 2
  • filtered/all.indels.vcf.gz
  • filtered/all.snvs.vcf.gz
  • gatk4 ==4.0.5.1
trim_reads_se 1
  • trimmed/B-2.fastq.gz
  • trimmed/B-2.qc.txt
  • cutadapt ==1.13
trim_reads_pe 2
  • trimmed/B-1.1.fastq.gz
  • trimmed/B-1.2.fastq.gz
  • trimmed/B-1.qc.txt
  • trimmed/A-1.1.fastq.gz
  • trimmed/A-1.2.fastq.gz
  • trimmed/A-1.qc.txt
  • cutadapt ==1.13
merge_variants 1
  • genotyped/all.vcf.gz
  • picard ==2.9.2
genotype_variants 1
  • genotyped/all.21.vcf.gz
  • gatk4 ==4.0.5.1
combine_calls 1
  • called/all.21.g.vcf.gz
  • gatk4 ==4.0.5.1
call_variants 2
  • called/B.21.g.vcf.gz
  • called/A.21.g.vcf.gz
  • gatk4 ==4.0.5.1