Workflow
Variants where called following the GATK best practices workflow:
Reads were mapped onto GRCh38.86 with BWA mem, and both optical and PCR duplicates were removed with Picard, followed by base recalibration with GATK.
The GATK HaplotypeCaller was used to call variants per-sample, including summarized evidence for non-variant sites (GVCF approach).
Then, GATK genotyping was done in a joint way over GVCF files of all samples.
Genotyped variants were filtered using hard thresholds.
For SNVs, the criterion QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 was used, for Indels the criterion QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0 was used.
Finally, SnpEff was used to predict and report variant effects.
In addition, quality control was performed with FastQC, Samtools, and Picard and aggregated into an interactive report via MultiQC.
Detailed software versions can be found under Rules.
Results
Calls
Filtered and annotated variant calls as gzipped VCF file.
Variants that do not pass filters are kept, but marked with a value other than PASS in ther FILTER column.
Filtered and annotated variant calls as gzipped tab separated table (TSV).
All variants that do not pass filters have been removed.
Plots
allele-freqs.svg Download
Per variant per sample allele frequency, i.e., m ⁄ n where m is the number of reads supporting the variant allele and n is the total number of reads over the variant allele in that sample.
Read depth distribution over variant alleles of each sample.
Quality control
Quality controls aggregated into an interactive report via MultiQC.
Statistics
If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue.
Rules
Rule |
Jobs |
Output |
Singularity |
Conda environment |
vcf_to_tsv |
1 |
|
|
- rust-bio-tools =0.2.6
- bcftools =1.8
|
plot_stats |
1 |
- plots/depths.svg
- plots/allele-freqs.svg
|
|
- python =3.6
- matplotlib =2.2
- pandas =0.23
- seaborn =0.8
|
snpeff |
1 |
- annotated/all.vcf.gz
- snpeff/all.csv
|
|
|
multiqc |
1 |
|
|
- multiqc ==1.2
- networkx <2.0
|
merge_calls |
1 |
|
|
|
mark_duplicates |
3 |
- dedup/B-2.bam
- qc/dedup/B-2.metrics.txt
- dedup/B-1.bam
- qc/dedup/B-1.metrics.txt
- dedup/A-1.bam
- qc/dedup/A-1.metrics.txt
|
|
|
fastqc |
3 |
- qc/fastqc/B-2.html
- qc/fastqc/B-2.zip
- qc/fastqc/B-1.html
- qc/fastqc/B-1.zip
- qc/fastqc/A-1.html
- qc/fastqc/A-1.zip
|
|
|
samtools_stats |
3 |
- qc/samtools-stats/B-2.txt
- qc/samtools-stats/B-1.txt
- qc/samtools-stats/A-1.txt
|
|
|
hard_filter_calls |
2 |
- filtered/all.indels.hardfiltered.vcf.gz
- filtered/all.snvs.hardfiltered.vcf.gz
|
|
|
map_reads |
3 |
- mapped/B-2.sorted.bam
- mapped/B-1.sorted.bam
- mapped/A-1.sorted.bam
|
|
- bwa ==0.7.15
- samtools ==1.5
- picard ==2.9.2
|
recalibrate_base_qualities |
3 |
- recal/B-2.bam
- recal/B-1.bam
- recal/A-1.bam
|
|
|
select_calls |
2 |
- filtered/all.indels.vcf.gz
- filtered/all.snvs.vcf.gz
|
|
|
trim_reads_se |
1 |
- trimmed/B-2.fastq.gz
- trimmed/B-2.qc.txt
|
|
|
trim_reads_pe |
2 |
- trimmed/B-1.1.fastq.gz
- trimmed/B-1.2.fastq.gz
- trimmed/B-1.qc.txt
- trimmed/A-1.1.fastq.gz
- trimmed/A-1.2.fastq.gz
- trimmed/A-1.qc.txt
|
|
|
merge_variants |
1 |
|
|
|
genotype_variants |
1 |
|
|
|
combine_calls |
1 |
|
|
|
call_variants |
2 |
- called/B.21.g.vcf.gz
- called/A.21.g.vcf.gz
|
|
|