Last updated: 2018-07-30
workflowr checks: (Click a bullet for more information) ✔ R Markdown file: up-to-date 
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
 ✔ Environment: empty 
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
 ✔ Seed: 
set.seed(12345) 
The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
 ✔ Session information: recorded 
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
 ✔ Repository version: 422a428 
wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    output/.DS_Store
Untracked files:
    Untracked:  data/18486.genecov.txt
    Untracked:  data/APApeaksYL.total.inbrain.bed
    Untracked:  data/Totalpeaks_filtered_clean.bed
    Untracked:  data/YL-SP-18486-T_S9_R1_001-genecov.txt
    Untracked:  data/bedgraph_peaks/
    Untracked:  data/bin200.5.T.nuccov.bed
    Untracked:  data/bin200.Anuccov.bed
    Untracked:  data/bin200.nuccov.bed
    Untracked:  data/clean_peaks/
    Untracked:  data/combined_reads_mapped_three_prime_seq.csv
    Untracked:  data/gene_cov/
    Untracked:  data/leafcutter/
    Untracked:  data/nuc6up/
    Untracked:  data/reads_mapped_three_prime_seq.csv
    Untracked:  data/smash.cov.results.bed
    Untracked:  data/smash.cov.results.csv
    Untracked:  data/smash.cov.results.txt
    Untracked:  data/smash_testregion/
    Untracked:  data/ssFC200.cov.bed
    Untracked:  output/picard/
    Untracked:  output/plots/
    Untracked:  output/qual.fig2.pdf
Unstaged changes:
    Modified:   analysis/cleanupdtseq.internalpriming.Rmd
    Modified:   analysis/dif.iso.usage.leafcutter.Rmd
    Modified:   analysis/explore.filters.Rmd
    Modified:   analysis/test.max2.Rmd
    Modified:   code/Snakefile
| File | Version | Author | Date | Message | 
|---|---|---|---|---|
| Rmd | 422a428 | Briana Mittleman | 2018-07-30 | add peak cove pipeline and combined lane qc | 
I need to create a processing pipeline that I can run each time I get more individuals that will do the following:
combine all total and nuclear libraries (as a bigwig/genome coverage)
call peaks with Yang’s script
filter peaks with Yang’s script
clean peaks
run feature counts on these peaks for all fo the individuals
I can do this step in my snakefile. First, I added the following to my environemnt.
I want to create bedgraph for each file. I will add a rule to my snakefile that does this and puts them in the bedgraph directory.
#add to directory
dir_bedgraph= dir_data + "bedgraph/"
#add to rule_all  
expand(dir_bedgraph + "{samples}.bg", samples=samples)
#rule
rule bedgraph: 
  input:
    bam = dir_sort + "{samples}-sort.bam"
  output: dir_bedgraph + "{samples}.bg"
  shell: "bedtools genomecov -ibam {input.bam} -bg -5 > {output}"I want to add more memory for this rule in the cluster.json
"bedgraph" :
    {
            "mem": 16000
    }I will use the bedgraphtobigwig tool.
#add to directory
dir_bigwig= dir_data + "bigwig/"
dir_sortbg= dir_data + "bedgraph_sort/"
#add to rule_all  
expand(dir_sortbg + "{samples}.sort.bg", samples=samples)
expand(dir_bigwig + "{samples}.bw", samples=samples)
rule sort_bg:
    input: dir_bedgraph + "{samples}.bg"
    output: dir_sortbg + "{samples}.sort.bg"
    shell: "sort -k1,1 -k2,2n {input} > {output}"
rule bg_to_bw:
    input: 
        bg=dir_sortbg + "{samples}.sort.bg"
        len= chrom_length 
    output: dir_bigwig + "{samples}.bw"
    shell: "bedGraphToBigWig {input.bg} {input.len} {output}""This next step will take all of the files in the bigwig directory and merge them. To do this I will create a script that creates a list of all of the files then uses this list in the merge script.
mergeBW.sh
#!/bin/bash
#SBATCH --job-name=mergeBW
#SBATCH --account=pi-yangili1
#SBATCH --time=24:00:00
#SBATCH --output=mergeBW.out
#SBATCH --error=mergeBW.err
#SBATCH --partition=broadwl
#SBATCH --mem=40G
#SBATCH --mail-type=END
module load Anaconda3
source activate three-prime-env
ls -d -1 $PWD/* /project2/gilad/briana/threeprimeseq/data/bigwig | tail -n +2 > /project2/gilad/briana/threeprimeseq/data/list_bw/list_of_bigwig.txt
bigWigMerge -inList /project2/gilad/briana/threeprimeseq/data/list_bw/list_of_bigwig.txt /project2/gilad/briana/threeprimeseq/data/mergedBW/merged_combined_YL-SP-threeprimeseq.bg
The result of this script will be a merged bedgraph of all of the files.
sessionInfo()R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     
loaded via a namespace (and not attached):
 [1] workflowr_1.1.1   Rcpp_0.12.18      digest_0.6.15    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] git2r_0.23.0      magrittr_1.5      evaluate_0.11    
[10] stringi_1.2.4     whisker_0.3-2     R.oo_1.22.0      
[13] R.utils_2.6.0     rmarkdown_1.10    tools_3.5.1      
[16] stringr_1.3.1     yaml_2.1.19       compiler_3.5.1   
[19] htmltools_0.3.6   knitr_1.20       
This reproducible R Markdown analysis was created with workflowr 1.1.1