Overview¶
This chapter introduces the features, operational options, and installation requirements of the data analysis software from Real Time Genomics.
Introduction¶
RTG software enables the development of fast, efficient software pipelines for deep genomic analysis. RTG is built on innovative search technologies and new algorithms designed for processing high volumes of high-throughput sequencing data from different sequencing technology platforms. The RTG sequence search and alignment functions enable read mapping and protein searches with a unique combination of sensitivity and speed.
RTG-based data production pipelines support unprecedented breadth and depth of analysis on genomic data, transforming researcher visibility into DNA sequence analysis and biological investigation. A comprehensive suite of easy-to-integrate data analysis functions increases the productivity of bioinformatics specialists, freeing them to develop analytical solutions that amplify the investigative ability unique to their organization.
RTG software supports a variety of research and medical genomics applications, such as:
Medical Genomic Research – Compare sequence variants and structural variation between normal and disease genomes, or over a disease progression in the same individual to identity causal loci.
Personalized Medicine – Establish reliable, high-throughput processing pipelines that analyze individual human genomes compared to one or more reference genomes. Use RTG software for detection of sequence variants (SNP and indel calling, intersection scripting), as well as structural variation (coverage depth, and copy number variation).
Model Organisms and Basic Research – Utilize RTG mapping and variant detection commands for focused research applications such as metagenomic species identification and frequency, and metabolic pathway analysis. Map microbial communities to generate gapped alignments of both DNA and protein sequence data.
Plant Genomics – Enable investigations of new crop species and variant detection in genetically diverse strains by leveraging RTG’s highly sensitive sequence search capabilities for strain and cross-species mapping applications. Flexible sensitivity tuning controls allow investigators to accommodate very high error rates associated with unique combinations of sequencing system error, genome-specific mutation, and aggressive cross-species comparisons.
RTG software description¶
RTG software is delivered as a single executable with multiple commands executed through a command line interface (CLI). Commands are delivered in product packages, and for commercial users each command can be independently enabled through a license key.
Usage:
rtg COMMAND [OPTIONS] <REQUIRED>
RTG software delivers features in four areas:
Sequence Search and Alignment – RTG software uses patented sequence search technology for the rapid production of genomic sequence data. The
map
command implements read mapping and gapped alignment of sequence data against a reference. Themapx
command searches translated sequence data against a protein database.Data Analysis – RTG software supports two pipelines for data analysis - variant detection and metagenomics. Purpose-built variant detection pipeline functions include several commands to identify small sequence variants, a
cnv
command to report copy number variation statistics for structural variation, and acoverage
command to report read depth across a reference.Reporting Options – Standard result formats and utility commands report results for validation, and ease development of custom scripts for analysis. Scripts that produce publication quality graphics for visualization of data analysis results are available through Real Time Genomics technical support.
Data Center Deployment – RTG software supports typical data center standards for enterprise deployment. RTG provides automated installation and supports industry standard operating environments and data processing systems to help maintain total cost of ownership objectives in enterprise data centers. The RTG software can be run in compute clusters of varying sizes, and commands take advantage of multi-core processors by default.
See also
For detailed information about RTG command syntax and usage of individual commands, refer to RTG Command Reference.
Sequence search and alignment¶
RTG software uses an edit-distance alignment score to determine best fit and alignment accuracy.
RTG software includes optimal sensitivity settings for error and mutation rates, plus command line controls and simulation tools that allow investigators to calibrate sensitivity settings for specific data sets. Extensive filtering and reporting options allow complete control over reported alignments, which leads to greater flexibility for downstream analysis functions.
Key functionality of RTG sequence search and alignment includes:
Read mapping by nucleotide sequence alignment to a reference genome
Protein database searching by translated nucleotide sequence searches against protein databases
Sensitivity tuning using parameter options for substitutions, indels, indel lengths, word or step sizes, and alignment scores
Filtering and reporting ambiguous reads that map to multiple locations
Benchmarking and optimization using simulation and evaluation commands
RTG mapping commands have the following characteristics:
Eliminates need for genome indexing
Aligns sequence reads of any length
Allows high mismatch levels for increased sensitivity in longer reads
Allows detection of short indels with single end (SE) or paired end (PE) data
Can optionally guarantee the mapping of reads with at least a specified number of substitutions and indels
Supports a wide range of alignment scores
See also
For detailed information about sequence search and alignment functionality, refer to Command Reference, map.
For more information about the RTG integrated software pipeline, refer to RTG product usage - baseline progressions
Data formatting¶
Prior to RTG data production, reference genome and sometimes read data sequence files are typically first converted to the RTG Sequence Data File (SDF) format. This is an efficient storage format optimized for fast retrieval during data processing.
The RTG format
/ cg2sdf
commands converts sequencing system read and
reference genome sequence data into the SDF format. The format
command
accepts source data in standard file formats (such as FASTA / FASTQ /
SAM / BAM) and maintains the integrity and consistency of the source
data during the conversion to SDF. Similarly, the cg2sdf
command
accepts data in the custom data format used for read data by Complete
Genomics, Inc. Read data may be single-end and paired-end reads of fixed
or variable length. Sequence data can be formatted as nucleotide or
protein.
An SDF is a directory containing a series of files that delineate sequence and quality score information stored in a binary format, along with metadata that describes the original sequencing system data format type:
03/19/2010 12:31 PM <DIR> .
03/19/2010 12:31 PM <DIR> ..
03/19/2010 12:31 PM 5,038 log
03/19/2010 12:31 PM 24,223 mainIndex
03/19/2010 12:31 PM 75 namedata0
03/19/2010 12:31 PM 8 nameIndex0
03/19/2010 12:31 PM 56 namepointer0
03/19/2010 12:31 PM 23,267,177 seqdata0
03/19/2010 12:31 PM 56 seqpointer0
03/19/2010 12:31 PM 8 sequenceIndex0
8 File(s) 23,296,641 bytes
2 Dir(s) 400,984,870,912 bytes free
See also
For detailed information about formatting sequencing system reads to RTG SDF, refer to Data Formatting Commands
Read mapping and alignment¶
The map
command implements read mapping and alignment of sequence data
against a reference genome, supporting gapped alignments for both single
and paired-end reads. The cgmap
command performs the same function for
the gapped, paired-end read data from Complete Genomics, Inc.
A summary of the mapping results is displayed at the command line following execution of the map command, as shown in the paired-end example below:
ARM MAPPINGS
left right both
6650124 6650124 13300248 64.2% mated uniquely (NH = 1)
186812 186812 373624 1.8% mated ambiguously (NH > 1)
1538777 1539520 3078297 14.9% unmated uniquely (NH = 1)
70667 70125 140792 0.7% unmated ambiguously (NH > 1)
0 0 0 0.0% unmapped due to read frequency (XC = B)
13624 13946 27570 0.1% unmapped with no matings but too many hits (XC = C)
109720 109765 219485 1.1% unmapped with poor matings (XC = d)
984 1003 1987 0.0% unmapped with too many matings (XC = e)
212158 211688 423846 2.0% unmapped with no matings and poor hits (XC = D)
0 0 0 0.0% unmapped with no matings and too many good hits (XC = E)
1569609 1569492 3139101 15.2% unmapped with no hits
10352475 10352475 20704950 100.0% total
The following display shows the summary output for single end mapped data from the map command
READ MAPPINGS
875007 87.5% mapped uniquely (NH = 1)
25174 2.5% mapped ambiguously (NH > 1)
71 0.0% unmapped due to read frequency (XC = B)
88729 8.9% unmapped with too many hits (XC = C)
8940 0.9% unmapped with poor hits (XC = D)
0 0.0% unmapped with too many good hits (XC = E)
2079 0.2% unmapped with no hits
1000000 100.0% total
Read mapping commands also produce HTML summary reports containing more information about mapping results.
Read mapping output files¶
The map command creates alignment reports in BAM file format and a
summary report file named summary.txt
. There is also a file called
progress
that can be used to monitor overall progress during a run,
and a file named map.log
containing technical information that may be
useful for debugging. Alignment reports may be filtered by alignment
score, and/or unmapped, unmated, and ambiguous reads (those that map to
multiple locations).
When mapping, the output BAM file is named alignments.bam
. The reads
that did not align to the reference will include XC
attributes in the
BAM file that describe why a read did not map.
See also
For more information about the RTG map
command, refer
to Command Reference, map.
For details on RTG extensions to the BAM file format, refer to SAM/BAM file extensions (RTG map command output)
Read mapping sensitivity tuning¶
The RTG map command uses default sensitivity settings that balance mapping percentage and speed requirements. These settings deliver excellent results in most cases, especially in human read sequence data from Illumina runs with error rates of 2% or less.
However, some experiments demand read mapping that accommodates higher machine error, genome mutation, or cross-species comparison. For these situations, the investigator can set various tuning parameters to increase the mapping percentage.
For reads shorter than 64 bp, RTG allows an investigator to select the
number of substitutions and indels that the map
command will “at
least” produce. For example, using the -a
parameter to specify the
number of allowed substitutions (i.e., mismatches) at 1, will guarantee
that the map
command finds all alignments with at least 1
substitution.
For reads equal to or longer than 64 base pairs, RTG allows an
investigator to modify word and step size parameters related to the
index. These parameters are set by default to 18 or half the read
length, whichever is smaller. Decreasing the values (using -w
for word
size and -s
for step size) will increase the percentage of mapped
reads at the expense of additional processing time, and in the case of
step size, increased memory usage.
The number of mismatches threshold can be altered to increase or
decrease the number of mapped reads. Using the --max-mated-mismatches
parameter for example, an investigator might limit reported alignments
to only those at or lower than the given threshold value.
See also
For more information about the RTG map
command’s
sensitivity and tuning parameters, refer to Command Reference, map
Protein search¶
The mapx
command implements a search of translated nucleotide
sequence data against one or more protein databases, with alignment
sensitivity adjusted for gaps and mismatches. The mapx
command
accepts reads formatted as nucleotide data and a reference database
formatted as protein data.
Similarly, the mapp
command implements search of untranslated
protein sequences against one or more protein databases. The input to
mapp
is FASTA formatted protein sequences.
With mapx
and mapp
, an investigator can sort and classify
knowns, and identify homologs and novels.
In a two-step process, queries that have one or more exact matches of an k-mer against the database during the matching phase are then aligned to the subject sequence with a full edit-distance calculation using the BLOSUM62 scoring matrix.
The mapx
and mapp
commands output the statistical significance
of matches based on semi-global alignments (globally across
query). Reported search results may be modified by a combination of one
or more thresholds on % identity, E value, bit score and alignment
score. The output results file is similar in construct to that reported
by BLASTX.
See also
For more information about the RTG protein mapping commands please refer to Command Reference, mapx and Command Reference, mapp
Protein search output files¶
The mapx
and mapp
commands write search results and a summary
file in a directory specified by the -o
parameter at the command
line. The summary file is named summary.txt
. There is also a file
called progress
that can be used to monitor overall progress during
a run, and a log file containing technical information that may be
useful for debugging.
The protein search results are written to a file named
alignments.tsv.gz
. Each record in this results file, representing a
valid search result, is written as tab-separated fields on a single
line. The output fields are very similar to those reported by BLASTX.
See also
For detailed information about the RTG mapx
and
mapp
command results file format refer to Mapx and mapp output file description
Protein search sensitivity tuning¶
The RTG mapx
command builds a set of indexes from the translated reads
and scans each query for matches according to user-specified sensitivity
settings. Sensitivity is set with two parameters. The word size (-w
or
--word
) parameter specifies match length. The mismatches (-a
or
--mismatches
) parameter specifies the number and placement of k-mers
across each translated query.
The alignment score threshold can be altered to increase or decrease the
number of mapped reads. Using the --max-alignment-score
parameter
for example, an investigator might limit reported alignments to only
those at or lower than the given threshold value.
See also
For more information about the RTG mapx
command’s
sensitivity and tuning parameters, refer to Mapx and mapp output file description
Benchmarking and optimization utilities¶
RTG benchmarking and optimization utilities consist of simulators that generate read and reference genome sequence data, and evaluators that verify the accuracy of sequence search and data analysis functions. Investigators will use these utility commands to evaluate the use of RTG software in various read mapping and data analysis scenarios.
RTG provides several simulators:
genomesim
Thegenomesim
command generates a reference genome with one or more segments of varying length and a percentage mix of nucleotide values. Use the command to create simulated genomes for benchmarking and evaluation.readsim
/cgsim
Thereadsim
/cgsim
commands generate synthetic read sequence data from an input reference genome, introducing errors at a specified rate. Use the commands to create simulated read sets for benchmarking and evaluation.popsim
,samplesim
,childsim
,samplereplay
,denovosim
These variant simulation commands are used to create mutated genomes from a known reference by adding variants. Use these commands to verify accuracy of variant detection analysis software for a particular experiment using different pipeline settings.
Simulated data that is produced in SDF format can be converted into
FASTA and FASTQ format sequence files for use with other tools using the
sdf2fasta
and sdf2fastq
commands respectively.
See also
For more information about the RTG simulation commands, refer to Simulation Commands. Advice is available to ensure best results. Please contact RTG technical support for assistance.
Variant detection functions¶
The RTG variant detection pipeline includes commands for both sequence
and structural variation detection: snp
, family
,
population
, somatic
, tumoronly
, cnv
and
coverage
. The types of data available for analysis from the RTG
software pipeline include: Bayesian sequence variant calling
(snps.vcf
), structural variation analysis (cnv.ratio
) and
alignment coverage depth (coverage.bed
).
Sequence variation (SNPs, indels and complex variants)¶
The snp
command uses a Bayesian probability model to identify and locate
single and multiple nucleotide polymorphisms (SNPs and MNPs), indels,
and complex sequence variants. The command uses standard BAM format
files as input and reports computed posterior scores, base calls,
mapping quality, coverage depth, and supporting statistics for all
positions and for all variants. The snp
command may be instructed to
run in either haploid or diploid calling mode, and can perform sex-aware
calling to automatically switch between haploid and diploid calling
according to sex chromosomes specified for your reference species.
The snp
command calls single nucleotide polymorphisms (SNPs), multiple
nucleotide polymorphisms (MNPs), and complex regions from the sorted
chromosome-ordered gapped alignment (BAM) files. The snp
command makes
consensus SNP and MNP calls on a diploid organism at every position
(homozygous, heterozygous, and equal) in the reference, and calls indels
and complex variants of 1-50 bp (depending on input alignments).
At each position in the reference, a base pair determination is made based on statistical analysis of the accumulated read alignments, in accordance with any priors and quality scores. The resulting predictions and accompanying statistics are reported in industry standard VCF format.
The snps.vcf
output file reports all the called variants.
The location and type of the call, the base pairs (reference and
called), and a confidence score are present in the snps.vcf
output file. Additional ancillary statistics in the output describe read
alignment evidence that can be used to further evaluate confidence in the variant.
Results may be filtered (post variant calling) by posterior
scores, coverage depth, or indels, and filtered report results may be
integrated with the SNP calls themselves.
See also
For more information about the SNP output data, refer to
Command Reference, map, Command Reference, snp for syntax, parameters,
and usage of the map
and snp
commands.
Sequence variation with Mendelian pedigree¶
The family
command uses Bayesian analysis and the constraints of
Mendelian inheritance to identify single and multiple nucleotide
variants in each member of a family group. It will usually yield a
better result than running the snp
command on each individual because
the Mendelian constraints help eliminate erroneous calls.
Family calling is restricted to families comprising a mother, father, and one or more sons and daughters. Family members are identified on the command line by sample names matching those used in the input BAM files. The family caller internally assigns the SAM records to the correct family member based on SAM read group information. If available, it automatically makes use of coverage and quality calibration information computed during mapping. It automatically selects the correct haploid/diploid calling depending on the sex of each individual.
The output is a multi-sample VCF file containing a call for each family
member whenever any one of the family differs from the reference genome.
Each sample reports a computed posterior, base call, and ancillary
statistics as per the snp
command. In addition, there is an overall
posterior representing the joint likelihood of the call across all the
samples. As with the other variant detection commands, the VCF output
includes a filter column containing markers for high-coverage,
high-ambiguity, and equivalent calls. It is not guaranteed that the
resulting calls will always be Mendelian across the entire family, as
de novo mutations are also identified and are automatically annotated
in the output VCF.
The population
command extends calling to multiple samples, which
may or may not be related according to a supplied pedigree. Mendelian
constraints are employed where appropriate, and in cases where many
unrelated samples are being called, an iterative
expectation-maximization algorithm updates Bayesian priors to give
improved accuracy compared to calling samples individually with the
snp
command.
Somatic sequence variation¶
The somatic
command uses Bayesian analysis to identify putative cancer
explanations in a tumor sample. As with the snp command, it can
identify SNPs, MNPs, indels, and complex sequence variants. It operates
on two samples, an original sample (assumed to be non-cancerous) and a
derived cancerous sample. The derived sample may be a mixture of
non-cancerous and cancerous sequence data. The samples are provided to
the somatic command in the form of BAM format files with appropriate
sample names selected via the read group mechanism.
The somatic caller produces a VCF file detailing putative cancer explanations consisting of computed posterior scores, base calls, and ancillary statistics for both input samples. The somatic caller handles both haploid and diploid sequences and is sex aware. If available, it automatically makes use of coverage and quality calibration information computed during mapping.
By default the snps.vcf
output file gives each variant called where
the original and derived sample differ, together with a confidence. The
file is sorted by genomic position. The same statistics reported by the
snp
command per VCF record are listed for both samples. The filter
column contains markers for situations of high-coverage, high-ambiguity,
and equivalent calls. This column can be used to discard unwanted
results in subsequent processing.
Coverage analysis¶
The coverage
command reports read depth across a reference genome with
smoothing options, and outputs the results in the industry standard BED
format. This can used to view histograms of mapped coverage data and gap
length distributions.
Use the coverage
command as a tool to analyze mapping results and
determine how much of the genome is covered with mapping alignments, and
how many times the same location has been mapped.
Customizable scripts are available for enabling graphical plotting of
the coverage results using gnuplot
.
See also
For more information about the RTG coverage analysis, refer to Command reference, coverage
Copy number variation (CNV) analysis¶
The cnv
command identifies and reports copy number statistics that can
be used for the investigation of structural variation.
It is used to identify aberrational CNV region(s) or copy number
variations in a mapped read. The RTG cnv
command identifies and
reports the copy number variation ratio between two genomes.
The results of CNV detection are output to a BED file format.
Customizable scripts are available for enabling graphical plotting of
the CNV results using tools such as gnuplot
.
See also
For more information about the CNV output data, refer to Command Reference, cnv
Standard input and output file formats¶
RTG software produces alignment and data analysis results in standard formats to allow pipeline validation and downstream analysis.
Table : Result file formats for validation and downstream analysis
File type |
Description and Usage |
---|---|
BAM, SAM |
The RTG |
TXT |
Many RTG commands output summary statistics as ASCII text files. |
TSV |
Many RTG commands output results in tab separated ASCII text files. These files can typically be loaded directly into a spreadsheet viewing program like Microsoft Excel or Open Office. |
BED |
Some RTG commands output results in standard BED formats for further analysis and reporting. |
PED |
Some RTG commands utilize standard PED format text files for supplying sample pedigree and sex information. |
VCF |
The |
See also
For more information about file format extensions, refer to Appendix RTG output results file descriptions
SAM/BAM files created by the RTG map command¶
The Sequence Alignment/Map (SAM/BAM) format (version 1.3) is a well-known standard for listing read alignments against reference sequences.
SAM records list the mapping and alignment data for each read, ordered by chromosome (or other DNA reference sequence) location.
A sample RTG SAM file is shown in the Appendix. It describes the
relationship between a read and a reference sequence, including
mismatches, insertions and deletions (indels) as determined by the RTG
map
aligner.
Note
RTG mapped alignments are stored in BAM format with RTG
read IDs by default. This default can be overridden using the
--read-names
flag or changed after processing using the RTG
samrename
utility to label the reads with the sequence identifiers
from the original source file. For more information, refer to the SAM
1.3 nomenclature and symbols online at:
https://samtools.github.io/hts-specs/SAMv1.pdf
RTG has defined several extensions within the standard SAM/BAM format; be sure to review the SAM/BAM format information in SAM/BAM file extensions (RTG map command output) of the Appendix to this guide for a complete list of all extensions and differences.
By default the RTG map
command produces output as compressed binary
format BAM files but can be set to produce human readable SAM files
instead with the --sam
flag.
Variant caller output files¶
The Variant Call Format (VCF) is a widely used standard format for storing SNPs, MNPs and indels.
A sample snps.vcf
file is provided in the Appendix as an example of
the output produced by an RTG variant calling run. Each line in a
snps.vcf
output has tab-separated fields and represents a SNP
variation calculated from the mapped reads against the reference genome.
Note
RTG variant calls are stored in VCF format (version 4.2). For more information about the VCF format, refer to the specification online at: https://samtools.github.io/hts-specs/VCFv4.2.pdf
RTG employs several extensions within the standard VCF format; be sure to review the VCF format information in Small-variant VCF output file description of the Appendix to this guide for a complete list of all extensions and differences.
See also
For more information about file formats, refer to the Appendix, RTG output results file descriptions
Metagenomic analysis functions¶
The RTG metagenomic analysis pipeline includes commands for sample contamination filtering, estimation of taxon abundances in a sample and finding relationships between samples.
Contamination filtering¶
The mapf
command is used for filtering contaminant reads from a
sample. It does this by performing alignment of the reads against a
reference of known contaminants and producing an output of the reads
that did not align successfully. A common use for this is to remove
human DNA from a bacterial sample taken from a body site.
Taxon abundance breakdown¶
The species
command is used to find the abundances of taxa within a
given sample. This is accomplished by analyzing reference genome
alignment data made with a metagenomic reference database of known
organisms. It produces output in which each taxon is given a fraction
representing its abundance in the sample with upper and lower bounds and
a value indicating the confidence that the taxon is actually present in
the sample. An HTML report allows interactive examination of the
abundances at different taxonomic levels.
Sample relationships¶
The similarity
command is used to find relationships between sample
read sets. It does this by examining k-mer word frequencies and the
intersections between sets of reads. This results in the output of a
similarity matrix, a principal component analysis and nearest neighbor
trees in the Newick and phyloXML formats.
Functional protein analysis¶
The mapx
command is used to perform a translated nucleotide search of
short reads against a reference protein database. This results in an
output similar to that reported by BLASTX.
Pipelines¶
Included in the RTG release are some pipeline commands which perform simple end-to-end tasks using other RTG commands. These pipelines use mostly default settings for each of the commands called, and are meant as a guideline to building more complex end-to-end pipelines using our tools. The metagenomic pipeline commands are:
species composition (
composition-meta-pipeline
)functional protein analysis (
functional-meta-pipeline
)species composition and functional protein analysis (
composition-functional-meta-pipeline
).
For detailed information about individual pipeline commands see Pipeline Commands
Parallel processing¶
The comparison of genomic variation on a large scale in real time demands parallel processing capability. Parallel processing of gapped alignments and variant detection is recommended by RTG because it significantly reduces wall clock time.
RTG software includes key features that make it easier for a person to
prepare a job for parallel processing. First, RTG mapping commands can
be performed on a subset of a large file or set of files either by using
the --start-read
and --end-read
parameters or for commands that do
not support this, by using sdfsplit
to break a large SDF into smaller
pieces. Second, the data analysis commands accept multiple alignment
files as input from the command. Third, many RTG commands take a
--region
or --bed-regions
parameter to allow breaking up tasks into
pieces across the reference genome.
See also
See RTG Command Reference for command-specific details, Administration & Capacity Planning for detailed information about estimating the number of multi-core servers needed (capacity planning), and Parallel processing approach for a deeper discussion of compute cluster operations.
Installation and deployment¶
RTG is a self-contained tool that sets minimal expectations on the environment in which it is placed. It comes with the application components it needs to execute completely, yet performance can be enhanced with some simple modifications to the deployment configuration. This section provides guidelines for installing and creating an optimal configuration, starting from a typical recommended system.
RTG software pipeline runs in a wide range of computing environments from dual-core processor laptops to compute clusters with racks of dual processor quad core server nodes. However, internal human genome analysis benchmarks suggest the use of six server nodes of the configuration shown in below.
Table : Recommended system requirements
Processor |
Intel Core i7-2600 |
Memory |
48 GB RAM DDR3 |
Disk |
5 TB, 7200 RPM (prefer SAS disk) |
RTG Software can be run as a Java JAR file, but platform specific wrapper scripts are supplied to provide improved pipeline ergonomics. Instructions for a quick start installation are provided here.
For further information about setting up per-machine configuration
files, please see the README.txt
contained in the distribution zip
file (a copy is also included in this manual’s appendix).
Quick start instructions¶
These instructions are intended for an individual to install and operate the RTG software without the need to establish root / administrator privileges.
RTG software is delivered in a compressed zip file, such as:
rtg-core-3.3.zip
. Unzip this file to begin installation.
Linux and Windows distributions include a Java Virtual Machine (JVM) version 1.8 that has undergone quality assurance testing. RTG may be used on other operating systems for which a JVM version 1.8 or higher is available, such as MacOS X or Solaris, by using the ‘no-jre’ distribution.
RTG for Java is delivered as a Java application accessed via executable
wrapper script (rtg
on UNIX systems, rtg.bat
on Windows) that allows
a user to customize initial memory allocation and other configuration
options. It is recommended that these wrapper scripts be used rather
than directly executing the Java JAR.
Here are platform-specific instructions for RTG deployment.
Linux/MacOS X:
Unzip the RTG distribution to the desired location.
If your installation requires a license file (
rtg-license.txt
), copy the license file provided by Real Time Genomics into the RTG distribution directory.- In a terminal, cd to the installation directory and test for success
by entering
./rtg version
On MacOS X, depending on your operating system version and configuration regarding unsigned applications, you may encounter the error message:
-bash: rtg: /usr/bin/env: bad interpreter: Operation not permitted
If this occurs, you must clear the OS X quarantine attribute with the command:
$ xattr -d com.apple.quarantine rtg
The first time rtg is executed you will be prompted with some questions to customize your installation. Follow the prompts.
Enter
./rtg help
for a list of rtg commands. Help for any individual command is available using the--help
flag, e.g.:./rtg format --help
By default, RTG software scripts establish a memory space of 90% of the available RAM - this is automatically calculated. One may override this limit in the
rtg.cfg
settings file or on a per-run basis by supplyingRTG_MEM
as an environment variable or as the first program argument, e.g.:./rtg RTG_MEM=48g map
[OPTIONAL] If you will be running RTG on multiple machines and would like to customize settings on a per-machine basis, copy
rtg.cfg
to/etc/rtg.cfg
, editing per-machine settings appropriately (requires root privileges). An alternative that does not require root privileges is to copyrtg.cfg
tortg.HOSTNAME.cfg
, editing per-machine settings appropriately, whereHOSTNAME
is the short host name output by the commandhostname -s
Windows:
Unzip the RTG distribution to the desired location.
If your installation requires a license, copy the license file provided by Real Time Genomics (
rtg-license.txt
) into the RTG distribution directory.Test for success by entering
rtg version
at the command line. The first time RTG is executed you will be prompted with some questions to customize your installation. Follow the prompts.Enter
rtg help
for a list of rtg commands. Help for any individual command is available using the--help
flag, e.g.:./rtg format --help
By default, RTG software scripts establish a memory space of 90% of the available RAM - this is automatically calculated. One may override this limit by setting the
RTG_MEM
variable in thertg.bat
script or as an environment variable.
License Management¶
Commercial distributions of RTG products require the presence of a valid license key file for operation.
The license key file must be located in the same directory as the RTG executable. The license enables the execution of a particular command set for the purchased product(s) and features.
A license key allows flexible use of the RTG package on any node or CPU core.
To view the current license features at the command prompt, enter:
$ rtg license
See also
For more data center deployment and instructions for editing scripts, see Administration & Capacity Planning.
Technical assistance and support¶
For assistance with any technical or conceptual issue that may arise
during use of the RTG product, contact Real Time Genomics Technical
Support via email at support@realtimegenomics.com
In addition, a discussion group is available at: https://groups.google.com/a/realtimegenomics.com/forum/#!forum/rtg-users
A low-traffic announcements-only group is available at: https://groups.google.com/a/realtimegenomics.com/forum/#!forum/rtg-announce