Command Line Menus

ABITraceTCPeakCalculator
AggregatePlotter
AlignmentEndTrimmer
Alleler
AllelicExpressionDetector
AllelicMethylationDetector
AMD
BamIntensityJoiner
BamNMerIntensityParser
Bar2Gr
Bar2USeq
BaseClassifier
Bed2Bar
BedStats
BisSeq
BisSeqAggregatePlotter
BisSeqErrorAdder
BisStat
BisStatRegionMaker
CalculatePerCycleErrorRate
ChIPSeq
CHPCAligner
CompareIntersectingRegions
CompareParsedAlignments
ConcatinateFastas
CorrelatePointData
CountChromosomes
BisulfiteConvertFastas
CorrelationMaps
ConvertFastaA2G
ConvertFastqA2G
ConvertFasta2GCBoolean
ConvertFasta2GCBarGraph
DefinedRegionBisSeq
DefinedRegionDifferentialSeq
DefinedRegionRNAEditing
DefinedRegionScanSeqs
DRDSAnnotator
EnrichedRegionMaker
ElandMultiParser
ElandParser
ElandSequenceParser
ExportExons
ExportIntergenicRegions
ExportIntronicRegions
ExportTrimmedGenes
FetchGenomicSequences
FindNeighboringGenes
FindOverlappingGenes
FindSharedRegions
FileCrossFilter
FileMatchJoiner
FileJoiner
FileSplitter
FilterDuplicateAlignments
Graph2Bed
FilterIntersectingRegions
FilterPointData
GenerateOverlapStats
Gr2Bar
InosinePredict
IntersectLists
IntersectKeyWithRegions
IntersectRegions
KeggPathwayEnrichment
MaqSnps2Bed
MakeSpliceJunctionFasta
MakeTranscriptome
MaskExonsInFastaFiles
MaskRegionsInFastaFiles
MaxEntScanScore3
MaxEntScanScore5
MergeExonMetrics
MergePairedSamAlignments
MergePointData
MergeRegions
MergeUCSCGeneTable
MethylationArrayScanner
MethylationArrayDefinedRegionScanner
MicrosatelliteCounter
MiRNACorrelator
MultipleReplicaScanSeqs
MultiSampleVCFFilter
NovoalignBisulfiteParser
NovoalignIndelParser
NovoalignParser
NovoalignPairedParser
OligoTiler
OverdispersedRegionScanSeqs
ParseExonMetrics
ParseIntersectingAlignments
ParsePointDataContexts
PeakShiftFinder
PointDataManipulator
Primer3Wrapper
PrintSelectColumns
QCSeqs
Qseq2Fastq
RandomizeTextFile
RankedSetAnalysis
ReadCoverage
ReferenceMutator
RNAEditingPileUpParser
RNAEditingScanSeqs
RNASeq
RNASeqSimulator
Sam2Fastq
Sam2USeq
SamAlignmentExtractor
SamComparator
SamParser
SamTranscriptomeParser
SamFixer
SamReadDepthSubSampler
SamSVFilter
SamSubsampler
ScanSeqs
ShiftAnnotationPositions
SoapV1Parser
SubtractRegions
ScoreChromosomes
ScoreParsedBars
ScoreSequences
Sgr2Bar
Simulator
StrandedBisSeq
SRAProcessor
SubSamplePointData
Tag2Point
Text2USeq
TomatoFarmer
Telescriptor
UCSCBig2USeq
USeq2UCSCBig
USeq2Text
VCFAnnotator
VCFComparator
VCFReporter
VCFSpliceAnnotator
VCFTabix
Wig2Bar
Wig2USeq
ScoreMethylatedRegions
ScoreEnrichedRegions

**************************************************************************************
**                        ABI Trace TC Peak Calculator: July 2009                   **
**************************************************************************************
Uses a sliding window to estimate the mean T peak area, compares it to the observed
T area for a given T or C to estimate the fraction T.  Useful for calculating the 
fraction of converted Cs from bisufite treated DNA in a methylation experiment.

Required Parameters:
-f Full path file text for tab delimited text ABI trace file.
-w Window size in bp for estimating mean T peak areas, defaults to 16.
-c Print only T and C bases, defaults to all.

Example: java -jar pathTo/Apps/ABITraceTCPeakCalculator -f /MyBisSeqData/exp1.txt -c

**************************************************************************************

**************************************************************************************
**                            Aggregate Plotter:  August 2012                       **
**************************************************************************************
Fetches point data contained within each region, inverts - stranded annotation, zeros
the coordinates, sums, and window averages the values.  Usefull for generating
class averages from a list of annotated regions. Use a spreadsheet app to graph the
results.

Options:
-t PointData directories, full path, comma delimited. These should contain chromosome
       specific xxx.bar.zip files.
-b Bed file (chr, start, stop, text, score, strand(+/-/.), full path, containing
       regions to stack. Must be all the same size.
-p Peak shift, average distance between + and - strand peaks. Will be used to shift
       the PointData by 1/2 the peak shift, defaults to 0. 
-u Strand usage, defaults to 0 (combine), 1 (use only same strand), 2 (opposite
       strand), or 3 (ignore).
       this option to select particular stranded data to aggregate.
-r Replace scores with 1.
-d Delog2 scores. Do it if your data is in log2 space.
-v Convert each region scores to % of total.
-n Divide scores by the number of regions.
-s Scale all regions to a particular size. Defaults to max region size.
-a Average region scores instead of summing.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/AgregatePlotter -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -b /Anno/tssSites.bed -p 73 -u 1

**************************************************************************************

**************************************************************************************
**                            Alignment End Trimmer: April 2014                     **
**************************************************************************************
This application can be used to trim alignments according to the density of mismatches.
Each base of the alignment is compared to the reference sequence from the start of the
alignment to the end.  If the bases match, the score is increased by -m. If the bases
don't match, the score is decreased by -n.  The alignment position with the highest 
score is used as the new alignment end point. The cigar string, alignment position,
mpos and flags are all updated to reflect trimming. 

Notes:
1) Insertions, deletions and skips are currently not counted as matches or mismatches

Required:
-i Path to the orignal alignment, sam/bam/sam.gz OK.
-r Path to the reference sequence, gzipped OK.
-o Name of the trimmed alignment output.  Output is bam and bai.

Optional:
-m Score of match. Default 1
-n Score of mismatch. Default 2
-v Verbose output.  This will write out detailed information for every trimmed read.
    It is suggested to use this option only on small test files.
-l Min length.  If the trimmed length is less than this value, the read is switched
    to unaligned. Default 10bp
-e Turn on RNA Editing mode.  A>G (forward reads) and T>C (reverse reads) are considered
   matches.
-s Turn on mismatch scoring mode. Reads with more than -x mismatches are dropped. If 
   RNA Editing mode is on, A>G (forward reads) and T>C (reverse reads) are considered 
   matches.
-x Max number of mismatches allowed in max scoring mode. Default 0

Examples: 
1) java -Xmx4G -jar /path/to/AlignmentEndTrimmer -i 1000X1.bam -o 100X1.trim.bam
           -r /path/to/hg19.fasta
2) java -Xmx4G -jar /path/to/AlignmentEndTrimmer -i 1000X1.bam -o 100X1.trim.bam
           -r /path/to/hg19.fasta -m 0.5 -n 3
3) java -Xmx4G -jar /path/to/AlignmentEndTrimmer -i 1000X1.test.bam 
           -o 100X1.test.trim.bam -r /path/to/hg19.fasta -v
**************************************************************************************

**************************************************************************************
**                                Alleler:  Sept 2010                               **
**************************************************************************************
Intersects a list of alleles (SNPs and INDELs) with gene models and returns their
affects on coding sequences and splice-junctions. Assumes interbase coordinates. If
ambiguious bases (ie R,Y,S,W,K,M) are provided the non-reference base is assumed.

Options:
-a Full path file text for a table of alleles.
-e Print an example of an allele table.
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables
-g Full path directory text containing fasta files for reference base calling
      (e.g. chr1.fasta, chr5.fasta, ...).
-n Neighborhood to include in intergenic intersection, defaults to 1000
-d Print only non-synonymous and splice affector alleles, defaults to all.
-b Print results in bed format, defaults to detailed report.
-c Collapse multiple hits to the same gene producing the same variant.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/Alleler -a /APCSeq/apcFam7Alleles.txt
      -u /Anno/ucscKnownGenes.txt -g /Anno/Hg18Fastas/ -n 5000 -d -b

**************************************************************************************

**************************************************************************************
**                      Allelic Expression Detector:  August 2014                   **
**************************************************************************************
Beta!

Required Options:
-n Sample names to process, comma delimited, no spaces.
-b Directory containing coordinate sorted bam and index files named according to their
      sample name.
-d SNP data file containing all sample snp calls.
-r Results directory.
-s SNP map bed file from the ReferenceMutator app.

Default Options:
-g Minimum GenCall score, defaults to >= 0.2
-q Minimum alignment base quality at snp, defaults to 20
-c Minimum alignment read coverage, defaults to 4

Example: java -Xmx4G -jar pathTo/USeq/Apps/ beta! 

**************************************************************************************

**************************************************************************************
**                     Allelic Methylation Detector:  March 2014                    **
**************************************************************************************
AMD identifies regions displaying allelic methylation, e.g. ~50% average mCG
methylation yet individual read pairs show a bimodal fraction distribution of either
fully methylated or unmethylated. Beta.

Options:
-s Save directory.
-f Fasta file directory.
-t BAM file directory containing one or more xxx.bam file with their associated xxx.bai
       index. The BAM files should be sorted by coordinate and have passed Picard
       validation.
-a Minimum number alignments per region, defaults to 15.
-e Minimum number Cs in each alignment, defaults to 6
-m Minimum region fraction methylation, defaults to 0.4
-x Maximum region fraction methylation, defaults to 0.6
-r Full path to R, defaults to /usr/bin/R
-c Converted CG context PointData directories, full path, comma delimited. These 
       should contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. Use the ParsePointDataContexts on the output of the
       NovoalignBisulfiteParser to select CG contexts. 
-n Non-converted PointData directories, ditto. 
-b Provide a bed file (chr, start, stop,...), full path, to scan a list of regions
       instead of the genome.  See, http://genome.ucsc.edu/FAQ/FAQformat#format1

Example: java -Xmx4G -jar pathTo/USeq/Apps/ beta! 

**************************************************************************************

**************************************************************************************
**                     Allelic Methylation Detector:  September 2012                **
**************************************************************************************
AMD identifies regions displaying allelic methylation, e.g. ~50% average mCG
methylation yet individual read pairs show a bimodal fraction distribution of either
fully methylated or unmethylated.  

Options:
-s Save directory.
-f Fasta file directory.
-t BAM file directory containing one or more xxx.bam file with their associated xxx.bai
       index. The BAM files should be sorted by coordinate and have passed Picard
       validation.
-a Minimum number alignments per region, defaults to 15.
-e Minimum number Cs in each alignment, defaults to 6
-m Minimum region fraction methylation, defaults to 0.4
-x Maximum region fraction methylation, defaults to 0.6
-r Full path to R, defaults to /usr/bin/R
-c Converted CG context PointData directories, full path, comma delimited. These 
       should contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. Use the ParsePointDataContexts on the output of the
       NovoalignBisulfiteParser to select CG contexts. 
-n Non-converted PointData directories, ditto. 
-b Provide a bed file (chr, start, stop,...), full path, to scan a list of regions
       instead of the genome.  See, http://genome.ucsc.edu/FAQ/FAQformat#format1

Example: java -Xmx4G -jar pathTo/USeq/Apps/ beta! 

**************************************************************************************

**************************************************************************************
**                          Bam Intensity Joiner : July 2013                        **
**************************************************************************************
Extracts base level intensity information from the output of modified Picard
IlluminaBaseCallsToSam app and inserts this into an alignment file. Be sure to 
syncronize the alignment output (e.g. -oSync in novoalign) so it is in the same order
as the intensity data.

Options:
-a Full path to sam/bam alignment file with header.
-i Full path to bam intensity file from running the modified IlluminaBasecallsToSam.
-r Full path bam file for saving the merged results.
-q Minimum mapping quality score. Defaults to 20, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect. For RNA-Seq data from the SamTranscriptomeParser, set this to 0.
-s Maximum alignment score. Defaults to 240, smaller numbers are more stringent.
-m Filter for particular MD fields.
-u Sub sample data, printing only every XXX alignment.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/BamIntensityJoiner -u 10000 -m 101 -a
      /Alignments/8341X.sam.gz -i /Ints/8341X.bam -r /Merged/8341.bam

**************************************************************************************

**************************************************************************************
**                           Bam NMer Intensity Parser : April 2012                 **
**************************************************************************************
Parses a BAM file from a modified Picard IlluminaBaseCallsToSam run on raw Illumina
sequencing data to extract information regarding N mers.

Options:
-f Full path to a bam file or directory containing such. Multiple files are merged.
-r Full path file name to save results, defaults to a derivative of -f
-n Length of the N mer, defaults to 5.
-q Minimum base quality score, defaults to 20. Only N-mers where all bases pass the
       threshold are scored.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/BamIntensityParser -f /Data/BamFiles/
       -n 7 -q 30

**************************************************************************************

**************************************************************************************
**                                 Bar2Gr: Nov 2006                                 **
**************************************************************************************
Converts xxx.bar to text xxx.gr files.

-f The full path directory/file text for your xxx.bar file(s).

Example: java -Xmx1500M -jar pathTo/T2/Apps/Bar2Gr -f /affy/BarFiles/ 

**************************************************************************************

**************************************************************************************
**                                 Bar 2 USeq: Mar 2011                             **
**************************************************************************************
Recurses through directories and sub directories of xxx.bar(.zip/.gz OK) files
converting them to xxx.useq files (http://useq.sourceforge.net/useqArchiveFormat.html).  

Required Options:
-f Full path directory containing bar files or directories of bar files.

Default Options:
-i Index size for slicing split chromosome data (e.g. # rows per file),
      defaults to 10000.
-r For graphs, select a style, defaults to 0
      0	Bar
      1	Stairstep
      2	HeatMap
      3	Line
-h Color, hexadecimal (e.g. #6633FF), enclose in quotations
-d Description, enclose in quotations 
-g Reset genome version, defaults to that indicated by the bar files.
-e Delete original folders, use with caution.
-m Replace bar files with new xxx.useq file in bar file directory, use with caution.

Example: java -Xmx4G -jar pathTo/USeq/Apps/Bar2USeq -f
      /AnalysisResults/ -i 5000 -h '#6633FF' -g D_rerio_Jul_2010 
      -d 'Final processed chIP-Seq results for Bcd and Hunchback, 30M reads' 

**************************************************************************************

**************************************************************************************
**                             Base Classifier : Oct 2012                            **
**************************************************************************************
Beta.

Options:

Example: java -Xmx1500M -jar pathTo/USeq/Apps/BamIntensityParser -f /Data/BamFiles/
       -n 7 -q 30

**************************************************************************************

**************************************************************************************
**                                  Bed2Bar: June 2010                              **
**************************************************************************************
Bed2Bar builds stair step graphs from bed files for display in IGB. Strands are merged
and text information removed. Will also generate a merged bed file thresholding the 
graph at that level. 

-f Full path file or directory containing xxx.bed(.zip/.gz OK) files
-v Genome version (eg H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-s Sum bed scores for overlapping regions, defaults to assigning the highest score.
-t Threshold, defaults to 0.
-g Maximum gap, defaults to 0.

Example: java -Xmx4G pathTo/Apps/Bed2Bar -f /affy/res/zeste.bed.gz -v 
      M_musculus_Jul_2007 -g 1000 -s -t 100 

**************************************************************************************

**************************************************************************************
**                                BedStats: June 2010                               **
**************************************************************************************
Calculates several statistics on bed files where the name column contains a short read
sequence. This includes a read length distribution and frequencies of the 1st and last
bps. Can also trim your read to a particular length. 

Options:
-b Full path file name for your alignment bed file or directory containing such. The
       name column should contain your just you sequence or seq;qual .
-t Trim the 3' ends of your reads to the indicated length, defaults to not trimming.
-s Calculate base frequencies for the given 0 indexed base instead of the last base.
-r Reverse complement sequences before calculating stats and trimming.

Example: java -Xmx1500M -jar pathToUSeq/Apps/BedStats -b /Res/ex1.bed.gz -s 9 -t 10

**************************************************************************************

**************************************************************************************
**                                  BisSeq: July 2013                               **
**************************************************************************************
Takes two condition (treatment and control) PointData from converted and non-converted
C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and scores
regions for differential methylation using either a fisher exact or chi-square test 
for changes in methylation.  A Benjamini & Hockberg correction is applied to convert
the pvalues to FDRs. Data is only collected on bases that meet the minimum
read coverage threshold in both datasets.  The fraction differential methylation
statistic is calculated by taking the pseudomedian of all of the log2 paired base level
fraction methylations in a given window. Overlapping windows that meet both the
FDR and pseLog2Ratio thresholds are merged when generating enriched and reduced
regions. BisSeq generates several tracks for browsing and lists of differentially
methlated regions. To examine only mCG contexts, first filter your PointData using the
ParsePointDataContexts app. 

Options:
-s Save directory, full path.
-c Treatment converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files fro the NBP app.
       One can also provide a single directory that contains multiple PointData
       directories.
-C Control converted PointData directories, ditto. 
-n Treatment non-converted PointData directories, ditto. 
-N Control non-coverted PointData directories, ditto. 
-a Scramble control data.

Default Options:
-d Minimum per base read coverage, defaults to 5.
-w Window size, defaults to 250.
-m Minimum number reads in window, defaults to 5. 
-f FDR threshold, defaults to 30 (-10Log10(0.01)).
-l Log2Ratio threshold, defaults to 1.585 (3x).
-r Full path to R, defaults to '/usr/bin/R'
-g Don't print graph files.

Example: java -Xmx10G -jar pathTo/USeq/Apps/BisStat -c /Sperm/Converted -n 
      /Sperm/NonConverted -C /Egg/Converted -N /Egg/NonConverted -s /Res/BisSeq
      -w 500 -m 10 -l 2 -f 50 

**************************************************************************************

**************************************************************************************
**                       Bis Seq Aggregate Plotter: October 2012                    **
**************************************************************************************
BSAP merges bisulfite data over equally sized regions to generate data for class
average agreggate plots of fraction methylation.  A smoothing window is also applied.
Data for unstranded, sense, and antisense are produced.

Options:
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. See the NovoalignBisulfiteParser app.
-n Non-converted PointData directories, ditto. 
-b Bed file (tab delim: chr start stop name score strand(+/-/.)), full path.
-i Don't invert - stranded regions, defaults to inverting.
-s Scale all regions to a particular size. Defaults to scaling to max region size.
-m Calculate individual base fractions and then take a mean, ignoring zeros, over
       the window, instead of summing the obs in the window and taking the fraction.
-o Minimum number of observations before scoring base fraction methylation, defaults
       to 8.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/BisSeqAgregatePlotter -c
      /NBP/Con -n /NBP/NonCon -b /Anno/tssSites.bed -m

**************************************************************************************

**************************************************************************************
**                                BisSeqErrorAdder: June 2012                       **
**************************************************************************************
Takes PointData from converted and non-converted C bisulfite sequencing data parsed
using the NovoalignBisulfiteParser and simulates a worse non-coversion rate by 
randomly picking converted observations and making them non-converted. This is
accomplished by first measuring the non-conversion rate in the test chromosome (e.g.
chrLambda), calculating the fraction of converted C's need to flip to non-converted
to reach the target fraction non-converted and then using this flip fraction
to modify the other chromosome data. 

Options:
-s Save directory, full path.
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-n Non-converted PointData directories, ditto. 
-f Target fraction non-converted for test chromosome, this cannot be less than the
       current fraction.
-t Test chromosome, defaults to chrLambda* .

Example: java -Xmx12G -jar pathTo/USeq/Apps/BisSeqErrorAdder -c /Data/Sperm/Converted
      -n /Data/Sperm/NonConverted -f 0.02 

**************************************************************************************

**************************************************************************************
**                                 BisStat: May 2014                                **
**************************************************************************************
Takes PointData from converted and non-converted C bisulfite sequencing data parsed
using the NovoalignBisulfiteParser and generates several xxCxx context statistics and
graphs (bp and window level fraction converted Cs) for visualization in IGB.
BisStat estimates whether a given C is methylated using a binomial distribution where
the expect can be calculated using the fraction of non-converted Cs present in the
lambda data. Binomial p-values are converted to FDRs using the Benjamini & Hochberg
method. This app requires considerable RAM (10-64G).

Options:
-s Save directory, full path.
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-n Non-converted PointData directories, ditto. 
-f Directory containing chrXXX.fasta(/.fa .zip/.gz OK) files for each chromosome.

Default Options:
-p Minimimal FDR for non-converted C's to be counted as methylated, defaults to 20 a
       -10Log10(FDR = 0.01) conversion.
-e Expected fraction non-converted Cs due to partial bisulfite conversion and
       sequencing error, defaults to 0.005 .
-l Use the unmethylated lambda alignment data to set the expected fraction of
       non-converted Cs due to partial conversion and sequencing error. This is
       predicated on including a 'chrLambda' fasta sequence while aligning your data.
-o Minimum read coverage to count mC fractions, defaults to 8
-w Window size, defaults to 1000.
-m Minimum number Cs passing read coverage in window to score, defaults to 5. 
-r Full path to R, defaults to '/usr/bin/R'
-g Don't merge stranded data, defaults to running a non stranded analysis. Affects CG's.
-a First density quartile fraction methylation threshold, defaults to 0.25
-b Fourth density quartile fraction methylation threshold, defaults to 0.75

Example: java -Xmx12G -jar pathTo/USeq/Apps/BisStat -c /Data/Sperm/Converted -n 
      /Data/Sperm/NonConverted -s /Data/Sperm/BisSeq -w 5000 -m 10 -f
      /Genomes/Hg18/Fastas -o 10 

**************************************************************************************

**************************************************************************************
**                           BisStat Region Maker: March 2012                       **
**************************************************************************************
Takes serialized window objects from BisStat, thresholds based on the min and max
fraction methylation params and prints regions in bed format meeting the criteria.
May also build regions base on the density of a given fraction methylation quartile.
For example, to identify regions where at least 0.8 of the sequenced Cs are low
methylated (<= 0.25 default settings in BisStat) set -q 1 -m 0.8 . To find regions of
with >= 0.9 of the Cs with high methylation (>= 0.75 default BisStat setting), set
-q 3 -m 0.9  . 

Options:
-s SerializedWindowObject directory from BisStat, full path.
-m Minimum fraction.
-x Maximum fraction.
-g Maximum gap, defaults to 0.
-q Merge windows based on their quartile density score, not fraction methylation, by
      indicating 1,2,or 3 for 1st, 2nd+3rd, or 4th, respectively.

Example: java -Xmx4G -jar pathTo/USeq/Apps/BisStatRegionMaker -m 0.8 -x 1.0 -g 100
      -s /Data/BisStat/SerializedWindowObjects  

**************************************************************************************

**************************************************************************************
**                        Calculate Per Cycle Error Rate : Feb 2013                 **
**************************************************************************************
Calculates per cycle error rates provided a sorted indexed bam file and a fasta
sequence file. Only checks CIGAR M bases not masked or INDEL bases.

Required Options:
-b Full path to a coordinate sorted bam file (xxx.bam) with its associated (xxx.bai)
      index or directory containing such. Multiple files are processed independently.
      Unsorted xxx.sam(.gz/.zip OK) files also work but are processed rather slowly.
-f Full path to the single fasta file you wish to use in calculating the error rate.
-n Require read names to begin with indicated text, defaults to accepting everything.
-o Path to log file.  Write coverage statistics to a log file instead of stdout.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/CalculatePerCycleErrorRate -b /Data/Bam/
     -f /Fastas/chrPhiX_Illumina.fasta.gz -n HWI

**************************************************************************************

**************************************************************************************
**                                   ChIPSeq: May 2014                              **
**************************************************************************************
The ChIPSeq application is a wrapper for processing ChIP-Seq data through a variety of
USeq applications. It:
   1) Parses raw alignments (sam, eland, bed, or novoalign) into binary PointData
   2) Filters PointData for duplicate alignments
   3) Makes relative ReadCoverage tracks from the PointData (reads per million mapped)
   4) Runs the PeakShiftFinder to estimate the peak shift and optimal window size
   5) Runs the MultipleReplicaScanSeqs to window scan the genome generating enrichment
        tracks using DESeq2's negative binomial pvalues and B&H's FDRs
   6) Runs the EnrichedRegionMaker to identify likely chIP peaks (FDR < 1%, >2x).

Options:
-s Save directory, full path.
-t Treatment alignment file directories, full path, comma delimited, no spaces, one
       for each biological replica. These should each contain one or more text
       alignment files (gz/zip OK) for a particular replica. Alternatively, provide
       one directory that contains multiple alignment file directories.
-c Control alignment file directories, ditto. 
-y Type of alignments, either novoalign, sam, bed, or eland (sorted or export).
-v Genome version (e.g. H_sapiens_Feb_2009, M_musculus_Jul_2007), see UCSC FAQ,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-r Full path to R, defaults to '/usr/bin/R'. Be sure to install DESeq2, gplots, and
      qvalue Bioconductor packages.

Advanced Options:
-m Combine any replicas and run single replica analysis (ScanSeqs), defaults to
      using DESeq2.
-a Maximum alignment score. Defaults to 60, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect. Set to 0 for RNASeq data.
-p Peak shift, defaults to the PeakShiftFinder peak shift or 150bp. Set to 0 for
      RNASeq data.
-w Window size, defaults to the PeakShiftFinder peak shift + stnd dev or 250bp.
-i Minimum number reads in window, defaults to 10.
-f Filter bed file (tab delimited: chr start stop) to use in excluding intersecting
      windows while making peaks, e.g. satelliteRepeats.bed .
-g Print verbose output from each application.
-e Don't look for reduced regions.

Example: java -Xmx2G -jar pathTo/USeq/Apps/ChIPSeq -y eland -v D_rerio_Dec_2008 -t 
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/PolIINRep1/,/Data/PolIINRep2/ -s
      /Data/Results/WtVsNull -f /Anno/satelliteRepeats.bed

**************************************************************************************

**************************************************************************************
**                                CHPC Aligner: Sept 2013                           **
**************************************************************************************
Wrapper for running novoalign on the CHPC clusters. You will need to configure ssh
keys from CHPC to your data server. See http://linuxproblem.org/art_9.html (might
need to reset your home dir on alta/moab 'chmod go-w ~/'). Run
this app at the CHPC.

Required Options:
-i Genome index file on CHPC
-r Working directory on CHPC, this also defines the name of the final data archive
-f First fastq file on the data server
-s (Optional) Second paired end read fastq file on the data server
-a Archive directory on the data server for saving the final alignments

Default Options:
-l Launch jobs, defaults to not launching jobs, inspect and test the shell scripts
     before committing.
-w Wall time in hours, defaults to 24.
-x Number CPUs, defaults to 16.
-e Administrator email address, defaults to david.nix@hci.utah.edu
-c (Optional) Client email addresses, comma delimited, no spaces.
-b Don't relaunch bad jobs, defaults to making 3 attempts before aborting.
-o CHPC account to draw hours from (e.g. kaplan-em), defaults to kaplan.
-d Raw data user name and server, defaults to u0028003@hci-moab.hci.utah.edu
-g Final alignment data user name and server, defaults to
     u0028003@hci-moab.hci.utah.edu
-j Aligner application, defaults to 
     '/uufs/chpc.utah.edu/common/home/hcibcore/tomato/app/novoalign/novoalign'
-p Aligner cmd line options
-n Number of reads to process per job, defaults to 1000000
-k Number of jobs to run, defaults to number of reads per job setting
-t Filter results for lines containing a 'chr' string, defaults to all.
-q Strip @SQ: lines from SAM alignment results, recommended for transcriptomes.

Example: java -Xmx4G -jar pathTo/USeq/Apps/CHPCAlign  -p '-F ILMFQ -t60 -rRandom' 
    -i ~/Genomes/hg19Splices34bpAdaptersNovo.index
    -r /scratch/serial/u0028003/7317X1_100602 
    -f /mnt/hci-ma/MicroarrayData/2010/7317R/GAII/100602_7317X1_s_7_1_sequence.txt.gz 
    -s /mnt/hci-ma/MicroarrayData/2010/7317R/GAII/100602_7317X1_s_7_2_sequence.txt.gz 
    -a /mnt/hci-ma/AnalysisData/2010/A115  -w 6 -e nix@gmail.com -t -b 

**************************************************************************************

**************************************************************************************
**                        Compare Intersecting Regions: Nov 2012                    **
**************************************************************************************
Compares test region file(s) against a master set of regions for intersection.
Reports the results as columns relative to the master. Assumes interbase coordinates.

Options:
-m Full path for the master bed file (tab delim: chr start stop ...).
-t Full path to the test bed file to intersect or directory of files.
-g Maximum bp gap allowed for scoring an intersection, defaults to 0 bp. Negative gaps
     force overlaps, positive gaps allow non intersecting bases between regions.

Example: java -Xmx4G -jar pathTo/Apps/CompareIntersectingRegions -g 1000
        -m /All/mergedRegions.bed.gz -t /IndividualERs/

************************************************************************************

**************************************************************************************
**                         Compare Parsed Alignments: Nov 2009                      **
**************************************************************************************
Compares two parsed alignments for a common distribution of snps using R's Fisher's
Exact. Run the ParseIntersectingAlignments with the same snp table first.

Options:
-a Full path file name for the first xxx.alleles file.
-b Full path file name for the first xxx.alleles file.
-d Full path directory name for writing temporary files.
-r Full path file name for R, defaults to '/usr/bin/R'

Example: java -Xmx1500M -jar pathToUSeq/Apps/CompareParsedAlignments. 
     -a /SeqData/lymphSNPs.alleles -b /SeqData/normalSNPs.alleles -b /temp/

**************************************************************************************

**************************************************************************************
**                              Concatinate Fastas: Oct 2010                        **
**************************************************************************************
Concatinates a directory of fasta files into a single sequence seperated by a defined
number of Ns.  Outputs the merged fasta as well as bed files for the junctions and
spacers as well as a file to be used to shift UCSC gene table annotations. Use this
app to create artificial chromosomes for poorly assembled genomes. 

Options:
-d Full path directory for saving the results.
-f Full path directory containing fasta files to concatinate.
-n Number of Ns to use as a spacer, defaults to 1000.
-c Name to give the concatinate, defaults to chrConcat .

Example: java -Xmx4G -jar pathTo/USeq/Apps/ConcatinateFastas -n 2000 -d
    /zv8/MergedNA_Scaffolds -f /zv8/BadFastas/ -c chrNA_Scaffold 

**************************************************************************************

**************************************************************************************
**                                CorrelatePointData: Aug 2011                      **
**************************************************************************************
Calculates a Pearson Correlation Coefficient on the values of PointData found with the
same positions in the two datasets. Do NOT use on stair-step/ heat-map graph data.
Only use on point representation data.

Options:
-f First PointData set. This directory should contain chromosome specific xxx.bar.zip
       files, stranded or unstranded.
-s Second PointData set, ditto. 
-p Full path file name to use in saving paired scores, defaults to not printing.

Example: java -Xmx4G -jar pathTo/USeq/Apps/CorrelatePointData -f /BaseFracMethyl/X1
      -s /BaseFracMethyl/X2 

**************************************************************************************

***************************************************************
*                      CountChromosomes                       *
*                                                             *
* This script drives samtools view command.  It will create   *
* a report that lists counds to standard chroms, extra        *
* chroms, phiX and adatpter.  This data will be used in the   *
* ParseMetrics App.                                           *
*                                                             *
* -i Input file (bam format)                                  *
* -o Output file (.txt format)                                *
* -r Reference (hg19, hg18, mm10, mm9 etc.                    *
* -p path to samtools                                         *
***************************************************************

Output File not specified, exiting

**************************************************************************************
**                        Bisulfite Convert Fastas: Dec 2008                        **
**************************************************************************************
Converts all the c/C's to t/T's in fasta file(s) maintaining case.

Required Parameters:
-f Full path text for the xxx.fasta file or directory containing such.

Example: java -Xmx2000M -jar pathTo/Apps/BisulfiteConvertFastas -f /affy/Fastas/

**************************************************************************************

To calculate p-values, X randomized datasets are created by shuffling the expression
profiles between genes, windows are scored and pooled. P-values for each real
score are calculated based on the area under the right side of the randomized score
distribution. In addition to a spread sheet report summary, heat map xxx.bar files
for the p-values and mean correlation are created for visualization in IGB.
Note, this analysis is not stranded. If so desired parse lists appropriately.

Parameters:
-f The full path file text for a tab delimited gene file (text,chr,start,stop,scores)
-o GenomicRegion filter file, full path file text for a tab delimited region file to use in
removing genes from correlation analysis. (chrom, start, stop).
-g Genome version for IGB visualizations (e.g. C_elegans_May_2007).
-w Window size, default is 50000bp. Setting this too small may exclude some regions.
-n Minimum number of genes required in each window, defaults to 3. Setting this too
high will exclude some regions.
-r Number random trials, defaults to 100

Example: java -Xmx256M -jar pathTo/T2/Apps/CorrelationMaps -f /Mango/geneFile.txt
-w 30000 -n 2 -o /Mango/operons.txt

**************************************************************************************

**************************************************************************************
**                        Convert Fasta A 2 G: Mar 2012                             **
**************************************************************************************
Converts all the a/A's to g/G's in fasta file(s) maintaining case.

Required Parameters:
-f Full path for the fasta file (.fa/.fasta/.gz/.zip OK) or directory containing such.
-s Full path directory to save the converted files.

Example: java -Xmx2G -jar pathTo/Apps/ConvertFastaA2G -f /mm9/Fastas/ -s
      /mm9/AGConvertedFastas/

**************************************************************************************

**************************************************************************************
**                        Convert Fastq A 2 G: Mar 2012                             **
**************************************************************************************
Converts all the sequence A's to G's, case insensitive.

Required Parameters:
-f Full path for the fastq file or directory containing such. xxx.gz/.zip OK.
-s Optional, full path directory to save the converted files.

Example: java -Xmx2G -jar pathTo/Apps/ConvertFastqA2G -f /IllData/Fastq/ 

**************************************************************************************

**************************************************************************************
**                       Convert Fasta 2 GC Boolean: Aug 2008                       **
**************************************************************************************
Converts fasta file(s) into serialized boolean[]s where every base g or c is true all
others false. Will also work with xxx.binarySeq files.

Required Parameters:
-f Full path text for the xxx.fasta file or directory containing such.

Example: java -Xmx2000M -jar pathTo/Apps/ConvertFasta2GCBoolean -f /affy/Fastas/

**************************************************************************************

**************************************************************************************
**                     Convert Fasta 2 GC Bar Graphs: April 2011                    **
**************************************************************************************
Converts fasta files into graph files containing a 1 over each C in a CpG context.

Required Parameters:
-f Full path name for the directory containing xxx.fasta(.gz/.zip OK).
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Example: java -Xmx4G -jar pathTo/Apps/ConvertFasta2GCBarGraph -f /affy/Fastas/
      -v H_sapiens_Feb_2009

**************************************************************************************

**************************************************************************************
**                             Defined Region Bis Seq: Dec 2013                     **
**************************************************************************************
Takes two condition (treatment and control) PointData from converted and non-converted
C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and scores user
defined regions for differential methylation using either a fisher or chi-square test. 
A Benjamini & Hockberg correction is applied to convert the pvalues to FDRs. Data is
only collected on Cs that meet the minimum read coverage threshold in both datasets. 
The fraction differential methylation statistic is calculated by taking the
pseudomedian of all of the log2 paired base level fraction methylations in a given
region. To examine particular mC contexts (e.g. mCG), first filter your PointData
using the ParsePointDataContexts app.

Options:
-b A bed file of regions to score (tab delimited: chr start stop ...)
-s Save directory, full path.
-c Treatment converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files fro the NBP app.
       One can also provide a single directory that contains multiple PointData
       directories.
-C Control converted PointData directories, ditto. 
-n Treatment non-converted PointData directories, ditto. 
-N Control non-coverted PointData directories, ditto. 

Default Options:
-d Minimum per base read coverage, defaults to 5.
-r Full path to R, defaults to '/usr/bin/R'

Example: java -Xmx10G -jar pathTo/USeq/Apps/DefinedRegionBisStat -c /Sperm/Converted
      -n /Sperm/NonConverted -C /Egg/Converted -N /Egg/NonConverted -s /Res/DRBS
      -b /Res/CpGIslands.bed 

**************************************************************************************

**************************************************************************************
**                     Defined Region Differential Seq:   Sept 2014                 **
**************************************************************************************
DRDS takes sorted bam files, one per replica, minimum one per condition, minimum two
conditions (e.g. treatment and control or a time course/ multiple conditions) and
identifies differentially expressed genes using DESeq2 or SAMTools. DESeq2's rLog
normalized count data is used to heirachically cluster the samples. Differential
splicing is estimated using a chi-square test of independence. When testing only a
few genes or regions, append these onto a full gene table so that DESeq2 can
appropriately estimate the library size and replica variance.

Options:
-s Save directory.
-c Conditions directory containing one directory for each condition with one xxx.bam
       file per biological replica and their xxx.bai indexs. 3-4 reps recommended per
       condition. The BAM files should be sorted by coordinate using Picard's SortSam.
       All spice junction coordinates should be converted to genomic coordinates, see
       USeq's SamTranscriptomeParser.
-r Full path to R (version 3+) loaded with DESeq2, samr, and gplots defaults to
       '/usr/bin/R' file, see http://www.bioconductor.org . Type 'library(DESeq2);
       library(samr); library(gplots)' in R to see if they are installed. 
-u UCSC RefFlat or RefSeq gene table file, full path. Tab delimited, see RefSeq Genes
       http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
       16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 . NOTE:
       this table should contain only ONE composite transcript per gene (e.g. use
       Ensembl genes NOT transcripts). Use the MergeUCSCGeneTable app to collapse
       transcripts. See http://useq.sourceforge.net/usageRNASeq.html for details.
-b (Or) a bed file (chr, start, stop,...), full path, See,
       http://genome.ucsc.edu/FAQ/FAQformat#format1
-g Genome Version  (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Advanced Options:
-m Mask overlapping gene annotations, recommended for well annotated genomes.
-x Max per base alignment depth, defaults to 50000. Genes containing such high
       density coverage are ignored.
-n Max number alignments per read. Defaults to 1, unique.  Assumes 'NH' tags have
      been set by processing raw alignments with the SamTranscriptomeProcessor.
-e Minimum number alignments per gene-region per replica, defaults to 10.
-i Score introns instead of exons.
-p Perform a stranded analysis. Only collect reads from the same strand as the
      annotation.
-j Reverse stranded analysis.  Only collect reads from the opposite strand of the
      annotation.  This setting should be used for the Illumina's strand-specific
      dUTP protocol.
-k Second read's strand is flipped. Otherwise, assumes this was not done in the 
      SamTranscriptomeParser.
-t Don't delete temp files (R script, R results, Rout, etc..).
-a Run SAMseq in place of DESeq2.  This is only recommended with five or more
      replicates per condition.

Example: java -Xmx4G -jar pathTo/USeq/Apps/DefinedRegionDifferentialSeq -c
      /Data/TimeCourse/ESCells/ -s /Data/TimeCourse/DRDS -g H_sapiens_Feb_2009
     -u /Anno/mergedHg19EnsemblGenes.ucsc.gz

**************************************************************************************

**************************************************************************************
**                           Defined Region RNA Editing: April 2014                 **
**************************************************************************************
DRRE scores regions for the pseudomedian of the base fraction edits as well as the
probability that the observations occured by chance using a permutation test based on
the chiSquare goodness of fit statistic. 

Options:
-b A bed file of regions to score (tab delimited: chr start stop ...)
-e Edited PointData directory from the RNAEditingPileUpParser.
       These should contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged when scanning.
-r Reference PointData directory from the RNAEditingPileUpParser. Ditto.
-a Minimum base read coverage, defaults to 5.
-t Run a stranded analysis, defaults to non-stranded.
-i Remove base fraction edits that are non zero and represented by just one edited
       base.

Example: java -Xmx4G -jar pathTo/USeq/Apps/DefinedRegionRNAEditing -b hg19UTRs.bed
-e /PointData/Edited -r /PointData/Reference 

**************************************************************************************

**************************************************************************************
**                           Defined Region Scan Seqs: March 2011                   **
**************************************************************************************
DRSS takes chromosome specific PointData xxx.bar.zip files and extracts scores under
each region to calculate several statistics including a binomial p-value, Storey
q-value FDR, an empirical FDR, a p-value for strand skew, and a chi-square test of
independence between the exon read count distributions between treatment and control
data (a test for alternative splicing). Several measures of read counts are provided
including counts for each strand, a normalized log2 ratio, and RPKMs (# reads per kb
of interrogated region per total million mapped reads). If a gene table is provided,
scores under each exon are summed to give a whole gene summary. It is also recommended
to run a gene table of introns (see the ExportIntronicRegions app) to look for
intronic retention and novel transfrags/ exons.  If one provides splice junction bed
files for treatment and control RNA-Seq data, see the NovoalignParser, splice
junctions will be scored for differential expression. This is an additional
calculation unrelated to the chi-square independance test. Lastly, if control
data is not provided, simple region sums are calculated.

Options:
-s Save directory, full path.
-t Treatment PointData directories, full path, comma delimited. These should
       contain unshifted stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-c Control PointData directories, ditto. 
-p Peak shift, average distance between + and - strand peaks for chIP-Seq data, see
       PeakShiftFinder. For RNA-Seq set to the smallest expected fragment size. Will
       be used to shift the PointData 3' by 1/2 the peak shift.
-r Full path to R loaded with Storey's q-value library, defaults to '/usr/bin/R'
       file, see http://genomics.princeton.edu/storeylab/qvalue/
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds)
-b (Or) a bed file (chr, start, stop,...), full path, See,
       http://genome.ucsc.edu/FAQ/FAQformat#format1

Advanced Options:
-o Don't remove overlapping exons, defaults to filtering gene annotation for overlaps.
-i Score introns instead of exons.
-f Scan for just enriched regions, defaults to look for both. Only use with chIP-Seq
       datasets where the control is input. This turns on the empFDR estimation.
-d Treatment splice junction bed file(s) from the NovoalignParser, comma delimited,
       full path.
-e Control splice junction bed file(s), comma delimited, full path.
-m Minimum number of reads in associated gene before scoring splice junctions.
       Used in estimating the expected proportion of T and scaling the log2Ratio. 
       Defaults to 100.
-w Use read score probabilities (assumes scores are > 0 and <= 1), defaults to
       assigning 1 to each read score. Experimental.

Example: java -Xmx4G -jar pathTo/USeq/Apps/DefinedRegionScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults -p 100 -b /Data/selectRegions.bed -f 

**************************************************************************************

**************************************************************************************
**                                DRDS Annotator: January 2014                      **
**************************************************************************************
This application annotates DefinedRegionDifferentialSeq xlsx files using Ensembl 
biomart tab-delimited annotation files. By default, ensembl biomart output files will 
list the Ensembl gene id in the first column and Ensembl transcript id in the second 
column.  This application assumes these defaults.  It will match the gene id in the 
first column of the biomart file to the name listed in the 'IGB HyperLink' column 
found in the 'Analyzed Genes' tab of the DRDS xlxs output. All biomart columns after 
the transcript id column are added to the output file.  The data is inserted between 
the 'Alt Name' and locus columns in the 'Analyzed Genes' tab.

The biomart output files can have multiple annotation lines for each gene id.  
Currently, this app uses the first annotation line encountered.


Required Arguments:

-i Input file. Path to DRDS xlsx output file you wish to annotate 
-a Annotation file. Path to biomart annotation file. 
-o Annotated output file. Path to the annotated output file

Example: java -Xmx4G -jar pathTo/USeq/Apps/DRDSAnnotator -i geneStats.xlsx 
               -a mm10.biomart.txt -o geneStats.ann.xlsx

**************************************************************************************

**************************************************************************************
**                          Enriched Region Maker: July 2013                        **
**************************************************************************************
ERM combines windows from ScanSeqs xxx.swi files into larger enriched or reduced
regions based on one or more scores. For each score index, you must provide a minimal
score. Adjacent windows that exceed the minimum score(s) are merged and the best
window scores applied to the region. If treatment and control PointData are provided,
the best 25bp peak within each region will be identified and each ER rescored. To
select for ERs with a 1% FDR and 2x enrichment above control, follow the example
assuming score indexes 1,2,4 correspond to QValFDR, EmpFDR, and 
Log2Ratio. Note, if you are performing a static analysis comparing chIP vs chIP,
don't set thresholds on the EmpFDR, this was disabled and all of the values are zero.
To print descriptions of the score indexes, complete the command line and skip the 
-i option. Lastly, FDRs and p-values are represented in USeq in a transformed state,
as -10Log10(FDR/p-val) where 13 = 5%, 20 = 1%, etc. To select for regions with an FDR of
less than 1% you would set a threshold of 20 for the QValFDR and, if running a static
analysis, the EmpFDR. 

Options:
-f Full path file name for the serialized xxx.swi file from ScanSeqs, if a
      directory is specified, all xxx.swi files will be processed.
-s Minimal score(s) one for each score idex, comma delimited, no spaces.
-i Score index(s) one for each minimum score. 

Advanced Options:
-n Make a given number of ERs, one or more, comma delimited, no spaces. Uses score
      index 0.
-m Multiply scores by -1 to make reduced regions instead of enriched regions.
-r Remove windows that intersect a list of regions. Enter a full path tab delimited
      regions file text (chr start stop) Coordinates are assumed to be zero based and
      stop inclusive. Useful for excluding regions from ER generation.
-b BP buffer to subtract and add to start and stops of regions used in filtering
      intersecting windows, defaults to 0.
-e Exclude entire ERs that intersect the -r regions, defaults to removing windows.
      This is more exclusive and will not simply punch holes in ERs but throw out
      The entire ER.
-g Max gap, defaults to the size of the window used in ScanSeqs.
-t Provide treatment PointData directories, full path, comma delimited to ID the peak
       center in each ER. These should contain the same unshifted stranded chromosome
       specific xxx_-/+_.bar.zip files used in ScanSeqs.
-c Control PointData directories, ditto. 
-p Full path to R, defaults to '/usr/bin/R', required for rescanning ERs.
-w Sub window size, defaults to 25bp.

Example: java -Xmx500M -jar pathTo/USeq/Apps/EnrichedRegionMaker -f /solexa/zeste.swi
      -i 1,2,4 -s 20,20,1 -w 50

**************************************************************************************

**************************************************************************************
**                          Eland Multi Parser: October 2008                        **
**************************************************************************************
Parses an Eland xxx.eland_multi.txt alignment file tabulating hits to each fasta entry.
Good for scoring hits to a transcriptome where every fasta entry represents a
different gene.

-f The full path directory/file text of your xxx.eland_multi.txt(.zip) file(s). Files
      will be merged.
-r Full path file text for saving the results.


Example: java -Xmx1500M -jar pathToUSeq/Apps/ElandMultipParser -f 
      /data/MultiFiles/ -r /data/transcriptomeResults.xls 

**************************************************************************************

**************************************************************************************
**                               ElandParser: May 2008                              **
**************************************************************************************
Splits and converts Eland Extended xxx_export.txt or xxx_sorted.txt files
into center position alignment scored binary xxx.bar files. Coordinates are in
interbase coordiantes (zero based, stop excluded). These can be directly viewed in IGB.

-v Versioned Genome (ie hg18, dm2, ce2, mm8), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-m Minimum aligment score, Phred scale, defaults to 13. Not used with stand alone.
-f The full path directory/file text of your xxx_export.txt(.zip/.gz) or
      xxx_sorted.txt(.zip/.gz) file(s).
-r Full path directory text for saving the results, defaults to export.txt parent.
-s Shift centered position N bps 3' to accomodate chIP-seq fragment size. Stranded.
      Note, this is far less than 1/2 the expected fragment size, determine best
      value by visual inspection of likely positives. Defaults to 0. If you plan on
      filtering your PointData, don't shift their positions, do it in the filter app.
-p Parse stand alone Eland output file.

Example: java -Xmx1500M -jar pathToUSeq/Apps/ElandParser -f /Solexa/Run7/
     -v H_sapiens_Mar_2006 -s 38 -r /Solexa/ParsedData/PolIII/

**************************************************************************************

**************************************************************************************
**                         Eland Sequence Parser: March 2009                        **
**************************************************************************************
Parses sequence information from Eland Extended alignment summary files. For every
base, sums the quality scores generating a G, A, T, and C track xxx.bar file for 
visualization in IGB.  Also generates a consensus track (1-fraction consensus) for
each base.

-f The full path directory/file text of your xxx_export.txt(.zip/.gz) or
      xxx_sorted.txt(.zip/.gz) file(s).
-r Full path directory text for saving the results.
-g Full path directory text containing fasta files for reference base calling
      (e.g. chr1.fasta, chr5.fasta).
-v Versioned Genome (ie hg18, dm2, ce2, mm8), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-a Minimum aligment score, -10Log10(p-value), defaults to 13.
-c Minimum consensus score, -10Log10(p-value), defaults to 60.


Example: java -Xmx1500M -jar pathToUSeq/Apps/ElandSequenceParser -v hg18 -c 90
      -f /data/ExportFiles/ -r /data/Results -g /genomes/Hg18Fastas 

**************************************************************************************

**************************************************************************************
**                              Export Exons   Sept 2013                            **
**************************************************************************************
EE takes a UCSC Gene table and prints the exons to a bed file.

Parameters:
-g Full path file text for the UCSC Gene table.
-a Expand the size of each exon by X bp, defaults to 0
-u Remove UTRs if present, defaults to including
-n Append exon numbers to the gene name field.  This makes the bed file compatible 
      with DRDS

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportExons -g /user/Jib/ucscPombe.txt
      -a 50
**************************************************************************************

**************************************************************************************
**                        Export Intergenic Regions    May 2007                     **
**************************************************************************************
EIR takes a gff file and uses it to mask a boolean array.  Parts of the boolean array
that are not masked are returned and represent integenic sequences. Be sure to put in
a gff line at the stop of each chromosome noting the last base so you caputure the last
intergenic region. (eg chr1 GeneDB lastBase 3600000 3600001 . + . lastBase). Base
coordinates are assumed to be stop inclusive, not interbase.

Parameters:
-g Full path file text for a gff file or directory containing such.
-t Base pairs to trim from the ends of each intergenic region, defaults to 0.
-m Minimum acceptable intergenic size, those smaller will be tossed, defaults to 60bp
-s Subtract one from the start and stop coordinates.

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportIntergenicRegions -s -m 100 -g
                 /user/Jib/GffFiles/Pombe/sanger.gff

**************************************************************************************

**************************************************************************************
**                         Export Intronic Regions    June 2007                     **
**************************************************************************************
EIR takes a UCSC Gene table and fetches the most conservative/ smallest intronic
regions. Base coordinates are assumed to be stop inclusive, not interbase.

Parameters:
-g Full path file text for the UCSC Gene table.
-m Minimum acceptable intron size, those smaller will be tossed, defaults to 60bp
-s Subtract one from the stop coordinates of your UCSC table to convert from interbase.

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportIntronicRegions -s -m 100 -g
                 /user/Jib/ucscPombe.txt

**************************************************************************************

**************************************************************************************
**                              Export Trimmed Genes    May 2012                    **
**************************************************************************************
EE takes a UCSC Gene table and clips each gene back to the first intron closed by a
coding sequence exon. Thus these include all of the 5'UTRs. Genes with no introns are
removed.

Parameters:
-g Full path file text for the UCSC Gene table.
-u Print just UTRs, defaults to UTRs plus 1st CDS intron with flanking exon.
-i Print just 1st CDS intron with flanking exons.

Example: java -Xmx1000M -jar pathTo/T2/Apps/ExportTrimmedGenes -u -g 
      /user/Jib/ucscPombe.txt
**************************************************************************************

**************************************************************************************
**                           FetchGenomicSequences: Feb 2013                        **
**************************************************************************************
Given a file containing genomic coordinates, fetches and saves the sequence (column
output: chrom origStart origStop fetchedStart fetchedStop completeFetch seq).

-f Full path to a file or directory containing tab delimited chrom, start,
        stop text files.  Interbabase coordinates (zero based, stop excluded).
-s Full path directory text containing containing genomic fasta files. The fasta
        header defines the name of the sequence, not the file name. 
-b Fetch flanking bases, defaults to 0. Will set start to zero or stop to last base if
        boundaries are exceeded.
-r Reverse complement fetched sequences, defaults to returning the + genomic strand.
-a Output fasta format.

Example: java -Xmx1000M -jar pathTo/T2/Apps/FetchGenomicSequences -f /data/miRNAs.txt
      -s /genomes/human/v35.1/ -b 5000 -r   


**************************************************************************************

**************************************************************************************
**                          Find Neighboring Genes:   Nov 2008                      **
**************************************************************************************
FNG takes a list of genes in UCSC Gene Table format and intersects them with a list of
regions finding the closest gene to each region as well as all of the genes that fall
within a given neighborhood. Distance is measured from the center of the region to the
transcription start site/ 1st base position in 1st exon. See Tables link under
http://genome.ucsc.edu/ . Note, output coordinates are zero based, stop inclusive.

-g Full path file text for a tab delimited UCSC Gene Table (text chrom strand txStart
      txEnd cdsStart cdsEnd exonCount exonStarts exonEnds etc...) .
-p Full path file/directory text for tab delimited region list(s) (chr, start, stop) .
-b Size of neighborhood in bp, default is 10000 
-f Find genes that overlap neighborhood irregardles of distance to TSS.
-c Only print closest genes.
-o Print neighbors on one line.

Example: java -jar pathTo/T2/Apps/FindNeighboringGenes -g /anno/hg17Ensembl.txt -p
      /affy/p53/finalPicks.txt -b 5000 -c

**************************************************************************************

**************************************************************************************
**                           Find Overlapping Genes: Oct 2010                       **
**************************************************************************************
Finds overlapping genes that converge, diverge, or contain one another given a UCSC
gene table.

Options:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). NOTE:
       this table should contain only one composite transcript per gene (e.g. Use
       Ensembl genes NOT transcripts. See MergeUCSCGeneTable app.). 

Example: java -Xmx4G -jar pathTo/USeq/Apps/FindOverlappingGenes -u 
      /data/zv8EnsemblGenes.ucsc.gz

**************************************************************************************

**************************************************************************************
**                            Find Shared Regions: Dec 2011                         **
**************************************************************************************
Writes out a bed file of shared regions. Interbase coordinates.

Options:
-f First bed file (tab delimited: chr start stop ...).
-s Second bed file.
-r Results file.
-m Minimum length, defaults to 0.

Example: java -Xmx4G -jar pathTo/USeq/Apps/FindSharedRegions -f 
      /Res/firstBedFile.bed -s /Res/secondBedFile.bed -r /Res/common.bed -m 100

************************************************************************************

**************************************************************************************
**                            File Cross Filter: March 2008                         **
**************************************************************************************
FCF take a column in the matcher file and uses it to parse the rows from other files.
Useful for pulling out and printing in order the rows that match the first file.

-m Full path file text for a tab delimited txt file to use in matching.
-f Full path file text to parse, can specify a directory too.
-i Ignore duplicate keys.
-a Column index containing the unique IDs in the matcher, defaults to 0.
-b Column index containing the unique IDs in the parsers, defaults to 0.
Example: java -jar pathTo/T2/Apps/FileCrossFilter -f /extendedArrayData/ -m /old/
     originalArray.txt -a 2 -b 2

**************************************************************************************

**************************************************************************************
**                            File Match Joiner:  July 2008                         **
**************************************************************************************
FMJ loads a file and a particular column containing unique entries, a key, and then
appends the key line to lines in the parsed file that match a particular column.
Usefull for appending say chromosome coordinates to snp ids data, etc.

-k Full path file text for a tab delimited txt file (key) containing unique entries.
-f Ditto but for the file to parse, can specify a directory too.
-i Collapse duplicate keys.
-j Skip duplicate keys.
-a Column index containing the unique IDs in key, defaults to 0.
-b Column index containing the unique IDs in parsers, defaults to 0.
-p Print only matches.

Example: java -jar pathTo/Apps/FileMatchJoiner -k /snpChromMap.txt -m /SNPData/
     --b 2 -p

**************************************************************************************

**************************************************************************************
**                             File Joiner: Feb 2005                                **
**************************************************************************************
Joins text files into a single file, avoiding line concatenations. This is a problem
with using 'cat * >> combine.txt'.  Removes empty lines.

Required Parameters:
-f Full path text for the directory containing the text files.

Example: java -jar pathTo/T2/Apps/FileJoiner -f /affy/SplitFiles/

**************************************************************************************

**************************************************************************************
**                          File Splitter: July 2010                                **
**************************************************************************************
Splits a big text file into smaller files given a maximum number of lines.

Required Parameters:
-f Full path file text or directory for the text file(s) (.zip/.gz OK).
-n Maximum number of lines to place in each.
-g GZip split files.

Example: java -Xmx256M -jar pathTo/T2/FileSplitter -f /affy/bpmap.txt -n 50000

**************************************************************************************

**************************************************************************************
**                          FilterDuplicateAlignments: Mar 2010                     **
**************************************************************************************
Filters alignments for potential amplification bias by randomly selecting X alignments
from those with the same chromosome, position, and strand. Can also filter for the
best unique alignment based on read score. Column indexes start with 0.

Options:
-f Full path file/ directory text containing tab delimited alignments.
-r Full path directory for saving the results.
-c Alignment chromosome column index.
-p Alignment position column index, assumes this is always referenced to the + strand
-s Alignment sequence column index.
-t Strand column index.
-m Save a max number of identical alignments, choose number, defaults to random
        unique sequences.
-b Save only the best alignment per start postion, defined by total score.  Indicate
        which column contains the quality ascii text.
-j Include splice junction chromosomes in filtering (e.g. chr7_101267544_101272320).
        Defaults to removing them. (Only keep for RNA-Seq datasets.)

Example: java -Xmx1500M -jar pathToUSeq/Apps/FilterDuplicateAlignments -f
     /Novoalign/Run7/ -s /Novoalign/Run7/DupFiltered/ -c 7 -p 8 -s 2 -b 3 -t 9

	  Use -c 10 -p 12 -s 8 -t 13 -b 9  for ELAND sorted or export alignments.

**************************************************************************************

**************************************************************************************
**                                 Graph 2 Bed: Feb 2011                            **
**************************************************************************************
Converts USeq stair step and heat map graphs into region bed files using a threshold.
Do not use this with non USeq generated graphs. Won't work with bar or point graphs.

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. May point this to a single directory
       of such too.
-t Threshold, regions exceeding it will be saved, defaults to 0.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/Graph2Bed -t 9 -p /data/ReadCoverage

**************************************************************************************

**************************************************************************************
**                        Filter Intersecting Regions: Oct 2013                     **
**************************************************************************************
Flattens the mask regions and uses it to split the split file(s) into intersecting
and non intersecting regions based on the minimum fraction intersection.

Options:
-m Full path file text for the masking bed file (tab delim: chr start stop ...).
-b Full path file text for the bed file to split into intersecting and non
        intersecting regions. Can also point to a directory of files to split.
-g (Or) Full path file text for the gff/ gtf file to split into intersecting and non
        intersecting regions. Can also point to a directory of files to split.
-i Minimum fraction of each split region required to score as an intersection with
        the flattened mask, defaults to 1x10-1074

Example: java -Xmx4000M -jar pathTo/Apps/FilterIntersectingRegions -i 0.5
        -m /ArrayDesigns/repMskedDesign.bed -b /ArrayDesigns/novoMskedDesign.bed

************************************************************************************

**************************************************************************************
**                          Filter Point Data: Oct 2012                             **
**************************************************************************************
FPD drops or saves observations from PointData that intersect a list of regions
      (e.g. repeats, interrogated regions).

Options:
-p Point Data directories, full path, comma delimited. These should contain
      chromosome specific xxx.bar.zip files. 
-r Full path file text for a tab delimited text file containing regions to use in
      filtering the intersecting data (chr start stop ..., interbase coordinates).
-i Select data that intersects the list of regions, defaults to selecting data that
      doesn't intersect.
-a Acceptible intersection, fraction, defaults to 0.5
-n Just calculate the number of observations after filtering, don't save any data.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/FilterPointData -p /data/PointData 
      -r /repeats/hg18RepeatMasker.bed -a 0.75

**************************************************************************************

**************************************************************************************
**                          Generate Overlas: Dec 2012                      **
**************************************************************************************
Merges proper paired alignments that pass a variety of checks and thresholds. Only
unambiguous pairs will be merged. Increases base calling accuracy in overlap and helps
avoid non-independent variant observations and other double counting issues. Identical
overlapping bases are assigned the higher quality scores. Disagreements are resolved
toward the higher quality base. If too close in quality, then the quality is set to 0.
Be certain your input bam/sam file(s) are sorted by query name, NOT coordinate. 

Options:
-f The full path file or directory containing raw xxx.sam(.gz/.zip OK)/.bam file(s)
      paired alignments. 
      Multiple files will be merged.

Default Options:
-a Maximum alignment score (AS:i: tag). Defaults to 120, smaller numbers are more
      stringent. Approx 30pts per mismatch for novoalignments.
-q Minimum mapping quality score, defaults to 13, larger numbers are more stringent.
      Set to 0 if processing splice junction indexed RNASeq data.
-r The second paired alignment's strand is reversed. Defaults to not reversed.
-d Maximum acceptible base pair distance for merging, defaults to 5000.
-m Don't cross check read mate coordinates, needed for merging repeat matches. Defaults
      to checking.
-l Output file name.  Write merging statitics to file instead of standard output.

Example: java -Xmx1500M -jar pathToUSeq/Apps/MergePairedSamAlignments -f /Novo/Run7/
     -c -s /Novo/STPParsedBams/run7.bam -d 10000 

**************************************************************************************

**************************************************************************************
**                                 Gr2Bar: Nov 2006                                 **
**************************************************************************************
Converts xxx.gr.zip files to chromosome specific bar files.

-f The full path directory/file text for your xxx.gr.zip file(s).
-v Genome version (ie H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases
-o Orientation of GR file.  If not specified, orientation is left as '.'

Example: java -Xmx1500M -jar pathTo/T2/Apps/Gr2Bar -f /affy/GrFiles/ -v hg17 

**************************************************************************************

**************************************************************************************
**                               Inosine Predict: Aug 2010                          **
**************************************************************************************
IP estimates the likelihood of ADAR RNA editing using the multiplicative 4L,4R model
described in Eggington et. al. 2010.

Options:
-f Multi fasta file containing sequence(s) to score.
-m Maxtrix scoring file.
-p Print an example matrix.
-o Don't include the opposite strand.
-s Save directory, defaults to parent of the fasta file.
-z Name of a zip archive to create containing the results.

Example: java -Xmx2G -jar pathTo/USeq/Apps/InosinePredict -m 
    ~/ADARMatrix/hADAR1-D.matrix.txt -f ~/SeqsToScore/candidates.fasta.gz

**************************************************************************************

**************************************************************************************
**                            Intersect Lists: Dec 2008                             **
**************************************************************************************
IL intersects two lists (of genes) and using randomization, calculates the
significance of the intersection and the fold enrichment over random. Note, duplicate
items are filtered from each list prior to analysis.

-a Full path file text for list A (or directory containing), one item per line.
-b Full path file text for list B (or directory containing), one item per line.
-t The total number of unique items from which A and B were drawn.
-n Number of permutations, defaults to 1000.
-p Print the intersection sets (common, unique to A, unique to B) to screen.

Example: java -Xmx1500M -jar pathTo/Apps/IntersectLists -a /Data/geneListA.txt -b 
     /Data/geneListB.txt -t 28356 -n 10000

**************************************************************************************

**************************************************************************************
**                       Intersect Key With Regions: July 2012                      **
**************************************************************************************
IR intersects lists of genomicRegions (chrom start stop(inclusive)) with a key, assumes the
lists are sorted from most confident to least confident. Multiple hits to the same key
region are ignored.

-k Full path file text for the key genomicRegions file, tab delimited (chr start
      stop(inclusive)).
-r Full path file text or directory containing your region files to score.
-g Max gap, defaults to -1. A max gap of 0 = genomicRegions must abut, negative values force
      overlap (ie -1= 1bp overlap, be careful not to exceed the length of the smaller
      region), positive values enable gaps (ie 1=1bp gap).
-s Subtract 1 from end coordinates.  Use for interbase.

Example: java -Xmx1500M -jar pathTo/Apps/IntersectKeyWithRegions -k /data/key.txt
      -r /data/HitLists/ 

**************************************************************************************

**************************************************************************************
**                            Intersect Regions: May 2012                           **
**************************************************************************************
IR intersects lists of regions (tab delimited: chrom start stop(inclusive)). Random
regions can also be used to calculate a p-value and fold enrichment.

-f First regions files, a single file, or a directory of files.
-s Second regions files, a single file, or a directory of files.
-g Max gap, defaults to 0. A max gap of 0 = regions must at least abut or overlap,
      negative values force overlap (ie -1= 1bp overlap, be careful not to exceed the
      length of the smaller region), positive values enable gaps (ie 1=1bp gap).
-e Score intersections where second regions are entirely contained by first regions.
-r Make random regions matched to the second regions file(s) and intersect with the
      first.  Enter either a bed file or full path directory that contains chromosome
      specific interrogated regions files (ie named: chr1, chr2 ...: chrom start stop).
-c Match GC content of second regions file(s) when selecting random regions, rather
      slow. Provide a full path directory text containing chromosome specific genomic
      sequences.
-n Number of random region trials, defaults to 1000.
-w Write intersections and differences.
-x Write paired intersections.
-p Print length distribution histogram for gaps between first and closest second.
-q Parameters for histogram, comma delimited list, no spaces:
       minimum length, maximum length, number of bins.  Defaults to -100, 2400, 100.

Example: java -Xmx1500M -jar pathTo/Apps/IntersectRegions -f /data/miRNAs.txt
      -s /data/DroshaLists/ -g 500 -n 10000 -r /data/InterrogatedRegions/


**************************************************************************************

**************************************************************************************
**                           Kegg Pathway Enrichment:  Aug 2009                     **
**************************************************************************************
KPE looks for overrepresentation of genes from a user's list in Kegg pathways using a
random permutation test. Several files are needed from http://www.genome.jp/kegg 
Gene names must be in Ensembl Gene notation and begin with ENSG.

Options:
-e Full path file text for a KeggGeneIDs : EnsemblGeneIDs file (e.g. Human 
      ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/hsa_ensembl-hsa.list)
-p Full path file text for a KeggPathwayIDs : TextDescription file (e.g. Human 
      ftp://ftp.genome.jp/pub/kegg/pathway/map_title.tab)
-g Full path file text for a KeggGeneIDs : KeggPathwayIDs file (e.g. Human 
      ftp://ftp.genome.jp/pub/kegg/pathway/organisms/hsa/hsa_gene_map.tab)
-a Full path file text for your all interrogated Ensembl gene list (e.g. ENSG00...)
      One gene per line.
-s Full path file text for your select gene list.
-n Number of random iterations, defaults to 10000

Example: java -Xmx1500M -jar pathTo/USeq/Apps/KeggPathwayEnrichment -e 
      /Kegg/hsa_ensembl-hsa.list -p /Kegg/map_title.tab -g /Kegg/hsa_gene_map.tab
      -a /HCV/ensemblGenesWith20OrMoreReads.txt -s /HCV/upRegInHCV_Norm.txt

**************************************************************************************

**************************************************************************************
**                              MaqSnps2Bed: June 2009                              **
**************************************************************************************
Converts a Maq snp text file (1 based coordinates) into a bed file (interbase 
      coordinates).  Also writes out an Alleler formated text file.

-f Full path file text to the file or directory containing Maq snp txt files.

Example: java -Xmx1000M -jar path2/USeq/Apps/MaqSnps2Bed -f /data/maqSnpFile.txt


**************************************************************************************

**************************************************************************************
**                        Make Splice Junction Fasta: Nov 2010                     **
**************************************************************************************

DEPRECIATED, don't use!  See MakeTranscriptome app!
MSJF creates a multi fasta file containing sequences representing all possible linear
splice junctions. The header on each fasta is the chr_endPosExonA_startPosExonB. The
length of sequence collected from each junction is 2x the radius. A word of warning,
be very careful about the coordinate system used in the gene table to define the
start and stop of exons.  UCSC uses interbase and this is assumed in this app. Check
a few of the junctions to be sure correct splices were made. All junction sequences
are from the top/ plus strand of the genome, they are not reverse complemented. Exon
sequence shorter than the radius will be appended with Ns.

Options:
-f Fasta file directory, should contain chromosome specific xxx.fasta files.
-u UCSC gene table file, full path. See, http://genome.ucsc.edu/cgi-bin/hgTables
-s Sequence length radius.
-r Results fasta file, full path.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/MakeSpliceJunctionFasta -s 32 
      -f /Genomes/Hg18/Fastas/ -u /Anno/Hg18/ucscKnownGenes.txt -r
      /Genomes/Hg18/Fastas/hg18_32_splices.fasta 

************************************************************************************

**************************************************************************************
**                           Make Transcriptome:  June 2012                         **
**************************************************************************************
Takes a UCSC ref flat table of transcripts and generates two multi fasta files of
transcripts and splices (known and theoretical). All possible unique splice junctions
are created given the exons from each gene's transcripts. In some cases this is
computationally intractable and theoretical splices from these are not complete.
Read through occurs with small exons to the next up or downstream so keep the sequence
length radius to a minimum to reduce the number of junctions. Overlapping exons are
assumed to be mutually exclusive. All sequence is from the plus genomic stand, no
reverse complementation. Interbase coordinates. This app can take a very long time to
run. Break up gene table by chromosome and run on a cluster. 

To incorporate additional splice-junctions, add a new annotation line containing two
exons representing the junction to the table. If needed, set the -s option to skip
duplicates. 

Options:
-f Fasta file directory, one per chromosome (e.g. chrX.fasta or chrX.fa, .gz/.zip OK)
-u UCSC RefFlat gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName transcriptName chrom strand
       txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 ENST00000329454 chr1 + 
       16203317 16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 .
-r Sequence length radius. Set to the read length - 4bp.
-n Max number splices per transcript, defaults to 100000.
-m Max minutes to process each gene's splices before interrupting, defaults to 10.
-s Skip subsequent occurrences of splices with the same coordinates. Memory intensive.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MakeTranscriptome -f /Genomes/Hg18/Fastas/
      -u /Anno/Hg18/ensemblGenes.txt.ucsc -r 46 -s 

************************************************************************************

**************************************************************************************
**                        Mask Exons In Fasta Files: June 2011                      **
**************************************************************************************
Replaces the exonic sequence with Ns.

Options:
-f Fasta file directory, one per chromosome (e.g. chrX.fasta or chrX.fa, .gz/.zip OK)
-u UCSC RefFlat gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName transcriptName chrom strand
       txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 ENST00000329454 chr1 + 
       16203317 16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 .
-s Save directory, full path.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MaskExonsInFastaFiles -f 
      /Genomes/Hg18/Fastas/ -u /Anno/Hg18/ensemblTranscripts.txt.ucsc -s 
      /Genomes/Hg18/MaskedFastas/

************************************************************************************

**************************************************************************************
**                       Mask Regions In Fasta Files: Dec 2011                      **
**************************************************************************************
Replaces the region (or non region) sequence with Ns. Interbase coordinates.

Options:
-f Fasta file directory, one per chromosome (e.g. chrX.fasta or chrX.fa, .gz/.zip OK)
-b Bed file of regions to mask.
-s Save directory, full path.
-r Mask sequence not in regions, reverse mask.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MaskRegionsInFastaFiles -f 
      /Genomes/Hg18/Fastas/ -b /Anno/Hg18/badRegions.bed -s 
      /Genomes/Hg18/MaskedFastas/

************************************************************************************

**************************************************************************************
**                               MaxEntScanScore3: Nov 2013                         **
**************************************************************************************
Implementation of Max Ent Scan's score3 algorithm for human splice site detection. See
Yeo and Burge 2004, http://www.ncbi.nlm.nih.gov/pubmed/15285897 

Options:
-s Full path directory name containing the me2x3acc1-9 splice model files. See
     USeq/Documentation/ or http://genes.mit.edu/burgelab/maxent/download/ 
-t Full path file name for 23mer test sequences, GATCgatc only, one per line. Fasta OK.

Example: java -Xmx10G -jar pathTo/USeq/Apps/MaxEntScanScore3 -s ~/MES/splicemodels -t
     ~/MES/seqsToTest.fasta 

**************************************************************************************

**************************************************************************************
**                               MaxEntScanScore5: Nov 2013                         **
**************************************************************************************
Implementation of Max Ent Scan's score5 algorithm for human splice site detection. See
Yeo and Burge 2004, http://www.ncbi.nlm.nih.gov/pubmed/15285897 

Options:
-s Full path directory containing the splice5sequences and me2x5 splice model files.
     See USeq/Documentation/ or http://genes.mit.edu/burgelab/maxent/download/ 
-t Full path file name for 9mer test sequences, GATCgatc only, one per line. Fasta OK.

Example: java -Xmx10G -jar pathTo/USeq/Apps/MaxEntScanScore5 -s ~/MES/splicemodels -t
     ~/MES/seqsToTest.fasta 

**************************************************************************************

**************************************************************************************
**                          MergeExonMetrics : June 2013                              **
**************************************************************************************
This app simply merges the output from several metrics html files.


Required:
-f Directory containing metrics dictionary files and a image directory
-o Name of the combined metrics file

Example: java -Xmx1500M -jar pathTo/USeq/Apps/MergeExonMetrics -f metrics -o 9908_metrics 
**************************************************************************************

**************************************************************************************
**                          MergePairedSamAlignments: Dec 2012                      **
**************************************************************************************
Merges proper paired alignments that pass a variety of checks and thresholds. Only
unambiguous pairs will be merged. Increases base calling accuracy in overlap and helps
avoid non-independent variant observations and other double counting issues. Identical
overlapping bases are assigned the higher quality scores. Disagreements are resolved
toward the higher quality base. If too close in quality, then the quality is set to 0.
Be certain your input bam/sam file(s) are sorted by query name, NOT coordinate. 

Options:
-f The full path file or directory containing raw xxx.sam(.gz/.zip OK)/.bam file(s)
      paired alignments that are sorted by query name (standard novoalign output).
      Multiple files will be merged.

Default Options:
-s Save file, defaults to that inferred by -f. If an xxx.sam extension is provided,
      the alignments won't be sorted by coordinate and saved as a bam file.
-a Maximum alignment score (AS:i: tag). Defaults to 120, smaller numbers are more
      stringent. Approx 30pts per mismatch for novoalignments.
-q Minimum mapping quality score, defaults to 13, larger numbers are more stringent.
      Set to 0 if processing splice junction indexed RNASeq data.
-r The second paired alignment's strand is reversed. Defaults to not reversed.
-d Maximum acceptible base pair distance for merging, defaults to 5000.
-m Don't cross check read mate coordinates, needed for merging repeat matches. Defaults
      to checking.
-o Merge all proper paired alignments. Defaults to only merging those that overlap.
-k Skip merging paired alignments. Defaults to merging. Useful for testing effect of
      merging on downstream analysis.

Example: java -Xmx1500M -jar pathToUSeq/Apps/MergePairedSamAlignments -f /Novo/Run7/
     -c -s /Novo/STPParsedBams/run7.bam -d 10000 

**************************************************************************************

**************************************************************************************
**                             Merge Point Data: Jan 2011                           **
**************************************************************************************
Efficiently merges PointData, collapsing by position and possibly strand. Identical
position scores are either summed or converted into counts. DO NOT use this app on
PointData that will be part of a primary chIP/RNA-seq analysis.  It is only for
bis-seq and visualization purposes.

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. Alternatively, provide one directory
       containing multiple PointData directories.
-s Save directory, full path.
-c Don't replace scores with hit count, just sum existing scores.
-m Merge strands

Example: java -Xmx1500M -jar pathTo/USeq/Apps/MergePointData -p
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -s /Data/MergedEts1 -m 

**************************************************************************************

**************************************************************************************
**                             Merge Regions: May 2009                              **
**************************************************************************************
Flattens tab delimited bed files (chr start stop ...). Assumes interbase coordinates.

Options:
-d Directory containing bed files.

Example: java -Xmx4000M -jar pathTo/Apps/MergeRegions -d /Anno/TilingDesign/

************************************************************************************

**************************************************************************************
**                           Merge UCSC Gene Table: Feb  2013                       **
**************************************************************************************
Merges transcript models that share the same gene name (in column 0). Maximizes exons,
minimizes introns. Assumes interbase coordinates.

Options:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (geneName name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). 

Example: java -Xmx4G -jar pathTo/USeq/Apps/MergeUCSCGeneTable -u 
      /data/zv8EnsemblGenes.ucsc.gz

**************************************************************************************

**************************************************************************************
**                        Methylation Array Scanner: May 2014                       **
**************************************************************************************
MAS takes paired or non-paired sample PointData representing beta values (0-1) from
arrays and scores regions with enriched/ reduced signal using a sliding window
approach. A B&H corrected Wilcoxon signed rank (or rank sum test for non-paired),
pseudo median of the log2(treat/control) ratios (or log2(pseT/pseC) for non-paired),
and permutation test FDR is calculated for each window. Use the EnrichedRegionMaker
to identify enriched and reduced regions by picking thresholds (e.g. -i 0,1 -s 0.2,13).
MAS generates several data tracks for visualization in IGB including paired sample bp
log2 ratios, window level Wilcoxon FDRs, and window level pseudomedian log2 ratios. 
Note, non-paired analysis are very underpowered and require > 30 obs/ window to see
any significant FDRs.

Required Options:
-s Path to a directory for saving the results.
-d Path to a directory containing individual sample PointData directories, each of
      which should contain chromosome split bar files (e.g. chr1.bar, chr2.bar, ...)
-t Names of the treatment sample directories in -d, comma delimited, no spaces.
-c Ditto but for the control samples, the ordering is critical and describes how to
      pair the samples for a paired analysis.

Advanced Options:
-n Run a non-paired analysis where t and c are treated as groups and pooled.
-w Window size, defaults to 1000.
-o Minimum number observations in window, defaults to 10.
-p Minimum pseudomedian log2 ratio for estimating the permutation FDR, defaults to 0.2
-r Number permutations, defaults to 5

Example: java -Xmx4G -jar pathTo/USeq/Apps/MethylationArrayScanner -s ~/MAS/Res
     -d ~/MAS/Bar/ -t Early1,Early2,Early3 -c Late1,Late2,Late3
     -w 1500

**************************************************************************************

**************************************************************************************
**                    Methylation Array Defined Region Scanner: July 2013           **
**************************************************************************************
MADRS takes paired sample PointData representing beta values (0-1) from arrays and
a list of regions to score for differential methylation using a B&H corrected Wilcoxon
signed rank test and pseudo median of the paired log2(treat/control) ratios. Pairs
containing a zero value are ignored. It generates a spreadsheet of statistics for each
region. If a non-paired analysis is selected, a Wilcoxon rank sum test and
log2(pseT/pseC) are calculated on each region. Note this is a very underpowered test
requiring >30 observations to see any significant FDRs.

Required Options:
-b A bed file of regions to score (tab delimited: chr start stop ...)
-d Path to a directory containing individual sample PointData directories, each of
      which should contain chromosome split bar files (e.g. chr1.bar, chr2.bar, ...)
-t Names of the treatment sample directories in -d, comma delimited, no spaces.
-c Ditto but for the control samples, the ordering is critical and describes how to
      pair the samples for a paired analysis.
-o Minimum number paired observations in window, defaults to 3.
-z Skip printing regions with less than minimum observations.
-n Run a non-paired analysis where t and c are treated as groups and pooled. Uneven
      numbers of t and c are allowed.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MethylationArrayDefinedRegionScanner 
     -v H_sapiens_Feb_2009 -d ~/MASS/Bar/ -t Early1,Early2,Early3
     -c Late1,Late2,Late3 

**************************************************************************************

**************************************************************************************
**                       Microsatellite Counter: Jan 2014                           **
**************************************************************************************
MicrosatelliteCounter identifies and counts microsatellite repeats in MiSeq fastq 
files. This iteration of the software requires you to specify the primers used in the
sequencing project.  It will automatically find the most likely microsatellite by 
looking at all possible repeats of length 1 through length 10 and finding the longest
repeat by length, not repeat unit.  There are two output files generated, the first 
lists primer statistics (currently only reads with both primers are used), the 
second lists repeat data.  Note that the input file are fastq sequence that were 
merged using a program like PEAR


Required Arguments:

-f Merged fastq file. Path to merged fastq file. We currently suggest using PEAR to 
       merge fastq sequences.
-p Primer file.  Path to primer reference file.  This file lists each primer used in 
       in the sequencing project in the format NAMEPRIMER1PRIMER2.
-n Sample name.  Sample name.  This string will be appended to the output files names.
-d Directory. Output directory. Output files will be written to this directory
-b Require both primers.  Both primers must be identified in order to more forward 
       with the analysis.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MicrosatelliteCounter -f Merged.fastq 
      -r PrimerReference.txt -p 10511X1.primer.txt -o 10511X1.repeat.txt

**************************************************************************************

Merged fastq file not specified, exiting.

**************************************************************************************
**                            MiRNA Correlator: March 2014                          **
**************************************************************************************
Generates a spreadsheet to use in comparing changing miRNA levels to changes in gene
expression.

Options:
-r Results file.
-a All miRNA name file (single column of miRNA names).
-m MiRNA data (two columns: miRNA name, miRNA log2Rto).
-t Gene target to miRNA data (two columns: gene target name, miRNA name).
-e Gene expression data (three columns: gene name, log2Rto, FDR).
-f Don't print the gene expression FDR value in the spreadsheet.

Example: java -Xmx4G -jar pathTo/USeq/Apps/MiRNACorrelator -m miRNA_CLvsMOR.txt -a 
allMiRNANamesNoPs.txt -t targetGene2MiRNA.txt -e geneExp_CLvsMOR.txt -r results.xls

**************************************************************************************

**************************************************************************************
**                        Multiple Replica Scan Seqs: May 2014                      **
**************************************************************************************
MRSS uses a sliding window and Ander's DESeq negative binomial pvalue -> Benjamini & 
Hochberg AdjP statistics to identify enriched and reduced regions in a genome. Both
treatment and control PointData sets are required, one or more biological replicas.
MRSS generates window level differential count tracks for the AdjP and normalized
log2Ratio as well as a binary window objec xxx.swi file for downstream use by the
EnrichedRegionMaker. MRSS also makes use of DESeq's variance corrected count data to
cluster your biological replics. Given R's poor memory management, running DESeq
requires lots of RAM, 64bit R, and 1-3 hrs.

Options:
-s Save directory, full path.
-t Treatment replica PointData directories, full path, comma delimited, no spaces,
       one per biological replica. Use the PointDataManipulator app to merge same
       replica and technical replica datasets. Each directory should contain stranded
       chromosome specific xxx_-/+_.bar.zip files. Alternatively, provide one
       directory that contains multiple biological replical PointData directories.
-c Control replica PointData directories, ditto. 
-r Full path to 64bit R loaded with DESeq library, defaults to '/usr/bin/R' file, see
       http://www-huber.embl.de/users/anders/DESeq/ . Type 'library(DESeq)' in
       an R terminal to see if it is installed.
-p Peak shift, average distance between + and - strand peaks for chIP-Seq data, see
       PeakShiftFinder or set it to 100bp. For RNA-Seq set it to 0. It will be used
       to shift the PointData by 1/2 the peak shift.
-w Window size, defaults to the peak shift. For chIP-Seq data, a good alternative 
       is the peak shift plus the standard deviation, see the PeakShiftFinder app.
       For RNA-Seq data, set this to 100-250.

Advanced Options:
-m Minimum number of reads in a window, defaults to 15
-d Don't delete temp files

Example: java -Xmx4G -jar pathTo/USeq/Apps/MultipleReplicaScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults/ -p 150 -w 250 -b 

**************************************************************************************

Required:
-v Full path to a sorted multi sample vcf file (xxx.vcf/xxx.vcf.gz)).
-p Full path to the output VCF (xxx.vcf/xxx.vcf.gz). Specifying xxx.vcf.gz will
compress and index the VCF using tabix (set -t too).

Optional:
-f Print out failing records, defaults to printing those passing the filters.
-a Fail records where no sample passes the sample thresholds.
-i Fail records where the original FILTER field is not 'PASS' or '.'
-b Filter by genotype flags. -n, -u and -l must be set.
-n Sample names ordered by category.
-u Number of samples in each category.
-l Requirement flags for each category. All samples that pass the specfied filters
must meet the flag requirements, or the variant isn't reported. At least one
sample in each group must pass the specified filters, or the variant isn't
reported.
a) 'W' : homozygous common
b) 'H' : heterozygous
c) 'M' : homozygous rare
d) '-W' : not homozygous common
e) '-H' : not heterozygous
f) '-M' : not homozygous rare
-e Strict genotype matching. If this is selected, records with no-call samples
or samples falling below either minimum sample genotype quality (-g) or
minimum sample read depth (-r) won't be reported. Only samples listed in (-n) will be checked
-d Minimum record QUAL score, defaults to 0, recommend >=20 .
-g Minimum sample genotype quality GQ, defaults to 0, recommend >= 20 .
-r Minimum sample read depth DP, defaults to 0, recommend >=10 .
-s Print sample names and exit.
-t Path to tabix

Example: java -Xmx10G -jar pathTo/USeq/Apps/MultiSampleVCFFilter
-v DEMO.passing.vcf.gz -p DEMO.intersection.vcf.gz -b
-n SRR504516,SRR776598,SRR504515,SRR504517,SRR504483 -u 2,2,1 -l M,H,-M

**************************************************************************************

**************************************************************************************
**                        Novoalign Bisulfite Parser: Dec 2013                      **
**************************************************************************************
Parses Novoalign -b2 and -b4 single and paired bisulfite sequence alignment files into
PointData file formats. Generates several summary statistics on converted and non-
converted C contexts. Flattens overlapping reads in a pair to call consensus bps.
Note: for paired read RNA-Seq data run through the SamTranscriptomeParser first.

Options:
-a Alignment file or directory containing novoalignments in SAM/BAM
      (xxx.sam(.zip/.gz OK) or xxx.bam) format. Multiple files are merged.
-f Fasta file directory, chromosome specific xxx.fa/.fasta(.zip/.gz OK) files.
-s Save directory.
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Default Options:
-p Print bed file parsed data.
-x Maximum alignment score. Defaults to 300, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect. For RNASeq data, set this to 0.
-b Minimum base quality score for reporting a non/converted C, defaults to 13.
-c Minimum base quality score for reporting a overlapping non/converted C not found
      in the other pair, defaults to 13.
-d Remove duplicate reads prior to generating PointData. Defaults to not removing
      duplicates.

Example: java -Xmx25G -jar pathToUSeq/Apps/NovoalignBisulfiteParser -x 240 -a
      /Novo/Run7/ -f /Genomes/Hg19/Fastas/ -v H_sapiens_Feb_2009 -s /Novo/Run7/NBP 

**************************************************************************************

**************************************************************************************
**                         Novoalign Indel Parser: June 2010                        **
**************************************************************************************
Parses Novoalign alignment xxx.txt(.zip/.gz) files for consensus indels, something
currently not supported by the maq apps. Generates a consensus indel allele file,
interbase coordinates, for running through the Alleler application. Also creates two
bed files for the insertions and deletions.

Options:
-f The full path directory/file text of your Novoalign xxx.txt(.zip or .gz) file(s).
-r Full path directory for saving the results.
-p Minimum alignment posterior probability (-10Log10(prob)) of being incorrect,
      defaults to 13 (0.05). Larger numbers are more stringent.
-b Minimum effected indel base quality score(s), ditto, defaults to 13.
-u Minimum number of unique reads covering indel, defaults to 2.

Example: java -Xmx1500M -jar pathToUSeq/Apps/NovoalignIndelParser -f /Novo/Run7/
     -r /Novo/Run7/indelAlleleTable.txt -p 20 -b 20 -u 3 
**************************************************************************************

**************************************************************************************
**                            Novoalign Parser: Jan 2011                            **
**************************************************************************************
Parses Novoalign xxx.txt(.zip/.gz) files into center position binary PointData xxx.bar
files, xxx.bed files, and if appropriate, a splice junction bed file. For the later,
create a gene regions bed file and run it through the MergeRegions application to
collapse overlapping transcripts. We recommend using the following settings while
running Novoalign 'novoalign -r0.2 -q5 -d yourDataBase -f your_prb.txt | grep '>chr' >
yourResultsFile.txt'. NP works with native, colorspace, and miRNA novoalignments. 

Options:
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-f The full path directory/file text of your Novoalign xxx.txt(.zip or .gz) file(s).
-r Full path directory text for saving the results.
-p Posterior probability threshold (-10Log10(prob)) of being incorrect, defaults to 13
      (0.05). Larger numbers are more stringent. The parsed scores are delogged and
      converted to 1-prob.
-q Alignment score threshold, smaller numbers are more stringent, defaults to 60
-c Chromosome prefix, defaults to '>chr'.
-i Ignore strand when making splice junctions.
-g (Optional) Full path gene region bed file (chr start stop...) containing gene
      regions to use in scaling intersecting splice junctions.
-s Just print alignment stats, don't save any data.

Example: java -Xmx1500M -jar pathToUSeq/Apps/NovoalignParser -f /Novo/Run7/
     -v H_sapiens_Mar_2006 -p 20 -q 30 -r /Novo/Run7/mRNASeq/ -i -g
     /Anno/Hg18/mergedUCSCKnownGenes.bed 

**************************************************************************************

**************************************************************************************
**                        Novoalign Paired Parser: January 2009                     **
**************************************************************************************
Parses Novoalign paired alignment files xxx.txt(.zip/.gz) into xxx.bed format.

Options:
-f The full path directory/file text of your Novoalign xxx.txt(.zip or .gz) file(s).
-e Exclude half matches with a high quality unmatched pair, defaults to keeping them.
-m Maximum size for paired reads mapping to the same chromosome, defaults to 100000.
-s Splice junction radius, defaults to 34. See the MakeSpliceJunctionFasta app.

Example: java -Xmx1500M -jar pathToUSeq/Apps/NovoalignPairedParser -f /Novo/Run7/
 

**************************************************************************************

**************************************************************************************
**                              Oligo Tiler: Oct 2009                               **
**************************************************************************************
OT tiles oligos across genomic regions returning their forward and reverse sequences.
Won't tile oligos with non GATC characters, case insensitive. Replaces non GATC chars
in offset regions with 'a'. Note, the defaults are set for generating a 60 mer Agilent
specific tiling microarray design where the first 10bp of the 3' stop are buried in the
matrix and the effective oligo length is 50bp. Adjust accordingly for other platforms.

Options:
-f Fasta file directory, should contain chromosome specific xxx.fasta files.
-r Regions file to tile (tab delimited: chr start stop ...) interbase coordinates.
-o Effective oligo size, defaults to 50.
-s Spacing to place oligos, defaults to 25.
-t Three prime offset, defaults to 10.
-m Minimum size of region to tile, defaults to 20.
-a Print oligo FASTA instead of an Agilent eArray text seq formatted results.
-c Tile CpG (spacing not used, see max gap option).
-g Max gap between adjacent CpGs to include in same oligo, defaults to 8.
-e Split export files by strand instead of alternating strand.
-b Replace 3' stop of oligos with the human 11-nullomer 'ccgatacgtcg'. The first
       ~10bp don't contribute to hybridization on Agilent arrays.

Example: java -Xmx4000M -jar pathTo/Apps/OligoTiler -s 40 -f /Genomes/Hg18/Fastas/ 
     -r /Designs/cancerArray.bed -p -a 

************************************************************************************

**************************************************************************************
**                        Overdispersed Region Scan Seqs: May 2012                  **
**************************************************************************************

WARNING: this application is depreciated and no longer maintained, use the
DefinedRegionDifferentialSeq app instead!

ORSS takes bam alignment files and extracts reads under each region or gene's exons to
calculate several statistics. Makes use of Simon Anders' DESeq R package to with its
negative binomial p-value test to control for overdispersion. A Benjamini-Hochberg FDR
correction is used to control for multiple testing. DESeq is run with and without
variance outlier filtering. A chi-square test of independence between the exon read
count distributions is used to score alternative splicing. Several read count measures
are provided including counts for each replica, FPKMs (# frags per kb of int region
per total mill mapped reads) as well as DESeq's variance adjusted counts(use these for
clustering, correlation, and other distance type analysis). If replicas are provided
either the smallest all pair log2Ratio is reported (default) or the pseudomedian.
Several results files are written: two spread sheets containing all of the genes,
those that pass the thresholds, as well as egr, bed12, and useq region files for
visualization in genome browsers.

Required Options:
-s Save directory.
-t Treatment directory containing one xxx.bam file with xxx.bai index per biological
       replica. The BAM files should be sorted by coordinate and have passed Picard
       validation. Use the SamTranscriptomeParser to convert your aligned transcriptome
       data to genomic coordinates.
-c Control directory, ditto. 
-u UCSC RefFlat or RefSeq Gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds). WARNING!!!!!!
       This table should contain only one composite transcript per gene. Use the
       MergeUCSCGeneTable app to collapse Ensembl transcripts downloaded from UCSC in
       RefFlat format.
-b (Or) a bed file (chr, start, stop,...), full path, See,
       http://genome.ucsc.edu/FAQ/FAQformat#format1
-v Versioned Genome (ie H_sapiens_Mar_2006, D_rerio_Jul_2010), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases. 

Advanced/ Default Options:
-o Don't remove overlapping exons, defaults to filtering gene annotation for overlaps.
-i Score introns instead of exons.
-a Data is stranded. Only collect reads from the same strand as the annotation.
-f Minimum FDR threshold, defaults to 10 (-10Log10(FDR=0.1))
-l Minimum absolute log2 ratio threshold, defaults to 1 (2x)
-e Minimum number mapping reads per region, defaults to 20
-d Don't delete temp files used by DESeq
-p Use a pseudo median log2 ratio in place of the smallest all pair log2 ratios for
      scoring the degree of differential expression when replicas are present.
      Recommended for experiments with 4 or more replicas.
-r Full path to R loaded with DESeq library, defaults to '/usr/bin/R' file, see
       http://www-huber.embl.de/users/anders/DESeq/ . Type 'library(DESeq)' in
       an R terminal to see if it is installed.

Example: java -Xmx4G -jar pathTo/USeq/Apps/OverdispersedRegionScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults/ -f 30 -e 30 -u /Anno/mergedZv9EnsemblGenes.ucsc.gz

**************************************************************************************

**************************************************************************************
**                         Create Exon Summary Metrics : April 2013                 **
**************************************************************************************
This script runs a bunch of summary metric programs and compiles the results.  It uses
R and LaTex to generate a fancy pdf as an output. Can also genrate html 


Required:
-a Alignment statistics from Picard's CollectAlignmentMetrics
-b Alignment counts from USeq's CountChromosome
-c Coverage of CCDS exons from USeq's Sam2USeq
-d Duplication statics from Picard's MarkDuplicates
-e Error rate from USeq's CalculatePerCycleErrorRate
-f Overlap Statistics from USeq's MergePaired Sam Alignment
-o Output file name
Optional
-r Path to R
-l Path to pdflatex
-t Generate html instead
-i Generate dictionary (for pipeline)
-c Coverage file name 


Example: java -Xmx1500M -jar pathTo/USeq/Apps/VCFAnnovar -v 9908R.vcf                 
**************************************************************************************

Alignment file not specified, exiting

**************************************************************************************
**                        ParseIntersectingAlignments: June 2010                    **
**************************************************************************************
Parses bed alignment files for intersecting reads provided another bed file of alleles.

Options:
-s Full path file text for your SNP allele five column bed file (tab delimited chr,
      start,stop,text,score,strand)
-a Full path file text for your alignment bed file from the NovoalignParser.
-m Minimum base quality, defaults to 13

Example: java -Xmx1500M -jar pathToUSeq/Apps/ParseIntersectingAlignments 
     -s /LympAlleles/ex1.bed -a /SeqData/lymphAlignments.bed -m 13

**************************************************************************************

**************************************************************************************
**                           ParsePointDataContexts: Feb 2011                       **
**************************************************************************************
Parses PointData for particular 5bp genomic sequence contexts.

Options:
-s Save directory, full path.
-p PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged before splitting by summing overlapping
       position scores.
-f Fasta files for each chromosome.
-c Context java regular expression, must be 5bp long, 5'->3', case insensitive, e.g.:
       '..CG.' for CG
       '..C[CAT]G' for CHG
       '..C[CAT][CAT]' for CHH
       '..C[CAT].' for nonCG
       '..C[^G].' for nonCG


Example: java -Xmx12G -jar pathTo/USeq/Apps/ParsePointDataContexts -c '..CG.' -s
      /Data/PointData/CG -f /Genomes/Hg18/Fastas -p /Data/PointData/All/  

**************************************************************************************

**************************************************************************************
**                               PeakShiftFinder: May 2010                          **
**************************************************************************************
PeakShiftFinder estimates the bp difference between sense and antisense proximal chIP-
seq peaks. It calculates the shift int two ways: by generating a composite peak from a
set of the top peaks in a dataset and by taking the median shift for the top peaks.
The latter appears more reliable for some datasets. Inspect the results in IGB by
loading the xxx.bar graphs. When in doubt, run ScanSeqs with just your
treatment data setting the peak shift to 0 and window size to 50 and manually inspect
the shift in IGB.

Options:
-t Treatment Point Data directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_+_.bar.zip and xxx_-_.bar.zip  files.
-c Control Point Data directories, ditto. 
-s Save directory, full path.

Advanced Options:
-e Two chIP samples are provided, no input, scan for reduced peaks too.
-w Window size in bps, defaults to 50.
-a Minimum number window reads, defaults to 10
-d Minimum normalized window score, defaults to 2.5
-r Minimum fold of treatment to control window reads, defaults to 5
-n Number of peaks to merge for composite, defaults to 100
-p Distance off peak center to collect from 5' stop, defaults to 500
-m Distance off peak center to collect from 3' stop, defaults to 1000

Example: java -Xmx1500M -jar pathTo/USeq/Apps/PeakShiftFinder -t
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -c /Data/Input1/,Data/Input2/ -s
      /Results/Ets1PeakShiftResults -w 25 -d 5

**************************************************************************************

**************************************************************************************
**                            Point Data Manipulator: Oct 2010                      **
**************************************************************************************
Manipulates PointData to merge strands, shift base positions, replace scores with 1
and sum identical positions. If multiple PointData directories are given, the data is
merged. 

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. Alternatively, provide one directory
       containing multiple PointData directories.
-s Save directory, full path.
-o Replace PointData scores with 1
-d Shift base position XXX bases 3', defaults to 0
-i Sum identical base position scores
-m Merge strands

Example: java -Xmx1500M -jar pathTo/USeq/Apps/PointDataManipulator -p
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -s /Data/MergedEts1 -o -i -m 

**************************************************************************************

**************************************************************************************
**                            Primer3 Wrapper: Dec  2006                            **
**************************************************************************************
Wrapper for the primer3 application. Extracts sequence, formats for primer3, executes
and parses the output to a spreadsheet. See http://frodo.wi.mit.edu/primer3/

-f Full path file text for your sequence file, tab delimited, sequence in 1st column.
-s Pick small product sizes (45-80bp), defaults to standard (80-150bp)
-p Full path file text for the primer3_core application. Defaults to
     /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/src/primer3_core
-m Full path file text for the mispriming library. Defaults to
     /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/
     cat_humrep_and_simple.cgi.txt

Example: java -jar pathTo/T2/Apps/Primer3Wrapper -f /home/dnix/seqForQPCR.txt -p
    /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/src/primer3_core
    -m /nfs/transcriptome/software/noarch/T2/64Bit_Primer3_1.0.0/
    cat_humrep_and_simple.cgi.txt -s 
**************************************************************************************

**************************************************************************************
**                           Print Select Columns: Sept 2010                        **
**************************************************************************************
Spread sheet manipulation.

Required Parameters:
-f Full path file or directory text for tab delimited text file(s)
-i Column indexs to print, comma delimited, no spaces
-n Number of initial lines to skip
-l Print only this last number of lines
-c Column word to append onto the start of each line
-r Append a row number column as the first column in the output
-d Append f ile text onto the start of each line
-s Skip blank lines and those with less than the indicated number of columns.
-a Print all available columns.

Example: java -jar pathTo/T2/PrintSelectColumns -f /TabFiles/ -i 0,3,9 -n 1 -c chr

**************************************************************************************

**************************************************************************************
**                                 QCSeqs: Nov 2009                                 **
**************************************************************************************
QCSeqs takes directories of chromosome specific PointData xxx.bar.zip files that 
represent replicas of signature sequencing data, merges the strands, uses a sliding
window to sum the hits, and calculate Pearson correlation coefficients for the window
sums between each pair of replicas.  Only windows with a sum score >= the minimum 
are included in the correlation.

-d Split chromosome Point Data directories, full path, comma delimited. (These should
       contain chromosome specific xxx.bar.zip files). 
-t Temp directory, full path. This will be created and then deleted.
-w Window size in bps, defaults to 500.
-s Window step size in bps, defaults to 250.
-m Minimum window sum score, defaults to 5.
-e (Optional) Provide a full path file name in which to write the window sums.

Example: java -Xmx1500M -jar pathTo/USeqs/Apps/QCSeqs -d /Solexa/PolII/Rep1PntData/,
      /Solexa/PolII/Rep2PntData/ -t /Solexa/PolII/TempDelMe -w 1000 -s 250 

**************************************************************************************

**************************************************************************************
**                                  Qseq2Fastq: Aug 2010                            **
**************************************************************************************
Parses, filters out reads failing QC, compresses and converts single and paired read
qseq files to Illumina fastq format. Does not concatinate tiles.

Required Parameters:
-q Qseq directory. This should contain all of the qseq files for a sequencing run,
       multiple lanes, paired and single reads. (e.g. s_5_1_0025_qseq.txt(.gz/zip OK))

Optional Parameters:
-f Fastq save directory. Defaults to the qseq directory.
-a Keep all reads. Defaults to removing those failing the QC flag. Paired reads are
       only removed if both reads fail QC.
-p Print full fastq headers. Defaults to using read count.
-d Delete qseq files upon successfull parsing of all files. Be carefull!
-s Silence non error output.

Example: java -Xmx2G -jar pathTo/USeq/Apps/Qseq2Fastq -f /Runs/7/Fastq -q
      /Runs/100726_SN141_0265_A207D4ABXX/Data/Intensities/BaseCalls 

**************************************************************************************

**************************************************************************************
**                            Randomize Text File: May 2013                         **
**************************************************************************************
Randomizes the lines of a text file(s).

Options:
-f Full path to a text file or directory containing such to randomize. Gzip/zip OK.
-n Number of lines to print, defaults to all.

Example: java -Xmx4G -jar pathTo/Apps/RandomizeTextFile -n 24560 -f
       /TilingDesign/oligos.txt.gz

************************************************************************************

**************************************************************************************
**                          Ranked Set Analysis: Jan 2006                           **
**************************************************************************************
RSA performs set analysis (intersection, union, difference) on lists of
genomic regions (tab delimited: chrom, start, stop, score, (optional notes)).

-a Full path file text for the first list of genomic regions.
-b Full path file text for the second list of genomic regions.
-d (Optional) Full path directory containing region files for all pair analysis.
-m Max gap, bps, set negative to force an overlap, defaults to -100
-s Save comparison as a PNG, default is no.

Example: java -jar pathTo/T2/Apps/RankedSetAnalysis -a /affy/nonAmpA.txt -b
      /affy/nonAmpB.txt -s

**************************************************************************************

**************************************************************************************
**                               Read Coverage: Feb 2012                            **
**************************************************************************************
Generates read coverage stair-step xxx.bar graph files for visualization in IGB. Will
also calculate per base coverage stats for a given file of interrogated regions and
create a bed file of regions with low coverage based on the minimum number of reads.
By default, graph values are scaled per million mapped reads.

Options:
-p Point Data directories, full path, comma delimited. Should contain chromosome
       specific xxx.bar.zip or xxx_-_.bar files. Can also provide one dir containing
       PointData dirs.
-s Save directory, full path.
-k Data is stranded, defaults to merging strands while generating graphs.
-a Data contains hit counts due to running it through the MergePointData app.
-r Don't scale graph values. Leave as actual read counts. 
-i (Optional) Full path file text for a tab delimited bed file (chr start stop ...)
       containing interrogated regions to use in calculating a per base coverage
       statistics. Interbase coordinates assumed.
-m Minimum number reads for defining good coverage, defaults to 8. Use this in combo
       with the interrogated regions file to identify poor coverage regions.
-b Just calculate stats, skip coverage graph generation.
-l Plus scalar, for stranded RC output, defaults to # plus observations/1000000
-n Minus scalar, for stranded RC output, defaults to # minus observations/1000000
-c Combine scaler, defaults to # observations/1000000

Example: java -Xmx1500M -jar pathTo/USeq/Apps/ReadCoverage -p
      /Data/Ets1Rep1/,/Data/Ets1Rep2/ -s /Data/MergedHitTrckEts1 -i 
      /CapSeqDesign/interrogatedExonsChrX.bed

**************************************************************************************

**************************************************************************************
**                            Reference Mutator  : Aug 2014                         **
**************************************************************************************
Takes a directory of fasta chromosome sequence files and converts the reference allele
to the alternate provided by a snp mapping table.

Required:
-f Full path to a directory containing chromosome specific fasta files. zip/gz OK.
-t Full path to a snp mapping table.
-s Full path to a directory to save the alternate fasta files.

Example: java -Xmx10G -jar pathTo/USeq/Apps/ReferenceMutator -f /Hg19/Fastas
    -s /Hg19/AltFastas/ -t /Hg19/omni2.5SnpMap.txt

**************************************************************************************

**************************************************************************************
**                          RNA Editing PileUp Parser: June 2013                    **
**************************************************************************************
Parses a SAMTools mpileup output file for refseq A bases that show evidence of
RNA editing via conversion to Gs, stranded. Base fraction editing is calculated for
bases passing the thresholds for viewing in IGB and subsequent clustering with
the RNAEditingScanSeqs app. The parsed PointData can be further processed using the
methylome analysis applications.

Options:
-p Path to a mpileup file (.gz or.zip OK, use 'samtools mpileup -Q 13 -A -B' params).
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-s Save directory, full path, defaults to pileup file directory.
-r Minimum read coverage, defaults to 5.
-t Generate stranded specific reference calls, defaults to non stranded. Required for
      stranded down stream analysis.
-m Skip processing chrM.

Example: java -Xmx4G -jar pathTo/USeq/Apps/RNAEditingPileUpParser -t -p 
      /Pileups/N2.mpileup.gz -v C_elegans_Oct_2010

**************************************************************************************

**************************************************************************************
**                           RNA Editing Scan Seqs: April 2014                      **
**************************************************************************************
RESS attempts to identify clustered editing sites across a genome using a sliding
window approach.  Each window is scored for the pseudomedian of the base fraction
edits as well as the probability that the observations occured by chance using a
permutation test based on the chiSquare goodness of fit statistic. 

Options:
-s Save directory, full path.
-e Edited PointData directory from the RNAEditingPileUpParser.
       These should contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged when scanning.
-r Reference PointData directory from the RNAEditingPileUpParser. Ditto.

Advanced Options:
-a Minimum base read coverage, defaults to 5.
-b Minimum base fraction edited to use in analysis, defaults to 0.01
-w Window size, defaults to 50.
-p Minimum window pseudomedian, defaults to 0.005.
-m Minimum number observations in window, defaults to 3. 
-t Run a stranded analysis, defaults to non-stranded.
-i Remove base fraction edits that are non zero and represented by just one edited
       base.

Example: java -Xmx4G -jar pathTo/USeq/Apps/RNAEditingScanSeqs -s /Results/RESS -p 0.01
-e /PointData/Edited -r /PointData/Reference 

**************************************************************************************

The pipeline:
1) Converts raw sam alignments containing splice junction coordinates into genome
coordinates outputting sorted bam alignemnts.
2) Makes relative read depth coverage tracks.
3) Scores known genes for differential exonic and intronic expression using DESeq2
and alternative splicing with a chi-square test.
4) Identifies unannotated differentially expressed transfrags using a window
scan and DESeq2.

Use this application as a starting point in your transcriptome analysis.

Options:
-s Save directory, full path.
-t Treatment alignment file directory, full path. Contained within should be one
directory per biological replica, each containing one or more raw
SAM (.gz/.zip OK) files.
-c Control alignment file directory, ditto.
-n Data is stranded. Only analyze reads from the same strand as the annotation.
-j Reverse stranded analysis. Only count reads from the opposite strand of the
annotation. This setting should be used for the Illumina's strand-specific dUTP protocol.
-k Flip the strand of the second read pair.
-b Reverse the strand of both pairs. Use this option if you would like the orientation
of the alignments to match the orientation of the annotation in Illumina stranded
dUTP sequencing.
-x Max per base alignment depth, defaults to 50000. Genes containing such high
density coverage are ignored.
-v Genome version (e.g. H_sapiens_Feb_2009, M_musculus_Jul_2007), see UCSC FAQ,
http://genome.ucsc.edu/FAQ/FAQreleases.
-g UCSC RefFlat or RefSeq gene table file, full path. Tab delimited, see RefSeq Genes
http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
(commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 . NOTE:
this table should contain only ONE composite transcript per gene (e.g. use
Ensembl genes NOT transcripts). Use the MergeUCSCGeneTable app to collapse
transcripts to genes. See the RNASeq usage guide for details.
-r Full path to R, defaults to '/usr/bin/R'. Be sure to install DESeq2, gplots, and
qvalue Bioconductor packages.

Advanced Options:
-m Combine replicas and run single replica analysis using binomial based statistics,
defaults to DESeq and a negative binomial test.
-a Maximum alignment score. Defaults to 120, smaller numbers are more stringent.
-o Don't delete overlapping exons from the gene table.
-e Print verbose output from each application.
-p Run SAMseq in place of DESeq. This is suggested when you have five or more
replicates in each condition, and not suggested if you have fewer. Note
that it can't be run if you don't have at least two replicates per condition

Example: java -Xmx2G -jar pathTo/USeq/Apps/RNASeq -v D_rerio_Dec_2008 -t
/Data/PolIIMut/ -c /Data/PolIIWT/ -s
/Data/Results/MutVsWT -g /Anno/zv8Genes.ucsc

**************************************************************************************

**************************************************************************************
**                            RNA Seq Simulator: Aug 2011                           **
**************************************************************************************
RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple
replica, differential, non-stranded RNA-Seq datasets. 

Options:
-u UCSC RefFlat or RefSeq gene table file, full path. See,
       http://genome.ucsc.edu/cgi-bin/hgTables, (name1 name2(optional) chrom strand
       txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds)
-p PointData directories, full path, comma delimited. These should contain parsed
       PointData (chromosome specific xxx_-/+_.bar.zip files) from running the
       NovoalignParser on all of your novoaligned RNA-Seq data. 
-n A full path directory name containing 3 or 4 equally split, randomized alignment
       xxx.sam (.zip or .gz) files. One for each replica you wish to simulate. Use the
       RandomizeTextFile and FileSplitter apps to generate these.

Default Options:
-g Number of genes to make differentially expressed, defaults to 500
-r Minimum number of mapped reads to include a gene in the differential expression
       defaults to 50.
-a Smallest skew factor for differential expression, defaults to 0.2
-b Largest skew factor for differential expression, defaults to 0.8
-c Smallest excluded skew factor for differential expression and for overdispersion,
       defaults to 0.45
-d Largest excluded skew factor for differential expression and for overdispersion,
       defaults to 0.55
-o Don't overdisperse datasets, defaults to overdispersing data using -c and -d params.
-s Skip intersecting genes.

Example: java -Xmx12G -jar pathTo/USeq/Apps/RNASeqSimulator -u 
       /anno/hg19RefFlatKnownGenes.ucsc.txt -p /Data/Heart/MergedPointData/ -n
       /Data/Heart/SplitSAM/ -s 46 -r 15 -g 1000 

**************************************************************************************

**************************************************************************************
**                               Sam 2 Fastq: March 2012                            **
**************************************************************************************
Extracts the original Illumina fastq data from single or paired end sam alignments.
Assumes alignments and reads are in the same order. In novoalign, set -oSync .

Options:
-a Sam alignment txt file, full path, .gz/.zip OK.
-f First read fastq file, ditto.
-s (Optional) Second read fastq file, from paired read sequencing, ditto.

Example: java -Xmx1G -jar pathToUSeq/Apps/Sam2Fastq -a /SAM/unaligned.sam.gz -f 
     /Fastq/X1_110825_SN141_0377_AD06YNACXX_1_1.txt.gz -s 
     /Fastq/X1_110825_SN141_0377_AD06YNACXX_1_2.txt.gz

**************************************************************************************

**************************************************************************************
**                                Sam 2 USeq : May 2014                             **
**************************************************************************************
Generates per base read depth stair-step graph files for genome browser visualization.
By default, values are scaled per million mapped reads with no score thresholding. Can
also generate a list of regions that pass a minimum coverage depth.

Required Options:
-f Full path to a bam or a sam file (xxx.sam(.gz/.zip OK) or xxx.bam) or directory
      containing such. Multiple files are merged.
-v Versioned Genome (ie H_sapiens_Mar_2006, D_rerio_Jul_2010), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.

Default Options:
-s Generate strand specific coverage graphs.
-m Minimum mapping quality score. Defaults to 0, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect.
-a Maximum alignment score. Defaults to 1000, smaller numbers are more stringent.
-r Don't scale graph values. Leave as actual read counts. 
-e Scale repeat alignments by dividing the alignment count at a given base by the
      total number of genome wide alignments for that read.  Repeat alignments are
      thus given fractional count values at a given location. Requires that the IH
      tag was set.
-b Path to a region bed file (tab delim: chr start stop ...) to use in calculating
      read coverage statistics.  Be sure these do not overlap! Run the MergeRegions app
      if in doubt.
-p Path to a file for saving per region coverage stats. Defaults to variant of -b.
-c Print regions that meet a minimum # counts, defaults to 0, don't print.
-l Print regions that also meet a minimum length, defaults to 0.
-o Path to log file.  Write coverage statistics to a log file instead of stdout.
-k Make average alignment length graph instead of read depth.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/Sam2USeq -f /Data/SamFiles/ -r
     -v H_sapiens_Feb_2009 -b ccdsExons.bed.gz 

**************************************************************************************

**************************************************************************************
**                            Sam Alignment Extractor: Jan 2013                     **
**************************************************************************************
Given a bed file containing regions of interest, parses all of the intersecting sam
alignments.

Options:
-a Alignment directory containing one or more xxx.bam files with their associated
       xxx.bai indexs sorted by coordinate.
-b A bed file (chr, start, stop,...), full path, see,
       http://genome.ucsc.edu/FAQ/FAQformat#format1
-s Optional File for saving extracted alignments, must end in .sam. Defaults to a
       permutation of the bed file.
-i Minimum read depth, defaults to 1
-x Maximum read depth, defaults to unlimited

Example: java -Xmx4G -jar pathTo/USeq/Apps/SamAlignmentExtractor -a
      /Data/ExonCaptureAlignmentsX1/ -b /Data/SNPCalls/9484X1Calls.bed.gz -x
      /Data/9484X1Calls.sam

**************************************************************************************

**************************************************************************************
**                             Sam Comparator  : July 2014                          **
**************************************************************************************
Compares coordinate sorted, unique, alignment sam/bam files.  Splits alignments into
those that match chrom and position or mismatch.

Required:
-a Full path sam/bam file name. zip/gz OK.
-b Full path sam/bam file name. zip/gz OK.
-s Full path to a directory to save the results.
-p Print paired mismatches to screen.

Example: java -Xmx10G -jar pathTo/USeq/Apps/SamComparator -a /hg19/ref.sam.gz
       -b /hg19/alt.sam.gz -s /hg19/SplitAlignments/

**************************************************************************************

**************************************************************************************
**                                Sam Parser: June 2013                             **
**************************************************************************************
Parses SAM and BAM files into alignment center position PointData xxx.bar files.
For RNASeq data, first run the SamTranscriptomeParser to convert splice junction
coordinates to genomic coordinates and set -m to 0 below.

Options:
-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-f The full path file or directory containing xxx.sam(.gz/.zip OK) or xxx.bam file(s).
      Multiple files will be merged.
-r Full path directory for saving the results.
-m Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect. For RNA-Seq data from the SamTranscriptomeParser, set this to 0.
-a Maximum alignment score. Defaults to 60, smaller numbers are more stringent.

Example: java -Xmx1500M -jar pathToUSeq/Apps/SamParser -f /Novo/Run7/
     -v C_elegans_May_2008 -m 0 -a 120  

**************************************************************************************

**************************************************************************************
**                          Sam Transcriptome Parser: Dec 2013                      **
**************************************************************************************
STP takes SAM alignment files that were aligned against chromosomes and extended
splice junctions (see MakeTranscriptome app), converts the coordinates to genomic
space and sorts and saves the alignments in BAM format. Although alignments don't need
to be sorted by chromosome and position, it is assumed all the alignments for a given
fragment are grouped together. 

Options:
-f The full path file or directory containing raw xxx.sam(.gz/.zip OK) file(s).
      Multiple files will be merged.

Default Options:
-s Save file, defaults to that inferred by -f. If an xxx.sam extension is provided,
      the alignments won't be sorted by coordinate or saved as a bam file.
-a Maximum alignment score. Defaults to 90, smaller numbers are more stringent.
      Approx 30pts per mismatch.
-m Minimum mapping quality score, defaults to 0 (no filtering), larger numbers are
      more stringent. Only applies to genomic matches, not splice junctions. Set to 13
      or more to require near unique alignments.
-x Maximum mapping quality, reset reads with a mapping quality greater than the max to
      this max.
-n Maximum number of locations each read may align, defaults to 1 (unique matches).
-d If the maximum number of locations threshold fails, save one randomly picked repeat
      alignment per read.
-r Reverse the strand of the second paired alignment. Reversing the strand is
      needed for proper same strand visualization of paired stranded Illumina data.
-b Reverse the strand of both pairs.  Use this option if you would like the orientation
      of the alignments to match the orientation of the annotation in Illumina stranded 
      UTP sequencing.
-u Save unmapped reads and those that fail the alignment score.
-c Don't remove chrAdapt and chrPhiX alignments.
-j Only print splice junction alignments, defaults to all.
-p Merge proper paired unique alignments. Those that cannot be unambiguously merged
      are left as pairs. Recommended to avoid double counting errors and increase
      base calling accuracy. For paired Illumina UTP data, use -p -r -b .
-q Maximum acceptable  base pair distance for merging, defaults to 300000.
-h Full path to a txt file containing a sam header, defaults to autogenerating the
      header from the read data.

Example: java -Xmx1500M -jar pathToUSeq/Apps/SamTranscriptomeParser -f /Novo/Run7/
     -m 20 -s /Novo/STPParsedBams/run7.bam -p -r 

**************************************************************************************

**************************************************************************************
**                                 Sam Fixer: August 2011                           **
**************************************************************************************
Parses, filters, merges, and fixes xxx.sam files.

Options:
-f The full path file or directory containing xxx.sam(.gz/.zip OK) file(s). Multiple 
      files will be merged.
-s Full path file name for saving the fixed sam file.

Default Options:
-m Minimum mapping quality score. Defaults to 0, bigger numbers are more stringent.
      This is a phred-scaled posterior probability that the mapping position of read
      is incorrect.
-a Maximum alignment score. Defaults to 1000, smaller numbers are more stringent.
-d Don't strip optional MD fields from alignments, defaults to removing these.
-u Remove unmapped reads.
-q Don't remove poor quality reads.
-c Convert splice-junctions to genomic coordinates, by providing a splice junction
      radius. Only works for single read RNA-Seq data where a splice junction fasta
      file was included in the alignments from the USeq MakeSpliceJunctionFasta app.
      This does NOT work for paired RNA-Seq data.

Example: java -Xmx1500M -jar pathToUSeq/Apps/SamParser -f /Novo/Run7/
     -m 20 -a 120 -s /Novo/Run7/mergedFixed.sam  -c 46 -u

**************************************************************************************

**************************************************************************************
**                              SamReadDepthSubSampler: Feb 2014                    **
**************************************************************************************
Filters, randomizes, subsamples each coordinate sorted bam alignment file to a target
base level read depth. Useful for reducing extreem read depths over localized areas.

Options:
-a Alignment file or directory containing coordinate sorted xxx.bam files. Each is 
      processed independently.
-t Target read depth.

Default Options:
-p Keep read groups together.  Causes greater variation in depth.
-x Maximum alignment score. Defaults to 300, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      For RNASeq data, set this to 0.

Example: java -Xmx25G -jar pathToUSeq/Apps/SamReadDepthSubSampler -x 240 -q 20 -a
      /Novo/Run7/ -n 100 

**************************************************************************************

**************************************************************************************
**                               Sam SV Filter: March 2014                          **
**************************************************************************************
Filters SAM records based on their intersection with a list of target regions for
structural variation analysis. Paired alignments are kept if they align to at least
one target region. These are split into those that align to different targets (span),
the same target with sufficient softmasking (soft), or one target and somewhere else
(single).

Options:
-a Alignment file or directory containing NAME sorted SAM/BAM files. Multiple files
       are processed independantly. Xxx.sam(.gz/.zip) or xxx.bam are OK. Assumes only
       uniquely aligned reads. Remove duplicates with Picard's MarkDuplicates app.
-s Save directory for the results.
-b Bed file (tab delim: chr, start, stop, ...) of target regions interbase coordinates.

Default Options:
-n Mark passing alignments as secondary. Needed for Delly with -n 30 novoalignments.
-d Don't coordinate sort and index alignments.
-x Maximum alignment score. Defaults to 1000, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 5, bigger numbers are more stringent.
-c Chromosomes to skip, defaults to 'chrAdap,chrPhi,chrM,random,chrUn'. Any SAM
       record chromosome name that contains one will be failed.
-m Minimum soft masked bases for keeping paired alignments intersecting the same
       target, defaults to 10

Example: java -Xmx25G -jar pathTo/USeq_xxx/Apps/SamSVFilter -x 150 -q 13 -a
      /Novo/Run7/ -s /Novo/Run7/SSVF/ -c 'chrPhi,_random,chrUn_' 

**************************************************************************************

**************************************************************************************
**                              SamSubsampler: July 2013                            **
**************************************************************************************
Filters, randomizes, subsamples and sorts sam/bam alignment files.

Options:
-a Alignment file or directory containing SAM/BAM (xxx.sam(.zip/.gz OK) or xxx.bam).
      Multiple files are merged.
-r Results directory.

Default Options:
-n Number of alignments to print, defaults to all passing thresholds.
-s Sort and index output alignments.
-x Maximum alignment score. Defaults to 300, smaller numbers are more stringent.
-q Minimum mapping quality score. Defaults to 13, bigger numbers are more stringent.
      For RNASeq data, set this to 0.

Example: java -Xmx25G -jar pathToUSeq/Apps/SamSubsampler -x 240 -q 20 -a
      /Novo/Run7/ -s /Novo/Run7/SR -n 10000000 

**************************************************************************************

**************************************************************************************
**                                  Scan Seqs: Feb 2012                             **
**************************************************************************************
Takes unshifted stranded chromosome specific PointData and uses a sliding window to
calculate several smoothed window statistics. These include a binomial p-value, a
q-value FDR, an empirical FDR, and a Bonferroni corrected binomial p-value for peak
shift strand skew. These are saved as heat map/ stairstep xxx.bar graph files for
direct viewing in the Integrated Genome Browser. The empFDR is only calculated when
scanning for enriched regions. Provide >2x the # of control reads relative to
treatment to prevent significant sub sampling when calculating the empFDR. If control
data is not provided, simple window sums are calculated.

Options:
-s Save directory, full path.
-t Treatment PointData directories, full path, comma delimited. These should
       contain unshifted stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories.
-c Control PointData directories, ditto. 
-p Peak shift, see the PeakShiftFinder app. Average distance between + and - strand
       peaks. Will be used to shift the PointData and set the window size.
-r Full path to R loaded with Storey's q-value library, defaults to '/usr/bin/R'
       file, see http://genomics.princeton.edu/storeylab/qvalue/

Advanced Options:
-w Window size, defaults to peak shift. A good alternative window size is the
       peak shift plus the standard deviation, see the PeakShiftFinder app.
-e Scan for both reduced and enriched regions, defaults to look for only enriched
       regions. This turns off the empFDR estimation.
-j Scan only one strand, defaults to both, enter either + or - 
-q Don't filter windows using q-value FDR threshold, save all to bar graphs,
       defaults to saving those with a q-value < 40%.
-m Minimum number reads in window, defaults to 2. Increasing this threshold will
       speed up processing considerably but compromises the q-value estimation.
-f Filter windows with high read control read counts. Don't use if looking for
       reduced regions.
-g Control window read count threshold, # stnd devs off median, defaults to 4.
-n Print point graph window representation xxx.bar files.
-a Number treatment observations to use in defining expect and ratio scalars.
-b Number control observations to use in defining expect and ratio scalars.
-u Use read score probabilities (assumes scores are > 0 and <= 1), defaults to
       assigning 1 to each read score. Experimental.

Example: java -Xmx4G -jar pathTo/USeq/Apps/ScanSeqs -t
      /Data/PolIIRep1/,/Data/PolIIRep2/ -c /Data/Input1/,Data/Input2/ -s
      /Data/PolIIResults -w 200 -p 100 -f -g 5 

**************************************************************************************

**************************************************************************************
**                         Shift Annotation Positions: Oct 2010                     **
**************************************************************************************
Uses the information in an xxx.shifter.txt file from the ConcatinateFastas app to
shift the annotation to match the coordinates of the concatinated sequence. Good for
working with poorly assembled genomes. Run this multiple times with different shifter
files. All files are assumed to use interbase coordinates.

Options:
-b Full path file name for a xxx.bed formatted annotation file.
-u (OR) Full path file name for a UCSC refflat/ refseq formatted gene table.
-s Full path file name for the xxx.shifter.txt file from the ConcatinateFastas app.

Example: java -Xmx4G -jar pathTo/USeq/Apps/ShiftAnnotationPositions 
    -u /zv8/ucscRefSeq.txt -f /zv8/BadFastas/chrScaffold.shifter.txt

**************************************************************************************

**************************************************************************************
**                                SoapV1Parser: Feb 2009                            **
**************************************************************************************
Splits and converts Soap version 1 alignment xxx.txt files into center position binary
PointData xxx.bar files. Interbase coordiantes (zero based, stop excluded).
These can be directly viewed in IGB.

-v Versioned Genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-f The full path directory/file text of your Soap xxx.txt(.zip or .gz) file(s).
-r Full path directory text for saving the results.
-x Maximum number of best matches, defaults to 1.
-m Miminum read length, defaults to 17.
-s Sum identical PointData positions. This should not be used for any downstream USeq
      applications, only for visualization.
-p Make read length histogram on reads that pass filters, defaults to all.

Example: java -Xmx1500M -jar pathToUSeq/Apps/SoapV1Parser -f /Soap/Run7/
     -v H_sapiens_Mar_2006 -x 5 -m 20

**************************************************************************************

**************************************************************************************
**                             Subtract Regions: May 2009                           **
**************************************************************************************
Removes regions and parts there of that intersect the masking region file.  Provide
tab delimited bed files (chr start stop ...). Assumes interbase coordinates.

Options:
-m Bed file to use in subtracting/ masking.
-d Directory containing bed files to mask.

Example: java -Xmx4000M -jar pathTo/Apps/SubtractRegions -d /Anno/TilingDesign/
       -m /Anno/repeatMaskerHg18.bed

************************************************************************************

**************************************************************************************
**                           Score Chromosomes: Oct  2012                           **
**************************************************************************************
SC scores chromosomes for the presence of transcription factor binding sites. Use the
following options:

-g The full path directory text to the split genomic sequences (i.e. chr2L.fasta, 
      chr3R.fasta...), FASTA format.
-t Full path file text for the FASTA file containing aligned trimmed examples of
      transcription factor binding sites.  A log likelihood position specific
      probability matrix will be generated from these sequences and used to scan the
      chromosomes for hits to the matrix.
-s Score cut off for the matrix. Defaults to the score of the lowest scoring sequence
      used in making the LLPSPM.
-p Print hits to screen, default is no.
-v Provide a versioned genome (ie H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases, if you would like to write graph LLPSPM
      scores in xxx.bar format for direct viewing in IGB.

Example: java -Xmx4000M -jar pathTo/T2/Apps/ScoreChromosomes -g /my/affy/Hg18Seqs/ -t 
      /my/affy/fgf8.fasta -s 4.9 -v H_sapiens_Mar_2006

**************************************************************************************

**************************************************************************************
**                           ScoreParsedBars: Sept 2008                             **
**************************************************************************************
For each region finds the underlying scores from the chromosome specific bar files.
Prints the scores as well as their mean . A p-value for each region's score can be
calculated using chromosome, interrogated region, length, # scores, and gc matched
random regions. Be sure to set the -u flag if your scores are log2 values.

-r Full path file text for your region file (tab delimited: chr start stop(inclusive)).
-b Full path directory text for the chromosome specific data xxx.bar files.
-o Bp offset to add to the position coordinates, defaults to 0.
-s Bp offset to add to the stop of each region, defaults to 0.
-u Unlog the bar values, set this flag if your scores are log2 transformed.
-g Estimate a p-value for the score associated with each region. Provide a full path
         directory text for chromosome specific gc content boolean arrays. See
         ConvertFasta2GCBoolean app. Complete option -i
-i If estimating p-values, provide a full path file text containing the interrogated
         regions (tab delimited: chr start stop ...) to use in drawing random regions.
-n Number of random region sets, defaults to 1000.
-d Don't print individual scores to screen.

Example: java -jar pathTo/Apps/ScoreParsedBars -b /BarFiles/Oligos/
       -r /Res/miRNARegions.bed -o -30 -s -60 -i /Res/interrRegions.bed
       -g /Genomes/Hg18/GCBooleans/

**************************************************************************************

**************************************************************************************
**                           Score Sequences: July 2007                             **
**************************************************************************************
SS scores sequences for the presence of transcription factor binding sites. Use the
following options:

-g The full path FASTA formatted file text for the sequence(s) to scan.
-t Full path file text for the FASTA file containing aligned trimmed examples of
      transcription factor binding sites.  A log likelihood position specific
      probability matrix will be generated from these sequences and used to scan the
      sequences for hits to the matrix.
-s Score cut off for the matrix. Defaults to zero.

Example: java -Xmx500M -jar pathTo/T2/Apps/ScoreSequences -g /my/affy/DmelSeqs.fasta
      -t /my/affy/zeste.fasta

**************************************************************************************

**************************************************************************************
**                               Sgr2Bar: Jan 2012                                  **
**************************************************************************************
Converts xxx.sgr(.zip) files to chromosome specific bar files.

-f The full path directory/file text for your xxx.sgr(.zip or .gz) file(s).
-v Genome version (ie H_sapiens_Mar_2006, M_musculus_Jul_2007), get from UCSC Browser.
-s Strand, defaults to '.', use '+', or '-'
-t Graphs should be viewed as a stair-step, defaults to bar

Example: java -Xmx1500M -jar pathTo/Apps/Sgr2Bar -f /affy/sgrFiles/ -s + -t
      -v D_rerio_Jul_2006

**************************************************************************************

**************************************************************************************
**                               Simulator: Nov 2008                                **
**************************************************************************************
Generates chIP-seq simulated sequences for aligning to a reference genome.

-f Directory containing xxx.fasta files with genomic sequence. File names should
     represent chromosome names (e.g. chr1.fasta, chrY.fasta...)
-r Results directory
-b Bed file containing repeat locations (e.g. RepeatMasker.bed)
-n Number of spike-ins, defaults to 1000
-g Number of random fragments to generate for each spike-in, defaults to 1000
-s Minimum size of a fragment, defaults to 150
-x Maximum size of a fragment, defaults to 350
-l Length of read, defaults to 26
-e Comma delimited text of per base % error rates, defaults to 0.5,0.528,0.556,...

Example: java -Xmx1500M -jar pathTo/USeq/Apps/Simulator -f /Hg18/Fastas -r /Spikes/
    -b /Hg18/Repeats/repMsker.bed -l 36

**************************************************************************************

**************************************************************************************
**                                 StandedBisSeq: Feb 2011                          **
**************************************************************************************
Looks for strand bias in CG methylation from one dataset using fischer or chi-square
tests followed by a Benjamini and Hochberg FDR correction. Merges significant CGs
within max gap into larger regions. WARNING: many bisulfite datasets display strand
bias due to preferential breakage of C rich regions.  Use this app with caution.

Options:
-s Save directory, full path.
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. These will be merged. Use the ParsePointDataContexts to filter
       for just CG contexts.
-n Non-converted PointData directories, ditto. 
-f Fasta files for each chromosome.

Default Options:
-p Minimimal FDR for stranded methylation, defaults to 30, a -10Log10(FDR = 0.001)
       conversion.
-l Log2Ratio threshold for stranded methylation, defaults to 1.585 (3x).
-w Window size, defaults to 500.
-m Minimum #C obs in window, defaults to 4. 
-o Minimum coverage for CG bp methylation scanning, defaults to 2.
-x Max gap between significant CGs to merge, defaults to 500bp.
-g Generate graph files for IGB, defaults to just identifying biased regions.
-r Full path to R, defaults to '/usr/bin/R'

Example: java -Xmx12G -jar pathTo/USeq/Apps/StandedBisSeq -c /Data/Sperm/Converted -n 
      /Data/Sperm/NonConverted -s /Data/Sperm/StrandedBisSeqRes -g -p 20 - l 1 -f
      /Genomes/Hg18/Fastas/ 

**************************************************************************************

**************************************************************************************
**                               SRA Processor: Nov 2013                            **
**************************************************************************************
Fetchs SRA files from the Sequence Read Archive and converts them to gzipped fastq.
Use in conjunction with Tomato to align these on the ember cluster. Be sure the SRA
archives you want are really in fastq format.

Required Parameters:
-n Names of SRRs (runs) or SRPs (projects) to fetch, comma delimited, no spaces.
       (e.g. SRR016669 or SRP000401).
-f Fastq-dump executable, full path, from the SRA Toolkit, download from
       http://www.ncbi.nlm.nih.gov/Traces/sra/?view=software
-s Save directory, full path.

Optional Parameters:
-c Full path to a cmd.txt file to copy into converted SRA folders. If the save
       directory is scanned by tomato, a tomato job is then launched,
       see http://bioserver.hci.utah.edu/BioInfo/index.php/Software:Tomato
-q Set quality score offset to 64, defaults to 33. Needed for some Illumina datasets.

Example: java -Xmx4G -jar pathTo/USeq/Apps/SRAProcessor -n SRP000401 /
      -s /tomato/job/Nix/SRP000401/ -f ~/sratoolkit.2.1.8-centos_linux64/fastq-dump
      -c /tomato/job/Nix/SRP000401/cmd.txt 

**************************************************************************************

**************************************************************************************
**                             SubSamplePointData:  Dec 2008                        **
**************************************************************************************
SSPD takes PointData directories and randomly selects points from each directory and
saves the merge.

-f Comma delimited full path PointDataDirectories from which to draw or a single 
       directory containing multiple PointDataDirectories.
-n Total number of observations desired.
-s Full path file directory in which to save the results.

Example: java -Xmx1500M -jar pathTo/USeq/Apps/SubSamplePointData -n 10000000 -f
    /Data/WCE1_Point,/Data/WCE2_Point,/Data/WCE3_Point -s /Data/Sub/ 

**************************************************************************************

**************************************************************************************
**                                Tag2Point: May 2010                               **
**************************************************************************************
Splits and converts tab delimited text (chr start stop ... strand (+ or -)) text
files into center position binary xxx.bar files. Use the appropriate options
to convert your coordinates into interbase coordiantes (zero based, stop excluded).

-v Versioned Genome (e.g. H_sapiens_Mar_2006), see UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases.
-i Strand column index, defaults to 5. 1st column is zero.
-b Subtract one from the beginning of each region.
-e Add one to the stop of each region.
-s Shift centered position x bps 3', defaults to 0.
-f The full path directory/file text of your text file(s) (.gz/.zip OK) .
-c Append 'chr' onto the chromosome column (your data lacks the prefix).

Example: java -Xmx1500M -jar pathTo/T2/Apps/Tag2Point -f /Solexa/BedFiles/
     -v H_sapiens_Mar_2006 -b 

**************************************************************************************

**************************************************************************************
**                                Text 2 USeq: June 2012                            **
**************************************************************************************
Converts text genomic data files (e.g. xxx.bed, xxx.gff, xxx.sgr, etc.) to
binary USeq archives (xxx.useq).  Assumes interbase coordinates. Only select
the columns that contain relevant information.  For example, if your data isn't
stranded, or you want to ignore strands, then skip the -s option.  If your data
doesn't have a value/ score then skip the -v option. Etc. Use the USeq2Text app to
convert back to text format. 

Required Parameters:
-f Full path file/directory containing tab delimited genomic data files.
-g Genome verison using DAS notation (e.g. H_sapiens_Mar_2006, M_musculus_Jul_2007),
      see http://genome.ucsc.edu/FAQ/FAQreleases#release1
-c Chromosome column index
-b Position/Beginning column index

Optional Parameters:
-s Strand column index (+, -, or .; NOT F, R)
-e End column index
-v Value column index
-t Text column index(s), comma delimited, no spaces, defines which columns
      to join using a tab.
-i Index size for slicing split chromosome data (e.g. # rows per slice),
      defaults to 10000.
-r For graphs, select a style, defaults to 0
      0	Bar
      1	Stairstep
      2	HeatMap
      3	Line
-h Color, hexadecimal (e.g. #6633FF), enclose in quotations
-d Description, enclose in quotations 
-p Prepend chr onto chromosome name.
-l Minus 10 Log10 transform values. Requires setting -v .
-m Convert chromosome names containing M to chrM .
-o Subtract one from beginning position.

Example: java -Xmx4G -jar pathTo/USeq/Apps/Text2USeq -f
      /AnalysisResults/BedFiles/ -c 0 -b 1 -e 2 -i 5000 -h '#6633FF'
      -d 'Final processed chIP-Seq results for Bcd and Hunchback, 30M reads'
      -g H_sapiens_Feb_2009 

Indexes for common formats:
       bed3 -c 0 -b 1 -e 2
       bed5 -c 0 -b 1 -e 2 -t 3 -v 4 -s 5
       bed12 -c 0 -b 1 -e 2 -t 3,6,7,8,9,10,11 -v 4 -s 5
       gff w/scr,stnd,name -c 0 -b 3 -e 4 -v 5 -s 6 -t 8

**************************************************************************************

Required Arguments:

-d Job directory. This directory must be a subdirectory of /tomato/version/job. Can
can be several directory levels below /tomato/version/job.
Example: '-d /tomato/version/job/krustofsky/demo'.
-e Email address. TomatoFarmer emails you once the job completes/fails. You can also
opt to get all tomato emails as individual jobs start/end (see option -x).
Example: '-e hershel.krustofsky@hci.utah.edu'.
-y Analysis pipeline. The analysis pipeline or step to run. Current options are:
Full Pipeline
1) exome_best - Full exome analysis, current core best practices.
2) exome_bwa_raw - Full exome analysis, using bwa and raw filtering. GATK
best practices
3) exome_bwa_vqsr - Full exome analysis, using bwa and vqsr filtering. GATK
best practices
A la carte
4) exome_align_bwa - Alignment/recalibration only (bwa).
5) exome_align_best - Align - core best practices.
6) exome_metrics - Sample QC metrics only. Requires *mate.bam and
*split.bam from one of 1-5 in the launch directory.
7) exome_variant_raw - Variant detection and filtering (raw settings).
Requires *reduced.bam from one of 1-5 in the launch directory.
8) exome_variant_vqsr - Variant detection and filtering (vqsr)
Requires *reduced.bam from one of 1-5 in the launch directory.
9) exome_variant_best - Variant detection using core best practices.
Example: '-y exome_bwa'.
-p Properties file. This file contains a list of cluster-specific paths and options
this file doesn't need to be changed by the user. Example: '-p properties.txt'

Optional Arguments:

-t Target regions. Setting this argument will restrict coverage metrics and variant
detection to targeted regions. This speeds up the variation detection process
and reduces noise. Options are:
1) AgilentAllExonV4
2) AgilentAllExonV5
3) AgilentAllExonV5UTR
4) AgilentAllExon50MB
5) NimbleGenEZCapV2
6) NimbleGenEZCapV3
7) TruSeq
8) path to custom targed bed file.
If nothing is specifed for this argument, the full genome will be queried for
variants and ccds exomes will be used for capture metrics. Example: '-t truseq'.
-g 1K Genome samples. Use this option if you want to spike in 200 1K genome samples
as the background sample set. This should improve VQSR variant calling and
VAAST, but it will take a lot more time to process. BETA, only works for core
users!
-w Wall time. Use this option followed by a new wall time, in hours, if you want less
wall time than the default 240 hours. Useful when there is upcoming CHPC
downtime. Example: '-w 40'.
-s Study name. Set this if you want your VCF files to have a prefix other than
'STUDY'. Example: '-s DEMO'.
-n No splitting. Set this option if you want to run variant calling on the entire
genome at one time. This is only suggested when you have a very small capture
region. By default, the genome is split by callable region, Example '-n'.
-x Unsuppress tomato emails. Receive both tomato and TomatoFarmer emails.
Example: '-x'.
-v Validate fastq files. TomatoFarmer will validate your fastq files before running This is required if any of your samples are ASCII-64

Example: java -Xmx4G -jar pathTo/USeq/Apps/TomatoFarmer -d /tomato/version/job/demo/
-e herschel.krustofsky@hci.utah.edu -y exome_bwa -s DEMO -c -t AgilentAllExon50MB

**************************************************************************************

**************************************************************************************
**                              Telescriptor:  Sept 2014                            **
**************************************************************************************
Compares two RNASeq datasets for possible telescripting. Generates a spreadsheet of
statistics for each gene as well as a variety of graphs in exonic bp space. The
ordering of A and B is important since A is window scanned to identify the maximal 5'
region. Thus A should be where you suspect telescripting, B where you do not.

Options:
-t Directory of bam files representing the first condition A.
-c Directory of bam files representing the second condition B.
-u UCSC refflat formatted Gene table. Run MergeUCSCGeneTable on a transcript table.
-s Director in which to save the results.
-r Full path to R, defaults to '/usr/bin/R', with installed ggplot2 package.

Default Options:
-g Minimum gene alignment count, defaults to 50
-a Minimum window + background alignment count, defaults to 25
-k Minimum Log2(ASkew/BSkew), defaults to 2. Set to 0 to print all.
-b Minimum base read coverage for log2Ratio graph output, defaults to 10
-l Minimum transcript exonic length, defaults to 250
-w Size of 5' window for scanning, defaults to 125
-f Fraction of exonic gene length to calculate background, defaults to 0.5
-i Data is not stranded, assumes both first and second reads follow annotation.

Example: java -Xmx4G -jar pathTo/USeq/Apps/Telescriptor -u hg19EnsTrans.ucsc -t Bam/T
       -c Bam/C -s GV_MOR 

**************************************************************************************

**************************************************************************************
**                              UCSC Big 2 USeq: Jan 2013                           **
**************************************************************************************
Converts UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives to xxx.useq archives.

Options:
-b Full path file/directory containing xxx.bw and xxx.bb files. Recurses through sub 
       if a directory is given.
-d Full path directory containing the UCSC bigWigToBedGraph, bigWigToWig, and 
       bigBedToBed apps, download from http://hgdownload.cse.ucsc.edu/admin/exe/ and
       make executable (e.g. chmod 755 /MyApps/UCSC/*).
-v Genome version (e.g. H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases or IGB 
      http://bioviz.org/igb/releases/current/igb-large.jnlp
-f Force conversion of xxx.bw or xxx.bb overwriting any existing xxx.useq archives.
       Defaults to skipping those already converted.
-e Only print error messages.

Example: java -Xmx4G -jar pathTo/USeq/Apps/USeq2UCSCBig -v M_musculus_Jul_2007 -u
      /AnalysisResults/USeqDataArchives/ -d /MyApps/UCSC/

**************************************************************************************

**************************************************************************************
**                              USeq 2 UCSC Big: Sept 2013                          **
**************************************************************************************
Converts USeq archives to UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives based on
the data type. WARNING: bigBed format conversion will clip any associated scores to
between 0-1000. 

Options:
-u Full path file/directory containing xxx.useq files. Recurses through sub 
       if a directory is given.
-d Full path directory containing the UCSC wigToBigWig and bedToBigBed apps, download
       from http://hgdownload.cse.ucsc.edu/admin/exe/ and make executable with chmod.
-f Force conversion of xxx.useq to xxx.bw or xxx.bb overwriting any UCSC big files.
       Defaults to skipping those already converted.
-e Only print error messages.
-t Sandbox the UCSC apps by providing a full path file name to the timeout.pl app.
       Download from https://github.com/pshved/timeout . Max time and mem per file 
       conversion 1hr and 4G.
-m Don't delete temp files.

Example: java -Xmx4G -jar pathTo/USeq/Apps/USeq2UCSCBig -u
      /AnalysisResults/USeqDataArchives/ -d /Apps/UCSC/

**************************************************************************************

**************************************************************************************
**                                USeq 2 Text: Oct 2012                             **
**************************************************************************************
Converts USeq archives to text either as minimal native, bed, or wig graph format. 


Options:
-f Full path file/directory containing xxx.useq files.
-b Print bed format, defaults to native text format.
-c Convert scores to bed format 0-1000.
-w Print wig graph format (var step or bed graph), defaults to native format.


Example: java -Xmx4G -jar pathTo/USeq/Apps/USeq2Text -f
      /AnalysisResults/USeqDataArchives/ 

**************************************************************************************

**************************************************************************************
**                              VCF Annotator : March 2013                          **
**************************************************************************************
VCFAnnotator adds user-specifed annotations to the VCF file INFO field.  Only hg19 is 
supported at this time.  If your VCF file has more than 500,000 records, it will be 
split into smaller VCF files that are annotated separately.  Once annotation is 
complete, the individual annotated files are merged and compressed.  This application 
uses a lot of memory when running large VCF files, so use 20gb of memory when starting
java.

Required:
-v VCF file. Path to a multi-sample vcf file, compressed ok (XXX.vcf/XXX.vcf.gz).
-o Output VCF file.  Path to the annotated vcf file, can be specifed as XXX.vcf or 
   XXX.vcf.gz. If XXX.vcf.gz, the file will be compressed and indexed using tabix.

Optional:
-d dbSNP database.  By default, this application uses dbSNP 137 for annotation. Use 
      this option along with dbSNP database identifier to use a different version, 
      i.e. snp129. The annovar-formatted dbSNP database must be in the annovar data 
      directory for this option to work.
-e Ethnicity.  By default, the 1K frequency is calculated across all ethnicities.  
      If you want to restrict it to one of EUR, AFR, ASN or AMR, use this option 
      followed by the ethnicity identifier.
-a Annotations to add.  By default, this application uses all available annovar 
      annotations.  Use a comma-separated list of keys to specify a custom set. 
      Available annotations with (keys): ensembl gene annotations (ENSEMBL), refSeq 
      gene names (REFSEQ), transcription factor binding sites (TFBS), segmental 
      duplicatons (SEGDUP), database of genomic variants (DGV), variant scores 
      (SCORES), GWAS catalog annotations (GWAS), dbsnp annotations (DBSNP), 1K 
      genomes annotations (ONEK), COSMIC annotations (COSMIC), ESP annotations (ESP),
      OMIM genes and diseases (OMIM), flagged VAAST genes (V-FLAG), ACMG genes (ACMG),
      and NIST callable ragions (NIST).  The SCORES option includes SIFT, PolyPhen2, 
      MutationTaster, MutationAssessor, LRT, GERP++, FATHMM, PhyloP and SiPhy.
      The ENSEMBL option includes the columns EnsemblRegion, EnsemblName, VarType and 
      VarDesc. 
-n VAAST output.  If a VAAST output file is specified, the VCF file is annotated with
      the VAAST variation score and gene rank.
-p Path to annovar directory.
-t Path to tabix directory.


Example: java -Xmx20G -jar pathTo/USeq/Apps/VCFAnnotator -v 9908R.vcf 
      -o 9908_ann.vcf.gz 

**************************************************************************************

**************************************************************************************
**                            VCF Comparator : August 2014                          **
**************************************************************************************
Compares test vcf file(s) against a gold standard key of trusted vcf calls. Only calls
that fall in the common interrogated regions are compared. WARNING tabix gzipped files
often fail to parse correctly with java. Seeing odd error messages? Try uncompressing.

Required Options:
-a VCF file for the key dataset (xxx.vcf(.gz/.zip OK)).
-b Bed file of interrogated regions for the key dataset (xxx.bed(.gz/.zip OK)).
-c VCF file for the test dataset (xxx.vcf(.gz/.zip OK)). May also provide a directory
       containing xxx.vcf(.gz/.zip OK) files to compare.
-d Bed file of interrogated regions for the test dataset (xxx.bed(.gz/.zip OK)).

Optional Options:
-g Require the genotype to match, defaults to scoring a match when the alternate
       allele is present.
-f Only require the position to match, don't consider the alt base or genotype.
-v Use VQSLOD score as ranking statistic in place of the QUAL score.
-s Only compare SNPs, defaults to all.
-n Only compare non SNPs, defaults to all.
-p Provide a full path directory for saving the parsed data. Defaults to not saving.
-e Exclude test and key records whose FILTER field is not . or PASS. Defaults to
       scoring all.

Example: java -Xmx10G -jar pathTo/USeq/Apps/VCFComparator -a /NIST/NA12878/key.vcf
       -b /NIST/NA12878/regions.bed.gz -c /EdgeBio/Exome/testHaploCaller.vcf.zip
       -d /EdgeBio/Exome/NimbleGenExomeV3.bed -g -v -s -e -p /CompRes/ 

**************************************************************************************

**************************************************************************************
**                            VCF Reporter: April 2013                              **
**************************************************************************************
This application takes a VCF file as input and returns either a modified VCF file or a
tab-delimited text file containing user-specified and optionally formatted INFO 
fields.  The modified VCF file is useful if you want to view annotations in IGV.  The
standard set of INFO fields is quite large and can't fit in the IGV window. The tab-
delimited text file allows the annotations to be viewing in Excel for easier sorting 
and filtering. If the number of VCF records is greater than 500,000, the reporting 
will be done in chunks.  The chunks are merged and compressed automatically at the end
of the application.

Required:
-v VCF file. Full path to a multi sample vcf file (xxx.vcf(.gz/.zip OK)).
-o Output file.  Full path to the output file. If xxx.txt is specified, output will 
      be a tab-delimited text file.  If xxx.vcf is specified, output will be an 
      uncompressed vcf file.  If xxx.vcf.gz is specifed, output will be a tabix 
      compressed and indexed vcf file.

Optional:
-d Desired Columns.  A comma-separated list of INFO-field names that will be reported
      in the output vcf.
-u Unwanted Columns. A comma-separated list of INFO-field names that will not be 
      reported in the output vcf.
-r Reporting Style. INFO field styles.  Only two styles are currently supported, 
      'unmodified' and 'short'. Unmodified is used by default.  Short truncates some
      of the longer fields, which help visibility in IGV.
-a Annotations only.  Report standard annotations using the 'short' reporting style.
      Skip INFO fields reported by GATK to reduce clutter.  The skipped fields are 
      used by GATK to determine variation quality and might not be useful to the 
      general user.
-x Damaging only.  Only report nonsynonymous, frameshift or splicing variants.
-p Path to tabix directory.  Set this variable if the application is not run on 
      moab/alta

Tab-delimited only options:
-k Generate key.  Text document that lists descriptions of each column in the output
      table.


Example: java -Xmx10G -jar pathTo/USeq/Apps/VCFReporter -v 9908R.vcf 
      -d SIFT,LRT,MT,MT_P -r short -o 9908.ann.txt 

**************************************************************************************

**************************************************************************************
**                            VCF Splice Annotator : May 2014                       **
**************************************************************************************
Scores variants for changes in splicing using the MaxEntScan algorithms. See Yeo and
Burge 2004, http://www.ncbi.nlm.nih.gov/pubmed/15285897 for details. Known splice
acceptors and donors are scored for loss of a junction.  Exonic, intronic, and splice
bases are scanned for novel junctions.  Note, indels in exons causing
frameshifts are not annotated. This app only looks for changes in splicing. Pvalues
for gained exonic or intronic splices are estimated by generating null distibutions
of masked sequence scores. Likewise, damaged splice pvalues are calculated using a
score distribution from scoring high confidence splice junctions. A detailed spread-
sheet and modified vcf file are generated. 

Required Options:
-r Save directory for writing results files.
-v VCF file or directory containing such (xxx.vcf(.gz/.zip OK)).
-f Fasta file directory, chromosome specific xxx.fa/.fasta(.zip/.gz OK) files.
-u UCSC RefFlat or RefSeq transcript (not merged genes) file, full path. See RefSeq 
       http://genome.ucsc.edu/cgi-bin/hgTables, (uniqueName1 name2(optional) chrom
       strand txStart txEnd cdsStart cdsEnd exonCount (commaDelimited)exonStarts
       (commaDelimited)exonEnds). Example: ENSG00000183888 C1orf64 chr1 + 16203317
       16207889 16203385 16205428 2 16203317,16205000 16203467,16207889 .
-m Full path directory name containing the me2x3acc1-9, splice5sequences and me2x5
       splice model files. See USeq/Documentation/ or 
       http://genes.mit.edu/burgelab/maxent/download/ 
-h Histogram object file for estimating pvalues or complete -j -t below. 

Optional options:
-e Don't scan exonic bases for novel splice junctions.
-i Don't scan intronic bases for novel splice junctions.
-s Don't scan known splice junctions for novel splice junctions.
-x Export category for adding info to the vcf file, defaults to 0:
       0 All types (gain or damaged in exon, intron, and splice)
       1 Just damaged splices
       2 Damaged splices and novel splices in exons and splice junctions
-a Minimum 5' threshold for scoring the presence of a splice junction, defaults to 4.
-b Minimum 3' threshold for scoring the presence of a splice junction, defaults to 4.
-c Minimum difference for loss or gain of a 5' splice junction, defaults to 4.
-d Minimum difference for loss or gain of a 3' splice junction, defaults to 4.
-p Minimum -10Log10(pvalue) for reporting a splice junction, defaults to 13.

Options for generating reusable splice score histograms:
-j Bam alignment splice file from running the SamTranscriptomeParser with -j to use
       in checking for actual use of the known splice junction prior to scoring it for
       inclusion in the known junction histograms. Besure to remove duplicates with the
       Picard MarkDuplicates app too.
-t CCDS transcript file, see -u above.
-k Minimum read coverage for known splice junction scoring, defaults to 10

Example: java -Xmx10G -jar ~/USeq/Apps/VCFSpliceAnnotator -f ~/Hg19/Fa/ -v ~/exm2.vcf
       -m ~/USeq/Documentation/splicemodels -i -u ~/Hg19/hg19EnsTrans.ucsc.zip -t
       ~/Hg19/hg19CCDSTrans.ucsc.zip -j ~/Hg19/allReadsSTP.bam -r ~/ExmSJAnno/

**************************************************************************************

**************************************************************************************
**                                VCFTabix: Jan 2013                                **
**************************************************************************************
Converts vcf files to a SAMTools compressed vcf tabix format. Recursive.

Required Options:
-v Full path file or directory containing xxx.vcf(.gz/.zip OK) file(s). Recursive!
-t Full path tabix directory containing the compiled bgzip and tabix executables. See
      http://sourceforge.net/projects/samtools/files/tabix/
-f Force overwriting of existing indexed vcf files, defaults to skipping.
-d Do not delete non gzipped vcf files after successful indexing, defaults to deleting.
-e Only print error messages.

Example: java -jar pathToUSeq/Apps/VCFTabix -v /VarScan2/VCFFiles/
     -t /Samtools/Tabix/tabix-0.2.6/ 

**************************************************************************************

**************************************************************************************
**                                Wig2Bar: Oct 2009                                 **
**************************************************************************************
Converts variable step and fixed step xxx.wig(.zip/.gz OK) files to chrom specific
bar files.

-f The full path directory/file text for your xxx.wig(.gz/.zip OK) file(s).
-v Genome version (ie H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases
-s Skip wig lines with designated value/score.

Example: java -Xmx1500M -jar pathTo/Apps/Wig2Bar -f /WigFiles/ -v hg18 -s 0.0 

**************************************************************************************

**************************************************************************************
**                                Wig 2 USeq: May 2012                              **
**************************************************************************************
Converts variable step, fixed step, and bedGraph xxx.wig/bedGraph4(.zip/.gz OK) files
into stair step/ heat map useq archives. Span parameters are not supported.

-f The full path directory/file text for your xxx.wig(.gz/.zip OK) file(s).
-v Genome version (e.g. H_sapiens_Mar_2006), get from UCSC Browser,
      http://genome.ucsc.edu/FAQ/FAQreleases
-s Skip wig lines with designated value/score.
-i Index size for slicing split chromosome data (e.g. # rows per file), defaults to
      100000.
-r Initial graph style, defaults to 1
      0	Bar
      1	Stairstep
      2	HeatMap
      3	Line
-h Initial graph color, hexadecimal (e.g. #6633FF), enclose in quotations!
-d Description, enclose in quotations! 
-p Prepend a 'chr' onto bedGraph chromosomes.

Example: java -Xmx1G -jar path2/Apps/Wig2USeq -f /WigFiles/ -v H_sapiens_Feb_2009

**************************************************************************************

**************************************************************************************
**                        Score Methylated Regions: Dec 2013                        **
**************************************************************************************
For each region finds the underlying methylation data. A p-value (Bon Corr) for each
region's fraction methylated (# nonConObs/ # totalObs) as well as a fold enrichment
can be calculated using regions randomly drawn matched by chromosome, region length,
# obs, and GC content.

Options:
-c Converted PointData directories, full path, comma delimited. These should
       contain stranded chromosome specific xxx_-/+_.bar.zip files. One
       can also provide a single directory that contains multiple PointData
       directories. See the NovoalignBisulfiteParser app.
-n Non-converted PointData directories, ditto. 
-r Full path file text for your region of interest file (tab delim: chr start stop).
-g To calculate p-values for methylation enrichment/ reduction,  provide a full path
         directory containing for chromosome specific gc content boolean arrays. See
         the ConvertFasta2GCBoolean app. Complete option -i
-i Likewise, to calculate p-values, also provide a full path file text containing the
         interrogated regions (tab delim: chr start stop ...) to use in drawing
         random regions.
-u Number of random region sets, defaults to 1000.
-m Minimum number of observations in a region to score, defaults to 10.
-o Minimum read coverage to count mC fraction, defaults to 8
-b Minimum number of Cs passing read coverage in region to score, defaults to 1
-p Print only regions that pass thresholds, defaults to all

Example: java -jar pathTo/Apps/ScoreMethylatedRegions -c /Data/Sperm/Converted -n 
      /Data/Sperm/NonConverted -r /Res/miRNARegions.bed -i /Res/interrRegions.bed
       -g /Genomes/Hg18/GCBooleanArrays/

**************************************************************************************

Advanced Options:
-x Max per base alignment depth, defaults to 50000. Genes containing such high
density coverage are ignored. Warnings are thrown.
-f Psuedocounts to each region. Defaults to 10

Example: java -Xmx10G -jar pathTo/USeq/Apps/ScoreEnrichedRegions -c
/Data/TimeCourse/ESCells/ -r regionOfInterest.bed -i sequencedRegions.bed -g gcContent/
-o resultsGoHere.txt

**************************************************************************************