Core Applications

Launch these applications using the -jar option, for example, 'java -jar pathTo/USeq/Apps/FileSplitter' .
Click the name to see the command line menu.

ChIP-Seq Apps
ChIP SeqWraps a variety of the following USeq apps to run a differential treatment vs control/ input ChIP-Seq analysis. It parses raw alignment files, filters duplicate reads, generates ReadCoverage tracks, estimates the peak shift using the PeakShiftFinder, and lastly, runs the MultipleReplicaScanSeqs and the EnrichedRegionMaker to identify ChIP-Seq peaks.
QC SeqsQCSeqs takes directories of chromosome specific PointData xxx.bar.zip files that represent replicas of signature sequencing data, merges the strands, uses a sliding window to sum the hits, and calculate Pearson correlation coefficients for the window sums between each pair of replicas.
Peak Shift FinderScans stranded point data for a skew in the distribution of peaks, a signature of a real chIP-seq peak. Calculates the bp difference.
Scan SeqsTakes stranded chromosome specific PointData and uses a sliding window to calculate several smoothed window statistics. These include a binomial p-value, a q-value FDR, an empirical FDR, and a Bonferroni corrected binomial p-value for flagging peak shift strand skew.
Multiple Replica Scan SeqsMRSS uses a sliding window and Ander's DESeq negative binomial pvalue -> Benjamini & Hochberg FDR statistics to identify enriched and reduced regions in a genome. Both treatment and control PointData sets are required, one or more biological replicas. MRSS generates window level differential count tracks for the FDR and normalized log2Ratio as well as a binary window object xxx.swi file for downstream use by the EnrichedRegionMaker. Lastly, MRSS also makes use of DESeq's variance corrected count data to cluster your biological replicas.
Enriched Region MakerCombines scored windows from ScanSeqs into larger Enriched Regions/ Binding Peaks given one or more thresholds. Can also be used to find the best peak within each Enriched Region.
Score Enriched RegionsDetermines if a set of regions specified by the user is more or less enriched than a randomly generated set of regions matched on chromosome, region and GC Content. The program checks each region individually and the dataset as a whole.
SimulatorGenerates simulated chIP-seq reads for alignment to a reference.
 
RNA-Seq Apps
RNA SeqWraps a variety of the following USeq apps to run a differential treatment vs control gene expression RNA-Seq analysis. It parses raw alignment files, generates ReadCoverage tracks, runs the MultipleReplicaScanSeqs and the EnrichedRegionMaker to identify novel transfrags, and identifies differential gene expression and splicing using the MultipleReplicaDefinedRegionScanSeqs.
Defined Region Differential SeqDRDS takes bam files representing a treatment and control experiment or a multiple condition/ time series and identifies differentially expressed genes under any pairwise comparison using Simon Anders DESeq. Alternative splicing is estimated using a chi-square test of independence.
DRDS AnnotatorThis application annotates DefinedRegionDifferentialSeq xlsx files using Ensembl biomart tab-delimited annotation files.
TelescriptorCompares two RNASeq datasets for possible telescripting. Generates a spreadsheet of statistics for each gene as well as a variety of graphs in exonic bp space
Allelic Expression DetectorApplication for identifying allelic expression based on a table of snps and bam alignments that have been filtered for alignment bias.
MiRNA CorrelatorGenerates a spreadsheet to use in comparing changing miRNA levels to changes in gene expression.
Make TranscriptomeMakes all known and theoretical splice junctions and transcripts from a table of gene models. Read through occurs with small exons to the next up or downstream. Include the splices fasta file along with the chromosome files when building a genome index for RNA-Seq alignments.
RNASeq SimulatorRSS takes novoalignments from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.
 
RNA Editing Apps
Inosine PredictIP estimates the likelihood of ADAR RNA editing using the multiplicative 4L,4R model described in Eggington et. al. 2010.
RNA Editing PileUp ParserParses a SAMTools mpileup output file for refseq A bases that show evidence of RNA editing via conversion to Gs, stranded. Base fraction editing is calculated for bases passing the thresholds for viewing in IGB and subsequent clustering with the RNAEditingScanSeqs app
RNA Editing ScanSeqsRESS attempts to identify clustered editing sites across a genome using a sliding window approach. Each window is scored for the pseudomedian of the base fraction edits as well as the probability that the observations occured by chance using a permutation test based on the chiSquare goodness of fit statistic.
Defined Region RNA EditingDRRE scores regions for the pseudomedian of the base fraction edits as well as the probability that the observations occured by chance using a permutation test based on the chiSquare goodness of fit statistic
 
Sam Alignment Apps
Sam Transcriptome ParserTakes SAM alignment files that were aligned against chromosomes and extended splice junctions (see MakeTranscriptome app) and converts the coordinates to genomic space.
Sam 2 USeqGenerates per base read depth stair-step xxx.useq graph files for visualization in IGB from SAM alignment files. Options are available for generating relative and stranded graphs.
Sam SV FilterFilters SAM records based on their intersection with a list of target regions for structural variation analysis. Paired alignments are kept if they align to at least one target region. These are split into those that align to different targets (span), the same target with sufficient softmasking (soft), or one target and somewhere else(single).
Merge Paired Sam AlignmentsMerges proper paired alignments that pass a variety of checks and thresholds. Usefull for avoiding non-independent variant observations and other double counting issues when reads overlap.
Alignment End TrimmerThis application can be used to trim alignments according to the density of mismatches.
Sam Alignment ExtractorParses all of the intersecting sam alignments to a given bed file of genomic regions.
Sam ComparatorCompares coordinate sorted, unique, alignment sam/bam files. Splits alignments with the same name into those that match chrom and position or mismatch. Use to remove alignment bias for allelic expression.
Sam Read Depth Sub SamplerFilters, randomizes, subsamples each coordinate sorted bam alignment file to a target base level read depth. Useful for reducing extreem read depths over localized areas.
Compare Parsed AlignmentsCompares two parsed alignments for a common distribution of snps using R's Fisher's Exact. Run the ParseIntersectingAlignments with the same snp table first.
Sam SubsamplerFilters, randomizes, subsamples and sorts sam/bam alignment files.
Calculate Per Cycle Error RateCalculates per cycle error rates provided a sorted indexed bam file and a fasta sequence file.
Sam ParserParses SAM and BAM files into center position binary PointData xxx.bar files.
 
VCF Apps
Multi Sample VCF FilterSplits vcf file(s) containing multiple sample records into those that pass and fail the user defined tests and sample level thresholds.
VCF ComparatorCompares test vcf file(s) against a gold standard key of trusted vcf calls.
VCF AnnotatorAdds a variety of ANNOVAR annotations to the INFO field of each vcf record.
VCF Splice AnnotatorScores vcf variants for gain or loss of splice junctions using the MaxEntScan algorithms.
VCF ReporterExports particular VCF record INFO to a spreadsheet or secondary VCF file.
VCFTabixConverts vcf files to a SAMTools compressed vcf tabix format.
 
Bisulfite Sequencing & Methylation Array Apps
Novoalign Bisulfite ParserParses Novoalign single and paired bisulfite sequence alignment files into xxx.bed and PointData file formats. Generates several summary statistics on converted and non-converted C contexts. Flattens overlapping reads in a pair to call consensus bps.
Bis StatTakes PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and generates several xxCxx context statistics and graphs for visualization in IGB. Estimates whether a given C is methylated using a binomial distribution.
Bis Stat Region MakerTakes serialized window objects from BisStat, thresholds based on the min and max fraction methylation params and prints regions in bed format meeting the criteria.
Bis SeqTakes two condition (treatment and control) PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and identifies differential methylated regions using either a fisher exact or a chi-square test p-values converted to B&H FDRs.
Defined Region Bis SeqTakes two condition (treatment and control) PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and scores regions defined by the user for differential methylated using either a fisher exact or a chi-square test p-values converted to B&H FDRs.
Stranded Bis SeqLooks for strand bias in CG methylation from one dataset using fischer/ chi-square tests followed by a Benjamini and Hochberg FDR correction. WARNING: many bisulfite datasets display strand bias due to preferential breakage of C rich strands. Use this app with caution.
Score Methylated RegionsFor each region fetches the underlying methylation data. A p-value for each region's fraction methylated as well as a fold enrichment over random can be calculated using randomly drawn regions matched by chromosome, region length, # obs, and GC content.
Parse PointData ContextsParses PointData for particular 5bp genomic sequence contexts. Useful for splitting data into CG and nonCG contexts.
BisSeq Aggregate PlotterBSAP merges bisulfite data over defined regions to generate data for class average agreggate plots of fraction methylation.
Bis Seq Error adderTakes PointData from the NovoalignBisulfiteParser and simulates a worse non-coversion rate by randomly picking converted observations and making them non-converted.
Allelic Methylation DetectorAMD identifies regions displaying allelic methylation, e.g. ~50% average mCG methylation yet individual read pairs show a bimodal fraction distribution of either fully methylated or unmethylated.
Methylation Array ScannerMAS takes paired and non-paired sample PointData representing beta values (0-1) from arrays and attempts to identify regions with enriched/ reduced signal using a sliding window approach.
Methylation Array Defined Region ScannerMADRS takes paired and non-paired sample PointData representing beta values (0-1) from arrays and scores user defined regions for differences in methylation.
 
Fasta and Fastq Apps
SRA ProcessorFetchs SRA files from the Sequence Read Archive and converts them to gzipped fastq. Can be used with tomato to launch alignments etc. on each fastq file.
Reference MutatorTakes a directory of fasta chromosome sequence files and converts the reference allele to the alternate provided by a snp mapping table.
Convert Fasta 2 GC BarGraphConverts fasta files into graph files containing a 1 over each C in a CpG context.
Mask Exons In Fasta FilesReplaces exonic sequence with Ns.
Mask Regions In Fasta FilesReplaces the region (or non region) sequence with Ns.
Convert Fasta A2GConverts all the a/A's to g/G's in fasta file(s) maintaining case.
Convert Fastq A2GConverts all the sequence A's to G's, case insensitive.
Fetch Genomic SequencesGiven a file containing genomic coordinates, fetches and saves the sequence.
Convert Fasta 2 GC BooleanConverts fasta sequences to GC boolean arrays for use by other applications.
Make Splice Junction FastaDepreciated, use MakeTranscriptome. MSJF creates a multi fasta file containing sequences representing all possible linear splice junctions.
Concatinate FastasConcatinates a directory of fasta files into a single sequence seperated by a defined number of Ns. Use this app to create artificial chromosomes for poorly assembled genomes.
Shift Annotation PositionsUses the information in an xxx.shifter.txt file from the ConcatinateFastas app to shift the annotation to match the coordinates of the concatinated sequence.
 
Region Analysis Apps
Intersect RegionsPerforms an intersection analysis on lists of genomic regions, uses random regions matched for GC content, length, array interrogated regions, and chromosome to calculate an enrichment over random and p-value. Also generates a distance to nearest region distribution histogram.
Compare Intersecting RegionsCompares test region file(s) against a master set of regions for intersection. Reports the results as columns relative to the master.
Filter Intersecting RegionsSorts a bed file into those regions that intersect or don't intersect any regions in a second bed file. Fast.
Merge RegionsMerges genomic regions. Good for collapsing duplicate regions for creating microarray seq capture designs.
Subtract RegionsRemoves regions from a genomic regions file. Good for RepeatMasking genomic regions prior to tiling.
Parse Intersecting AlignmentsParses bed alignment files for intersecting reads provided another bed file of regions of interest (e.g. alleles)
Find Shared RegionsWrites out a bed file of shared regions.
Bed StatsCalculates several statistics on bed files where the name column contains a short read sequence. This includes a read length distribution and frequencies of the 1st and last bp.
Intersect Key With RegionsIR intersects lists of genomic regions with a key of known regions. Generates TPR, FPR, FDR, etc at each threshold.
 
General Analysis Apps
Aggregate PlotterFetches point data contained within each region, zeros the coordinates, scales, sums, and window averages the values. Useful for generating class averages from a list of annotated regions. Use a spreadsheet app to graph the results.
Intersect ListsIntersects two lists (of gene names) and using randomization, calculates the significance of the intersection and the fold enrichment over random.
Ranked Set AnalysisPerforms an intersection analysis on lists of ranked regions creating a visual box-line-box representation as well as a rank based % intersection graph.
Correlation MapsCM creates correlation maps from gene expression data to look for physical gene clusters (aka gene expression neighborhoods, chromosome territories).
Correlate PointDataCalculates a Pearson Correlation Coefficient on the values of PointData found with the same positions in the two datasets.
Score ChromosomesScores a genome for hits to a transcription factor binding matrix, LLPSPM.
Score SequencesScores a multi-FASTA file of sequences for hits to a transcription factor binding matrix, LLPSPM.
Score Parsed BarsGiven a list of regions and a directory of graph data in bar format, extracts all the values under each region, calculates their mean and compares it to a random background model to generate a p-value for the associated means.
Kegg Pathway EnrichmentLooks for overrepresentation of genes from a user's list in Kegg pathways using a random permutation test.
Max Ent Scan Score3Implementation of Max Ent Scan's score3 algorithm for human splice site detection. See Yeo and Burge 2004.
Max Ent Scan Score5Implementation of Max Ent Scan's score5 algorithm for human splice site detection. See Yeo and Burge 2004.
Microsatellite CounterMicrosatelliteCounter identifies and counts microsatellite repeats in MiSeq fastq files.
TomatoFarmerTomatoFarmer controls an exome analysis from start to finish. It creates alignment jobs for each of the samples in your directory, waits for all jobs to finish and then launches metrics and variant calling jobs.
Parse Exon MetricsThis script runs a bunch of summary metric programs and compiles the results. It uses R and LaTex to generate a fancy pdf as an output.
 
PointData Manipulation Apps
Filter Point DataFPD drops observations from PointData that intersect a list of regions (e.g. repeats).
Sub Sample Point DataCreates a random sub sampling of Point Data. Useful for matching treatment and control datasets for visualization.
Point Data ManipulatorManipulates point data to merge datasets, merge strands, shift base positions, replace scores with 1, and sum identical positions.
Merge Point DataEfficiently merges PointData, collapsing by position and possibly strand. Identical position scores are either summed or converted into counts
 
UCSC Gene Table Apps
Find Neighboring GenesFNG takes a list of genes in UCSC Gene Table format and intersects them with a list of regions finding the closest gene to each region as well as all of the genes that fall within a given neighborhood.
Export ExonsTakes a UCSC gene table and exports the exons to a bed file +/- a bp buffer.
Export Intergenic RegionsTakes a GFF file and exports regions not covered by any annotation, the intergenic regions.
Export Intronic RegionsTakes a UCSC gene table and exports the most conservative estimate of intronic sequence.
Merge UCSC Gene TableMerges transcript models that share the same gene name. Maximizes exons, minimizes introns.
 
Misc Analysis Apps
AllelerIntersects a list of alleles (SNPs and INDELs) with gene models and returns their effects on coding sequences (synonymous/ non-synonymous/ frame shift) and splice-junctions.
Oligo TilerTiles oligos across genomic regions returning their forward and reverse sequences. Good for creating microarray seq capture designs.
Primer 3 WrapperWrapper for the primer3 (http://frodo.wi.mit.edu/primer3/) application. Extracts sequence, formats for primer3, executes, and parses the output to a spreadsheet. Useful for bulk qPCR primer picking. Yes, you do need to validate your results.
 
Converter Apps
Maq Snps 2 BedConverts a Maq snp text file (1 based coordinates) into a bed file (interbase coordinates).
Text 2 USeqConverts genomic data text files (xxx.bed, xxx.gff, xxx.sgr, etc) to ultra compact, indexed, USeq binary files.
Wig 2 USeqConverts UCSC variable step, fixed step, and bedGraph xxx.wig/bedGraph4(.zip/.gz OK) files into stair step/ heat map useq archives.
UCSC Big 2 USeqConverts UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives to USeq archives (xxx.useq).
USeq 2 UCSC BigConverts USeq archives to UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives based on the data type.
USeq 2 TextConverts USeq binary files to native, bed, or wig formats.
Bar 2 USeqConverts directories of xxx.bar files into useq archives, recursive.
Bar 2 GrConverts xxx.bar files to text xxx.gr files.
Gr 2 BarConverts chromosome specific xxx.gr.zip files (position score) to binary chromosome specific xxx.bar files.
Sgr 2 BarConverts xxx.sgr.zip files (chr position score) to binary chromosome specific xxx.bar files.
Bed 2 BarConverts xxx.bed files (chr start stop... name score strand) into binary chromosome specific xxx.bar file graphs. Can also generate a merged composite bed file thresholded at the sum of the overlapping bed scores.
Wig 2 BarConverts variable step and fixed step xxx.wig(Var) files to chrom specific bar files.
Graph 2 BedConverts USeq stair step and heat map graphs into region bed files using a threshold.
 
Utility Apps
Randomize Text FileRandomizes the lines in a text file. Good for randomly selecting reverse oligos to pad a microarray design.
File Cross FilterFCF take a column in the matcher file and uses it to parse the rows from other files. Useful for pulling out and printing in order the rows that match the first file.
File Match JoinerFMJ loads a file and a particular column containing unique entries, a key, and then appends the key line to lines in the parsed file that match a particular column. Useful for appending chromosome coordinates to snp data based on a common ID, etc.
File JoinerJoins many text files together, paying attention to avoid fusing the last and first lines from two files, a major headache with cat * >>
File SplitterSplits a text file into many files containing a given number of lines.
Print Select ColumnsSpread sheet/ tab delimited file manipulations.
 
Odd Alignment Parser Apps
Eland ParserSplits and converts stand alone or Solexa pipeline Eland Extended xxx_export.txt and xxx_sorted.txt alignment files into center position alignment scored binary PointData xxx.bar files.
Eland Multi ParserParses an Eland xxx.eland_multi.txt file tabulating hits to each fasta entry. Good for scoring hits to a transcriptome where every fasta entry represents a different gene.
Eland Sequencing ParserGenerates sequencing summary tracks for each base and a called consensus from Eland export and sorted alignment files.
Novoalign ParserParses Novoalign files into center position binary PointData xxx.bar files for USeq analysis and xxx.bed files.
Novoalign Paired ParserParses paired Novoalign files into 12 column xxx.bed files. Useful for visualizing sequenced fragments.
Novoalign Indel ParserParses Novoalign alignment files for consensus indels, something currently not supported by the maq apps. See Alleler app.
Soap V1 ParserParses a Soap version 1 alignment txt file into PointData, split by chromosome and strand.
Tag 2 PointSplits and converts tab delimited text (chr start stop ... strand, e.g. xxx.bed) text files into center position binary xxx.bar PointData files used by ScanSeqs and IGB for analysis and visualization respectively. Very fast loading. Small size.
Filter Duplicate AlignmentsFilters identical reads from alignment files by random subsampling, finding the best, or limiting to a max number. Useful for minimizing PCR amplification artifacts in sequencing data.