Core Applications

Launch these applications using the -jar option, for example, 'java -jar pathTo/USeq/Apps/FileSplitter' .
Click the name to see the command line menu.

ChIP-Seq Apps

ChIP Seq Wraps a variety of the following USeq apps to run a differential treatment vs control/ input ChIP-Seq analysis. It parses raw alignment files, filters duplicate reads, generates ReadCoverage tracks, estimates the peak shift using the PeakShiftFinder, and lastly, runs the MultipleReplicaScanSeqs and the EnrichedRegionMaker to identify ChIP-Seq peaks.

QC Seqs QCSeqs takes directories of chromosome specific PointData xxx.bar.zip files that represent replicas of signature sequencing data, merges the strands, uses a sliding window to sum the hits, and calculate Pearson correlation coefficients for the window sums between each pair of replicas.

Peak Shift Finder Scans stranded point data for a skew in the distribution of peaks, a signature of a real chIP-seq peak. Calculates the bp difference.

Scan Seqs Takes stranded chromosome specific PointData and uses a sliding window to calculate several smoothed window statistics. These include a binomial p-value, a q-value FDR, an empirical FDR, and a Bonferroni corrected binomial p-value for flagging peak shift strand skew.

Multiple Replica Scan Seqs MRSS uses a sliding window and Ander's DESeq negative binomial pvalue -> Benjamini & Hochberg FDR statistics to identify enriched and reduced regions in a genome. Both treatment and control PointData sets are required, one or more biological replicas. MRSS generates window level differential count tracks for the FDR and normalized log2Ratio as well as a binary window object xxx.swi file for downstream use by the EnrichedRegionMaker. Lastly, MRSS also makes use of DESeq's variance corrected count data to cluster your biological replicas.

Enriched Region Maker Combines scored windows from ScanSeqs into larger Enriched Regions/ Binding Peaks given one or more thresholds. Can also be used to find the best peak within each Enriched Region.

Score Enriched Regions Determines if a set of regions specified by the user is more or less enriched than a randomly generated set of regions matched on chromosome, region and GC Content. The program checks each region individually and the dataset as a whole.

Simulator Generates simulated chIP-seq reads for alignment to a reference.

RNA-Seq Apps

RNA Seq Wraps a variety of the following USeq apps to run a differential treatment vs control gene expression RNA-Seq analysis. It parses raw alignment files, generates ReadCoverage tracks, runs the MultipleReplicaScanSeqs and the EnrichedRegionMaker to identify novel transfrags, and identifies differential gene expression and splicing using the MultipleReplicaDefinedRegionScanSeqs.

Defined Region Differential Seq DRDS takes bam files representing a treatment and control experiment or a multiple condition/ time series and identifies differentially expressed genes under any pairwise comparison using Simon Anders DESeq. Alternative splicing is estimated using a chi-square test of independence.

DRDS Annotator This application annotates DefinedRegionDifferentialSeq xlsx files using Ensembl biomart tab-delimited annotation files.

Telescriptor Compares two RNASeq datasets for possible telescripting. Generates a spreadsheet of statistics for each gene as well as a variety of graphs in exonic bp space

Allelic Expression Detector Application for identifying allelic expression based on a table of snps and bam alignments that have been filtered for alignment bias.

MiRNA Correlator Generates a spreadsheet to use in comparing changing miRNA levels to changes in gene expression.

Make Transcriptome Makes all known and theoretical splice junctions and transcripts from a table of gene models. Read through occurs with small exons to the next up or downstream. Include the splices fasta file along with the chromosome files when building a genome index for RNA-Seq alignments.

RNASeq Simulator RSS takes novoalignments from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.

RNA Editing Apps

Inosine Predict IP estimates the likelihood of ADAR RNA editing using the multiplicative 4L,4R model described in Eggington et. al. 2010.

RNA Editing PileUp Parser Parses a SAMTools mpileup output file for refseq A bases that show evidence of RNA editing via conversion to Gs, stranded. Base fraction editing is calculated for bases passing the thresholds for viewing in IGB and subsequent clustering with the RNAEditingScanSeqs app

RNA Editing ScanSeqs RESS attempts to identify clustered editing sites across a genome using a sliding window approach. Each window is scored for the pseudomedian of the base fraction edits as well as the probability that the observations occured by chance using a permutation test based on the chiSquare goodness of fit statistic.

Defined Region RNA Editing DRRE scores regions for the pseudomedian of the base fraction edits as well as the probability that the observations occured by chance using a permutation test based on the chiSquare goodness of fit statistic

Sam Alignment Apps

Sam Transcriptome Parser Takes SAM alignment files that were aligned against chromosomes and extended splice junctions (see MakeTranscriptome app) and converts the coordinates to genomic space.

Sam 2 USeq Generates per base read depth stair-step xxx.useq graph files for visualization in IGB from SAM alignment files. Options are available for generating relative and stranded graphs.

Sam SV Filter Filters SAM records based on their intersection with a list of target regions for structural variation analysis. Paired alignments are kept if they align to at least one target region. These are split into those that align to different targets (span), the same target with sufficient softmasking (soft), or one target and somewhere else(single).

Merge Paired Sam Alignments Merges proper paired alignments that pass a variety of checks and thresholds. Usefull for avoiding non-independent variant observations and other double counting issues when reads overlap.

Alignment End Trimmer This application can be used to trim alignments according to the density of mismatches.

Sam Alignment Extractor Parses all of the intersecting sam alignments to a given bed file of genomic regions.

Sam Comparator Compares coordinate sorted, unique, alignment sam/bam files. Splits alignments with the same name into those that match chrom and position or mismatch. Use to remove alignment bias for allelic expression.

Sam Read Depth Sub Sampler Filters, randomizes, subsamples each coordinate sorted bam alignment file to a target base level read depth. Useful for reducing extreem read depths over localized areas.

Compare Parsed Alignments Compares two parsed alignments for a common distribution of snps using R's Fisher's Exact. Run the ParseIntersectingAlignments with the same snp table first.

Sam Subsampler Filters, randomizes, subsamples and sorts sam/bam alignment files.

Calculate Per Cycle Error Rate Calculates per cycle error rates provided a sorted indexed bam file and a fasta sequence file.

Sam Parser Parses SAM and BAM files into center position binary PointData xxx.bar files.

VCF Apps

Multi Sample VCF Filter Splits vcf file(s) containing multiple sample records into those that pass and fail the user defined tests and sample level thresholds.

VCF Comparator Compares test vcf file(s) against a gold standard key of trusted vcf calls.

VCF Annotator Adds a variety of ANNOVAR annotations to the INFO field of each vcf record.

VCF Splice Annotator Scores vcf variants for gain or loss of splice junctions using the MaxEntScan algorithms.

VCF Reporter Exports particular VCF record INFO to a spreadsheet or secondary VCF file.

VCFTabix Converts vcf files to a SAMTools compressed vcf tabix format.

Bisulfite Sequencing & Methylation Array Apps

Novoalign Bisulfite Parser Parses Novoalign single and paired bisulfite sequence alignment files into xxx.bed and PointData file formats. Generates several summary statistics on converted and non-converted C contexts. Flattens overlapping reads in a pair to call consensus bps.

Bis Stat Takes PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and generates several xxCxx context statistics and graphs for visualization in IGB. Estimates whether a given C is methylated using a binomial distribution.

Bis Stat Region Maker Takes serialized window objects from BisStat, thresholds based on the min and max fraction methylation params and prints regions in bed format meeting the criteria.

Bis Seq Takes two condition (treatment and control) PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and identifies differential methylated regions using either a fisher exact or a chi-square test p-values converted to B&H FDRs.

Defined Region Bis Seq Takes two condition (treatment and control) PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and scores regions defined by the user for differential methylated using either a fisher exact or a chi-square test p-values converted to B&H FDRs.

Stranded Bis Seq Looks for strand bias in CG methylation from one dataset using fischer/ chi-square tests followed by a Benjamini and Hochberg FDR correction. WARNING: many bisulfite datasets display strand bias due to preferential breakage of C rich strands. Use this app with caution.

Score Methylated Regions For each region fetches the underlying methylation data. A p-value for each region's fraction methylated as well as a fold enrichment over random can be calculated using randomly drawn regions matched by chromosome, region length, # obs, and GC content.

Parse PointData Contexts Parses PointData for particular 5bp genomic sequence contexts. Useful for splitting data into CG and nonCG contexts.

BisSeq Aggregate Plotter BSAP merges bisulfite data over defined regions to generate data for class average agreggate plots of fraction methylation.

Bis Seq Error adder Takes PointData from the NovoalignBisulfiteParser and simulates a worse non-coversion rate by randomly picking converted observations and making them non-converted.

Allelic Methylation Detector AMD identifies regions displaying allelic methylation, e.g. ~50% average mCG methylation yet individual read pairs show a bimodal fraction distribution of either fully methylated or unmethylated.

Methylation Array Scanner MAS takes paired and non-paired sample PointData representing beta values (0-1) from arrays and attempts to identify regions with enriched/ reduced signal using a sliding window approach.

Methylation Array Defined Region Scanner MADRS takes paired and non-paired sample PointData representing beta values (0-1) from arrays and scores user defined regions for differences in methylation.

Fasta and Fastq Apps

SRA Processor Fetchs SRA files from the Sequence Read Archive and converts them to gzipped fastq. Can be used with tomato to launch alignments etc. on each fastq file.

Reference Mutator Takes a directory of fasta chromosome sequence files and converts the reference allele to the alternate provided by a snp mapping table.

Convert Fasta 2 GC BarGraph Converts fasta files into graph files containing a 1 over each C in a CpG context.

Mask Exons In Fasta Files Replaces exonic sequence with Ns.

Mask Regions In Fasta Files Replaces the region (or non region) sequence with Ns.

Convert Fasta A2G Converts all the a/A's to g/G's in fasta file(s) maintaining case.

Convert Fastq A2G Converts all the sequence A's to G's, case insensitive.

Fetch Genomic Sequences Given a file containing genomic coordinates, fetches and saves the sequence.

Convert Fasta 2 GC Boolean Converts fasta sequences to GC boolean arrays for use by other applications.

Make Splice Junction Fasta Depreciated, use MakeTranscriptome. MSJF creates a multi fasta file containing sequences representing all possible linear splice junctions.

Concatinate Fastas Concatinates a directory of fasta files into a single sequence seperated by a defined number of Ns. Use this app to create artificial chromosomes for poorly assembled genomes.

Shift Annotation Positions Uses the information in an xxx.shifter.txt file from the ConcatinateFastas app to shift the annotation to match the coordinates of the concatinated sequence.

Region Analysis Apps

Intersect Regions Performs an intersection analysis on lists of genomic regions, uses random regions matched for GC content, length, array interrogated regions, and chromosome to calculate an enrichment over random and p-value. Also generates a distance to nearest region distribution histogram.

Compare Intersecting Regions Compares test region file(s) against a master set of regions for intersection. Reports the results as columns relative to the master.

Filter Intersecting Regions Sorts a bed file into those regions that intersect or don't intersect any regions in a second bed file. Fast.

Merge Regions Merges genomic regions. Good for collapsing duplicate regions for creating microarray seq capture designs.

Subtract Regions Removes regions from a genomic regions file. Good for RepeatMasking genomic regions prior to tiling.

Parse Intersecting Alignments Parses bed alignment files for intersecting reads provided another bed file of regions of interest (e.g. alleles)

Find Shared Regions Writes out a bed file of shared regions.

Bed Stats Calculates several statistics on bed files where the name column contains a short read sequence. This includes a read length distribution and frequencies of the 1st and last bp.

Intersect Key With Regions IR intersects lists of genomic regions with a key of known regions. Generates TPR, FPR, FDR, etc at each threshold.

General Analysis Apps

Aggregate Plotter Fetches point data contained within each region, zeros the coordinates, scales, sums, and window averages the values. Useful for generating class averages from a list of annotated regions. Use a spreadsheet app to graph the results.

Intersect Lists Intersects two lists (of gene names) and using randomization, calculates the significance of the intersection and the fold enrichment over random.

Ranked Set Analysis Performs an intersection analysis on lists of ranked regions creating a visual box-line-box representation as well as a rank based % intersection graph.

Correlation Maps CM creates correlation maps from gene expression data to look for physical gene clusters (aka gene expression neighborhoods, chromosome territories).

Correlate PointData Calculates a Pearson Correlation Coefficient on the values of PointData found with the same positions in the two datasets.

Score Chromosomes Scores a genome for hits to a transcription factor binding matrix, LLPSPM.

Score Sequences Scores a multi-FASTA file of sequences for hits to a transcription factor binding matrix, LLPSPM.

Score Parsed Bars Given a list of regions and a directory of graph data in bar format, extracts all the values under each region, calculates their mean and compares it to a random background model to generate a p-value for the associated means.

Kegg Pathway Enrichment Looks for overrepresentation of genes from a user's list in Kegg pathways using a random permutation test.

Max Ent Scan Score3 Implementation of Max Ent Scan's score3 algorithm for human splice site detection. See Yeo and Burge 2004.

Max Ent Scan Score5 Implementation of Max Ent Scan's score5 algorithm for human splice site detection. See Yeo and Burge 2004.

Microsatellite Counter MicrosatelliteCounter identifies and counts microsatellite repeats in MiSeq fastq files.

TomatoFarmer TomatoFarmer controls an exome analysis from start to finish. It creates alignment jobs for each of the samples in your directory, waits for all jobs to finish and then launches metrics and variant calling jobs.

Parse Exon Metrics This script runs a bunch of summary metric programs and compiles the results. It uses R and LaTex to generate a fancy pdf as an output.

PointData Manipulation Apps

Filter Point Data FPD drops observations from PointData that intersect a list of regions (e.g. repeats).

Sub Sample Point Data Creates a random sub sampling of Point Data. Useful for matching treatment and control datasets for visualization.

Point Data Manipulator Manipulates point data to merge datasets, merge strands, shift base positions, replace scores with 1, and sum identical positions.

Merge Point Data Efficiently merges PointData, collapsing by position and possibly strand. Identical position scores are either summed or converted into counts

UCSC Gene Table Apps

Find Neighboring Genes FNG takes a list of genes in UCSC Gene Table format and intersects them with a list of regions finding the closest gene to each region as well as all of the genes that fall within a given neighborhood.

Export Exons Takes a UCSC gene table and exports the exons to a bed file +/- a bp buffer.

Export Intergenic Regions Takes a GFF file and exports regions not covered by any annotation, the intergenic regions.

Export Intronic Regions Takes a UCSC gene table and exports the most conservative estimate of intronic sequence.

Merge UCSC Gene Table Merges transcript models that share the same gene name. Maximizes exons, minimizes introns.

Misc Analysis Apps

Alleler Intersects a list of alleles (SNPs and INDELs) with gene models and returns their effects on coding sequences (synonymous/ non-synonymous/ frame shift) and splice-junctions.

Oligo Tiler Tiles oligos across genomic regions returning their forward and reverse sequences. Good for creating microarray seq capture designs.

Primer 3 Wrapper Wrapper for the primer3 (http://frodo.wi.mit.edu/primer3/) application. Extracts sequence, formats for primer3, executes, and parses the output to a spreadsheet. Useful for bulk qPCR primer picking. Yes, you do need to validate your results.

Converter Apps

Maq Snps 2 Bed Converts a Maq snp text file (1 based coordinates) into a bed file (interbase coordinates).

Text 2 USeq Converts genomic data text files (xxx.bed, xxx.gff, xxx.sgr, etc) to ultra compact, indexed, USeq binary files.

Wig 2 USeq Converts UCSC variable step, fixed step, and bedGraph xxx.wig/bedGraph4(.zip/.gz OK) files into stair step/ heat map useq archives.

UCSC Big 2 USeq Converts UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives to USeq archives (xxx.useq).

USeq 2 UCSC Big Converts USeq archives to UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives based on the data type.

USeq 2 Text Converts USeq binary files to native, bed, or wig formats.

Bar 2 USeq Converts directories of xxx.bar files into useq archives, recursive.

Bar 2 Gr Converts xxx.bar files to text xxx.gr files.

Gr 2 Bar Converts chromosome specific xxx.gr.zip files (position score) to binary chromosome specific xxx.bar files.

Sgr 2 Bar Converts xxx.sgr.zip files (chr position score) to binary chromosome specific xxx.bar files.

Bed 2 Bar Converts xxx.bed files (chr start stop... name score strand) into binary chromosome specific xxx.bar file graphs. Can also generate a merged composite bed file thresholded at the sum of the overlapping bed scores.

Wig 2 Bar Converts variable step and fixed step xxx.wig(Var) files to chrom specific bar files.

Graph 2 Bed Converts USeq stair step and heat map graphs into region bed files using a threshold.

Utility Apps

Randomize Text File Randomizes the lines in a text file. Good for randomly selecting reverse oligos to pad a microarray design.

File Cross Filter FCF take a column in the matcher file and uses it to parse the rows from other files. Useful for pulling out and printing in order the rows that match the first file.

File Match Joiner FMJ loads a file and a particular column containing unique entries, a key, and then appends the key line to lines in the parsed file that match a particular column. Useful for appending chromosome coordinates to snp data based on a common ID, etc.

File Joiner Joins many text files together, paying attention to avoid fusing the last and first lines from two files, a major headache with cat * >>

File Splitter Splits a text file into many files containing a given number of lines.

Print Select Columns Spread sheet/ tab delimited file manipulations.

Odd Alignment Parser Apps

Eland Parser Splits and converts stand alone or Solexa pipeline Eland Extended xxx_export.txt and xxx_sorted.txt alignment files into center position alignment scored binary PointData xxx.bar files.

Eland Multi Parser Parses an Eland xxx.eland_multi.txt file tabulating hits to each fasta entry. Good for scoring hits to a transcriptome where every fasta entry represents a different gene.

Eland Sequencing Parser Generates sequencing summary tracks for each base and a called consensus from Eland export and sorted alignment files.

Novoalign Parser Parses Novoalign files into center position binary PointData xxx.bar files for USeq analysis and xxx.bed files.

Novoalign Paired Parser Parses paired Novoalign files into 12 column xxx.bed files. Useful for visualizing sequenced fragments.

Novoalign Indel Parser Parses Novoalign alignment files for consensus indels, something currently not supported by the maq apps. See Alleler app.

Soap V1 Parser Parses a Soap version 1 alignment txt file into PointData, split by chromosome and strand.

Tag 2 Point Splits and converts tab delimited text (chr start stop ... strand, e.g. xxx.bed) text files into center position binary xxx.bar PointData files used by ScanSeqs and IGB for analysis and visualization respectively. Very fast loading. Small size.

Filter Duplicate Alignments Filters identical reads from alignment files by random subsampling, finding the best, or limiting to a max number. Useful for minimizing PCR amplification artifacts in sequencing data.