Core Applications

Launch these applications using the -jar option, for example, 'java -jar pathTo/USeq/Apps/FileSplitter' .
Click the name to see the command line menu.

Alignment Parser Apps
Eland ParserSplits and converts stand alone or Solexa pipeline Eland Extended xxx_export.txt and xxx_sorted.txt alignment files into center position alignment scored binary PointData xxx.bar files.
Eland Multi ParserParses an Eland xxx.eland_multi.txt file tabulating hits to each fasta entry. Good for scoring hits to a transcriptome where every fasta entry represents a different gene.
Eland Sequencing ParserGenerates sequencing summary tracks for each base and a called consensus from Eland export and sorted alignment files.
Novoalign ParserParses Novoalign files into center position binary PointData xxx.bar files for USeq analysis and xxx.bed files.
Novoalign Paired ParserParses paired Novoalign files into 12 column xxx.bed files. Useful for visualizing sequenced fragments.
Novoalign Bisulfite ParserParses Novoalign single and paired bisulfite sequence alignment files into xxx.bed and PointData file formats. Generates several summary statistics on converted and non-converted C contexts. Flattens overlapping reads in a pair to call consensus bps.
Novoalign Indel ParserParses Novoalign alignment files for consensus indels, something currently not supported by the maq apps. See Alleler app.
Sam ParserParses SAM and BAM files into center position binary PointData xxx.bar files.
Sam Transcriptome ParserTakes SAM alignment files that were aligned against chromosomes and extended splice junctions (see MakeTranscriptome app) and converts the coordinates to genomic space.
Soap V1 ParserParses a Soap version 1 alignment txt file into PointData, split by chromosome and strand.
Tag 2 PointSplits and converts tab delimited text (chr start stop ... strand, e.g. xxx.bed) text files into center position binary xxx.bar PointData files used by ScanSeqs and IGB for analysis and visualization respectively. Very fast loading. Small size.
SRA ProcessorFetchs SRA files from the Sequence Read Archive and converts them to gzipped fastq. Can be used with tomato to launch alignments etc. on each fastq file.
 
ChIP/RNA-Seq Apps
RNA SeqWraps a variety of the following USeq apps to run a differential treatment vs control gene expression RNA-Seq analysis. It parses raw alignment files, generates ReadCoverage tracks, runs the MultipleReplicaScanSeqs and the EnrichedRegionMaker to identify novel transfrags, and identifies differential gene expression and splicing using the MultipleReplicaDefinedRegionScanSeqs.
ChIP SeqWraps a variety of the following USeq apps to run a differential treatment vs control/ input ChIP-Seq analysis. It parses raw alignment files, filters duplicate reads, generates ReadCoverage tracks, estimates the peak shift using the PeakShiftFinder, and lastly, runs the MultipleReplicaScanSeqs and the EnrichedRegionMaker to identify ChIP-Seq peaks.
QC SeqsQCSeqs takes directories of chromosome specific PointData xxx.bar.zip files that represent replicas of signature sequencing data, merges the strands, uses a sliding window to sum the hits, and calculate Pearson correlation coefficients for the window sums between each pair of replicas.
Peak Shift FinderScans stranded point data for a skew in the distribution of peaks, a signature of a real chIP-seq peak. Calculates the bp difference.
Scan SeqsTakes stranded chromosome specific PointData and uses a sliding window to calculate several smoothed window statistics. These include a binomial p-value, a q-value FDR, an empirical FDR, and a Bonferroni corrected binomial p-value for flagging peak shift strand skew.
Multiple Replica Scan SeqsMRSS uses a sliding window and Ander's DESeq negative binomial pvalue -> Benjamini & Hochberg FDR statistics to identify enriched and reduced regions in a genome. Both treatment and control PointData sets are required, one or more biological replicas. MRSS generates window level differential count tracks for the FDR and normalized log2Ratio as well as a binary window object xxx.swi file for downstream use by the EnrichedRegionMaker. Lastly, MRSS also makes use of DESeq's variance corrected count data to cluster your biological replicas.
Defined Region Differential SeqDRDS takes bam files representing a treatment and control experiment or a multiple condition/ time series and identifies differentially expressed genes under any pairwise comparison using Simon Anders DESeq. Alternative splicing is estimated using a chi-square test of independence.
EnrichedRegionMakerCombines scored windows from ScanSeqs into larger Enriched Regions/ Binding Peaks given one or more thresholds. Can also be used to find the best peak within each Enriched Region.
SimulatorGenerates simulated chIP-seq reads for alignment to a reference.
RNASeq SimulatorRSS takes novoalignments from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.
Score Enriched RegionsDetermines if a set of regions specified by the user is more or less enriched than a randomly generated set of regions matched on chromosome, region and GC Content. The program checks each region individually and the dataset as a whole.
 
Sequencing Apps
Filter Duplicate AlignmentsFilters identical reads from alignment files by random subsampling, finding the best, or limiting to a max number. Useful for minimizing PCR amplification artifacts in sequencing data.
Calculate Per Cycle Error RateCalculates per cycle error rates provided a sorted indexed bam file and a fasta sequence file.
Read CoverageDepreciated, use Sam2USeq. Generates read coverage stats and stair-step xxx.bar graph files for visualization in IGB. If provided with an interrogated regions file, will calculate the fraction interrogated bps with 1,2,3... or more overlapping reads.
Multi Sample VCF FilterSplits vcf file(s) containing multiple sample records into those that pass and fail the user defined tests and sample level thresholds.
VCF ComparatorCompares test vcf file(s) against a gold standard key of trusted vcf calls.
Maq Snps 2 BedConverts a Maq snp text file (1 based coordinates) into a bed file (interbase coordinates).
AllelerIntersects a list of alleles (SNPs and INDELs) with gene models and returns their effects on coding sequences (synonymous/ non-synonymous/ frame shift) and splice-junctions.
Oligo TilerTiles oligos across genomic regions returning their forward and reverse sequences. Good for creating microarray seq capture designs.
Merge Paired Sam AlignmentsMerges proper paired alignments that pass a variety of checks and thresholds. Usefull for avoiding non-independent variant observations and other double counting issues when reads overlap.
Merge RegionsMerges genomic regions. Good for collapsing duplicate regions for creating microarray seq capture designs.
Subtract RegionsRemoves regions from a genomic regions file. Good for RepeatMasking genomic regions prior to tiling.
Randomize Text FileRandomizes the lines in a text file. Good for randomly selecting reverse oligos to pad a microarray design.
Filter Intersecting RegionsSorts a bed file into those regions that intersect or don't intersect any regions in a second bed file. Useful for merging two microarray designs (e.g. RepeatMasked and AlignmentMasked)
Parse Intersecting AlignmentsParses bed alignment files for intersecting reads provided another bed file of regions of interest (e.g. alleles)
 
Bisulfite Sequencing & Methylation Array Apps
Novoalign Bisulfite ParserParses Novoalign single and paired bisulfite sequence alignment files into xxx.bed and PointData file formats. Generates several summary statistics on converted and non-converted C contexts. Flattens overlapping reads in a pair to call consensus bps.
Bis StatTakes PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and generates several xxCxx context statistics and graphs for visualization in IGB. Estimates whether a given C is methylated using a binomial distribution.
Bis Stat Region MakerTakes serialized window objects from BisStat, thresholds based on the min and max fraction methylation params and prints regions in bed format meeting the criteria.
Bis SeqTakes two condition (treatment and control) PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and identifies differential methylated regions using either a fisher exact or a chi-square test p-values converted to B&H FDRs.
Defined Region Bis SeqTakes two condition (treatment and control) PointData from converted and non-converted C bisulfite sequencing data parsed using the NovoalignBisulfiteParser and scores regions defined by the user for differential methylated using either a fisher exact or a chi-square test p-values converted to B&H FDRs.
Stranded Bis SeqLooks for strand bias in CG methylation from one dataset using fischer/ chi-square tests followed by a Benjamini and Hochberg FDR correction. WARNING: many bisulfite datasets display strand bias due to preferential breakage of C rich strands. Use this app with caution.
Score Methylated RegionsFor each region fetches the underlying methylation data. A p-value for each region's fraction methylated as well as a fold enrichment over random can be calculated using randomly drawn regions matched by chromosome, region length, # obs, and GC content.
Parse PointData ContextsParses PointData for particular 5bp genomic sequence contexts. Useful for splitting data into CG and nonCG contexts.
BisSeq Aggregate PlotterBSAP merges bisulfite data over defined regions to generate data for class average agreggate plots of fraction methylation.
Bis Seq Error adderTakes PointData from the NovoalignBisulfiteParser and simulates a worse non-coversion rate by randomly picking converted observations and making them non-converted.
Methylation Array ScannerMAS takes paired and non-paired sample PointData representing beta values (0-1) from arrays and attempts to identify regions with enriched/ reduced signal using a sliding window approach.
Methylation Array Defined Region ScannerMADRS takes paired and non-paired sample PointData representing beta values (0-1) from arrays and scores user defined regions for differences in methylation.
 
General Analysis Apps
Aggregate PlotterFetches point data contained within each region, zeros the coordinates, scales, sums, and window averages the values. Useful for generating class averages from a list of annotated regions. Use a spreadsheet app to graph the results.
Find Neighboring GenesFNG takes a list of genes in UCSC Gene Table format and intersects them with a list of regions finding the closest gene to each region as well as all of the genes that fall within a given neighborhood.
Intersect RegionsPerforms an intersection analysis on lists of genomic regions, uses random regions matched for GC content, length, array interrogated regions, and chromosome to calculate an enrichment over random and p-value. Also generates a distance to nearest region distribution histogram.
Compare Intersecting RegionsCompares test region file(s) against a master set of regions for intersection. Reports the results as columns relative to the master.
Intersect ListsIntersects two lists (of gene names) and using randomization, calculates the significance of the intersection and the fold enrichment over random.
Ranked Set AnalysisPerforms an intersection analysis on lists of ranked regions creating a visual box-line-box representation as well as a rank based % intersection graph.
Correlation MapsCM creates correlation maps from gene expression data to look for physical gene clusters (aka gene expression neighborhoods, chromosome territories).
Correlate PointDataCalculates a Pearson Correlation Coefficient on the values of PointData found with the same positions in the two datasets.
Score ChromosomesScores a genome for hits to a transcription factor binding matrix, LLPSPM.
Score Parsed BarsGiven a list of regions and a directory of graph data in bar format, extracts all the values under each region, calculates their mean and compares it to a random background model to generate a p-value for the associated means.
Score SequencesScores a multi-FASTA file of sequences for hits to a transcription factor binding matrix, LLPSPM.
Kegg Pathway EnrichmentLooks for overrepresentation of genes from a user's list in Kegg pathways using a random permutation test.
 
Utility Apps
Text 2 USeqConverts genomic data text files (xxx.bed, xxx.gff, xxx.sgr, etc) to ultra compact, indexed, USeq binary files.
Wig 2 USeqConverts UCSC variable step, fixed step, and bedGraph xxx.wig/bedGraph4(.zip/.gz OK) files into stair step/ heat map useq archives.
UCSC Big 2 USeqConverts UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives to USeq archives (xxx.useq).
USeq 2 UCSC BigConverts USeq archives to UCSC bigWig (xxx.bw) or bigBed (xxx.bb) archives based on the data type.
USeq 2 TextConverts USeq binary files to native, bed, or wig formats.
Sam 2 USeqGenerates per base read depth stair-step xxx.useq graph files for visualization in IGB from SAM alignment files. Options are available for generating relative and stranded graphs.
Sam SubsamplerFilters, randomizes, subsamples and sorts sam/bam alignment files.
Bar 2 USeqConverts directories of xxx.bar files into useq archives, recursive.
Bar 2 GrConverts xxx.bar files to text xxx.gr files.
Gr 2 BarConverts chromosome specific xxx.gr.zip files (position score) to binary chromosome specific xxx.bar files.
Sgr 2 BarConverts xxx.sgr.zip files (chr position score) to binary chromosome specific xxx.bar files.
Bed 2 BarConverts xxx.bed files (chr start stop... name score strand) into binary chromosome specific xxx.bar file graphs. Can also generate a merged composite bed file thresholded at the sum of the overlapping bed scores.
Wig 2 BarConverts variable step and fixed step xxx.wig(Var) files to chrom specific bar files.
Graph 2 BedConverts USeq stair step and heat map graphs into region bed files using a threshold.
Filter Point DataFPD drops observations from PointData that intersect a list of regions (e.g. repeats).
Sub Sample Point DataCreates a random sub sampling of Point Data. Useful for matching treatment and control datasets for visualization.
Point Data ManipulatorManipulates point data to merge datasets, merge strands, shift base positions, replace scores with 1, and sum identical positions.
Primer 3 WrapperWrapper for the primer3 (http://frodo.wi.mit.edu/primer3/) application. Extracts sequence, formats for primer3, executes, and parses the output to a spreadsheet. Useful for bulk qPCR primer picking. Yes, you do need to validate your results.
Export ExonsTakes a UCSC gene table and exports the exons to a bed file +/- a bp buffer.
Export Intergenic RegionsTakes a GFF file and exports regions not covered by any annotation, the intergenic regions.
Export Intronic RegionsTakes a UCSC gene table and exports the most conservative estimate of intronic sequence.
Fetch Genomic SequencesGiven a file containing genomic coordinates, fetches and saves the sequence.
Convert Fasta 2 GC BooleanConverts fasta sequences to GC boolean arrays for use by other applications.
Make Splice Junction FastaDepreciated, use MakeTranscriptome. MSJF creates a multi fasta file containing sequences representing all possible linear splice junctions.
Concatinate FastasConcatinates a directory of fasta files into a single sequence seperated by a defined number of Ns. Use this app to create artificial chromosomes for poorly assembled genomes.
Shift Annotation PositionsUses the information in an xxx.shifter.txt file from the ConcatinateFastas app to shift the annotation to match the coordinates of the concatinated sequence.
Make TranscriptomeMakes all known and theoretical splice junctions and transcripts from a table of gene models. Read through occurs with small exons to the next up or downstream. Include the splices fasta file along with the chromosome files when building a genome index for RNA-Seq alignments.
Merge UCSC Gene TableMerges transcript models that share the same gene name. Maximizes exons, minimizes introns.
Find Overlapping GenesFinds overlapping genes that converge, diverge, or contain one another given a UCSC gene table.
File Cross FilterFCF take a column in the matcher file and uses it to parse the rows from other files. Useful for pulling out and printing in order the rows that match the first file.
File Match JoinerFMJ loads a file and a particular column containing unique entries, a key, and then appends the key line to lines in the parsed file that match a particular column. Useful for appending chromosome coordinates to snp data based on a common ID, etc.
File JoinerJoins many text files together, paying attention to avoid fusing the last and first lines from two files, a major headache with cat * >>
File SplitterSplits a text file into many files containing a given number of lines.
Print Select ColumnsSpread sheet/ tab delimited file manipulations.