USeq results typically include:
- Note all pvals and fdrs are transformed (-10*log10(pval or fdr)) thus 13 = 5%, 20 = 1%, 30 = 0.1%.
- Three data folders containing:
- PointData These folders contain mapped read data split by chromosome and strand. For each read, its center position, and any associated score is saved. These are used by various USeq applications to perform analyisis. They can be viewed in IGB but be aware that PointData that share the same position are graphed on top of each other in IGB, not summed.
- Windows: several different measurements related to the overlapping window scanning summary statistics created by ScanSeqs. These are best used for visualization in the IGB. For each summary score, two types of window representations maybe generated.
HeatMap is a window representation best viewed in IGB as a stair-step or heatmap.
Point window summaries are where the window score is assigned to the center position in the window. Both use the xxx.bar.zip binary format.
- BinPVal: -10Log10(p-values) - Bionomial p-values for each window calculated by comparing the number of treatment reads to the number of control reads. Not multiple test corrected. Treat this as a variance corrected score for ranking purposes.
- QValFDR: -10Log10(q-values) - Window level bionomial p-values converted into q-value FDRs using John Storey's R package.
- EmpFDR: -10Log10(FDR) - Empirical false discovery rates based on a control vs control null distribution.
- Sum, Sum+, Sum-: A sum of the window point data, combine or for each strand, only provided when a treatment only analysis is run.
- xxx.swi: Serializes window object array generated by ScanSeqs and used by the EnrichedRegionMaker application.
- Enriched Regions: Enriched Regions (or Reduced Regions) from the EnrichedRegionMaker are collapsed overlapping windows best used for downstream analysis. Several different sets of ERs can be created by specifying multiple thresholds or asking the EnrichedRegionMaker application to produce the top 100, 200, 400, etc. ERs.
For each ER set, a folder (name_threshold_#ERs) is created containing the following.
- xxx.xls: a spreadsheet report, these are ordered by the the best window binomial p-value score, the most significant on top. Some or all of the following will be found:
- #Hyperlinks - if the Integrated Genome Browser is open, clicking these links will take you to that region in the genome browser.
- Chr - chromosome
- Start - start bp coordinate (0 based) for the entire enriched (or reduced) region
- Stop - stop bp coordinate (end excluded) for the entire enriched (or reduced) region
- #Windows - number of windows that were merged into the enriched region
- #T - if PointData were provided, the number of treatment reads in the enriched region
- #Unique_T - if PointData were provided, the number of unique treatment reads in the enriched region
- #C - if PointData were provided, the number of control reads in the enriched region
- #Unique_C - if PointData were provided, the number of unique control reads in the enriched region
- ER_BinPVal - if PointData were provided, the binomial p-value for the entire enriched region. These are not multiple test corrected.
- ER_Log2((#T+1)/(#C+1)) - if PointData were provided, the normalized log2 ratio for the enriched region.
- BSW_Start - if PointData were provided, start bp coordinate of the best sub window
- BSW_Stop - if PointData were provided, end bp coordinate of the best sub window
- BSW_BinPVal - if PointData were provided, the binomial p-value for the best sub window. These are not multiple test corrected.
- BW_Start - start bp coordinate for the best window from all the windows merged
- BW_Stop - end bp coordinate for the best window from all the windows merged
- BW_BinPVal - binomal pvalue for the best window
- BW_QValueFDR - FDR estimation derived from the binomial pvalue using Storey's q-value method
- BW_EmpFDR - FDR estimation based on generating a null distribution with your input data, only used with a static chIP-seq, skipped with a dynamic analysis
- BW_SkewPVal - Bonferroni corrected binomial p-value looking at whether there is a skew in the distribution of + vs - stranded reads, some folks insist this can be used to filter out false positives, in our hands it hurts more than helps and should be ignored
- BW_Log2((sumT+1)/(sumC+1)) - normalized log2 ratio, a measure of enrichment
- BW_SumT+ - number of reads in the best window for the treatment + strand
- BW_SumT- - number of reads in the best window for the treatment - strand
- BW_SumC+ - number of reads in the best window for the control + strand
- BW_SumC- - number of reads in the best window for the control - strand
- Info for entire run - genome version and total read counts
- xxx.egr: a multiple score EGR format file for viewing in IGB.
- xxx.gff: a GFF file
- XXXbpSubWinData: If treatment and control data were provided to the EnrichedRegionMaker, each ER is rescanned using a small window to identify the best peaks within the ER. These are included in the spreadsheet report as well as outputed as graph bar files, xxx.egr, and xxx.gff files.
- Defined Region Scan Seqs output:: a spreadsheet containing the following:
- #Name - if the Integrated Genome Browser is open, clicking these links will take you to that defined region in the genome browser.
- Chr - chromosome
- Strand - strand of the defined region
- Start - start bp coordinate (0 based) for the defined region
- Stop - stop bp coordinate (end excluded) for the defined region
- pVal - uncorrected binomial p-value
- qValFDR - FDR estimation derived from the binomial pvalue using Storey's q-value method
- eFDR - Empirical FDR estimation based on generating a null distribution with your control/input data, only used with a static analysis, skipped with a dynamic analysis
- pValSkew - Bonferroni corrected binomial p-value looking at whether there is a skew in the distribution of + vs - stranded reads, some folks insist this can be used to filter out false positives, in our hands it hurts more than helps and should be ignored
- pValDiffDist - Bonferroni corrected chi-square test of independence p-value looking for differences in the distribution of reads between the exons for a particular define region, good for flagging alternative splicing
- TotalRegionBPs - total bps in defined region
- Log2((sumT+1)/(sumC+1)) - normalized log2 ratio
- tSumPlus - number of reads from the treatment + strand
- tSumMinus - number of reads from the treatment - strand
- tRPKM - treatment reads per kb of interrogated region per total million mapped reads
- cSumPlus - number of reads from the control + strand
- cSumMinus - number of reads from the control - strand
- cRPKM - control reads per kb of interrogated region per total million mapped reads
- SpliceJunctions - numerous columns looking at the difference in splice junction reads that map to a particular defined region. Only calculated when novoalign bed format alignments were provided to the DRSS.
- Info for entire run - genome version and total read counts
- Defined Region Differential Seq output:: a spreadsheet containing the following:
- #DisplayName - if the Integrated Genome Browser is open, clicking these hyper links will take you to that defined region/ gene in the genome browser.
- Name - present if found in the gene table.
- Chr - chromosome
- Strand - strand of the defined region
- Start - start bp coordinate (0 based) for the defined region
- Stop - stop bp coordinate (end excluded) for the defined region
- TotalBPs - total base pairs of the defined region/ gene's exons.
- PVal_ConditionA_ConditionB - uncorrected negative binomial -10*log10(pval) from DESeq for the paired ConditionA vs ConditionB comparison
- FDR_ConditionA_ConditionB - Benjamini-Hochberg -10*log10(FDR) estimation derived from the DESeq p-values
- VarCorLg2Rto_ConditionA_ConditionB - Log2 ratio estimation (pseudo median of ConditionA varCorr values - pseudo median of ConditionB varCor values) for the degree of differential expression based on DESeq's variance corrected counts for each sample.
- SpliceChiPVal_ConditionA_ConditionB - Bonferroni corrected chi-square test of independence -10*log10(pval) looking for differences in the distribution of reads between the exons for a particular gene. Replica counts are merged for this analysis.
- SpliceMaxLg2Rto_ConditionA_ConditionB - The maximum observed log2Ratio for differential exonic splicing normalized for total gene counts.
- SpliceMaxExon_Treatment_Control - The zero based index for which exon displayed the maximum log2Ratio splicing difference.
- Counts_ConditionReplica - number of fragments from each replica that aligned to the defined region or gene exons.
- VarCorCounts_ConditionReplica - DESeq's variance corrected fragment counts for each replica that aligned to the defined region or gene exons.
- FPKM_ConditionReplica - FPKM corrected fragment counts for each replica that aligned to the defined region or gene exons. FPKM = #fragments/TotalBPs/1000/#fragmentsInDataSet/1M