PAVIS is a tool for facilitating ChIP-seq data analysis and hypotheses generation. It offers two main functions: annotation and visualization. The annotation function provides the relative location relationship information between query peaks and genes and other comparison peaks in a genome, and reports relative enrichment levels of peaks in different genomic regions. The visualization offers a simultaneous view of multiple peaks in the context of genomic features and nearby comparison peaks. PAVIS takes as the input the peak location data generated by a peak-calling tool (e.g., MACS). The default format of input peak data files is the UCSC BED format. PAVIS also supports the GFF3 format, and can use peak data files from most ChIP-seq data analysis tools (e.g., EpiCenter).

UPDATES

The last update on 02-05-2018: the genome visualization browser function has been suspended until the related browser issue can be solved. The server has been upgraded with the lastest Python Packages

  1. 04-08-2016:
    • added the support to annotate strand-specific peak data, i.e., peaks are known to be associated with a specific chromosome strand. Note: To use strand-specific annotation, strand (+/-) information must be included in your peak data file, e.g, in the 6th field of the UCSC BED format, and in the 7th field of the GFF3 format (thanks to the feedback from Silvia Bottini).
    • added the genomic feature category of peak center location to the full annotation file.
    • added the option to output Microsoft Excel file for the full annotation data on the CLEAR interface.
    • added the option to include additional fields from the input peak file in the full annotation file (thanks to the feedback from Silvia Bottini).
    • fixed a bug related to UTR annotation when UTR including multiple exons (thanks to the feedback from Benjamin Cossins).
    • other changes to enhence PAVIS's robustness and efficiency.
  2. 08-10-2014: added support of the Drosophila Melanogaster Genome annotation R5/dm3, and the latest R6 from the FlyBase.
  3. 08-11-2014: added gene ID field in output annotation files.
  4. 09-11-2014: added GRCh38/hg38, GRCh37/hg19, and GRCm38/mm10 Ensembl annotation data
  5. 04-08-2015: upgraded to a new release (v1.8) with the main updates listed below
    • added support of the Zebrafish (Zv9) Genomes from Ensemble
    • added the capability to query genes of different types (e.g. protein-coding, microRNA, etc.) for GRCh38/hg38, GRCm38/mm10 and Zebrafish Zv9
    • added support of the 43 plant genomes from Phytozome(10.1)
    • updated Web-UI to group genome annotations
  6. 04-10-2015: added the Gossypium arboreum (A2) Genome from Cotton Genome Project (CGP)
  7. 04-28-2015: added the Gossypium hirsutum (v1) Genome from Cotton Genome Project (CGP)
  8. 05-12-2015: changed TSS annotation mapping from at the gene level to at the transcript level (with nearest TSS) for GRCh38/hg38, GRCh37/hg19, RCm38/mm10, and Zebrafish (Zv9) genome annotation
  9. 06-02-2015: added Drosophila Melanogaster Genome annotation R6.04 and R6.05, and changed TSS mapping from at the gene level to at the transcript level
  10. 07-21-2015: added two new Arabidopsis thaliana TAIR10 gene annotation datasets, one including all regular genes and transposable element genes, and the other having all regular genes, transposable element genes, and transposons
  11. 10-28-2015: added Dog and Cow Ensemble r82 annotation, and updated annotation of Human, Mouse, Rat, C.elegans, Yeast, Zebrafish to the latest Ensembl r82 annotation

Click here for the INTUITIVE interface

Species/Genome Assembly/Gene Set:
Upstream Length:
Downstream Length:

The query peak file to be annotated: strand-specific peaks
File format: UCSC BED GFF3 EpiCenter Report Other text file
If other, please specify the delimiter and column numbers:
field delimiter: tab whitespace comma semicolon pipe
column number (1-based): chromosome:, start position:, end position:
strand (required for strand-specific) : , optional extra fields (e.g., 5-7, 10. 0=NONE):

The optional comparison peak files:

File format: UCSC BED GFF3 EpiCenter Report Other text file
If other, please specify the delimiter and column numbers:
field delimiter: tab whitespace comma semicolon pipe
column number: chromosome:, start position:, end position:
Search distance to query peaks:

Output file format: Tab-delimited text Excel format


USING PAVIS

Species/Genome Assembly/Gene Set
specifies the organism/genome assembly version/a particular gene set for the analysis
Upstream and downstream Length
Besides gene regions, PAVIS can also annotate peaks in upstream regions of Transcript Start Site (TSS), as well in downstream regions of Transcription Termination Site (TTS). The default upstream length is 5000bp, and downstream length is 1000bp. If you do not want to set a specific limit of a region length, you can sets it to be 0 (No limit), and it will include almost all non-gene regions.
Peak data file:
each input peak data file must be a flat or gzipped text file. A gzipped file must have .gz filename extension, and is preferred because it can reduce time to transfer large data over the Internet. The suggested filesize limit for each peak data file is 10 MB.
File format:
the default input data file format is the UCSC BED file format. If your file is not in the UCSC BED, the GFF3 or EpiCenter format, you can specify the filed delimiter, and the column numbers of three required fields: chromosome, start position, and end position fields. The column number is the 1-based.

CITATION

W. Huang, R. Loganantharaj, B. Schroeder, D. Fargo, and L. Li. PAVIS: a tool for Peak Annotation and Visualization. Bioinformatics, 2013, doi:10.1093/bioinformatics/btt520.

contact weichun dot huang at nih dot gov for questions, comments or other feedback.