PAVIS is a tool for facilitating ChIP-seq data analysis and hypotheses generation. It offers two main functions: annotation and visualization. The annotation function provides the relative location relationship information between query peaks and genes and other comparison peaks in a genome, and reports relative enrichment levels of peaks in different genomic regions. The visualization offers a simultaneous view of multiple peaks in the context of genomic features and nearby comparison peaks.
PAVIS takes as the input the peak location data generated by a peak-calling tool (e.g., MACS). The default format of input peak data files is the UCSC BED format.
PAVIS also supports the GFF3 format, and can use peak data files from most ChIP-seq data analysis tools (e.g., EpiCenter).
UPDATES
The last update on 02-05-2018: the genome visualization browser function has been suspended until the related browser issue can be solved. The server has been upgraded with the lastest Python Packages
- 04-08-2016:
- added the support to annotate strand-specific peak data, i.e., peaks are known to be associated with a specific chromosome strand. Note: To use strand-specific annotation, strand (+/-) information must be included in your peak data file, e.g, in the 6th field of the UCSC BED format, and in the 7th field of the GFF3 format (thanks to the feedback from Silvia Bottini).
- added the genomic feature category of peak center location to the full annotation file.
- added the option to output Microsoft Excel file for the full annotation data on the CLEAR interface.
- added the option to include additional fields from the input peak file in the full annotation file (thanks to the feedback from Silvia Bottini).
- fixed a bug related to UTR annotation when UTR including multiple exons (thanks to the feedback from Benjamin Cossins).
- other changes to enhence PAVIS's robustness and efficiency.
- 08-10-2014: added support of the Drosophila Melanogaster Genome annotation R5/dm3, and the latest R6 from the FlyBase.
- 08-11-2014: added gene ID field in output annotation files.
- 09-11-2014: added GRCh38/hg38, GRCh37/hg19, and GRCm38/mm10 Ensembl annotation data
- 04-08-2015: upgraded to a new release (v1.8) with the main updates listed below
- added support of the Zebrafish (Zv9) Genomes from Ensemble
- added the capability to query genes of different types (e.g. protein-coding, microRNA, etc.) for GRCh38/hg38, GRCm38/mm10 and Zebrafish Zv9
- added support of the 43 plant genomes from Phytozome(10.1)
- updated Web-UI to group genome annotations
- 04-10-2015: added the Gossypium arboreum (A2) Genome from Cotton Genome Project (CGP)
- 04-28-2015: added the Gossypium hirsutum (v1) Genome from Cotton Genome Project (CGP)
- 05-12-2015: changed TSS annotation mapping from at the gene level to at the transcript level (with nearest TSS) for GRCh38/hg38, GRCh37/hg19, RCm38/mm10, and Zebrafish (Zv9) genome annotation
- 06-02-2015: added Drosophila Melanogaster Genome annotation R6.04 and R6.05, and changed TSS mapping from at the gene level to at the transcript level
- 07-21-2015: added two new Arabidopsis thaliana TAIR10 gene annotation datasets, one including all regular genes and transposable element genes, and the other having all regular genes, transposable element genes, and transposons
- 10-28-2015: added Dog and Cow Ensemble r82 annotation, and updated annotation of Human, Mouse, Rat, C.elegans, Yeast, Zebrafish to the latest Ensembl r82 annotation
USING PAVIS
- Species/Genome Assembly/Gene Set
- specifies the organism/genome assembly version/a particular gene set for the analysis
- Upstream and downstream Length
- Besides gene regions, PAVIS can also annotate peaks in upstream regions of Transcript Start Site (TSS), as well in downstream regions of Transcription Termination Site (TTS). The default upstream length is 5000bp, and downstream length is 1000bp. If you do not want to set a specific limit of a region length, you can sets it to be 0 (No limit), and it will include almost all non-gene regions.
- Peak data file:
- each input peak data file must be a flat or gzipped text file. A gzipped file must have .gz filename extension, and is preferred because it can reduce time to transfer large data over the Internet. The suggested filesize limit for each peak data file is 10 MB.
- File format:
- the default input data file format is the UCSC BED file format. If your file is not in the UCSC BED, the GFF3 or EpiCenter format, you can specify the filed delimiter, and the column numbers of three required fields: chromosome, start position, and end position fields. The column number is the 1-based.
CITATION
W. Huang, R. Loganantharaj, B. Schroeder, D. Fargo, and L. Li. PAVIS: a tool for Peak Annotation and Visualization.
Bioinformatics, 2013, doi:10.1093/bioinformatics/btt520.