DOMINE: Database of Protein Domain Interactions

Help Topics

A. Overview - general information on DOMINE
B. Getting started - how to use the database, searching DOMINE
C. Terms of Use
D. Data Download
E. Citing DOMINE
F. Frequently Asked Questions (FAQ)

A. Overview

DOMINE is a database of known and predicted protein domain (domain-domain) interactions. It contains interactions inferred from PDB entries, and those that are predicted by 13 different computational approaches using Pfam domain definitions. The domain-domain interaction (DDI) data in the DOMINE database was gathered from 15 different sources altogether.

Data source or method	Number of interactions	Comments
iPfam	4,030	iPfam is a database of DDIs that are observed in PDB entries. Interaction data, dated 17 Feb 2007, was used.
3did	6,066	3did is a collection of DDIs in proteins for which high-resolution three-dimensional structures are known. Interaction data, downloaded in Sep 2010 was used.
ME	2,391	ME refers to Lee et al.'s integrated approach (2006) for the prediction of DDIs. This method used a Bayesian approach to integrate domain interactions predicted using a maximum likelihood estimation (MLE) approach on yeast, worm, fruitfly and human protein interaction networks with the gene ontology and domain fusion data to arrive at a total of 2,391 high-confidence predictions.
RCDP	960	Jothi et al.'s Relative Co-evolution of Domain Pairs (RCDP) approach uses sequence co-evolution to predict the domain pair that is most likely to mediate a given protein-protein interaction. Given a protein-protein interaction, RCDP computes the degree of sequence co-evolution between all pairs of domains between the two proteins, and predicts the domain pair with the highest degree of co-evolution to be the mediating domain pair. The set of 960 DDIs predicted from 1,180 yeast protein-protein interactions was used.
P-value	596	"P-value" refers to Nye et al.'s statistical approach that assigns p-values to pairs of domain superfamilies, measuring the strength of evidence that this pair of superfamilies form contacts within a set of protein interactions. A set of p-values is calculated for SCOP superfamily pairs based on a pooled data set of interactions from yeast. These p-values were then used to predict which domains come into contact in an interacting protein pair. This scheme was applied on protein complexes in the Protein Quaternary Structure (PQS) database to predict domain-domain contacts for 705 interacting protein pairs. Since interactions were predicted between SCOP domains, SGD was used to map SCOP domains to Pfam domains, for every yeast protein used, and convert 705 interactions between SCOP domain families to 596 DDIs among Pfam domain families.
Interdom	2,768	Domain-domain interactions inferred using domain fusion hypothesis as inferred by Ng et al. Interaction data from version 1.1, released on 15 Feb 2003, was used.
DPEA	1,812	Riley et al.'s Domain Pair Exclusion Analysis (DPEA) is a statistical approach to infer DDIs from the incomplete sets of protein-protein interactions from multiple organisms. It employs an expectation maximization algorithm to obtain a maximum likelihood estimate or the probability of interaction of each potentially interacting domain pair. For each potential domain pair, a change in likelihood, expressed as a log odds score, is computed by excluding this domain pair from being considered as a potentially interacting domain pair. The set of 3,005 domain pairs with log odds score ≥ 3.0 were designated as high-confidence interactions. Since only interactions between Pfam-A domains are being considered, 1,193 interactions in which at least one of the interacting partner is a Pfam-B domain were discarded, reducing the number of interactions to 1,812.
PE	2,588	Guimaraes et al.'s Linear Programming approach is an optimization approach, which relies on the parsimonious explanation (PE) that "DDI partners are predicted by identifying the minimal weighted set of domain pairs that can justify a given protein-protein interaction network". Given a protein-protein interaction network, the PE approach computes a LP-score, in the range (0,1), for every domain pair that could possibly justify interaction between two proteins. False positives in the protein-protein interaction network are handled using a probabilistic construction (p-scores). Domain pairs with an LP-score above a certain threshold are considered as interacting. A set of 3,499 domain pairs with LP-score ≥ 0.5 and 0.0 ≤ p-score ≤ 0.1 was used. Since only interactions between Pfam-A domains are considered, 911 interactions in which at least one of the interacting partner is a Pfam-B domain were discarded, reducing the number of interactions to 2,588.
GPE	1,563	GPE builds upon the PE approach by unifying domains that always occur together in a protein as a singular 'supra -domain', and uses the linear-programming framework as used by PE. GPE was applied on the redefined Riley et al. dataset (Guimaraes dataset), and the set of interactions only between Pfam-A domains with LP-score ≥ 0.60 and pw-score ≥ 0.01 was used. Supra-domains were expanded back to individual Pfam-A domains to obtain 1,563 interactions.
DIPD	2,157	DIPD constructs feature vectors for each protein pair within the sets of PPIs (Riley et al. dataset) and non-PPIs, and uses a discriminative classifier to identify the minimum set of domain pairs/triplets that can discriminate PPIs and non-PPIs. Each selected feature (domain pair) is a putative DDI. The sets of predictions on input datasets from Jothi et al., Riley et al., and Guimaraes and Przytycka were used to predict a combined 2,157 interactions.
RDFF	2,475	Chen and Liu's Random Decision Forest Framework (RDFF) approach explores all possible DDIs and predicts protein-protein interactions based on protein domains. The decision tree-based model is used to infer domain–domain interactions for each correctly predicted protein-protein interaction pair. The set of 2,475 DDIs between Pfam-A domains was used.
K-GIDDI	386	K-GIDDI uses gene ontology information to construct an initial DDI network using the top s% of DDIs inferred from cross-species protein interaction networks, and then expands the DDI network by predicting additional interactions using a graph theoretical approach based on a parameter b. The latter procedure allows for prediction of interactions that can otherwise not predictable by methods that rely solely on protein interaction data. The set of 386 interactions predicted using s=10 and b=50 was used.
Insite	2,408	Insite uses a naive Bayes model to build upon features in DPEA. Its novel formulation of evidence models for protein interactions and DDIs helps address noise (false-positives) generated by high-throughput assays. The set of 2,408 interactions with score ≥ 1 was used.
DomainGA	459	DomainGA is a genetic algorithm-type machine learning approach based on multi-parameter optimization. It uses the available protein interaction data to compute a score for every domain pair, which are then used to predict protein interactions. Yeast protein interaction dataset was used to identify 867 putative DDIs between domains defined based on information derived from the Interpro database. The set of 459 interactions only between Pfam domains was used.
DIMA	8,.012	DIMA contains DDI predictions based on phylogenetic profiling. A set of 8,012 interactions reported in Pagel et al.'s J Mol Biol (2004) paper was used.

DOMINE contains a total of 26,219 DDIs out of which 6,634 (gold-standard positives) are inferred from PDB entries (the union of the sets of interactions from iPfam and 3did), and 21,620 are predicted by at least one out of the 13 computational approaches.

The confidence levels of predictions by each computational approach was computed based on how well an approach's predictions are also confirmed by other approaches. For every pair of methods, Jaccard index values were computed. Then, these were used to compute prediction overlap index(POI) for each method. The figure below shows the schematic overview of the DOMINE database construction including the new classification scheme to assign confidence levels to each DDI. The POI scores for 12 methods are shown as a table within this figure. A method whose predictions do not overlap with those of any other methods will receive a POI of one, whereas a method whose predictions overlap completely with those of at least one other method will receive a POI of not more than 0.5. Thus, a high POI for a method indicates this methods predictions are mostly either outliers (false-positives) or novel putative DDIs. The confidence score for each predicted DDI is computed by summing up the POIs of methods predicting. Based on a predicted DDI's confidence score and gene ontology information on the interacting domains, it is either classified as either a high-confidence, medium-confidence, or a low-confidence prediction (HCP, MCP, and LCP, respectively). We made a decision to classify predictions by the ME approach HCPs since its predictions are based on three sufficiently different types of information (protein interaction data, gene ontology and gene fusion), and over 50% of them are known to be true.

This new scheme resulted in 2,989, 2,537, and 16094 DDIs classified as HCP, MCP, and LCP, respectively. The following figure shows the percentages of predictions by each method classified as HCP, MCP, or LCP

B. Getting Started

How to use the database?

Domain-domain interaction information contained in the DOMINE database can be accessed by clicking either the "Browse" or the "Search" option on the menu.

Users who wish to browse the database have the option to browse a list of Pfam domain IDs based on their gene ontology (GO) classification.

Users who wish to search the database may perform their search using either a keyword (e.g., Insulin) or Pfam domain ID (e.g., Insulin) or accession (e.g., PF00049 or 00049). Users may also search the database using Interpro (e.g., IPR016179 or 016179 or 16179) and GO (e.g., GO:0005179 or 0005179 or 5179) identifiers. If you wish to search using multiple identifiers, be sure to separate those using spaces. For convenience, a search toolbar is provided right within the menu bar (top-right corner of the Web Site).

What do the results mean?

Clicking on a domain name (Pfam ID) from anywhere on the Web Site displays interaction information, if available, for that domain. For each interacting domain, the list of domains that it is known/predicted to interact with are displayed along with external links to the Pfam, Interpro, and GO databases. For each predicted interaction, information on whether DOMINE considers it to be a high-, medium-, or low-confidence prediction is provided in addition to the source(s) of evidence.

Notes

	Interaction observed in PDB crystal structure(s).
	Predicted interaction with either confidence score ≥ 2, or confidence score ≥ 1 and the domains share a ontology term (part of the same biological process).
	Predicted interaction with either confidence score ≥ 1, or the domains share a gene ontology term (part of the same biological process).
	Predicted interaction that is not classified as HCP or MCP.

Source

	Interaction observed in PDB crystal structure(s), as inferred by iPfam
	Interaction observed in PDB crystal structure(s), as inferred by 3did
	Interaction predicted by Lee et al.'s integrated approach
	Interaction predicted by Jothi et al.'s using sequence co-evolution
	Interaction predicted by Nye et al.'s statistical approach
	Interaction predicted using domain fusion hypothesis, as inferred by Interdom
	Interaction predicted by Riley et al.'s using domain pair exclusion analysis
	Interaction predicted by Guimaraes et al.'s based on parsimonious explanation argument
	Interaction predicted by Guimaraes and Przytycka's generalized parsimonious explanation argument
	Interaction predicted by Zhao et al.'s discriminative classifier approach
	Interaction predicted by Chen and Liu's random forest algorithm
	Interaction predicted by Liu et al.'s graph-theoretical approach based on gene ontology data
	Interaction predicted by Wang et al.'s naive Bayes model
	Interaction predicted by Singhal and Resat's genetic algorithm-type machine learning approach
	Interaction predicted by Pagel et al.'s phylogenetic profiling approach

C. Terms of Use

The following terms and conditions apply to all those who use the DOMINE database or this Web Site. By visiting, accessing, browsing and/or using this web site, you acknowledge that you have read, understood, and agree, to be bound by these terms and conditions. If you do not agree to these terms, do not use this web site.
It is forbidden to redistribute any of the Content of this Web Site in any manner or create a database in electronic form or manually by downloading and storing any such content without an express written consent from the DOMINE team. Moreover, it is forbidden to sell any information derived from this Web Site.
You may not make copies or mirrors of the DOMINE database without a prior authorization from the DOMINE team.
The DOMINE team do not assume any legal liability or responsibility for the quality, accuracy, truth, suitability, completeness, or usefulness of any data, material or other information contained on this Web Site.
The DOMINE team reserves the right to change these terms and conditions by posting changes on this page of the Web Site and you will be deemed to have accepted such changes if you use this Web Site after the DOMINE team has published the amended terms and conditions on this page of the Web Site.

D. Download Data

Click here to download data from the DOMINE database.

E. Citing DOMINE

Yellaboina S, Tasneem A, Zaykin DV, Raghavachari B, and Jothi R. DOMINE: A comprehensive collection of known and predicted domain-domain interactions, Nucleic Acids Research, Vol 39 (Database Issue), D730-735, 2011.

Raghavachari B, Tasneem A, Przytycka, T, and Jothi R. DOMINE: A database of protein domain interactions, Nucleic Acids Research, Vol 36 (Database Issue), D656-661, 2008.

F. Frequently Asked Questions

1. What is DOMINE?

DOMINE is a database of protein domain (domain-domain) interactions inferred from PDB entries, and those that are predicted by 8 different computational approaches using Pfam domain definitions.

2. What is a domain?

Domain is a structural or functional subunit of a protein. The concept of the domain was first proposed in 1973 by Wetlaufer, who defined domains as stable units of protein structure that could fold autonomously. We refer the reader to Wikipedia to learn more about domains.

3. Which domain definition is being use by DOMINE?

DOMINE uses domains as defined by the Pfam.

4. What is the difference between a protein-protein interaction and a domain-domain interaction?

Interaction between two proteins are referred to as a protein-protein interaction. Domain is a structural or functional subunit of a protein, and Interaction between two proteins often involves binding between pair(s) of their constituent domains. Interaction between two domains of the same protein are referred to as intra-chain domain-domain interaction, and interaction between two domains from different proteins are referred to as inter-chain domain-domain interaction.

5. DOMINE does not contain interaction information (or partners) for YYYY domain!

None of the 10 data sources we used inferred YYYY domain to be interacting with any other domain. It is possible that YYYY domain may actually be interacting with other domains. If you know of such an interaction, and have supporting information, please write to us. We would like to be alerted on new domain-domain interactions, and would be happy to add them to the database after examination.

6. Can I search my protein against DOMINE?

No. Unfortunately, the current version of DOMINE does not have that capability. However, the domain composition of the protein of interest can be obtained using Pfam search, and these domains can be searched against DOMINE.