In summary ComPPI contains 386 861 major and 390 598 minor localizations for 132 935 proteins (no localization for 14 982 proteins). The average number of major localizations for one protein is 2,93. The localization and interaction scoring algorithm works with major localizations (detailed below).
The final distribution of the proteins in the major localizations:
The evidence type of the subcellular localizations of the proteins could be experimentally verified, predicted by computational methods or from unknown origin.
We standardized the subcellular localizations based on Gene Ontology (GO) cellular component terms and arranged them by manual curation into a non-redundant localization tree with more than 1800 individual GO terms. Our localization tree contains only those GO cellular component terms that are contained in the ComPPI database. For more details visit the whole localization tree or download it in SVG format.
The difference in the resolution of the localization data and the ability to deal with a usable number of localizations for the scoring based filtering method indicate the relevance of the separation of subcellular localization data into major and minor localization fractions.
Based on the localization tree of the GO terms we have arranged the minor localizations into 6 major cellular components (cytosol, nucleus, mitochondrion, secretory-pathway, membrane, extracellular). For more details visit the list of minor localizatons belonging to major cellular components.
The distribution of the branches in the tree shows how many minor localizations (GO terms) were assigned to a certain major localization:
The diverse origins/evidence types of the subcellular localization data indicate that we had to arrange the system types into experimental (1), predicted (2), and unknown (0) origin/evidence type. To learn more about our evidence types click here.
The nomenclature of the evidences show redundancy, therefore we had to map the synonyms of the evidences to a single common name per evidence. To learn more about our evidence synonyms click here.
The Gene Ontology subcellular localization data is a major data source for ComPPI. It classifies the GO cellular component annotations by evidence level, which allows us to deal with a more detailed subcellular localization dataset. One exeption of the classification in ComPPI compared to GO is the Inferred from Physical Interaction (IPI), which is treated as a predicted evidence, according to its origin from interaction detection experiments.
The structure of the GO evidence levels and the resolved abbreviations in the source data: