Annotation guide / File format
HPO Annotation Guide / File Formats
We represent clinical annotations using a simple tab-delimited format that was designed to be as similar as possible to the format used by the Gene Ontology consortium. This document describes the process of assigning HPO terms to disease entities such as Mendelian disorders. Each line in the annotation file represents a link between a disease entity (such as Noonan syndrome) and one of the clinical features characteristically seen in that disease. Each of the features of a disease is to be listed on a separate line.
Note that this file (and format) is intended to be used for the annotation of disease entities (such as "Noonan syndrome") and not individuals (such as a person with Noonan syndrome). The Human Phenotype Ontology consortium is currently developing software for use in clinical research projects where clinical findings of individuals with hereditary diseases are to be described. Interested parties are requested to contact us for further information.
File format
The flat file format comprises 12 tab-delimited fields.
|
1) DB
This field refers to the database from which the identifier in DB_Object_ID (column 2) is drawn. At present, only annotations from the OMIM database are available, but we are planning to add annotations to chromosomal disorders.2) DB_Object_ID
This is the identified of the annotated disease within the database indicated in column 1. Note that for OMIM identifiers, the symbol preceding the MIM number is omitted (*,#,+,%).3) DB_Name
This is the name of the disease associated with the DB_Object_ID in the database. Only the accepted name should be used, synonyms should not be listed here.4) Qualifier
This optional field can be used to qualify the annotation shown in field 5. Possible values of this field are "NOT", "SECONDARY", "MILD","MODERATE","SEVERE","FREQUENCY". If multiple qualifiers are shown, they are separated by a comma (",") symbol. The meaning of individual modifiers is as follows:- NOT: The disorder being annotated is NOT characterized by the feature associated with HPO_ID in column 5.
- SECONDARY: The feature is a secondary consequence of a primary pathophysiological event in another organ. For instance, although Jaundice is observed in the skin or sclerae, it is secondary to abnormalities in other organs. For instance, to indicate that jaundice seen in a certain disease is secondary to Cholestatic liver disease, we would annotate SECONDARY(HP:0002611), where HP:0002611 is the HPO identified of the term Cholestatic liver disease.
- MILD, MODERATE, SEVERE: In general, it is preferred to annotate with a term describing the underlying abnormality, such as Hearing loss, and to use qualifiers such as MILD, MODERATE, SEVERE if thought necessary to describe the severity of the clinical involvement. This is preferred because of the assumption that mild and moderate manifestations of specific medical abnormalities result from mild or moderate disturbances of the same cellular and physiological networks and also because distinctions that are often heard in clinical practice such as "mild-to-moderate" or grade "II-III/VI" often seem more or less arbitrary. It is an error to use more than one of the qualifiers MILD, MODERATE, SEVERE in one annotation.
- FREQUENCY: This modifier can be used to give the exact numbers of affected persons manifesting a given trait. For instance, if a study showed that 5 of 8 patients display the feature indicated by HPO_ID in column 5, we could use the modifier FREQUENCY(5/8).
- OBLIGATE,COMMON,FREQUENT,OCCASIONAL,UNCOMMON: If exact numbers are unknown or unavailable, it is possible to use these modifiers. As a general guide, obligate features are those that are found in 95%--100% of affected persons, common features are found in at least 50%, frequent features are found in 25--50%,occasional features are found in 10--25%, and uncommon features are found in less than 10% of affected persons but are clearly related to the disease.
Some examples for modifier entries are "MILD,FREQUENCY(18/23)" and "SECONDARY(HP:0004321),UNCOMMON"
It is expected that software does not depend on the qualifiers being listed in a certain order.
5) HPO ID
This field is for the HPO identifier for the term attributed to the DB_Object_ID.This field is mandatory, cardinality 1.
6) DB:Reference
This required field indicates the source of the information used for the annotation. This may be the clinical experience of the annotator or may be taken from an article as indicated by a pubmed id. Each collaborating center of the Human Phenotype Ontology consortium is assigned a HPO:Ref id. In addition, if appropriate, a pubmed id for an article describing the clinical abnormality may be used.7) Evidence code
This required field indicates the level of evidence supporting the annotation. At the kickoff of the HPO, most annotations were extracted by parsing the Clinical Features sections of the omim.txt file. These annotations are assigned the evidence code "IEA". Other codes include "PCS" for published clinical study. This should be used for information extracted from articles in the medical literature. Generally, annotations of this type will include the pubmed id of the published study in the DB:Reference field. Finally, "ICE" can be used for annotations based on individual clinical experience. This may be appropriate for disorders with a limited amount of published data. This must be accompanied by an entry in the DB:Reference field denoting the individual or center performing the annotation together with an identifier. For instance, GH:007 might be used to refer to the seventh such annotation made by a specialist from Gotham Hospital. (assuming the prefix GH has been registered with the HPO). We have also included "ITM" to mark annotations retrieved by text-mining (inferred by text-mining).8) Onset modifier
A term-id from the sub-ontology below the term "Age of onset" (HP:0003674).9) Frequency modifier
A percentage value reflecting the frequency with that the particular abnormality occurs in patients having the syndrome. Another possibility is to specifiy the number n of patients that have this feature out of the m patients investigated (n of m)If exact data are not available, categories from the following table may also be used for indicating the frequency of a phenotypic feature. As a rough guide, the HPO consortium interprets the following categories as having roughly the following numerical meaning (Table 2).
| Description |
Percent of patients |
| very rare |
1 % |
| rare | 5 % |
| occasional |
7.5 % |
| frequent |
33 % |
| typical |
50 % |
| common |
75 % |
| hallmark |
90 % |
| obligate |
100 % |
10) With
This field is not currently used. However, it can be used to enter information about characteristics that go with the remaining annotation. For instance, in order to annotate a co-occurence of two features in some disease, one could add the qualified COOCCURS(17/18) and the identifier of some other characteristic in the WITH field to denote that the HPO term listed in the HPO ID field occurred in 17 of 18 cases of the disease listed in the DB_Object_ID field. In the future, the meaning of this field may be extended to include other information such as (for instance) repeat length in order to correlate, say, average age of onset of symptoms in Huntington disease with the number of CAG repeats in the huntingtin gene.11) Aspect
one of O (organ abnormality), I (inheritance) or C (onset and clinical course)this field is mandatory; cardinality 1