123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157 |
- Selecting Vocabulary Terms
- ==========================
- Ontologies: what and why?
- -------------------------
- Tripal 3 requires all bundles and fields to be associated with a Controlled Vocabulary (CV). CVs are dictionaries of defined terms (CV terms) that make data machine-accessible, ensuring uniform terms are used across experiments, organisms and websites. Without CVterms, our scientific knowledge might be split by "dialects". Plant biologists might study temperature stress, while animal biologists study heat shock. Each group might benefit from the knowledge of the other, but they use a different vocabulary to describe the same thing, creating challenges for data discovery and exchange. CV terms make this easier not just for people, but especially for machines. Ontologies take this a step further. Where CVs are controlled lists of CVterms, ontologies are a controlled language, that include heirarchical relationships of terms.
- Tripal leverages vocabularies to make use of the `Semantic Web <https:
- term= tripal_insert_cvterm([
- 'id' => 'OBI:0100026',
- 'name' => 'organism',
- 'cv_name' => 'OBI',
- 'definition' => 'A material entity that is an individual living system, such as animal,
- plant, bacteria or virus, that is capable of replicating or reproducing, growth and maintenance
- in the right environment. An organism may be unicellular or made up, like humans, of many
- billions of cells divided into specialized tissues and organs.',
- ]);
- Note that in the code above the namespace is provided as the **cv_name** element and the full accessions (including the short name) is provided as the **id** element. In this case the OBI CV already exists by default in the Tripal database, so we did not need to add the vocabulary record. If the OBI did not exist we could have added it using the following API calls. First we insert the "database" record for the ontology.
- .. code-block:: php
- <?php
- tripal_insert_db(array(
- 'name' => 'obi',
- 'description' => 'The Ontology for Biomedical Investigation.',
- 'url' => 'http://obi-ontology.org/page/Main_Page',
- 'urlprefix' => 'http://purl.obolibrary.org/obo/{db}_{accession}',
- ));
- Notice here that the **name** element is the **namespace** (short name converted to lower case) for the vocabulary. The url is the web address for the ontology online. The urlprefix is a URL that can be used to construct a link that when clicked will take the user to any term in the vocabulary. Almost all vocabularies will have a common URL for all terms. Tripal will automatically substitute the short name into the **{db}** token and the term **accession** in to the **{accession}** token to generate the URL.
- Second, we insert the record for the controlled vocabulary.
-
- .. code-block:: php
- <?php
- tripal_insert_cv(
- 'OBI',
- 'Ontology for Biomedical Investigation. The Ontology for Biomedical Investigations (OBI) is build in a collaborative, international effort and will serve as a resource for annotating biomedical investigations, including the study design, protocols and instrumentation used, the data generated and the types of analysis performed on the data. This ontology arose from the Functional Genomics Investigation Ontology (FuGO) and will contain both terms that are common to all biomedical investigations, including functional genomics investigations and those that are more domain specific.'
- );
- Case 2: Ontologies with a defined namespace
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Consider the entry for `CDS <https:
- term= tripal_insert_cvterm([
- 'id' => 'SO:0000316',
- 'name' => 'CDS',
- 'cv_name' => 'sequence',
- 'definition' => 'A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. [ http://www.sequenceontology.org/browser/current_svn/term/SO:ma ].',
- ]);
- Notice in the code above we can properly set the cv_name to sequence.
- Case 3: Ontologies with multiple namespaces
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Some ontologies are b into sub-ontologies. This includes the Gene Ontology (GO). Let's consider the example GO term `cell aggregation <http://www.ebi.ac.uk/ols/ontologies/go/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGO_0098743>`_. Looking at the EBI entry, the teal box is GO, the orange box is GO:0098743, and the has_obo_namespace is biological_process. However, the GO provides two other namespaces: cellular_component and molecular_function. Be sure to pay attention to these different namespaces if you ever need to manually insert a term.
- .. figure:: select_vocab_terms.3.go.png
- Case 4: Ontologies with muliptle short names
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- The EDAM ontology builds its term accessions using different short names instead of the ontology. Consider the EDAM term for `Sequence <http://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_2044>`_. The teal box is EDAM, the orange box is data:2044, and there is no **namespace**.
- .. figure:: select_vocab_terms.4.edam.png
- For this case, the **namespace** is EDAM, the short name is **data**, and the accession is 2044. Unfortunately, this breaks the paradigm that Chado expects. Typically the **short name** is the teal box (EDAM). In order to force Chado to properly handle ontologies like this we are forced to reverse the short name and **namespace** values when creating our record:
- .. code-block:: php
- <?php
- $term= tripal_insert_cvterm([
- 'id' => 'data:2044',
- 'name' => 'sequence',
- 'cv_name' => 'EDAM',
- 'definition' => 'One or more molecular sequences, possibly with associated annotation.',
- ]);
- tripal_insert_db(array(
- 'name' => 'data',
- 'description' => 'Bioinformatics operations, data types, formats, identifiers and topics.',
- 'url' => 'http:
- tripal_insert_cv(
- 'EDAM',
- 'EDAM is an ontology of well established, familiar concepts that are prevalent within bioinformatics, including types of data and data identifiers, data formats, operations and topics. EDAM is a simple ontology - essentially a set of terms with synonyms and definitions - organised into an intuitive hierarchy for convenient use by curators, software developers and end-users. EDAM is suitable for large-scale semantic annotations and categorization of diverse bioinformatics resources. EDAM is also suitable for diverse application including for example within workbenches and workflow-management systems, software distributions, and resource registries.'
- );
- Case 5: You really cant find a term
- term= tripal_insert_cvterm([
- 'id' => 'local:shame_on_you',
- 'name' => 'shame_on_you',
- 'cv_name' => 'local',
- 'definition' => 'You should really find a good CVterm.',
- ]);
- Notice in the above code the **short name** and **namespace** are both "local" as this is a local term on the site.
|