|
@@ -1,8 +1,8 @@
|
|
|
Selecting Vocabulary Terms
|
|
|
-============================
|
|
|
+==========================
|
|
|
|
|
|
Ontologies: what and why?
|
|
|
---------------------------
|
|
|
+-------------------------
|
|
|
|
|
|
Tripal 3 requires all bundles and fields to be associated with a Controlled Vocabulary (CV). CVs are dictionaries of defined terms (CV terms) that make data machine-accessible, ensuring uniform terms are used across experiments, organisms and websites. Without CVterms, our scientific knowledge might be split by "dialects". Plant biologists might study temperature stress, while animal biologists study heat shock. Each group might benefit from the knowledge of the other, but they use a different vocabulary to describe the same thing, creating challenges for data discovery and exchange. CV terms make this easier not just for people, but especially for machines. Ontologies take this a step further. Where CVs are controlled lists of CVterms, ontologies are a controlled language, that include heirarchical relationships of terms.
|
|
|
|
|
@@ -11,17 +11,17 @@ Tripal leverages vocabularies to make use of the `Semantic Web <https://en.wikip
|
|
|
Before you can create a new Bundle or Field the vocabulary term must be present in your local Tripal site. You can check if a term exists by using the Tripal lookup service on your local site using the URL path cv/lookup (e.g. http://your-site/cv/lookup). If the term is not present then you'll need to add it. You can do so manually by using Tripal's controlled vocabulary admin pages. For creating new bundles this is all you need to do. However, when creating Fields you will want to programmatically add the term. This is important because Fields are meant to be shared. If you create an awesome field that you want to share with others then you need to make sure the terms get added programmatically. The following sections describe how terms are stored in Chado and how you can add them using Tripal API calls.
|
|
|
|
|
|
Storage of Terms in Chado
|
|
|
---------------------------
|
|
|
+-------------------------
|
|
|
|
|
|
In Chado, CVs are stored by two tables: the **db** and **cv** tables. Chado was designed to store a record for the online database that a vocabulary lives at in the **db** table, and the namespace of a vocabulary in the **cv** table. For example, the sequence ontology uses the namespace, sequence, which is stored in the **cv** table but uses the short name of SO which is stored in the **db** table. As we'll see later, sometimes the distinction between what gets stored in the **cv** vs the **db** tables can get a bit fuzzy with some vocabularies. The terms themselves are stored in the cvterm table. The cvterm table has a foreign key to the **cv** table via the **cv_id** field. Every controlled vocabulary term has an accession. For example the term gene in the Sequence Ontology has an accession number of SO:0000704. Accession numbers consist of two parts: a vocabulary "short name", followed by a unique identifier separated by a colon. Within Chado, the accession for any term is stored in the dbxref table. This table has a foreign key to the **db** table via the **db_id** as well as a foreign key to the cvterm table via the **dbxref_id** field.
|
|
|
|
|
|
Vocabulary Short Names and Namespaces
|
|
|
---------------------------------------
|
|
|
+-------------------------------------
|
|
|
|
|
|
How can you tell what the **short name** and **namespace** values will be for a vocabulary term that you want to insert into Chado for your custom Bundle or Field? Hint: use the information is in the EMBL-EBI `Ontology Lookup Service <http://www.ebi.ac.uk/ols/index>`_ (OLS). The following sections provide three examples for different cases.
|
|
|
|
|
|
Case 1: Ontologies without a defined namespace
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
Consider the term for `organism <http://www.ebi.ac.uk/ols/ontologies/obi/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBI_0100026>`_.
|
|
|
|
|
@@ -74,7 +74,7 @@ Second, we insert the record for the controlled vocabulary.
|
|
|
|
|
|
|
|
|
Case 2: Ontologies with a defined namespace
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
Consider the entry for `CDS <https://www.ebi.ac.uk/ols/ontologies/so/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSO_0000316>`_.
|
|
|
|
|
@@ -97,9 +97,7 @@ Notice that in the Term Info box on the right there is the term **has_obo_namesp
|
|
|
Notice in the code above we can properly set the cv_name to sequence.
|
|
|
|
|
|
Case 3: Ontologies with multiple namespaces
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
-
|
|
|
-
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
Some ontologies are b into sub-ontologies. This includes the Gene Ontology (GO). Let's consider the example GO term `cell aggregation <http://www.ebi.ac.uk/ols/ontologies/go/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGO_0098743>`_. Looking at the EBI entry, the teal box is GO, the orange box is GO:0098743, and the has_obo_namespace is biological_process. However, the GO provides two other namespaces: cellular_component and molecular_function. Be sure to pay attention to these different namespaces if you ever need to manually insert a term.
|
|
|
|
|
@@ -107,7 +105,7 @@ Some ontologies are b into sub-ontologies. This includes the Gene Ontology (GO).
|
|
|
|
|
|
|
|
|
Case 4: Ontologies with muliptle short names
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
The EDAM ontology builds its term accessions using different short names instead of the ontology. Consider the EDAM term for `Sequence <http://www.ebi.ac.uk/ols/ontologies/edam/terms?iri=http%3A%2F%2Fedamontology.org%2Fdata_2044>`_. The teal box is EDAM, the orange box is data:2044, and there is no **namespace**.
|
|
|
|
|
@@ -141,8 +139,8 @@ For this case, the **namespace** is EDAM, the short name is **data**, and the ac
|
|
|
|
|
|
|
|
|
|
|
|
-Case 5: You really cant a term!
|
|
|
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
+Case 5: You really cant find a term!
|
|
|
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
Sometimes a good CVterm just doesn't exist for what you want to describe. If you can't find a CV term, you can insert a term into the "local" CV. This is meant to be used as a last resort. In these cases, before you use a local term, consider contributing the term to an existing CV or ontology. Any terms that are invented for a local site may mean that the data exposed by your site cannot be discovered by other sites or tools. In this case, the accession will not be numeric, but is the same as the term name.
|
|
|
|