Browse Source

Finished blast docs

Stephen Ficklin 6 years ago
parent
commit
9711b19098

+ 1 - 1
docs/user_guide.rst

@@ -10,9 +10,9 @@ User's Guide
    user_guide/what_is_tripal
    user_guide/install_tripal
    user_guide/drupal_overview
-   user_guide/example_genomics
    user_guide/learn_chado
    user_guide/content_types
+   user_guide/example_genomics
    user_guide/mviews
    user_guide/searching
    user_guide/job_management

+ 3 - 43
docs/user_guide/example_genomics/func_annots.rst

@@ -1,46 +1,5 @@
-Adding Functional Annotations
-=============================
-.. note::
-
-  Remember you must set the ``$DRUPAL_HOME`` environment variable if you want to cut-and-paste the commands below. See :doc:`../install_tripal/drupal_home`
-  
-  
-For this example we will be load functional data for our gene. To do this we will use the Blast, KEGG, and InterPro extension modules. However, these extension modules are not part of the "core" Tripal package but are available as separate extensions.  Anyone may create extensions for Tripal.  These extensions are useful for genomic data and therefore are included in this tutorial. 
-
-To download these modules:
-
-  ::
-  
-    cd $DRUPAL_HOME    
-    drush pm-download tripal_analysis_blast
-    drush pm-download tripal_analysis_kegg
-    drush pm-download tripal_analysis_interpro
-
-Now, enable these extension modules:
-
-  ::
-  
-    drush pm-enable tripal_analysis_blast
-    drush pm-enable tripal_analysis_interpro
-    drush pm-enable tripal_analysis_kegg
-
-For this example, we will use the following files which are available for downloading:
-
-- `Citrus sinensis-orange1.1g015632m.g.iprscan.xml <http://www.gmod.org/mediawiki/images/0/0c/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml>`_
-- `Citrus sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz <http://www.gmod.org/mediawiki/images/1/13/Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz>`_
-- `Blastx citrus sinensis-orange1.1g015632m.g.fasta.0 vs uniprot sprot.fasta.out <http://www.gmod.org/mediawiki/images/e/e8/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out>`_
-- `Blastx citrus sinensis-orange1.1g015632m.g.fasta.0 vs nr.out <http://www.gmod.org/mediawiki/images/2/24/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out>`_
-
-Download these files to the /var/www/html/sites/default/files directory. To do so quickly run these commands:
-
-  ::
-  
-    cd $DRUPAL_HOME 
-
-    wget http://www.gmod.org/mediawiki/images/0/0c/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml
-    wget http://www.gmod.org/mediawiki/images/1/13/Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz
-    wget http://www.gmod.org/mediawiki/images/e/e8/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out
-    wget http://www.gmod.org/mediawiki/images/2/24/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out
+Functional Annotations
+======================
 
 
 .. toctree::
@@ -48,6 +7,7 @@ Download these files to the /var/www/html/sites/default/files directory. To do s
    :caption: Table of Contents
    :glob:
 
+   func_annots/setup
    func_annots/blast
    func_annots/interpro
    func_annots/kegg

+ 93 - 57
docs/user_guide/example_genomics/func_annots/blast.rst

@@ -8,7 +8,7 @@ Adding BLAST Results
 Adding BLAST Databases
 ----------------------
 
-Before we load our BLAST results we want to add some external databases.  For this tutorial we have protein BLAST results against NCBI nr and ExPASy SwissProt.  We would like the BLAST hits to be clickable such that they link back to their respective databases. To do this, we must add some additional databases.  Navigate to Tripal → Chado Modules → Databases and click the link titled Add a Database. The resulting page provides fields for adding a new database.  Add two new datbases, one for NCBI nr and the other for ExPASy SwissProt.
+Before we load our BLAST results we want to add some external databases.  For this tutorial we have protein BLAST results against NCBI nr and ExPASy SwissProt.  We would like the BLAST hits to be clickable such that they link back to their respective databases. To do this, we must add some additional databases.  Navigate to **Tripal → Data Loaders → Chado Databases** and click the link titled **Add a Database**. The resulting page provides fields for adding a new database.  Add two new databases, one for NCBI nr and the other for ExPASy SwissProt.
 
 Use these values for adding the NCBI nr database:
 
@@ -33,7 +33,7 @@ Use these values for adding the SwssProt database:
 
 Configure Parsing of BLAST Results
 ----------------------------------
-First, we need to ensure that the BLAST module can properly parse the BLAST hits. To do this, navigate to Tripal → Extension Modules → Tripal Blast Analyses. On this page are configuration settings for the Tripal BLAST Analysis extension module.
+First, we need to ensure that the BLAST module can properly parse the BLAST hits. To do this, navigate to **Tripal → Extensions → Tripal Blast Analyses**. On this page are configuration settings for the Tripal BLAST Analysis extension module.
 
 Within the section titled BLAST Parsing, you can specify a different, more meaningful name for the sequence library file (a.k.a. database) used for BLASTing. This name will be displayed with BLAST results. You can also provide regular expressions for parsing BLAST hits. For example, the following is a line for a match from SwissProt:
 
@@ -42,19 +42,19 @@ Within the section titled BLAST Parsing, you can specify a different, more meani
     sp|P43288|KSG1_ARATH Shaggy-related protein kinase alpha OS=Arabidopsis thaliana GN=ASK1 PE=2 SV=3
 
 
-Here the hit name is `KSG1_ARATH`, the accession is `P43288`, the hit description is `Shaggy-related protein kinase alpha OS=Arabidopsis thaliana` and the organism is `Arabidopsis thaliana`. We need regular expressions to tell Tripal how to extract these unique parts from the match text. Because Tripal is a PHP application, the syntax for regular expressions follows the PHP method. Documentation for regular expressions used in PHP can be found here. The following regular expressions can be used to extract the hit name, the accession, hit description and organism for the example SwissProt line above:
+Here the hit name is ```KSG1_ARATH```, the accession is ```P43288```, the hit description is ```Shaggy-related protein kinase alpha OS=Arabidopsis thaliana``` and the organism is ```Arabidopsis thaliana```. We need regular expressions to tell Tripal how to extract these unique parts from the match text. Because Tripal is a PHP application, the syntax for regular expressions follows the PHP method. Documentation for regular expressions used in PHP can be found here. The following regular expressions can be used to extract the hit name, the accession, hit description and organism for the example SwissProt line above:
 
 .. csv-table::
   :header: "Element", "Regular Expression"
   
-  "Hit Name", `^sp\|.*?\|(.*?)\s.*?$`
-  "Hit Description", `^sp\|.*?\|.*?\s(.*)$`
-  "Hit Accession", `^sp\|(.*?)\|.*?\s.*?$`
-  "Hit Organism", `^.*?OS=(.*?)\s\w\w=.*$`
+  "Hit Name", ``^sp\|.*?\|(.*?)\s.*?$``
+  "Hit Description", ``^sp\|.*?\|.*?\s(.*)$``
+  "Hit Accession", ``^sp\|(.*?)\|.*?\s.*?$``
+  "Hit Organism", ``^.*?OS=(.*?)\s\w\w=.*$``
 
 In this tutorial, we will be adding BLAST results for the two databases we just created: ExPASy SwissProt and NCBI nr. First, select ExPASy SwissProt from the drop-down menu. A form will appear:
 
-Tripal2.0 Blast settings.png
+.. image:: blast1.png
 
 In the form fields, add the following values:
 
@@ -62,73 +62,99 @@ In the form fields, add the following values:
   :header: "Field", "Value"
   
   "Title for the BLAST analysis", "(leave blank)"
-  "Regular expression for Hit Name", "`^sp\|.*?\|(.*?)\s.*?$`"
-  "Regular expression for Hit Description", "`^sp\|.*?\|.*?\s(.*)$`"
-  "Regular expression for Hit Accession:", "`^sp\|(.*?)\|.*?\s.*?$`"
-  "Regular expression for Organism", "`^.*?OS=(.*?)\s\w\w=.*$`"
+  "Regular expression for Hit Name", ``^sp\|.*?\|(.*?)\s.*?$``
+  "Regular expression for Hit Description", ``^sp\|.*?\|.*?\s(.*)$``
+  "Regular expression for Hit Accession:", ``^sp\|(.*?)\|.*?\s.*?$``
+  "Regular expression for Organism", ``^.*?OS=(.*?)\s\w\w=.*$``
   "Organism Name", "(leave blank)"
 
-Click Save Settings.
+Click **Save Settings**.
 
 The match accession will be used for building web links to the external database. The accession will be appended to the URL Prefix set earlier when the database record was first created.
 
-Now select the NCBI nr database from the drop-down. NCBI databases use a format that is compatible with BLAST. Therefore, the hit name, accession and description are handled differently in the BLAST XML results. To correctly parse results from an NCBI database click the Use Genbank style parser checkbox. This should disable all other fields and is all we need for this database.  Clikc the Save Settings button.
+Now select the **NCBI nr** database from the drop-down. NCBI databases use a format that is compatible with BLAST. Therefore, the hit name, accession and description are handled differently in the BLAST XML results. To correctly parse results from an NCBI database click the **Use Genbank style parser** checkbox. This should disable all other fields and is all we need for this database.  Clikc the Save Settings button.
 
-Load the BLAST Results
-----------------------
-Now we can create out analysis page. Navigate to Create Content page and select the Analysis: BLAST content type. Add the following values for this analysis. In the fields set the following values:
+Create the Analysis Page
+------------------------
+
+.. note::
+
+  It is always recommended to create an analysis page anytime you import data. The purpose of the analysis page is to describe how the data being added was derived or collected.
+
+Now we can create our analysis page. Navigate to **Content → Tripal Content** and click the **Add Tripal Content** link. This page contains a variety of content types that the site supports.  Scroll to the **Other** section and find the content type named **Blast Results**:
+
+.. image:: blast2.png
+
+Here we can save details about the analysis used to create the BLAST results.  Enter the following in the fields that appear on the page:
 
 .. csv-table::
   :header: "Field", "Value"
   
-    "Analysis Name", "blastx Citrus sinensis v1.0 genes vs ExPASy SwissProt"
-    "Program", "blastall"
-    "Program Version", "2.2.25"
-    "Algorithm", "blastx"
-    "Source name", "C. sinensis mRNA vs ExPASy SwissProt"
-    "Time Executed", "(today's date)"
-    "Materials & Methods", "C. sinensis mRNA sequences were BLAST'ed against the ExPASy SwissProt protein database using a local installation of BLAST on in-house linux server. Expectation value was set at 1e-6"
-    "BLAST Settings", "
-       - Database: ExPASy SwissProt
-       - BLAST XML File/Directory: /var/www/html/sites/default/files/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out
-       - Query Type: mRNA
-       - Parameters: -p blastx -e 1e-6 -m 7
-       - Submit a job to parse the XML output: checked
-       - Keywords for custom search: checked"
-
-Click the Save button. You can now see our new Analysis.
-
-Tripal2.0 Blast analysis.png
-
-Now we need to manually run the job to parse the BLAST results:
+    "Name", "blastx Citrus sinensis v1.0 genes vs ExPASy SwissProt"
+    "Description", "Citrus sinensis mRNA sequences were BLAST'ed against the ExPASy SwissProt protein database using a local installation of BLAST on in-house linux server. Expectation value was set at 1e-6"
+    "BLAST Program", "blastx"
+    "BLAST Version", "2.2.25"
+    "Data Source Name ", "Citrus sinensis mRNA vs ExPASy SwissProt"
+    "Date Performed", "(today's date)"
 
-::
+Click the **Save** button. You can now see our new BLAST analysis page.
 
-  drush trp-run-jobs --username=administrator --root=$DRUPAL_HOME
+.. image:: blast3.png
 
-The results should now be loaded. if we visit our feature page, for feature `orange1.1g015615m` (http://localhost/feature/citrus/sinensis/mRNA/PAC%3A18136219) we should now see BLAST results by clicking the 'Homology' link in the left table of contents.
+Create a second Analysis page for the results of the NCBI nr BLAST analysis. Use the following values:
 
-Tripal2.0 feature homology.png
+.. csv-table::
+  :header: "Field", "Value"
+  
+    "Name", "blastx Citrus sinensis v1.0 genes vs NCBI nr"
+    "Description", "Citrus sinensis mRNA sequences were BLAST'ed against the NCBI non-redundant protein database using a local installation of BLAST on in-house linux server. Expectation value was set at 1e-6"
+    "BLAST Program", "blastx"
+    "BLAST Version", "2.2.25"
+    "Data Source Name ", "Citrus sinensis mRNA vs NCBI nr"
+    "Date Performed", "(today's date)"
+
+
+Import the BLAST XML results
+----------------------------
+First, we will load BLAST results for our citrus gene vs ExPASy SwissProt.  Now that we have our database records setup and configured and we have our analysis record created, we are ready to import the blast results.  To do this, navigate to **Tripal > Data Loaders > Chado BLAST XML results loader**.  The following page will be presented:
+
+.. image:: blast4.png
 
-Now we want to add the results for NCBI nr. Repeat the steps above to add a new analysis with the following details:
+The top section of this page provides multiple methods for providing results file: via an upload interface, specifing a remote URL or a file path that is local to the server.  Most likely, you will always upload or provide a remote URL.  However, we download the file earlier, and stored them here: ```$DRUPAL_HOME/sites/default/files```.  So, in this case we can use the path on the local server.  Provide the following value for this form:
 
 .. csv-table::
   :header: "Field", "Value"
   
-  "Analysis Name", "blastx Citrus sinensis v1.0 genes vs NCBI nr"
-  "Program", "blastall"
-  "Program Version", "2.2.25"
-  "Algorithm", "blastx"
-  "Source name", "C. sinensis mRNA vs NCBI nr"
-  "Time Executed", "(today's date)"
-  "Materials & Methods: C. sinensis mRNA sequences were BLAST'ed against the NCBI nr protein database using a local installation of BLAST on in-house linux server. Expectation value was set at 1e-6"
-  "Blast Settings", "
-        - Database: NCBI nr
-        - Blast XML File/Directory: /var/www/html/sites/default/files/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out
-        - Query Type: mRNA
-        - Parameters: -p blastx -e 1e-6 -m 7
-        - Submit a job to parse the XML output: checked
-        - Keywords for custom search: checked"
+  "Server path", "sites/default/files/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out"
+  "Analysis", "blastx Citrus sinensis v1.0 genes vs ExPASy SwissProt (blastall 2.2.25, Citrus sinensis mRNA vs ExPASy SwissProt)"
+  "Database", "ExPASy SwissProt"
+  "BLAST XML File Extension", "out"
+  "Query Type", "mRNA"
+
+.. note::
+
+  For the **Server path** we need not give the full path.  Because we downloaded the files into the Drupal directory we can leave off any preceeding path and Tripal will resolve the path.  Otherwise we could provide the full path.
+  
+.. note::
+
+  Specifying **ExPASy SwissProt** as the database will allow the importer to use the database configuration settings we entered earlier.
+  
+Clicking the **Import BLAST file** will add a job which we can manually execute with the following command:
+
+::
+
+  drush trp-run-jobs --username=administrator --root=$DRUPAL_HOME
+
+The results should now be loaded. Now we want to add the results for NCBI nr. Repeat the steps above to add a new analysis with the following details:
+
+.. csv-table::
+  :header: "Field", "Value"
+  
+  "Server path", "sites/default/files/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out"
+  "Analysis", "blastx Citrus sinensis v1.0 genes vs ExPASy SwissProt (blastall 2.2.25, Citrus sinensis mRNA vs NCBI nr)"
+  "Database", "ExPASy SwissProt"
+  "BLAST XML File Extension", "out"
+  "Query Type", "mRNA"
 
 Click the Save button and manually run the job:
 
@@ -136,4 +162,14 @@ Click the Save button and manually run the job:
 
   drush trp-run-jobs --username=administrator --root=$DRUPAL_HOME
 
-Return to the example feature page to view the newly added results: http://localhost/feature/citrus/sinensis/mRNA/PAC%3A18136219
+To view results we must find the mRNA that has BLAST hits.  For this example, click on the **mRNA Search** link in the **Data Search** block.  Search for the mRNA named `orange1.1g015615m`.  Viewing the page, we should now see BLAST results by clicking the 'BLAST results' link in the left table of contents.
+
+.. image:: blast5.png
+
+Notice, that when viewing the results, the SwissProt maches are links.  When clicked they redirect the user to the SwissProt website where users can find more information about that protein.
+
+.. image:: blast6.png
+
+.. note::
+
+  The match links are able to link out to SwissProt and NCBI because of the initial setup where we added the database settings and we set regular expressions for parsing the match accessions.

BIN
docs/user_guide/example_genomics/func_annots/blast1.png


BIN
docs/user_guide/example_genomics/func_annots/blast2.png


BIN
docs/user_guide/example_genomics/func_annots/blast3.png


BIN
docs/user_guide/example_genomics/func_annots/blast4.png


BIN
docs/user_guide/example_genomics/func_annots/blast5.png


BIN
docs/user_guide/example_genomics/func_annots/blast6.png


+ 73 - 0
docs/user_guide/example_genomics/func_annots/setup.rst

@@ -0,0 +1,73 @@
+Module Setup
+============
+.. note::
+
+  Remember you must set the ``$DRUPAL_HOME`` environment variable if you want to cut-and-paste the commands below. See :doc:`../../install_tripal/drupal_home`
+  
+  
+For this example we will be load functional data for our gene. To do this we will use the Blast, KEGG, and InterPro extension modules. However, these extension modules are not part of the "core" Tripal package but are available as separate extensions.  Anyone may create extensions for Tripal.  These extensions are useful for genomic data and therefore are included in this tutorial. 
+
+To download these modules:
+
+  ::
+  
+    cd $DRUPAL_HOME    
+    drush pm-download tripal_analysis_blast
+    drush pm-download tripal_analysis_kegg
+    drush pm-download tripal_analysis_interpro
+
+Now, enable these extension modules:
+
+  ::
+  
+    drush pm-enable tripal_analysis_blast
+    drush pm-enable tripal_analysis_interpro
+    drush pm-enable tripal_analysis_kegg
+
+For this example, we will use the following files which are available for downloading:
+
+- `Citrus_sinensis-orange1.1g015632m.g.iprscan.xml <http://www.gmod.org/mediawiki/images/0/0c/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml>`_
+- `Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz <http://www.gmod.org/mediawiki/images/1/13/Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz>`_
+- `Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out <http://www.gmod.org/mediawiki/images/e/e8/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out>`_
+- `Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out <http://www.gmod.org/mediawiki/images/2/24/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out>`_
+
+Download these files to the ```$DRUPAL_HOME/sites/default/files``` directory. To do so quickly run these commands:
+
+  ::
+  
+    cd $DRUPAL_HOME/sites/default/files
+    wget http://www.gmod.org/mediawiki/images/0/0c/Citrus_sinensis-orange1.1g015632m.g.iprscan.xml
+    wget http://www.gmod.org/mediawiki/images/1/13/Citrus_sinensis-orange1.1g015632m.g.KEGG.heir.tar.gz
+    wget http://www.gmod.org/mediawiki/images/e/e8/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_uniprot_sprot.fasta.out
+    wget http://www.gmod.org/mediawiki/images/2/24/Blastx_citrus_sinensis-orange1.1g015632m.g.fasta.0_vs_nr.out
+
+Each of these modules provides new fields for both the **gene** and **mRNA** content types.  To add these new field to those content types, navigate to **Structure > Tripal Content Types** and click the **manage fields** link in the row for the **mRNA** content type.  Click the link titled **Check for new fields**.  After a few moments the page will refresh and you will be notified that new fields have been added.
+
+.. image:: setup1.png
+
+Next, we need to position the new field. Using the skills you learned in the :doc:`../../content_types/configuring_page_display` Create three new **Tripal Panes** named:
+
+- Blast Results
+- Protein Domains
+- KEGG Pathways
+
+Be sure to:
+
+- Place the three new fields into each pane respectively 
+- Move the Panes out of the **disabled** section. 
+- Set the label for each field to **Hidden**.
+
+The following shows an example of this layout:
+
+.. image:: setup2.png
+
+The fields are now ready for display once data is added!  
+
+.. note::
+
+   If you want both the **Gene** and **mRNA** content type to have BLAST, InterPro and KEGG result fields you must repeat the steps above for both.
+
+.. note::
+
+  Anytime you install a Tripal v3 extension module you should check for new fields, and then place those fields in the layout.  Extension modules often will not do this for you because they do not assume you want these new fields.
+  

BIN
docs/user_guide/example_genomics/func_annots/setup1.png


BIN
docs/user_guide/example_genomics/func_annots/setup2.png