targetdbs.rst 5.3 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
  1. Blast Target Databases
  2. =======================
  3. "Target Database" is the BLAST terminology for a database you want your users to be able to BLAST against. For example, on the NCBI Blast website they have a nucleotide and protein target database.
  4. Creating Blast Indicies
  5. ------------------------
  6. This section provides instructions for how to prepare a FASTA file for use with BLAST. We use the MCBI+ Blast command `formatdb` which should have been installed along-side the other blast command-line tools. The following command can be used to create a nucleotide database from the fasta file ``my_nucleotide.fasta`` where resulting files have the name ``Genus_species_version_genome``.
  7. .. code:: bash
  8. formatdb -p F -o T -i my_nucleotide.fasta -t Genus_species_version_genome -n Genus_species_version_genome
  9. .. note::
  10. The following indicates what each paramter does:
  11. .. code::
  12. formatdb --help
  13. formatdb 2.2.26 arguments:
  14. -t Title for database file [String] Optional
  15. -i Input file(s) for formatting [File In] Optional
  16. -n Base name for BLAST files [String] Optional
  17. -p Type of file [T/F] Optional
  18. T - protein
  19. F - nucleotide
  20. -o Parse options
  21. T - True: Parse SeqId and create indexes.
  22. F - False: Do not parse SeqId. Do not create indexes.
  23. Add Blast Database
  24. -------------------
  25. To add one to the "BLAST Databases" drop-down on the Blast program forms, in the "Navigation" menu go to "Add Content" > "Blast Database". Then fill out the form with the human readable name of your blast database (shown to the user in the drop-down) and the path to the blast database (passed to NCBI Blast).
  26. .. image:: targetdbs.1.nodeform.png
  27. For example, the above form will add "Tripalus Databasica Genome v1.0" to the "BLAST Databases" drop-down on the Nucleotide BLAST (blastn) form.
  28. Linkouts
  29. --------
  30. These settings will be used to transform the hit name into a link to additional information.
  31. .. image:: targetdbs.2.linkouts.png
  32. Linkout Type
  33. ^^^^^^^^^^^^^
  34. The linkout type determines how the URL will be formed. When configuring the linkouts for a given blast database, you first choose the type (i.e. Generic, GBrowse, JBrowse) based on the descriptions above. This is very dependent upon the FASTA headers used to create the BLAST database.
  35. - **Generic Link**: Creates a generic link using a Tripal External Database and the backbone names from the blast database.
  36. - **GBrowse Link**: Creates a link to highlight blast results on an existing GBrowse. This requires the blast database consist of backbone sequences of the same name and version as the GBrowse instance.
  37. - **JBrowse Link**: Creates a link to highlight blast results on an existing JBrowse. This requires the blast database consist of backbone sequences of the same name and version as the JBrowse instance.
  38. .. warning::
  39. You cannot use the **GBrowse and JBrowse linkout types** unless your target BLAST database consists of the same records with the same names as the backbone of your GBrowse/JBrowse instance. For example, if your JBrowse instance consists of Lens culinaris genome v1.0 with LcChr1, LcChr2, etc. then your BLAST database must consist of the exact same genome version with the original FASTA record containing `>LcChr1`.
  40. .. note ::
  41. **Generic linkouts** are great for linking BLAST results to either your own Tripal pages or external pages such as NCBI Genbank.
  42. FASTA Header Format
  43. ^^^^^^^^^^^^^^^^^^^^
  44. This section is for indicating the format of the original FASTA record used to create the blast database. For example, if you downloaded a FASTA file from NCBI Genbank and then used `formatdb` to make it your target BLAST database, then you want to choose "NCBI Genbank" as the **FASTA Header Format**.
  45. If you have a FASTA header that doesn't match any of those below, then you can choose **Custom Format** and enter your own `PHP-compliant regular expression <http://php.net/manual/en/reference.pcre.pattern.syntax.php>`_`. The regular expression should include the opening and closing forward slashes (i.e. ``/``) and curved brackets around the section you would like to be used for the linkout (e.g. ``/^>.*(LcChr\d+).*$/``) if you would like to capture LcChr1, LcChr2, etc. It is always a good idea to test your regular expression using `online tools <https://www.phpliveregex.com/>`_.
  46. .. image:: targetdbs.3.regextest.png
  47. External Database
  48. ^^^^^^^^^^^^^^^^^^
  49. This section uses the Tripal API, (i.e. Tripal External Databases) to allow you to choose the URL prefix for your linkouts. A Tripal External Database consists of a label, which is shown in the drop-down, and both a URL and URL prefix. The URL prefix will be used with the record name extracted using the FASTA header settings above to create the linkout for your users. If the Tripal External Database already exists on your Tripal site, simply select it from the drop-down.
  50. If it does not already exist then you must first create it by going Administration > Tripal > Data Loaders > Chado Databases > Add Database. The most important elements are the "Database Name", which will appear in the drop-down on the "Blast Database" page once you refresh it and the "URL Prefix" which will be used to create the linkout. For more information on configuring Tripal databases, see the `Tripal User's Guide <https://tripal.readthedocs.io/en/latest/user_guide/example_genomics/cross_refs.html>`_.
  51. .. image:: targetdbs.4.externaldb.png