pub_import.rst 8.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. Importing Publications
  2. ======================
  3. Tripal provides an interface for automatically and manually adding publications.
  4. Manually Adding a Publication
  5. -----------------------------
  6. First, we will manually add a publication. Click the Add Tripal Content link in the administrative menu and then Publication.
  7. .. image:: pub_import.1.png
  8. We will add information about the Tripal publication. Enter the following values:
  9. .. csv-table::
  10. :header: "Field Name", "Value"
  11. "Title", "Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases."
  12. "Series Name", "Database"
  13. "Publication Year", "2013"
  14. "Unique Local Identifier", "Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases."
  15. "Type Journal", "Article"
  16. "Publication Date", "2013 Oct 25"
  17. "Cross Reference", "Database: PMID"
  18. "Accession", "24163125"
  19. "Authors", "Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D"
  20. "Citation", "Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D. Tripal: a construction Toolkit for Online Genome Databases. Database, Oct 25 2013. bat075"
  21. "Abstract", "Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info."
  22. To complete the page click the **Save** button at the bottom
  23. Import of Publications
  24. ----------------------
  25. Tripal supports importing of publications from remote databases such as NCBI PubMed.
  26. Creation of an importer is an administrative function. A publication importer is created by the site administrator and consists of a set of search criteria for finding multiple publications at one time. When the importer is run, it will query the remote database, retrieve the publications that match the criteria and add them to the database. Because we loaded genomic data for Citrus sinensis we will create an importer that will find all publications related to this species.
  27. First, navigate to **Tripal → Data Loaders → Chado Bulk Publication Importers** and click the link New Importer. You will see the following page:
  28. .. image:: pub_import.2.png
  29. Enter the following values in the fields:
  30. .. csv-table::
  31. :header: "Field Name", "Value"
  32. "Remote Database", "PubMed"
  33. "Loader Name", "Pubs for Citrus sinensis"
  34. "Criteria #1", "
  35. - Scope: Abstract/Title
  36. - Search Terms: Citrus sinensis
  37. - is Phrase?: checked"
  38. Now, click the 'Test Importer' button. This will connect to PubMed and search for all publications that match our provided criteria. it may take a few minutes to complete. On the date this portion of the tutorial was written, over 800 publications were found:
  39. .. image:: pub_import.3.png
  40. Now, save this importer. You should see that we have one importer in the list:
  41. .. image:: pub_import.4.png
  42. We can use this importer to load all publications related to Citrus sinensis from PubMed into our database (how to load these will be shown later). However, what if new publications are added? We would like this importer to be run monthly so that we can automatically add new publications as they become available. But we do not need to try to reload these 760 every time the loader runs each month. We will create a new importer that only finds publications within the last 30 days. To do this, click the link New Importer. Now, add the following criteria:
  43. .. csv-table::
  44. :header: "Field Name", "Value"
  45. "Remote Database", "PubMed"
  46. "Loader Name", "Pubs for Citrus sinensis last 30 days"
  47. "Days since record modified", "30"
  48. "Criteria #1", "
  49. - Scope: Abstract/Title
  50. - Search Terms: Citrus sinensis
  51. - is Phrase?: checked"
  52. Now, when we test the importer we find only 1 publications that has been added (created) to PubMed in the last 30 days:
  53. .. image:: pub_import.5.png
  54. Save this importer.
  55. Next, there are two ways to import these publications. The first it to manually import them. There is a Drush command that is used for importing publications. Return to the terminal and run the following command:
  56. ::
  57. cd /var/www/html
  58. drush trp-import-pubs --username=administrator
  59. You should see output to the terminal that begins like this:
  60. ::
  61. NOTE: Loading of publications is performed using a database transaction.
  62. If the load fails or is terminated prematurely then the entire set of
  63. insertions/updates is rolled back and will not be found in the database
  64. Importing: Pubs for Citrus sinensis
  65. The importer will import 100 publications at a time and pause between each set of 100 as it requests more.
  66. Some things to know about the publication importer:
  67. 1. The importer keeps track of publications from the remote database using the publication accession (e.g. PubMed ID).
  68. 2. If a publication with an accession (e.g. PubMed ID) already exists in the local database, the record will be updated.
  69. 3. If a publication in the local database matches by title, journal and year with one that is to be imported, then the record will be updated. Y
  70. 4. Run the newly created Tripal Job to finish:
  71. ::
  72. cd /var/www/html
  73. drush trp-run-jobs --user=administrator
  74. The second way to import publications is to add an entry to the UNIX cron. We did this previously for the Tripal Jobs management system when we first installed Tripal. We will add another entry for importing publications. But first, now that we have imported all of the relevant pubs, we need to return to the importers list at **Tripal → Data Loaders → Chado Publication Importers** and disable the first importer we created. We do not want to run that importer again, as we've already imported all historical publications on record at PubMed. Click the edit button next to the importer named Pubs for Citrus sinensis, click the disable checkbox and then save the template. The template should now be disabled.
  75. Now we have the importer titled **Pubs for Citrus sinensis last 30 days** enabled. This is the importer we want to run on a monthly basis. The cron entry will do this for us. On the terminal open the crontab with the following command:
  76. ::
  77. sudo crontab -e
  78. Now add the following line to the bottom of the crontab:
  79. ::
  80. 30 8 1,15 * * su - www-data -c '/usr/local/drush/drush -r /var/www/html -l http://[site url] trp-import-pubs --report=[your email] > /dev/null'
  81. Where
  82. - [site url] is the full URL of your site
  83. - [your email] is the email address of the user that should receive an email containing a list of publications that were imported. You can separate multiple email addresses with a comma.
  84. The cron entry above will launch the importer at 8:30am on the first and fifteenth days of the month. We will run this importer twice a month in the event it fails to run (e.g. server is down) at least one time during the month.