Plant Biology 2014 Special Minisymposium: Bioinformatic Resources for Plant Biology Research

I was eagerly waiting to write a post on this topic. This workshop attracted a lot of conference attendees, since current day plant biology research cannot move forward without these resources. Unfortunately it was kind of an effort to pack too many eggs in one basket within a limited time which could not meet expectations of many attendees. However a short note on the tools and resources described might be helpful for our readers.

Tanya Beraradini introduced the audience with the past and future of The Arabidopsis Information Resource (TAIR) which collects information and maintains the genetics and molecular biology data for the model plant Arabidopsis thaliana. It is managed by Phoenix Bioinformatics Corporation and is supported through institutional, lab and personal subscriptions. You can search and view the data using Gbrowse or interactive sequence viewer and they also have some tools like Mapviewer, AraCyc pathway, Patmatch, Motif search etc. for different analysis. The datasets can be downloaded as bulk.

Next, Chris Town from J. Craig Venter Institute (JCVI) talked about the first release of the Arabidopsis Information portal (AIP). This is an NSF and BBSRC-funded collaborative project between the JCVI, the Texas Advanced Computing Center (TACC) at the University of Texas at Austin and the University of Cambridge, with technical assistance from TAIR. The aim was to make a flexible community – extensible portal to enhance the next generation of Arabidopsis research. It has the enhanced search retrieval and display capabilities. It also enable community participation in functional annotation and own a language- and region-specific presentation of scientific content.

Next talk was on the iPlant Collaborative. Jason Williams from CSHL talked about the initiative in a title “Scalable Cyberinfrastructure for life science.” CI or CyberInfrastructure provides technological and sociological solution for the high throughput computational biology which consists of the hardware, software and people. The iPlant Collaborative is a dynamic virtual organization. The project is led by scientists at the University of Arizona (UA), the Texas Advanced Computing Center (TACC), Cold Spring Harbor Laboratory (CSHL) and University of North Carolina at Wilmington (UNCW). It includes a data storage facility, interactive web based analytical platform, cloud infrastructure for storage, computational data analysis. They provide support for scaling computational algorithms to run on large, high speed computers. They also have programs for education and training on how to use the CI for scientist at all levels.

Pankaj Jaiswal from Oregon State University delivered the next talk entitled “Gramene: A resource for comparative plant Genomics.” Gramene mainly deals with the grass genomes. As an information resource Gramene facilitates researchers by providing added value to the existing data of grass genomes. It takes advantage of genomic sequence known in one species to identify and understand the corresponding genes, pathways and phenotypes in the other related species. It contains information on Taxonomy ontology, Gene ontology, plant ontology, trait ontology and environment ontology. The genome module used here is adopted from Ensembl. It has cMAP and BLAST options. Literature supporting all data provided are also organized in the Literature database. It has a completely new look from the earlier version which is user friendly. And informative.

Peifen Zhang, from Carnegie Institution for Science talked about “PMN: metabolic pathway databases of 17 viridiplantae species, an introduction and demo of use cases.” PMN (Plant Metabolic Network) focuses on bringing the biochemical pathway databases focuses on plant metabolism. There is a growing need to place the sequenced and annotated genomes in a biochemical context in order to facilitate discovery of enzymes and engineering of metabolism. PMN, first made public in June 2008, generates an infrastructure for drawing together diverse sources of plant metabolism information.

The next talk was on Medicago truncatula genome resources at JCVI, Chris Town talked about the project which was initiated with a generous grant from Samuel Roberts Noble Foundation to the University of Oklahoma. Beginning in 2003 (and renewed in 2006), the National Science Foundation and the European Union’s Sixth Framework Programme provided funding to complete sequencing of the euchromatic genespace.

Asher Pasha, from University of Toronto, delivered a talk on the title “Data Sets, Web services and Visualization Apps from the Bio-Analytic Resource for use in the Arabidopsis Information Portal and other Cyberinfrastructure Assets.” As part of a Genome Canada grant, the Bio-Analytic Resource for Plant Biology (BAR) is providing seven modules to the Arabidopsis Information Portal: two modules for gene expression (transcript abundance and transcript structure) based on publish microarray and RNA-seq data sets, a module covering 90,000 conserved sequences in the Brassicaceae from Mathieu Blanchette and collaborators via the VEGI Project in a novel sequence conservation viewer (GeneSlider), a module for a database of almost 100k protein-protein interactions plus a viewer app, a module for viewing protein structures for ca. 500 experimentally-determined protein structures and predicted structures covering ~70% of the Arabidopsis proteome, an Expressolog/Synteny module and a zoomable user interface app to navigate between the biological organizations by the Arabidopsis Information Portal (AIP). Progress on these and other potential modules were discussed.

Doreen Ware and Sunita Kumari, from CSHL talked about “The DOE Systems Biology Knowledgebase: An integrated knowledgebase for biofuel research.” In 2011, DOE Office of Biological and Environmental Research (BER) launched KBase (KBase.us), an open-source, open-architecture software and data environment for systems biology research. It provides a computational framework for integrating and analyzing large datasets. High-level data types currently supported by KBase include genomes (of bacteria, archea, and eukaryotes), metagenomes, transcriptomes, proteomes (mapped to genomes), interactomes, phenotypes, 16s amplicons, expression data, enzymes, ontologies, pathway data, protein annotations, protein-protein interactions, regulons, and ribosomes. KBase pulls these data types from existing international repositories. So in future, data submitted to these standard resources wil be automatically integrated into the Kbase system. Because KBase pulls these data types from existing international repositories, future data submitted to these standard resources ultimately will be integrated into the KBase system. Powerful tools in Kbase allow users to analyze an simulate data for generating and testing hypotheses, designing biological functions or proposing new experiments.

Leave a Comment