DOE Genomes
Human Genome Project Information  Genomics:GTL  DOE Microbial Genomics  home
-

Genomes to Life Contractor-Grantee Workshop III
February 6-9, 2005, Washington, D.C.

Genomics:GTL Program Projects

Oak Ridge National Laboratory and Pacific Northwest National Laboratory

12

Center for Molecular and Cellular Systems: Analysis and Visualization of Data from a High-Throughput Protein Complex Identification Pipeline Using Modular and Automated Tools

W. Hayes McDonald1 (mcdonaldwh@ornl.gov), Joshua N. Adkins2, Deanna L. Auberry2, Kenneth J. Auberry2, Gregory B. Hurst1, Vladimir Kery2, Frank W. Larimer1, Manesh B. Shah1, Denise D. Schmoyer1, Eric F. Strittmatter2, and Dave L. Wabb1

1Oak Ridge National Laboratory, Oak Ridge, TN and 2Pacific Northwest National Laboratory, Richland, WA

Global or systems level analysis of biological processes is becoming increasingly common and some of the best examples are emerging out the field of proteomics. The Center for Molecular and Cellular systems is focused on high-throughput isolation and characterization of protein complexes. The core experimental pipeline of this effort uses the parallel and complementary approaches of affinity purification of endogenously expressed tagged proteins (endogenous pulldown) and heterologously expressed tagged proteins which are then used to isolate interacting proteins out of a cell lysate (exogenous pulldown). This integrated pipeline is currently being applied to the study of protein complexes from R. palustris and S. oneidensis. After isolation, constituents of these complexes are identified using high performance liquid chromatography coupled to either tandem mass spectrometry (LC-MS/MS) or high resolution mass spectrometry (LC-MS).

The two affinity isolation and the two mass spectrometry protocols have differing analysis requirements, therefore we have modularized our data analysis and visualization tools. This gives us not only the capabilities to automate and integrate data from these different sources, but also to “plug in” and evaluate new tools readily. Each of the following modules uses one or more software tools to accomplish its task: (a) MS data extraction and preparation - including extraction of data, filtering, and output to necessary format; (b) MS search – MS/MS database searches using SEQUEST or DBDigger or MS searches against an Accurate Mass Tag (AMT) database; (c) Search result filtering and summarization – protein identification and confidence; (d) Experimental filtering – reproducibility and background subtraction built on statistical evaluation and expert analysis; (e) Network visualization – using Cytoscape to view both simple and weighted networks of interactions. Currently, we have major modules automated; future work will require the seamless integration across modules and across PNNL and ORNL. Taken together this tool set gives us not only the ability to automate our data analysis, but also to quickly explore and compare relative strengths and utilities of both the experimental and data analysis pipelines.

* Presenting author