Genomes to Life Contractor-Grantee Workshop III
February 6-9, 2005, Washington, D.C.
Microbial Genomics
87
Data Analysis and Protein Identification Strategy for the Systems-Level Protein-Protein Interaction Networks of Shewanella oneidensis MR-1
Gordon A. Anderson*1 (gordon@pnl.gov), James E. Bruce2, Xiaoting Tang2, Gerhard Munske2, and Nikola Tolic1
1Pacific Northwest National Laboratory, Richland, WA and 2Washington State University, Pullman, WA
The Protein-Protein Interaction Networks research program aims to identify proteome wide protein interaction using cross-linking and high performance mass spectrometry. Cross-linking coupled with mass spectrometry is a widely used technique in protein interaction research. This technique can present a number of informatics challenges. Techniques that involve cross-linking of the proteins and then enzymatic digestion of these proteins into complex mixtures of peptides and cross linked peptides present a significant analysis challenge. The number of possible cross-linked peptides is the square of the number of tryptic peptides in the organism under study. This problem is amplified by other factors such as incomplete digestion and posttranslational modification.
This research project has developed unique cross-linker molecules called Protein Interaction Reporters (PIRs) that contain bonds that can be cleaved with high specificity in the mass spectrometer. This allows the detection of the cross-linked peptide mass and then, after low energy CID, detection of the individual peptide masses. Using this mass information and the amino acid specificity of the cross-linker molecule peptide identification is possible in many cases without further mass spectrometry. An additional feature of this technique is the spacer chain of the cross-linker that is detected in the low energy CID spectrum. This provides a mass “signature” in the CID spectrum indicating cross-linked peptide data is present. This presentation will outline the data analysis steps required and outline an identification strategy.
High performance mass spectrometry will be used to enable high throughput identification of cross linked proteins. The cross-linked proteins will be extracted from the organism and purified. The proteins will be digested and prepared for analysis by mass spectrometry. High performance FTICR mass spectrometry will be used because of its high mass measurement accuracy. FTICR allows peptide mass measurements of 1 to 5 ppm. This mass measurement accuracy coupled with the peptide constraints due to this PIR approach will enable peptide and protein identification based on mass measurement accuracy alone.
The analysis of cross linked peptides consists of a mass spectrometry experiment that involves capturing a pre-cursor ms scan containing potential cross linked peptides. The next ms spectrum is a low energy CID of the same ions detected in the first pre-cursor spectrum. The second spectrum contains the cross-linked peptides with the cross-linker fragmented. This fragmentation occurs without affecting the peptide backbone. This pair of spectra (the pre-cursor and low energy CID) allows identification of the peptide masses that were cross linked. The low energy CID spectrum will also contain a “signature” indicating a cross linked peptide pair was in the pre-cursor spectrum. This “signature” is the mass of the cross linker’s core or spacer mass. This “signature” identifies spectra that potentially will identify a cross linked peptide pair. The two spectra can then be analyzed to identify the two peptide masses. The mass of the cross-linker core plus the two cross linked peptides must equal the mass of a ion detected in the pre-cursor spectrum. This analysis step will insure peptide identification will only be attempted for masses resulting from cross linked peptides.
The next step in the process is the identification of these peptides from their accurate masses. This identification is assisted by peptide constraints imposed by the cross-linker molecule. This presentation will show the mass measurement accuracy needed for identification with and without the peptide constrains this methodology provides. This analysis is performed using the genomic data from Shewanella oneidensis. This data will be used to calculate all of the possible cross linkable peptides. These peptide sequences will be used to calculate the peptide masses. The resultant masses will be used to predict there uniqueness at various levels of mass measurement accuracy to determine the feasibility of identifying the peptides using mass accuracy alone.
This presentation, through simulation, demonstrates the analysis algorithms that will be used to identify cross-linked peptides and illustrate the mass spectrometry performance necessary for high throughput performance.
This research was supported by the Office of Science (BER), U.S. Department of Energy, Grant No. DE-FG02-04ER63924.
* Presenting author