Genomes to Life Contractor-Grantee Workshop III
February 6-9, 2005, Washington, D.C.
Genomics:GTL Program Projects
Sandia National Laboratories
21
Predicting Protein-Protein Interactions Using Signature Products with an Application to β-Strand Ordering
Shawn Martin1 (smartin@sandia.gov), W. Michael Brown1, Charlie Strauss2, Mark D. Rintoul*1, and Jean-Loup Faulon3
1Sandia National Laboratories, Albuquerque, NM; 2Los Alamos National Laboratory, Los Alamos, NM; and 3Sandia National Laboratories, Livermore, CA
As a part of the project entitled “Carbon Sequestration in Synechococcus sp.: From Molecular Machines to Hierarchical Modeling,” we have developed a computational method for predicting protein-protein interactions from amino acid sequence and experimental data (Martin, Roe et al. 2004). This method is based on the use of symmetric tensor products of amino acid sequence fragments, which we call signature products. These products occur with different frequencies when considering interacting versus non-interacting protein pairs. We can therefore predict when a protein pair interacts by comparing the frequency of the signature products in that pair to the corresponding frequencies in protein pairs known to interact. Computationally, these comparisons are encoded into a Support Vector Machine (SVM) framework (Cristianini and Shawe-Taylor 2000), where the signature products are implemented using kernel functions. The final result is an automated classification system which can extrapolate from experimental results to complexes to entire proteomes, based only on experiment and primary sequence.
We have expended significant effort in benchmarking our method against competing techniques. In particular, we have compared the signature product method with methods based on products of InterPro signatures (Sprinzak and Margalit 2001; Mulder, Apweiler et al. 2003), concatenation of full-length amino acid sequences (Bock and Gough 2001), and a method combining multiple data sources (Jansen, Yu et al. 2003). In all cases our method performed as well as or better than the competing methods, as measured by 10-fold cross validation. We have applied our method to the prediction of protein-protein interactions in the case of yeast SH3 domains (Tong, Drees et al. 2002), where we achieved 80.7% accuracy, the full yeast proteome (69% accuracy), and H. Pylori (Rain, Selig et al. 2001) (83.4% accuracy). In addition to these results, which appear in (Martin, Roe et al. 2004), we have applied our method using COG networks (Tatusov, Koonin et al. 1997) to Synechocystis sp. (91% accuracy), and Nostoc sp. (69% accuracy).
Using our signature product approach, we have gone on to develop a method for ordering β-strands, which can in turn be used to improve the results of ab initio protein folding. The first step in our method is to train a signature product model for predicting β-strand interactions. This model was trained by extracting all β-strands from the Protein Data Bank (PDB) (Berman, Westbrook et al. 2000) using the dictionary of protein secondary structure (DPSS) method (Kabsch and Sander, 1983). After removing identical sequences from the ~20,000 structures available in the PDB, we obtained ~90,000 β-strands with more than 3 residues. Using the product signature method, we trained a β-strand interaction predictor which achieved 77% accuracy on a randomly selected training/test set combination with 80% of the β-strands in the training set and the remaining 20% of the β-strands in the test set. The next step in our method, as yet unimplemented, applies to individual proteins. For a given protein, we will apply our model to every possible pair of β-strands within the protein, and then consider every possible ordering of these strands. We will use the ordering which gives the highest average interaction score, as measured using our β-strand interaction predictor. We will validate our results by comparing our predicted ordering to the ordering actually present in the PDB.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the U.S. Department of Energy’s National Nuclear Security Administration under Contract DE-AC04-94AL85000.
* Presenting author