Genomes to Life Contractor-Grantee Workshop III
February 6-9, 2005, Washington, D.C.
Genomics:GTL Program Projects
Oak Ridge National Laboratory and Pacific Northwest National Laboratory
11
Center for Molecular and Cellular Systems: Statistical Screens for Datasets from High-Throughput Protein Pull-Down Assays
Frank W. Larimer*1 (larimerfw@ornl.gov), Kenneth K. Anderson2, Deanna L. Auberry2, Don S. Daly2, Vladimir Kery2, Denise D. Schmoyer1, Manesh B. Shah1, and Amanda M. White2
1Oak Ridge National Laboratory, Oak Ridge, TN and 2Pacific Northwest National Laboratory, Richland, WA
The large-scale analysis pipeline developed at ORNL and PNNL to identify protein complexes uses a high-throughput affinity-tag “pulldown” isolation step followed by denaturing elution, trypsin digestion and combined liquid chromatography tandem mass spectrometry analysis. Each pulldown experiment identifies potential associations between the target proteins and the tagged protein in the isolation step. Any protein complex analysis method may miss some protein:protein interactions and identify artifactual associations.
We are developing informatics-based criteria for assigning significance to protein identifications and associations. Data from blanks and quality assurance standards are used identify analysis problems such as carry-over between samples. Wild-type controls and replicate pull-downs are used to estimate repeatability. Proteins that show up with statistically significant frequency in a large number of experiments are used to establish background profiles. As our Mass Spec dataset of pulldown experiments grows, it will facilitate validation of this approach. With an understanding of the experimental noise, a quantitative estimate of the significance of specific pulldown results can be estimated.
We are also evaluating published statistical frameworks for interpreting protein association data. In tandem, we have developed a statistical screen for high-throughput pulldown experiments to reduce labeling spurious associations and strengthen identification of true associations. Initial results, though promising, emphasize the difficulties in developing a valid estimator of the probability of association between two proteins.
* Presenting author