Genomes to Life Contractor-Grantee Workshop III
February 6-9, 2005, Washington, D.C.
Genomics:GTL Program Projects
Sandia National Laboratories
19
Mapping of Biological Pathways and Networks across Microbial Genomes
F. Mao, V. Olman, Z. Su, P. Dam, and Ying Xu* (xyn@bmb.uga.edu)
University of Georgia, Athens, GA and Oak Ridge National Laboratory, Oak Ridge, TN
Homology exists beyond the individual gene level, and it could exist at the biological pathway and network level. There are a number of databases consisting of all experimentally validated and reliably predicted pathways/networks, providing a rich source of information for genome annotation and biological studies at a systems level. A key to effectively use such information is to identify orthologous genes accurately. However existing methods for mapping these known pathways and networks have serious limitations, greatly limiting the utility of such very useful information. Virtually all existing mapping methods are based on sequence similarity information, using tools such as reciprocal BLAST search or COG mapping. A fundamental problem with such methods is that sequence similarity information alone does NOT contain all the information needed to identify true orthologous genes!
We have recently developed a computational method and software, called P-MAP, for mapping a known pathway/network from one microbial organism to another by combining homology information and genomic structure information. The basic idea is that in microbes, genes working in the same pathway can generally be decomposed into a few operons or, in case of complex pathways/networks, regulons. Such information has not been effectively used in pathway mapping. When mapping known pathways, we first predict all the operons in a genome using our operon prediction program. The predictions are then validated through comparing microarray data mainly to check for consistency between gene expression patterns for genes predicted to be in the same operons or adjacent operons. Our evaluation has indicated that our prediction accuracy is close to 90%. With such information, we then map genes in a pathway template to the target genome that simultaneously gives relatively high sequence similarity between predicted orthologous gene pairs and has all the mapped genes grouped into a number of operons, preferably co-regulated operons based on the predicted cis regulatory elements and available microarray data. We have formulated the mapping problem as a linear integer programming (LIP) problem, and solved the problem using a commercial LIP solver, called COIN.
We have applied the P-MAP program to map known biological pathways in KEGG and MetaCyc to the cyanobacterial genomes and currently are mapping them to the Shewanella oneidensis MR-1 genome. Some of the mapping results could be found at http://csbl.bmb.uga.eddu/WH8102.
Acknowledgement: This project is supported by the U.S.Department of Energy’s Genomics:GTL Program under project “Carbon Sequestration in Synechococcus sp: From Molecular Machines to Hierarchical Modeling” (http://www.genomes-to-life.org).
* Presenting author