Workflow Process of the Protein Production and Characterization Facility
Note: Numbers and italicized words in parentheses below refer to terms used on charts below.
Inputs (1)
In its DNA sequence, every gene contains information needed by a cell to produce a specific protein. Scientists can use this information to make the same protein in the laboratory. The Protein Production and Characterization Facility will make proteins beginning with one of two inputs: Actual pieces of DNA that serve as molecular templates for producing given proteins (Gene Clones) or gene sequence information stored in databases—virtual pieces of DNA (Gene Sequences).
Genomics (1)
With standard techniques, the gene sequence information can be used to construct a gene clone (Gene Synthesis). Cloning is accomplished by inserting the synthesized DNA segment into a cloning vector, usually a specific microbe or bacterial virus designed to over-express the protein of interest (Cloning). Choice of vector will vary, since all DNA sequences cannot be cloned in the same vector, nor can all proteins be produced in the same vector. In some cases, specific DNA sequence modifications will be needed before insertion [e.g., to increase the resultant protein’s solubility or to change the way it interacts with other proteins (Modification)].
Cloning and modification can introduce errors into a given DNA sequence. A critical quality-control step, one of several in the protein-production process, is verification that the gene clone’s DNA sequence is correct. This process uses the high-throughput DNA sequencing technology developed as part of the Human Genome Project (DNA Sequencing).
Virtually all steps in this process can be automated. A technician can obtain gene sequence information from a database and use genomics software to automatically direct a series of robots to produce a gene clone, verify the sequence, insert the clone into the appropriate vector, and produce DNA samples ready for making proteins. A laboratory can run this process simultaneously on hundreds of different target gene samples.
Protein Production (2–3)
No single method will work equally well for all proteins, so several methods will be needed to produce difference proteins from gene clones or gene sequence information. Preproduction screening will optimize production and purification methods for each protein of interest. Various production conditions will be tested using nanoliter volumes of reagents and a “lab on a chip”on which large numbers of synthesis and analysis steps can be carried out in parallel. Robotics and microfluidic processes will be used to test various combinations of cloning vectors, reagents, and reaction conditions. The presence, level, and purity of protein expression will be checked using microchannel separations of reaction products combined with molecular-weight markers and various detection techniques (e.g., mass spectrometry, ultraviolet absorbance, and light scattering). Data will be entered into a computer and analyzed, and the best conditions and methods for large-scale protein production will be identified automatically.
During cellular protein production, vectors carrying the gene clone of interest are inserted into a bacterial host whose cellular machinery is used to produce the specific protein of interest (Cellular Production). The protein is extracted from the host cells and purified. Alternatively, proteins can be produced by mixing a DNA template with a set of purified enzymes and chemicals normally used by the cell for protein production; only the protein of interest is produced without the need for a living cell (Cell-Free Production). Finally, well-established chemical-synthesis methods can be used to make short strings of amino acids that must then be hooked together to make complete proteins (Chemical Synthesis). These methods, especially chemical synthesis, can be used to introduce specific changes in a protein sequence such as modification of protein subunits or incorporation of radioactive isotopes needed in downstream analysis. All proteins will undergo purification using a variety of separation technologies (e.g., liquid chromatography, capillary electrophoresis, or affinity columns). Proteins also will need to be collected and maintained under specific conditions that enable them to fold into their natural, functionally active configurations.
The protein-production process can be automated and run simultaneously on hundreds of samples to generate a vast array of normal or modified proteins ready for characterization.
Characterization (4)
In addition to verifying the sequences of gene clones, we also need to characterize the proteins produced (and the processes used to produce them) to ensure their purity and biological behavior (Quality Control and and Quality Assurance).
All proteins produced will be run through a battery of tests and screening procedures (Biophysical Characterization) to assess their quality and to provide initial insights into their structures. For each protein, molecular weight, stability, and proper folding must be determined. No single test will be sufficient to characterize every protein adequately and accurately. Instead, a combination of various spectroscopic, separation, and imaging techniques will be used. Some proteins of particular interest to DOE, such as those involved in hydrogen production or cleanup of environmental contaminants, will be characterized further for biological function by assaying for specific enzymatic activity or binding properties.
Automated systems will simultaneously characterize hundreds of proteins for purity and, in some cases, function.
Affinity Reagent Production (5)
A very useful product of this facility will be affinity reagents that can serve as molecular markers needed to “see”the proteins in cells as parts of multiprotein complexes or as they interact with other proteins or molecules in their normal functions. Multiple affinity reagents, produced by a variety of methods, will be needed for each protein, since each reagent will recognize and bind to a particular feature (e.g., a specific physical conformation or shape as well as specific sites responsible for protein function or activity).
Affinity reagents can be produced from “libraries”of potential binders (Libraries). Each contains, for example, millions of different antibody-like molecules. These libraries can be screened rapidly to identify sets of affinity reagents for each protein. Proteins also can be produced or synthesized (see Protein Production above) with molecular markers or tags built into each (Synthesis).
Almost all steps in this process can be automated and run in parallel so millions of potential affinity reagents can be made simultaneously and hundreds of proteins can be screened against these large libraries to identify binding markers.
Computing and Information (6)
Both the production and research components of this facility need robust tools for tracking the many processes and products and associated R&D operations. A laboratory information management system (LIMS) is needed to track every sample and product that goes into or out of the facility and every process carried out as part of the facility (Workflow). LIMS will enable tracking of process efficiencies, product locations, status and availability of all facility research tools, and status of ongoing user projects. LIMS will allow facility managers and researchers to monitor production strategies (Production Strategies) for both proteins and molecular tags, keep track of all data generated by the facility including successes and failures, and use all that information to predict, for example, which specific strategy would be most likely to work for a given protein (Data and Tools). Developing these data-analysis and process-simulation capabilities will increase facility operational efficiency and reduce costs (Simulation and Analysis). Moreover, the publicly available protocols of “lessons learned”will be a valuable resource that speeds progress in laboratories of scientists not physically using this facility.
Cryogenic Archives (7)
Samples (DNA, proteins, affinity reagents) used and produced by this facility will be stored for future use, shipped to current users, and received from new users (Shipping, Receiving, Storage). Part of the centralized LIMS, all storage, shipping, and receiving data are key components in operating this high-throughput user facility. Many aspects of sample storage and shipping are automatable.
Technology Research and Development (8)
Item 8 is illustrated on first chart only. While technologies currently exist to carry out all production and analysis steps described above, additional research and development are needed to make each individual step more efficient, cost-effective, and part of an automated, high-production assembly line (High Production, Automation). Development and use of computational tools for all aspects of facility operations will be extremely important (Computing Tools).
