November 14, 2000

Subject:  Proteomics: PEB and Celera
Author:  tomheadrick

I've been looking into the proteomics research field with an aim to better understand some of the techniques being developed for protein identification and how informatics will play a role in the future.

Some things I've learned about the the instruments that PEB has on the market and what they have in beta test were really neat. PEB currently markets an analyzer called the Voyager which uses mass spec techniques and the protein sample are acquired by 2D gel separation techniques of which there are a few different techniques for using 2D gel to make a sample that can be extracted and prepared for a mass-spec slide for identification. While automation of 2D sample prep is improving today as a result of automation techniques developed and marketed by companies like Genomic Solutions and Amersham Pharma the techniques for sample prep had varied in the past, enough so that it made for some difficulties in searching databases for proteins that a researcher might want results/information on. Many different labs used many different color techniques, how long did the sample prep sit before you made a analysis and other things that can effect the outcome of your protein sample. Companies that have developed these automation procedures, as mentioned above provide built in standards for sample prep to eliminate these differences. The eventual outcome of researches using the same prep techniques will start to eliminate these post identification "problems". As well theses automated prep stations greatly increase the number of samples a researcher may run in a day, from roughly 10 to 100 in a day. This is good. PEB's Voyager is used in conjunction with the standardized 2D gel samples or 1D strips. As well, other venders of mass-spec instruments like PEB's are also used.

PEB recently developed the MALDI-TOF/TOF mass spec instrument. Recently, they announced that Oxford Glyco will be an early access user of the instrument in 2001. This MALDI-TOF/TOF instrument still requires either 2D or 1D or you can use HPLC as a means to prepare a sample for the new MALDI-TOF/TOF instrument. HPLC prep-times are about 4 hours and can handle about 1000 sample loaded on a slide at one time.

Today I talked to a woman at PEB about the development of a mass-spec machine that requires neither 2D, 1D or HPLC and she said they do have the instrument in beta test. It is the ICAT (or Isotope-Coded Affinity Tag, developed by Rudi Aebersold) double quadrupole mass-spec machine. It is in beta at Celera. The machine automatically fragments peptides and identifies proteins from which they were derived by comparison with protein sequence databases. In conjunction with the ICAT a researcher uses protein databases for identification. I could give you an extremely detailed text book walk through but it would bore and confuse the snot out of a lay person. The obvious caveat, if the protein sequence is not in the database you can't make an identification. So, 2D gel is not going to go away over night. It will more than likely phase out (over a few years, I guess) as more and more protein sequence data becomes available. That is the PEB end of this proteomics approach.

Informatic solutions for this approach:

Protein informatics requires sequence information, functional information, structural information and interaction information as well as effects of time and environment. High-throughput screening of these levels will require lab management systems (LIMS) to track samples and results of the analyses. Software tools will for this type of information management will be key. Simple organisms have 6000 proteins and more complex organisms have 200 different tissue types with 10,000 proteins each this will be a massive multiple year task.

The most comprehensive proteomics database is SWISS-PROT. It is curated and annotated. It has in addition to protein sequences, it has annotations including info about expression, function, domain structure, variants and protein processing and post-translational modifications. It only has 84,000 entries, small compared to GenBank, but the SWISS-PROT is of very high quality and is revered as the gold standard of protein databases. The biggest challenge SWISS-PROT faces is the speed of genome sequencing and scientific publication which makes it impossible to comprehensively keep up the pace. A company called Proteome Inc has taken on a couple projects, or rather organisms, specifically yeast and c.elegans and begun to develop additional information like codon bias, chromosome position and gene structure and provides the even more focused information for these organisms via HTML link through SWISS-PROT. Proteome has similar databases for c.albicans and p.carinii.

Recently, last spring when Celera made the ICAT machine and proteomics factory announcement in the same press release they also said they were partnering with Denis Hochstrasser and SWISS-PROT. Last month Celera released news of 1 year exclusivity to Proteome Inc., certain data and on-going on other data. In my view these are moves by Celera to incorporate "gold-standard" protein sequence databases in conjunction with what they are/have been sequencing for a time now in an effort to create a seamless central repository of sequence data. The ICAT mass-spec instruments data interface will be wired to the internet and I suspect a default setting for searching for your related protein information and data management (LIMS;see above) will set to The more mass-spec instruments PEB gets into the hands of researchers, both public and private the more it could drive users to Celera.

As I noted earlier the SWISS-PROT database has only 84,000 entries. Clearly, everybody in research has a very long way to go. Celera's stock is not going to blast to the sun tomorrow or next week because of this. The purpose of this post was to attempt to connect some dots. I'm not a genetics researcher or bioinformaticist, heck I don't even work in science.


