March 28, 2000

Subject:  Fly Meeting Report
Author:  drosophilosopher

If this is duplicating a prior post, I'm sorry - I came back from the fly meeting in Pittsburgh to discover that I was 877 messages behind, and I cetainly didn't read them all before posting this.

I was at the Drosophila Genome Sequence workshop at the Fly meeting, and as promised will report on it. This message will not give you any clues as to Celera's business model, or future stock prices, so if that's what you want, just click "Next."

It was absolutely amazing to go to the session and find on the seats a CD containing 120,000,000 bases of the DNA sequence (the first three rows also got an assembled jigsaw puzzle of the genome overview figure from Science). Not that long ago it was a struggle to sequence a few kilobases. The reports in the New york Times etc. were pretty accurate as far as they went. What I can add are some comments on the annotation, and the analysis tools, and a cute anecdote.

Craig Venter (Celera) spoke briefly, then Mark Adams (C) who was in charge of the sequencing side, then Eugene Myers (C) who was in charge of the assembly, then Gerry Rubin (Berkeley) who was in charge of the publicly funded portion of the sequencing, then Suzanna Lewis and Susan Celniker from Berkeley who were in charge of the mapping and annotation, and the development of the Java-based analysis tools.

Gene Myers talked about the assembly, and I went away pretty convinced that they can handle the repeats in the human genome. Quite a computing tour de force. He also said that with their current software tools they could do it over again in one month, rather than the nine months it took the first time. Wow. After that someone asked how much they would charge to sequence another Drosophila species, if it would only take a month. Myers said that he wasn't in the business end so he couldn't answer that question. Michael Ashburner then hollered "Craig is sitting right there, have him answer the question." Venter said "Less." Ashburner then said "Come on Craig, name a price," and Venter said "Talk to me when human and mouse are done."

The annotation is spectacular, and is a real tribute to the combined efforts. Rubin was the only one who really had a lot to lose by Celera jumping in to the genome project, and according to several reports, he was instrumental in setting aside his ego and keeping all the other egos from disrupting the collaboration. A real model for Francis Collins to follow, but not very likely. Last fall they had an "annotation jamboree" at Celera which has been described in some of the news reports. By all accounts, Celera learned a lot about how to annotate a genome, and what the biologists want in a sequence database. It's not done yet, but what is there is pretty amazing. You can scroll graphically along the chromosomes watching genes pass by, color coded to indicate the evidence for the existence of the gene (BLAST hit, cDNA sequence, computer prediction), and then you can click on a gene and get information about it including links to the GenBank entries, exon/intron junctions, homologies to other genes, etc. etc. It's amazing. Eventually you will be able to do a BLAST search and then end up in the right spot on the map. It's way cooler than what GenBank has.

A lot of this is thanks to the Berkeley effort, but Celera certainly learned a lot from the collaboration. The Berkeley group will be continuing on with finishing up the small gaps, and doing more annotation, and doing cDNA sequencing. The computer algorithms for finding genes are not as good yet as they need to be. The accuracy is 1 error in 10,000, which could be improved.

It is a really incredible research tool. It used to be that a big slow bottleneck in research was cloning and sequencing your gene, and that step is now done for you. Instead of fishing out a gene by bench work, you do it by computer, and then move on to asking interesting biological questions using the sequence. It changes the field in a very dramatic and powerful way.

In the last year Celera did an animal with a genome 5% the size of the human genome, and managed to scale up and speed up enough that they could now do it almost 10 times faster. What will the next year bring?

