Shortly after their press conferences, the two groups that had been striving for several years to map the human genome published their findings:
- the International Human Genome Sequencing Consortium (IHGSC) in the 15 February 2001 issue of Nature;
- Celera Genomics, a company in Rockville, Maryland, in the 16 February issue of Science.
These achievements were monumental, but before we examine them, let us be clear as to what they were not.
- Neither group had determined the complete sequence of the human genome.
Each of our chromosomes is a single molecule of DNA. Some day the sequence of base pairs in each will be known from one end to the other. But in 2001, thousands of gaps remained to be filled.
What they had done was present a series of draft sequences that represented about 90% (probably the most interesting 90%) of the genome.
- Even taken together, the results did not provide an accurate count of the number of protein-encoding genes in our genome (in contrast to such genomes as those of
One reason: the
- large number and
- large size
of the introns that split these genes make it difficult to recognize the open reading frames (ORFs) that encode proteins.
The two groups came up with slightly different estimates of the number of protein-encoding genes, but both in the range of 30 to 38 thousand:
- only two to three times larger than the genomes of
- and representing only 1–
2% of the total DNA in the cell;
- and a third of the 100,000 genes that many had predicted would be found.
- (By the end of 2004, the number had been reduced to some 20,000–25,000.)
Are we only twice as complex as the tiny roundworm and fruit fly?
Probably not, although we share many homologous genes (called "orthologs") with both these animals.
But,
Although there are some giants such as
- dystrophin with its 79 exons spread over 2.4 million base pairs of DNA;
- titin whose exons (Celera identified 234; IHGSC only 178) encode a single polypeptide of ~27,000 amino acids;
the average human gene contains 4 exons totaling 1,350 base pairs and thus encodes an average protein of 450 amino acids.
The density of genes on the different chromosomes varies from
- 23 genes per million base pairs on chromosome 19 (for a total of 1,400 genes) to
- only 5 genes per million base pairs on chromosome 13.
Humans, and presumably most vertebrates, have genes not found in invertebrate animals like Drosophila and C. elegans.
These include genes encoding
- antibodies and T cell receptors for antigen (TCRs) [Discussion]
- the transplantation antigens of the major histocompatibility complex (HLA, the MHC of humans) [Link]
- cell-signaling molecules including the many types of cytokines
- the molecules that participate in blood clotting. [Link]
- mediators of apoptosis. Although these proteins occur in Drosophila and C. elegans, we have a much richer assortment of them.
Both groups added to the list of human genes that have arisen by repeated duplication (e.g., by unequal crossing over) from a single precursor gene; for examples,
Both groups verified the presence of large amounts of repetitive DNA. In fact, this DNA —
with similar sequences occurring over and over —
is one of the main obstacles to assembling the DNA sequences in proper order.
All told, repetitive DNA probably accounts for over 50% of our total genome.
- Fill in the gaps.
This will make the human genome truly complete. On October 21, 2004, the IHGSC announced that they had pretty much (99%) completed the job with only 341 gaps remaining. However, they still could not determine the exact number of our genes (probably 20,000–25,000 of them).
- Determine the human proteome; that is, the total complement of proteins we synthesize.
- Analyze how clusters of genes are coordinately expressed
- in various types of cells
- at different times in the life of a cell.
Such analysis will benefit greatly from the availability to gene chip technology and will also help us to understand how such a modest increase in gene number from Drosophila to humans could produce such a different outcome!
- Determine the genomes of other vertebrates, e.g., mouse and the chimpanzee.
This will not only help us recognize more human genes but will give us insight into what makes us unique.
Already we know that large sections of our genome have closely-related homologs in the mouse.
Examples:
- The collection of genes — and even their order — on human chromosome 17 matches closely those of mouse chromosome 11. The same is true of human chromosome 20 and mouse chromosome 2.
- Humans and mice (also rats) share several hundred absolutely identical stretches of DNA extending for 200–800 base pairs.
- Some are present in the exons of genes, especially genes involved in RNA processing.
- Some are found in or near the introns of genes, especially genes encoding proteins involved in DNA transcription.
- Some are found between genes — especially those, like Pax-6, essential to embryonic development — and may serve as enhancers.
To have avoided any mutations for 60 million years since humans and rodents went their separate evolutionary ways suggest that these regions perform functions absolutely essential to mammalian life.
Some External Links |
- The journal Nature has made the contents of its genome issue available online, even to nonsubscribers. Link to it.
- How to Sequence a Genome Illustrated descriptions of sequencing strategies. (Requires Flash)
|
(Please let me know by e-mail if you find a broken link in my pages.) |
9 November 2004