Tuesday, December 30, 2008

Goals of the original Human Genome Project (HGP)

  • identify all the approximately 20,000-25,000 genes in human DNA,
  • determine the sequences of the 3 billion chemical base pairs that make up human DNA,
  • store this information in databases,
  • improve tools for data analysis,
  • transfer related technologies to the private sector, and
  • address the ethical, legal, and social issues (ELSI) that may arise from the project.

The goals of the original HGP were not only to determine all 3 billion base pairs in the human genome with a minimal error rate, but also to identify all the genes in this vast amount of data. This part of the project is still ongoing although a preliminary count indicates about 30,000 genes in the human genome, which is far fewer than predicted by most scientists.Another goal of the HGP was to develop faster, more efficient methods for DNA sequencing and sequence analysis and the transfer of these technologies to industry.The sequence of the human DNA is stored in databases available to anyone on the Internet. The U.S. National Center for Biotechnology Information (and sister organizations in Europe and Japan) house the gene sequence in a database known as Genbank, along with sequences of known and hypothetical genes and proteins. Other organizations such as the University of California, Santa Cruz, and ENSEMBL present additional data and annotation and powerful tools for visualizing and searching it. Computer programs have been developed to analyze the data, because the data themselves are difficult to interpret without them.The process of identifying the boundaries between genes and other features in raw DNA sequence is called genome annotation and is the domain of bioinformatics. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language, using concepts from computer science such as formal grammars.Another, often overlooked, goal of the HGP is the study of its ethical, legal, and social implications. It is important to research these issues and find the most appropriate solutions before they become large dilemmas whose effect will manifest in the form of major political concerns.All humans have unique gene sequences, therefore the data published by the HGP does not represent the exact sequence of each and every individual's genome. It is the combined genome of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences among individuals. Most of the current effort in identifying differences among individuals involves single nucleotide polymorphisms and the HapMap.


How it was accomplished

The publicly funded groups NIH, the Sanger Institute in Great Britain, and numerous groups from around the world broke the genome into larger pieces; approximately 150,000 base pairs in length. These pieces are called "bacterial artificial chromosomes", or BACs, because they can be inserted into bacteria where they are copied by the bacterial replication machinery. Each of these pieces was then sequenced separately as a small "shotgun" project and then assembled. The larger, 150,000 base pair chunks were then stitched together to create chromosomes. This is known as the "hierarchical shotgun" approach, because the genome is first broken into relatively large chunks, which are then mapped to chromosomes before being selected for sequencing. The whole-genome shotgun (WGS) method is faster and cheaper, and by 2003 - thanks to the availability of clever assembly algorithms - it had become the standard approach to sequencing most mammalian genomes.


Whose genome was sequenced?

In the international public-sector Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Dr. Pieter J. de Jong. It has been informally reported, and is well known in the genomics community, that much of the DNA for the public HGP came from a single anonymous male donor from the state of New York.Technically, it is much easier to prepare DNA cleanly from sperm than from other cell types because of the much higher ratio of DNA to protein in sperm and the much smaller volume in which purifications can be done. Using sperm does provide all chromosomes for study, including equal numbers of sperm with the X (female) or Y (male) sex chromosomes. HGP scientists also used white cells from the blood of female donors so as to include female-originated samples. One minor technical issue is that sperm samples contain only half as much DNA from the X and Y chromosomes as from the other 22 chromosomes (the autosomes); this happens because each sperm cell contains only one X or one Y chromosome, but not both. Thus in 100 sperm cells, on average there will be 50 X and 50 Y chromosomes, as compared to 100 copies of each of the other chromosomes.Although the main sequencing phase of the HGP has been completed, studies of DNA variation continue in the International HapMap Project, whose goal is to identify patterns of SNP groups (called haplotypes, or “haps”).

The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese in Tokyo; Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisme Humain (CEPH) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe.In the Celera Genomics private-sector project, DNAs from five different individuals were used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter, later acknowledged (in a public letter to the journal Science) that his DNA was one of those in the pool.

No comments: