Contents
· How it was accomplished?
2. Public versus private approaches
4. Potential benefits from HGP
Introduction
What is the Human Genome Project?
Begun formally in October 1990, the U.S. Human Genome Project was a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but rapid technological advances accelerated the completion date to 2003.
The Human Genome Project (HGP) is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA and to identify and map the approximately 20,000–25,000 genes of the human genome from both a physical and functional standpoint (Robert Krulwich, 2001).
The most ambitious research project made possible by DNA technology is the effort to map the entire human genome. Four complementary approaches are being used:
1) Genetic (Linkage) Mapping of the Human Genome
The initial goal is to locate at least 3000 genetic markers (genes or other identifiable loci on the DNA) spaced evenly through the chromosomes. This approach is made feasible by the abundance of RFLP in the Human genome, and many of the markers used will be RFLP markers. The resulting map will make it easy for researchers to find the locasi of the other markers including genes by testing for genetic linkage to known markers.
2) Physical Mapping of the Human Genome
This is done by cutting each chromosome into a number of identifiable fragments, and then determining their actual order in the chromosome.
3) Sequencing of the Human Genome
This is the process of determining the exact order of the nucleotide pairs of each chromosome. Since a haploid set of human chromosome contains approximately 3 billion nucleotide pairs, this is potentially the most time consuming party of the project.
4) Analyzing the Genome of Other Species
The project also includes similar analysis of the genome of other species in genetic research, such as E. coli, Yeast, the plant Arabidopsis, and mouse.
These four approaches are yielding data that together will give a complete map of Human genome and an understanding of how the Human genome compares to those of other organisms.
The project began in 1990 and was initially headed by [Ari Patrinos, head of the Office of Biological and Environmental Research. Francis Collins directed the National Institutes of Health National Human Genome Research Institute efforts]. The DNA that is being used in the project is from four individuals. A working draft of the genome was released in 2000 is at 13.6% of its goal (453,968,000 bases) and a complete one in 2003 is at 13.8% (466,883,000 bases) of their goal, with further, more detailed analysis still being published.
In May 2006, Human Genome Project (HGP) researchers announced the completion of the DNA sequence for the last of the 24 human chromosomes.
Most of the government-sponsored sequencing was performed in universities and research centers from the United States, the United Kingdom, Japan, France, Germany, and China. The mapping of human genes is an important step in the development of medicines and other aspects of health care.
From its inception, the Human Genome Project revolved around two key principles (International Human Genome Sequencing Consortium, 2001). First, it welcomed collaborators from any nation in an effort to move beyond borders, to establish an all-inclusive effort aimed at understanding our shared molecular heritage, and to benefit from diverse approaches. The group of publicly funded researchers that eventually assembled was known as International Human Genome Sequencing Consortium (IHGSC). Second, this project required that all human genome sequence information be freely and publicly available within 24 hours of its assembly. This founding principle ensured unrestricted access for scientists in academia and in industry, and it provided the means for rapid and novel discoveries by researchers of all types.
- identify all the approximately 20,000-25,000 genes in human DNA,
- determine the sequences of the 3 billion chemical base pairs that make up human DNA,
- store this information in databases,
- improve tools for data analysis,
- transfer related technologies to the private sector, and
- address the ethical, legal, and social issues (ELSI) that may arise from the project.
To help achieve these goals, researchers also studied the genetic makeup of several nonhuman organisms. These include the common human gut bacterium Escherichia coli, the fruit fly, and the laboratory mouse.
While the objective of the Human Genome Project is to understand the genetic makeup of the human species, the project has also focused on several other nonhuman organisms such as E. coli, the fruit fly, and the laboratory mouse. It remains one of the largest single investigational projects in modern science.
The Department of Energy's Human Genome Program and the National Institutes of Health's National Human Genome Research Institute (NHGRI) together sponsored the U.S. Human Genome Project.The Human Genome Project originally aimed to map the nucleotides contained in a human haploid reference genome (more than three billion). Several groups have announced efforts to extend this to diploid human genomes including the International HapMap Project, Applied Biosystems, Perlegen, Illumina, JCVI, Personal Genome Project, and Roche-454.
The "genome" of any given individual (except for identical twins and cloned organisms) is unique; mapping "the human genome" involves sequencing multiple variations of each gene (Harmon & Katherine, 2010). The project did not study the entire DNA found in human cells; some heterochromatic areas (about 8% of the total genome) remain un-sequenced.
Background
What's a Genome? And Why is it Important?
A genome is the entire DNA in an organism, including its genes. Genomes vary widely in size: the smallest known genome for a free-living organism (a bacterium) contains about 600,000 DNA base pairs, while human and mouse genomes have some 3 billion. Except for mature red blood cells, all human cells contain a complete genome.
· Genes carry information for making all the proteins required by all organisms. These proteins determine, among other things, how the organism looks, how well its body metabolizes food or fights infection, and sometimes even how it behaves.
· DNA is made up of four similar chemicals (called bases and abbreviated A, T, C, and G) that are repeated millions or billions of times throughout a genome. The human genome, for example, has 3 billion pairs of bases.
· The particular order of As, Ts, Cs, and Gs is extremely important. The order underlies all of life's diversity, even dictating whether an organism is human or another species such as yeast, rice, or fruit fly, all of which have their own genomes and are themselves the focus of genome projects. Because all organisms are related through similarities in DNA sequences, insights gained from nonhuman genomes often lead to new knowledge about human biology.
The human genome is the total DNA in a complete set of human chromosomes: that is, 22 pairs of ordinary chromosomes (or autosomes) and a pair of sex chromosomes (X and Y).
Fig 1: The Human Genome (showing the chromosomes like this are called a 'karyotype')
The DNA sequence contained in a genome contains the complete code that determines which genes and proteins will be present in human cells.
Fig 2: From Genes to Proteins (Knowledge of a genome unlocks the secrets of what DNA is making which proteins. This will ultimately help scientist to better understand the inner workings of biology)
The project began with the culmination of several years of work supported by the United States Department of Energy, in particular workshops in 1984 (Cook Deegan, 1989) and 1986 and a subsequent initiative of the US Department of Energy (Barnhart, Benjamin, 1989).
This 1987 report stated boldly, "The ultimate goal of this initiative is to understand the human genome" and "knowledge of the human as necessary to the continuing progress of medicine and other health sciences as knowledge of human anatomy has been for the present state of medicine." Candidate technologies were already being considered for the proposed undertaking at least as early as 1985 (DeLisi & Charles, 2001).
James D. Watson was head of the National Center for Human Genome Research at the National Institutes of Health (NIH) in the United States starting from 1988. Largely due to his disagreement with his boss, Bernadine Healy, over the issue of patenting genes, Watson was forced to resign in 1992. He was replaced by Francis Collins in April 1993, and the name of the Center was changed to the National Human Genome Research Institute (NHGRI) in 1997.
The $3-billion project was formally founded in 1990 by the United States Department of Energy and the U.S. National Institutes of Health, and was expected to take 15 years. In addition to the United States, the international consortium comprised geneticists in the United Kingdom, France, Germany, Japan, China, and India.
Due to widespread international cooperation and advances in the field of genomics (especially in sequence analysis), as well as major advances in computing technology, a 'rough draft' of the genome was finished in 2000 (announced jointly by then US president Bill Clinton and the British Prime Minister Tony Blair on June 26, 2000). This first available rough draft assembly of the genome was completed by the UCSC Genome Bioinformatics Group, primarily led by then graduate student Jim Kent. Ongoing sequencing led to the announcement of the essentially complete genome in April 2003, 2 years earlier than planned (Noble, Ivan, 2003).
In May 2006, another milestone was passed on the way to completion of the project, when the sequence of the last chromosome was published in the journal Nature.
A unique aspect of the U.S. Human Genome Project is that it was the first large scientific undertaking to address potential ELSI implications arising from project data.
Another important feature of the project was the federal government's long-standing dedication to the transfer of technology to the private sector. By licensing technologies to private companies and awarding grants for innovative research, the project catalyzed the multibillion-dollar U.S. biotechnology industry and fostered the development of new medical applications.
Landmark papers detailing sequence and analysis of the human genome were published in February 2001 and April 2003 issues of Nature and Science.
There are multiple definitions of the "complete sequence of the human genome". According to some of these definitions, the genome has already been completely sequenced, and according to other definitions, the genome has yet to be completely sequenced.
The first working draft of the human genome sequence was hailed with much excitement and fanfare as the “completion of the human genome” in the media. However, this first draft was not considered to be complete by scientists because of significant gaps in the sequences (this draft was 90% complete). For scientists, the high-quality reference sequence publicly released in April 2003 represents the first real step to having “finished” human sequence on hand (this draft represents sequence information that is considered to be 99% complete).
There have been multiple popular press articles reporting that the genome was "complete." The genome has been completely sequenced using the definition employed by the International Human Genome Project. A graphical history of the human genome project shows that most of the human genome was complete by the end of 2003.
However, there are a number of regions of the human genome that can be considered unfinished:
· First, the central regions of each chromosome, known as centromeres, are highly repetitive DNA sequences that are difficult to sequence using current technology. The centromeres are millions (possibly tens of millions) of base pairs long and for the most part these are entirely un-sequenced.
· Second, the ends of the chromosomes, called telomeres, are also highly repetitive, and for most of the 46 chromosome ends these too are incomplete. It is not known precisely how much sequence remains before the telomeres of each chromosome are reached, but as with the centromeres, current technological restraints are prohibitive.
· Third, there are several loci in each individual's genome that contain members of multigene families that are difficult to disentangle with shotgun sequencing methods – these multigene families often encode proteins important for immune functions.
· Other than these regions, there remain a few dozen gaps scattered around the genome, some of them rather large, but there is hope that all these will be closed in the next couple of years.
In summary: the best estimates of total genome size indicate that about 92.3% of the genome has been completed and it is likely that the centromeres and telomeres will remain un-sequenced until new technology is developed that facilitates their sequencing. Most of the remaining DNA is highly repetitive and unlikely to contain genes, but it cannot be truly known until it is entirely sequenced. Understanding the functions of all the genes and their regulation is far from complete.
The roles of junk DNA, the evolution of the genome, the differences between individuals, and many other questions are still the subject of intense interest by laboratories all over the world.
Goals
The completion of the human DNA sequence in the spring of 2003 coincided with the 50th anniversary of Watson and Crick's description of the fundamental structure of DNA.
The Human Genome Project was marked by accelerated progress. In June 2000, the rough draft of the human genome was completed a year ahead of schedule. In February 2001, the working draft was completed, and special issues of Science and Nature containing the working draft sequence and analysis were published. Additional papers were published in April 2003 when the project was completed..
The project's first 5-year plan, intended to guide research in FYs 1990-1995, was revised in 1993 due to unexpected progress, and the second plan outlined goals through FY 1998. The third and final plan [Science, 23 October 1998] was developed during a series of DOE (Department of Energy) and NIH (National Institute of Health) workshops.
Some 18 countries have participated in the worldwide effort, with significant contributions from the Sanger Center in the United Kingdom and research centers in Germany, France, and Japan.
Human Genome Project Goals and Completion Dates
|
|
|
|
|
resolution map (600 – 1,500 markers)
|
|
|
|
|
|
|
|
95% of gene-containing part of human sequence finished to 99.99% accuracy
|
99% of gene-containing part of human sequence finished to 99.99% accuracy
|
|
Capacity and Cost of Finished Sequence
|
Sequence 500 Mb/year at < $0.25 per finished base
|
Sequence >1,400 Mb/year at <$0.09 per finished base
|
|
|
100,000 mapped human SNPs
|
3.7 million mapped human SNPs
|
|
|
|
15,000 full-length human cDNAs
|
|
|
Complete genome sequences of E. coli, S. cerevisiae, C. elegans, D. melanogaster
|
Finished genome sequences of E. coli, S. cerevisiae, C. elegans, D. melanogaster, plus whole-genome drafts of several others, including C. briggsae, D. pseudoobscura, mouse and rat
|
|
|
Develop genomic-scale technologies
|
High-throughput oligonucleotide synthesis
|
|
|
|
Eukaryotic, whole-genome knockouts (yeast)
|
|
Scale-up of two-hybrid system for protein-protein interaction
|
|
The sequence of the human DNA is stored in databases available to anyone. The U.S. National Center for Biotechnology Information (and sister organizations in Europe and Japan) house the gene sequence in a database known as Gen Bank, along with sequences of known and hypothetical genes and proteins. Other organizations such as the University of California, Santa Cruz, and Ensembl present additional data and annotation and powerful tools for visualizing and searching it. Computer programs have been developed to analyze the data, because the data itself is difficult to interpret without such programs.
The process of identifying the boundaries between genes and other features in a raw DNA sequence is called genome annotation and is the domain of bioinformatics. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language, using concepts from computer science such as formal grammars.
Another, often overlooked, goal of the HGP is the study of its ethical, legal, and social implications. It is important to research these issues and find the most appropriate solutions before they become large dilemmas whose effect will manifest in the form of major political concerns.
All humans have unique gene sequences. Therefore the data published by the HGP does not represent the exact sequence of each and every individual's genome. It is the combined "reference genome" of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences among individuals. Most of the current effort in identifying differences among individuals involves single-nucleotide polymorphisms (SNP) and the HapMap.
Findings
Key findings of the draft (2001) and complete (2004) genome sequences include
1. There are approximately 20,500 genes in human beings, the same range as in mice and twice that of roundworms. Understanding how these genes express themselves will provide clues to how diseases are caused.
2. Between 1.1% to 1.4% of the genomes sequence codes for proteins.
3. The human genome has significantly more segmental duplications (nearly identical, repeated sections of DNA) than other mammalian genomes. These sections may underlie the creation of new primate-specific genes.
4. At the time when the draft sequence was published less than 7% of protein families appeared to be vertebrate specific.
Human Genome sequence tells us about following things;
By the Numbers
· The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).
· The average gene consists of 3000 bases, but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.
· The total number of genes is estimated at 30,000 - much lower than previous estimates of 80,000 to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a composite of gene-rich and gene-poor areas.
· Almost all (99.9%) nucleotide bases are exactly the same in all people.
· The functions are unknown for over 50% of discovered genes.
· The human genome's gene dense "urban centers" are predominantly composed of the DNA building blocks G and C.
· In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC and AT rich regions usually can be seen through a microscope as light and dark bands on chromosomes.
· Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between.
· Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to gene-rich areas, forming a barrier between the genes and the "junk DNA." These CpG islands are believed to help regulate gene activity.
· Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231).
How the Human Compares with Other Organisms
· Unlike the human's seemingly random distribution of gene-rich areas, many other organisms' genomes are more uniform, with genes evenly spaced throughout.
· Humans have on average three times as many kinds of proteins as the fly or worm because of mRNA transcript "alternative splicing" and chemical modifications to the proteins. This process can yield different protein products from the same gene.
· Humans share most of the same protein families with worms, flies, and plants, but the number of gene family members has expanded in humans, especially in proteins involved in development and immunity.
· The human genome has a much greater portion (50%) of repeat sequences than the mustard weed (11%), the worm (7%), and the fly (3%).
· Scientists have proposed many theories to explain evolutionary contrasts between humans and other organisms, including those of life span, litter sizes, inbreeding, and genetic drift.
How It Was Accomplished
The first printout of the human genome to be presented as a series of books, displayed at the Wellcome Collection, London.
The Human Genome Project was started in 1989 with the goal of sequencing and identifying all three billion chemical units in the human genetic instruction set, finding the genetic roots of disease and then developing treatments. With the sequence in hand, the next step was to identify the genetic variants that increase the risk for common diseases like cancer and diabetes.
It was far too expensive at that time to think of sequencing patient’s whole genomes. So the National Institutes of Health embraced the idea for a "shortcut", which was to look just at sites on the genome where many people have a variant DNA unit. The theory behind the shortcut was that since the major diseases are common, so too would be the genetic variants that caused them. Natural selection keeps the human genome free of variants that damage health before children are grown, the theory held, but fails against variants that strike later in life, allowing them to become quite common (In 2002 the National Institutes of Health started a $138 million project called the HapMap to catalog the common variants in European, East Asian and African genomes).
The genome was broken into smaller pieces; approximately 150,000 base pairs in length. These pieces were then ligated into a type of vector known as "bacterial artificial chromosomes", or BACs, which are derived from bacterial chromosomes which have been genetically engineered. The vectors containing the genes can be inserted into bacteria where they are copied by the bacterial DNA replication machinery. Each of these pieces was then sequenced separately as a small "shotgun" project and then assembled. The larger, 150,000 base pairs go together to create chromosomes. This is known as the "hierarchical shotgun" approach, because the genome is first broken into relatively large chunks, which are then mapped to chromosomes before being selected for sequencing.
The Human Genome Project employed a two-phase approach to tackle the human genome sequence (IHGSC, 2001).
The first phase, called the shotgun phase, divided human chromosomes into DNA segments of an appropriate size, which were then further subdivided into smaller, overlapping DNA fragments that were sequenced. The Human Genome Project relied upon the physical map of the human genome established earlier, which served as a platform for generating and analyzing the massive amounts of DNA sequence data that emerged from the shotgun phase.
Next, the second phase of the project, called the finishing phase, involved filling in gaps and resolving DNA sequences in ambiguous areas not obtained during the shotgun phase.
The shotgun phase of the Human Genome Project itself consisted of three steps:
1. Obtaining a DNA clone to sequence.
2. Sequencing the DNA clone.
3. Assembling sequence data from multiple clones to determine overlap and establish a contiguous sequence.
Fig 3: Idealized Representation of the Hierarchical Shotgun Sequencing Approach
The approach used by the members of the IHGSC was called the hierarchical shotgun method, because the team members systematically generated overlapping clones mapped to individual human chromosomes, which were individually sequenced using a shotgun approach. The clones were derived from DNA libraries made by ligating DNA fragments generated by partial restriction enzyme digestion of genomic DNA from anonymous human donors into bacterial artificial chromosome (BAC) vectors, which could be propagated in bacteria.
When possible, the DNA fragments within the library vectors were mapped to chromosomal regions by screening for sequence-tagged sites (STSs), which are DNA fragments, usually less than 500 base pairs in length, of known sequence and chromosomal location that can be amplified using polymerase chain reaction (PCR). Library clones were also digested with the restriction enzyme HindIII, and the sizes of the resulting DNA fragments were determined using agarose gel electrophoresis. Each library clone exhibited a DNA fragment "fingerprint," which could be compared to that of all other library clones in order to identify overlapping clones. Fluorescence in situ hybridization (FISH) was also used to map library clones to specific chromosomal regions. Collectively, the STS, DNA fingerprint, and FISH data allowed the IHGSC to generate contigs, which consisted of multiple overlapping bacterial artificial chromosome (BAC) library clones spanning each of the 24 different human chromosomes (i.e., 22 autosomes and the X and Y chromosomes).
Next, individual BAC clones selected for DNA sequence analysis were further fragmented, and the smaller genomic DNA fragments were subcloned into vectors to generate a BAC derived shotgun library. The inserts were sequenced using primers matching the vector sequence flanking the genomic DNA insert, and overlapping shotgun clones were used to generate a DNA sequence spanning the entire BAC clone.
A summary of this step is shown in following figure:
Fig 4: Levels of clone and sequence coverage
The members of the IHGSC agreed that each center would obtain an average of four fold sequence coverage, with no clone having less than three fold coverage. The term "shotgun" comes from the fact that the original BAC clone was randomly fragmented and sequenced, and the raw DNA sequence data was then subjected to computational analyses to generate an ordered set of DNA sequences that spanned the BAC clone.
After the completion of the draft phase of the Human Genome Project, the IHGSC pursued the second phase of the project: the finishing phase (IHGSC, 2004). During this phase, the researchers filled in gaps and resolved DNA sequences in ambiguous areas that were not solved during the shotgun phase. The finishing phase yielded 99% of the human genome in final form. The final form of the human genome contained 2.85 billion nucleotides, with a predicted error rate of 1 event per 100,000 bases sequenced.
Furthermore, the IHGSC reduced the number of gaps by 400-fold; only 341 gaps out of 147,821 gaps remained. The remaining gaps were associated with technically challenging chromosomal regions. Although the earlier draft publications had predicted as many as 40,000 protein encoding genes, the finishing phase reduced this estimate to between 20,000 and 25,000 protein-encoding genes.
Future challenges identified by the IHGSC during this phase included the identification of polymorphisms as a platform for understanding genetic links to human disease, the identification of functional elements within the genome (genes, proteins, elements involved in gene regulation, and structural elements), and the identification of gene and protein "modules" that act in concert with one another.
Funding came from the US government through the National Institutes of Health in the United States, and a UK charity organization, the Wellcome Trust, as well as numerous other groups from around the world. The funding supported a number of large sequencing centers including those at Whitehead Institute, the Sanger Centre, Washington University, and Baylor College of Medicine.
The Human Genome Project is considered a Mega Project because the human genome has approximately 3.3 billion base-pairs; if the cost of sequencing is US $3 per base-pair, then the approximate cost will be US $10 billion.
Fig 5: Total amount of Human sequence in high Throughput Genome Sequence (HTGS) division of GenBank [The total is the sum of finished (red) and unfinished sequence (yellow)]
Phase I - Conceptualization / Initiation: [~1985-1990]
ü In 1985, Charles DeLisi, Department of Energy (DOE), begins discussion of a mammoth project 'of a scale unprecedented in biology' to sequence the complete human genome.
ü In 1989, The Department of Energy (DOE) moved ahead, soon challenged by the National Institutes of Health (NIH). Result: a 'national' program to sequence the human genome for $3,000,000,000 in government spending over 15 years
The Human Genome Project - Infrastructure: the US Human Genome Project is the result of the combined effort of two government organizations:
ü The primary Human Genome Project sequencing sites: The "G5"
Phase II - The First 5 years: [1990-93]
ü NIH establishes itself as lead agency with funding apportioned 2:1 (NIH: DOE) - The first HGP director, James Watson captured control of the project for NIH and designed both the scientific and organizational strategy for its implementation.
ü Watson also expanded the national program to an international venture, particularly with the Sanger Center (Wellcome Trust), called the International Human Genome Sequencing Consortium (IHGSC) [nationwide and worldwide genome centers].
ü Scientific Five-Year Goals of the U.S. Human Genome Project from the NIH-DOE Five Year Plan Implemented October 1, 1990
1. Identify all the 'approximately 100,000 genes' in human DNA, (now known to be ~20,000-25,000 genes)
2. Make them accessible for further biological study
3. Determine the sequences of the 3 billion bases that make up human DNA,
4. Store this information in databases,
5. Develop faster, more efficient sequencing technologies,
6. Develop tools for data analysis, and
7. Address the ethical, legal, and social issues (ELSI) that may arise from the project.
ü Waston Resigns from Genome Project [PDF] April 1992 this stage ended in a crisis for HGP when Watson resigned in a dispute with the NIH director Bernadine Healy and J. Craig Venter an NIH (but not HGP) scientist who wanted the government to patent ESTs.
Phase III -Gathering Speed: [1993 – 1998]
ü A new leader, Dr. Francis Collins, MD, PhD, a geneticist from the University of Michigan takes command of HGP. With the project falling somewhat behind scale in progress, the HGP under Collins soon began to greatly accelerate the pace of the HGP in terms of progress and project growth. This phase ended with another crisis due to external threat, this time again from J. Craig Venter.
ü Big Result from this period -1997: A Gene Map of the Human Genome 5 September 1997.
ü Landmark technology: High throughput automated DNA sequencers.
ü Another important policy decision of the time: NHGRI Rapid Data Release Policy: A main 'ground rule' of the HGP: 'Bermuda principles'.
In 1996 February, at a meeting in Bermuda, international partners in the genome project agree to formalize the conditions of data access, "which expressly call for automatic, rapid release (in this case, within 24 hours) of sequence assemblies of 1 to 2 kilobase (kb) or greater to the public domain. These principles were publicly endorsed by U.S. President Bill Clinton and British Prime Minister Tony Blair in a joint statement issued in March 2000."
The three main ideas of the Bermuda Principles scribbled on a blackboard by John Sulston in 1996
o Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours).
o Immediate publication of finished annotated sequences.
o Aim to make the entire sequence freely available in the public domain for both research and development in order to maximise benefits to society.
ü "The highest priority of the International Human Genome Sequencing Consortium is ensuring that sequencing data from the human genome is available to the world's scientists rapidly, freely and without restriction." From NHGRI's Data Release and Access Principles and Policy.
ü May 1998: Race for the Genome: J. Craig Venter, with Perkin Elmer's applied Biosystem's Michael Hunkapillar, creates Celera Genomics. Goal: sequence the entire human genome by December 31, 2001 (2 years before the completion by the HGP, and for a mere $300 million) by a method untested in a complex eukaryotic genome: Whole Genome Shotgun Sequencing, using 300 hi-speed automated DNA sequencers running in parallel, 24 hours a day. Venter calls the plan a "mutually rewarding partnership between public and private institutions."
Phase IV – Reorientation: [October 1998-2001]
ü In response to was widely perceived as a race to sequence the human genome, the HGP shifted dramatically to a crash project. Collins reoriented the HGP, altering scientific and organizational strategy.
ü "The controversy escalated, as did the acrimony, public visibility, and political pressures surrounding the completion of the HGP". President Bill Clinton, Prime Minister TonyBlair, and DOE became involved.
ü On June 21, 2000: Draft Sequence - the completion of the Rough Draft was announced: Hundreds of thousands of gaps, many 'flipped' or misassembled regions, but still a good draft!
Phase V – Finishing: [2001-2005]
ü Mouse Genome Sequencing Consortium December 2002.
ü Human Chromosome 14 Completed, January 2003.
ü April 13, 2003- Finished Sequence for gene containing regions - 13 years after the HGP began; 2 years ahead of schedule, and in time for the 50th anniversary of the Watson and Crick publication of the structure of DNA: "International Consortium Completes Human Genome Project: All Goals Achieved; New Vision for Genome Research Unveiled."
ü The Finished sequence produced by the Human Genome Project covers about 99 percent of the human genome's gene-containing regions, and it has been sequenced to an accuracy of 99.9% (no more than 1 error in 10,000).
ü Human Chromosome Y Completed, June 2003.
ü Human Chromosome 7 Completed, July 2003.
ü The Finished Sequence for Euchromatin Nature 431, 931 - 945 (21 Oct 2004) 2,851,330,913 nucleotides finished, at an error rate of 99.99% (no more than 1 error in 100,000). 341 gaps remain, (~228,000,000 nucleotides, ~198,000,000 bases of which is highly condensed heterochromatin - chromosomal replication and maintainence, only ~28,000,000 bases of active, gene-containing euchromatin).
ü Human Gene Count Estimates Changed to 20,000 to 25,000, October 21, 2004.
Major Events in the U.S. Human Genome Project
1990
· DOE and NIH present joint 5-year U.S. HGP plan to Congress. The 15-year project formally begins.
· Projects begun to mark gene sites on chromosome maps as sites of mRNA expression.
· Research and development begun for efficient production of more stable, large-insert BACs.
1991
· Human chromosome mapping data repository, GDB, established.
1992
· Low-resolution genetic linkage map of entire human genome published.
· Guidelines for data release and resource sharing announced by DOE and NIH.
1993
· International IMAGE (Integrated Molecular Analysis of Gene Expression) Consortium established to coordinate efficient mapping and sequencing of gene-representing cDNAs.
· DOE-NIH ELSI Working Group's Task Force on Genetic and Insurance Information releases recommendations.
· DOE and NIH revise 5-year goals.
· French Généthon provides mega YACs to the genome community.
· IOM (Institute of Medicine) releases U.S. HGP-funded report, "Assessing Genetic Risks."
· LBNL (Lawrence Berkeley National Laboratory) implements novel transposon-mediated chromosome-sequencing system.
· GRAIL (Gene Recognition and Analysis Internet Link) sequence-interpretation service provides Internet access at ORNL (Oak Ridge National Laboratory).
1994
· Genetic-mapping 5-year goal achieved 1 year ahead of schedule.
· Completion of second-generation DNA clone libraries representing each human chromosome by LLNL (Lawrence Livermore National Laboratory) and LBNL (Lawrence Berkeley National Laboratory).
· Genetic Privacy Act, first U.S. HGP legislative product, proposed to regulate collection, analysis, storage, and use of DNA samples and genetic information obtained from them, endorsed by ELSI (ethical, legal, and social issues) Working Group.
· DOE MGP (Microbial Genome Project) launched; spin-off of HGP.
· LLNL chromosome paints commercialized.
· SBH (Sequencing by hybridization) technologies from ANL (Argonne National Laboratory) commercialized.
· DOE HGP Information Web site activated for public and researchers.
1995
· LANL (Los Alamos National Laboratory) and LLNL (Los Alamos National Laboratory) announce high-resolution physical maps of chromosome 16 and chromosome 19, respectively.
· Moderate-resolution maps of chromosomes 3, 11, 12, and 22 maps published.
· Physical map with over 15,000 STS markers published.
· First (nonviral) whole genome sequenced (for the bacterium Haemophilus influenzae).
· Sequence of smallest bacterium, Mycoplasma genitalium, completed; provides a model of the minimum number of genes needed for independent existence.
· EEOC guidelines extend ADA employment protection to cover discrimination based on genetic information related to illness, disease, or other conditions.
1996
· Methanococcus jannaschii genome sequenced; confirms existence of third major branch of life on earth.
· DOE initiates 6 pilot projects on BAC (bacterial artificial chromosome) end sequencing.
· Health Care Portability and Accountability Act prohibits use of genetic information in certain health-insurance eligibility decisions, requires DHHS (Department of Health and Human Services) to enforce health- information privacy provisions.
· HGP Participants Agree on Sequencing Data Release Policies Bermuda Conference
· DOE and NCHGR (National Center for Human Genome Research) issue guidelines on use of human subjects for large-scale sequencing projects.
· Saccharomyces cerevisiae (yeast) genome sequence completed by international consortium.
· Sequence of the human T-cell receptor region completed.
· Wellcome Trust sponsors large-scale sequencing strategy meeting for international coordination of human genome sequencing.
1997
· NIH NCHGR (National Center for Human Genome Research) becomes National Human Genome Research Institute (NHGRI).
· Escherichia coli genome sequence completed.
· Second large-scale sequencing strategy meeting held in Bermuda.
· High-resolution physical maps of chromosomes X and 7 completed.
· DOE-NIH Task Force on Genetic Testing releases final report and recommendations.
· DOE forms Joint Genome Institute for implementing high-throughput activities at DOE human genome centers, initially in sequencing and functional genomics.
· UNESCO (United Nations Educational, Scientific, and Cultural Organization) adopts Universal Declaration on the Human Genome and Human Rights.
1998
· Hospital for Sick Children, Toronto, Ontario, to continue GDB data collection, curation.
· Caenorhabditis elegans genome sequence completed.
· DOE (Department of Energy) and NIH (National Institutes of Health) reveal new five-year plan for HGP, predict project completion by 2003.
· JGI exceeds sequencing goal, achieves 20 Mb for FY 1998.
· GeneMap'98 containing 30,000 markers released.
· Incyte Pharmaceuticals announces plans to sequence human genome in 2 years.
· Mycobacterium tuberculosis bacterium sequenced.
· Celera Genomics formed to sequence much of human genome in 3 years using HGP-generated resources.
· DOE (Department of Energy) funds production BAC end sequencing projects
· Largest-ever ELSI (ethical, legal, and social issues) meeting attended by over 800 from diverse disciplines and sponsored by DOE; Whitehead Institute; and the American Society of Law, Medicine, and Ethics.
· Human Genome Project passes midpoint.
1999
· First Human Chromosome Completely Sequenced. On December 1, researchers in the Human Genome Project announced the complete sequencing of the DNA making up human chromosome 22.
· Joint Genome Institute sequencing facility opens in Walnut Creek, CA.
· Major Drug Firms Create Public SNP Consortium
· The Billion Base Pair Celebration November 23, 1999. Bruce Alberts, President, National Academy of Sciences and early planner of the Genome Project; Francis Collins, Director, NHGRI; Secretary of HHS, Donna Shalala; Secretary of DOE, Bill Richardson. (Total Running Time: 01:09:45; Bandwidth: 146 Kbps)
· HGP advances goal for obtaining a draft sequence of the entire human genome from 2001 to 2000.
2000
· HGP leaders and President Clinton announce the completion of a "working draft" DNA sequence of the human genome.
· An Interview with Ari Patrinos, Director U.S. DOE Human Genome Program;
· International research consortium publishes chromosome 21 genome, the smallest human chromosome and the second to be completely sequenced.
· DOE researchers announce completion of chromosomes 5, 16, and 19 draft sequence.
· International collaborators publish genome of fruit fly Drosophila melanogaster.
· President Clinton signs executive order prohibiting federal departments and agencies from using genetic information in hiring or promoting workers.
2001
· Human Chromosome 20 Finished - Chromosome 20 is the third chromosome completely sequenced to the high quality specified by the Human Genome Project.
· Publication of Initial Working Draft Sequence February 12, 2001
Special issues of Science (Feb. 16, 2001) and Nature (Feb. 15, 2001) contain the working draft of the human genome sequence. Nature papers include initial analysis of the descriptions of the sequence generated by the publicly sponsored Human Genome Project, while Science publications focus on the draft sequence reported by the private company, Celera Genomics. A press conference was held at 10 a.m., Monday, February 12, 2001, to discuss the landmark publications.
· Pieter de Jong's team (now at the Oakland Children's Hospital, Oakland, CA) was a major provider of the BAC libraries used in the sequencing of the human and several other genomes.
2002
· Mouse Genome Sequencing Consortium publishes its draft sequence of mouse genome in the December 5, 2002, issue of Nature.
· International consortium led by the DOE (Department of Energy) Joint Genome Institute publishes draft sequence of Fugu rubripes.
2003
· Human Chromosome 6 Completed, October 2003.
· Human Chromosome 7 Completed, July 2003.
· Human Chromosome Y Completed, June 2003.
· Human Genome Project Declared Complete, April 2003 [Corresponding Nature and Science]
· Human Chromosome 14 Finished - Chromosome 14 is the fourth chromosome to be completely sequenced.
2004
· Human Chromosome 16 Completed, December 2004.
· Finishing the euchromatic sequence of the human genome, Nature, Oct. 21, 2004
· Human Gene Count Estimates Changed to 20,000 to 25,000, October 2004.
· Human Chromosome 5 Completed, September 2004.
· Human genome: Quality assessment of the human genome sequence.
· Human Chromosome 9 Completed, May 2004.
· Human Chromosome 10 Completed, May 2004.
· Human Chromosome 18 Completed, March 2004.
· Human Chromosome 19 Completed, March 2004.
· Human Chromosome 13 Completed, March 2004.
2005
· Human Chromosome 4 Completed, April 2005.
· Human Chromosome 2 Completed, April 2005.
· Human Chromosome X Completed, March 2005.
2006
· Human Chromosome 1 Completed, May 2006.
· Human Chromosome 3 Completed, April 2006.
· Human Chromosome 17 Completed, April 2006.
· Human Chromosome 11 Completed, March 2006.
· Human Chromosome 12 Completed, March 2006.
· Human Chromosome 15 Completed, March 2006.
· Human Chromosome 8 Completed, January 2006.
2008
· Genetic Information Nondiscrimination Act (GINA) Becomes Law, May 2008.
· Landmark Paper: Mapping and sequencing of structural variation from eight human genomes, Nature, May 1, 2008.
Public versus Private Approaches
In 1998, a similar, privately funded quest was launched by the American researcher Craig Venter, and his firm Celera Genomics. Venter was a scientist at the NIH during the early 1990s when the project was initiated. The $300,000,000 Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion publicly funded project.
Celera used a technique called whole genome shotgun sequencing, employing pairwise end sequencing (Roach, Boysen, Wang, Hood, 1995), which had been used to sequence bacterial genomes of up to six million base pairs in length, but not for anything nearly as large as the three billion base pair human genome.
Celera initially announced that it would seek patent protection on "only 200–300" genes, but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100–300 targets. The firm eventually filed preliminary ("place-holder") patent applications on 6,500 whole or partial genes. Celera also promised to publish their findings in accordance with the terms of the 1996 "Bermuda Statement," by releasing new data annually (the HGP released its new data daily), although, unlike the publicly funded project, they would not permit free redistribution or scientific use of the data. The publicly funded competitor UC Santa Cruz was compelled to publish the first draft of the human genome before Celera for this reason. On July 7, 2000, the UCSC Genome Bioinformatics Group released a first working draft on the web. The scientific community downloaded one-half trillion bytes of information from the UCSC genome server in the first 24 hours of free and unrestricted access to the first ever assembled blueprint of our human species.
In March 2000, President Clinton announced that the genome sequence could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology heavy Nasdaq. The biotechnology sector lost about $50 billion in market capitalization in two days.
Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published details of their drafts.
Special issues of Nature (which published the publicly funded project's scientific paper) and Science (which published Celera's paper) described the methods used to produce the draft sequence and offered analysis of the sequence. These drafts covered about 83% of the genome (90% of the euchromatic regions with 150,000 gaps and the order and orientation of many segments not yet established).
In February 2001, at the time of the joint publications, press releases announced that the project had been completed by both groups. Improved drafts were announced in 2003 and 2005, filling in to ≈92% of the sequence currently.
The competition proved to be very good for the project, spurring the public groups to modify their strategy in order to accelerate progress. The rivals at UC Santa Cruz initially agreed to pool their data, but the agreement fell apart when Celera refused to deposit its data in the unrestricted public database GenBank. Celera had incorporated the public data into their genome, but forbade the public effort to use Celera data.
HGP is the most well known of many international genome projects aimed at sequencing the DNA of a specific organism. While the human DNA sequence offers the most tangible benefits, important developments in biology and medicine are predicted as a result of the sequencing of model organisms, including mice, fruit flies, zebrafish, yeast, nematodes, plants, and many microbial organisms and parasites.
In 2004, researchers from the International Human Genome Sequencing Consortium (IHGSC) of the HGP announced a new estimate of 20,000 to 25,000 genes in the human genome. Previously 30,000 to 40,000 had been predicted, while estimates at the start of the project reached up to as high as 2,000,000. The number continues to fluctuate and it is now expected that it will take many years to agree on a precise value for the number of genes in the human genome
History
In 1976, the genome of the RNA virus Bacteriophage MS2 was the first complete genome to be determined, by Walter Fiers and his team at the University of Ghent (Ghent, Belgium).
The idea for the shotgun technique came from the use of an algorithm that combined sequence information from many small fragments of DNA to reconstruct a genome. This technique was pioneered by Frederick Sanger to sequence the genome of the Phage Φ-X174, a virus (bacteriophage) that primarily infects bacteria that was the first fully sequenced genome (DNA-sequence) in 1977. The technique was called shotgun sequencing because the genome was broken into millions of pieces as if it had been blasted with a shotgun.
In order to scale up the method, both the sequencing and genome assembly had to be automated, as they were in the 1980s.
Those techniques were shown applicable to sequencing of the first free-living bacterial genome (1.8 million base pairs) of Haemophilus influenzae in 1995 and the first animal genome (~100 Mbp). It involved the use of automated sequencers, longer individual sequences using approximately 500 base pairs at that time. Paired sequences separated by a fixed distance of around 2000 base pairs which were critical elements enabling the development of the first genome assembly programs for reconstruction of large regions of genomes (also known as 'contigs').
Three years later, in 1998, the announcement by the newly-formed Celera Genomics that it would scale up the pairwise end sequencing method to the human genome was greeted with skepticism in some circles.
The shotgun technique breaks the DNA into fragments of various sizes, ranging from 2,000 to 300,000 base pairs in length, forming what is called a DNA "library".
Using an automated DNA sequencer the DNA is read in 800bp lengths from both ends of each fragment. Using a complex genome assembly algorithm and a supercomputer, the pieces are combined and the genome can be reconstructed from the millions of short, 800 base pair fragments.
The success of both the public and privately funded effort hinged upon a new, more highly automated capillary DNA sequencing machine, called the Applied Biosystems 3700, that ran the DNA sequences through an extremely fine capillary tube rather than a flat gel.
Even more critical was the development of a new, larger-scale genome assembly program, which could handle the 30–50 million sequences that would be required to sequence the entire human genome with this method. At the time, such a program did not exist.
One of the first major projects at Celera Genomics was the development of this assembler, which was written in parallel with the construction of a large, highly automated genome sequencing factory.
Development of the assembler was led by Brian Ramos. The first version of this assembler was demonstrated in 2000, when the Celera team joined forces with Professor Gerald Rubin to sequence the fruit fly Drosophila melanogaster using the whole-genome shotgun method. At 130 million base pairs, it was at least 10 times larger than any genome previously shotgun assembled.
One year later, the Celera team published their assembly of the three billion base pair human genome.
The Human Genome Project was a 13 year old mega project, which was launched in the year 1990 and completed in 2003. This project is closely associated to the branch of biology called Bio-informatics. The human genome project international consortium announced the publication of a draft sequence and analysis of the human genome - the genetic blueprint for the human being.
An American company - Celera, led by Craig Venter and the other huge international collaboration of distinguished scientists led by Francis Collins, director, National Human Genome Research Institute, U.S., both published their findings.
This Mega Project is coordinated by the U.S. Department of Energy and the National Institute of Health. During the early years of the project, the Wellcome Trust (U.K.) became a major partner, other countries like Japan, Germany, China and France contributed significantly. Already the atlas has revealed some starting facts.
The two factors that made this project a success is:
1. Genetic Engineering Techniques, with which it is possible to isolate and clone any segment of DNA.
2. Availability of simple and fast technologies, to determining the DNA sequences.
Being the most complex organisms, human beings was expected to have more than 100,000 genes or combination of DNA that provides commands for every characteristics of the body. Instead their studies show that humans have only 30,000 genes - around the same as mice, three times as many as flies, and only five times more than bacteria.
Scientist told that not only are the numbers similar, the genes themselves, baring a few, are alike in mice and men. In a companion volume to the Book of Life, scientists have created a catalogue of 1.4 million single-letter differences, or single-nucleotide polymorphisms (SNPs) and specified their exact locations in the human genome. This SNP map, the world's largest publicly available catalogue of SNP's, promises to revolutionize both mapping diseases and tracing human history.
The sequence information from the consortium has been immediately and freely released to the world, with no restrictions on its use or redistribution. The information is scanned daily by scientists in academia and industry, as well as commercial database companies, providing key information services to bio-technologists.
Already, many genes have been identified from the genome sequence, including more than 30 that play a direct role in human diseases. By dating the three millions repeat elements and examining the pattern of interspersed repeats on the Y-chromosome, scientists estimated the relative mutation rates in the X and the Y chromosomes and in the male and the female germ lines. They found that the ratio of mutations in male Vs female is 2:1.
Scientists point to several possible reasons for the higher mutation rate in the male germ line, including the fact that there are a greater number of cell divisions involved in the formation of sperm than in the formation of eggs.
At least 18 countries have established human genome research programs. Some of the larger programs are in Australia, Brazil, Canada, China, Denmark, European Union, France, Germany, Israel, Italy, Japan, Korea, Mexico, Netherlands, Russia, Sweden, United Kingdom, and the United States. Some developing countries are participated through studies of molecular biology techniques for genome research and studies of organisms that are particularly interesting to their geographical regions. The Human Genome Organisation (HUGO) helped to coordinate international collaboration in the genome project.
Methods
The IHGSC used pair-end sequencing plus whole-genome shotgun mapping of large (≈100 Kbp) plasmid clones and shotgun sequencing of smaller plasmid sub-clones plus a variety of other mapping data to orient and check the assembly of each human chromosome.
The Celera group emphasized the importance of the “whole-genome shotgun” sequencing method, relying on sequence information to orient and locate their fragments within the chromosome. However they used the publicly available data from HGP to assist in the assembly and orientation process, raising concerns that the Celera sequence was not independently derived (Waterston et al., 2003).
Before the IHGSC had completed the first phase of the Human Genome Project, a private biotechnology company called Celera Genomics also entered the race to sequence the human genome. Led by Dr. Craig Venter, Celera proclaimed that it would sequence the entire human genome within three years.
Celera used two independent data sets together with two distinct computational approaches to determine the sequence of the human genome (Venter et al., 2001). The first data set was generated by Celera and consisted of 27.27 million DNA sequence reads, each with an average length of 543 base pairs, derived from five different individuals. The second data set was obtained from the publicly funded Human Genome Project and was derived from the BAC contigs (called bactigs); here, Celera "shredded" the Human Genome Project DNA sequence into 550-base-pair sequence reads representing a total of 16.05 million sequence reads. The company then used a whole genome assembly method and a regional chromosome assembly method to sequence the human genome.
In the whole-genome assembly method (also called the whole-genome random shotgun method), Celera generated a massive shotgun library derived from its own DNA sequence data combined with the "shredded" Human Genome Project DNA sequence data, which together corresponded to a total of 43.32 million sequence reads (Venter et al., 2001). Celera used computational methods and sophisticated algorithms to identify overlapping DNA sequences and to reconstruct the human genome by generating a set of scaffolds. As shown in figure:
Fig 6: Anatomy of Whole-Genome Assembly
In contrast, with the regional chromosome assembly approach (also called the compartmentalized shotgun assembly method), Celera organized its own data and the Human Genome Project sequence data into the largest possible chromosomal segments, followed by shotgun assembly of the sequence data within each segment (Venter et al., 2001); this approach was similar to the hierarchical shotgun approach used by the IHGSC. The first step of the regional assembly approach involved separating Celera reads that matched Human Genome Project reads from those that were distinct from the public sequence data. Of the 27.27 million Celera reads, 21.38 million matched a Human Genome Project bactig, and 5.89 million did not match the public sequence data. These reads were assembled into Celera-specific or Human Genome Project-specific scaffolds, which were then combined and analyzed using whole-gene assembly algorithms. The resulting bactig data were again "shredded" to permit unbiased assembly of the combined sequence data.
Celera's whole-genome and regional chromosome assembly methods were independent of each other, permitting direct comparison of the data. Celera found that the regional chromosome assembly method was slightly more consistent than the whole-genome assembly method. Using these complementary approaches, Celera generated data that was in strong agreement with that of the IHGSC.
Sequencing of genome was performed in following steps;
ü Chromosomes, which range in size from 50 million to 250 million bases, must first be broken into much shorter pieces (sub cloning step).
ü Each short piece is used as a template to generate a set of fragments that differ in length from each other by a single base that will be identified in a later step (template preparation and sequencing reaction steps).
ü The fragments in a set are separated by gel electrophoresis (separation step).
ü New fluorescent dyes allow separation of all four fragments in a single lane on the gel.
ü The final base at the end of each fragment is identified (base-calling step). This process recreates the original sequence of As, Ts, Cs, and Gs for each short piece generated in the first step.
ü Automated sequencers analyze the resulting electropherograms and the output is a four-color chromatogram showing peaks that represent each of the four DNA bases.
ü After the bases are "read," computers are used to assemble the short sequences (in blocks of about 500 bases each, called the read length) into long continuous stretches that are analyzed for errors, gene-coding regions, and other characteristics.
ü Finished sequence is submitted to major public sequence databases, such as GenBank. Human Genome Project sequence data are thus made freely available to anyone around the world.
As previously mentioned, the IHGSC and Celera used different approaches to determine the sequence of the human genome. However, they used the same general method for the DNA sequencing step (Hood & Galas, 2003). This method uses DNA polymerase, the same enzyme used in DNA replication, to produce DNA sequence information. As shown in Figure (7a), DNA polymerase binds to a single-stranded DNA template and adds DNA bases to the 3′ end of the complementary DNA strand it synthesizes. DNA polymerase requires an existing primer with a free 3′ end to which it adds new DNA bases in a 5′ to 3′ manner, and it moves along the template strand in a 3′ to 5′ direction.
Researchers from both the IHGSC and Celera combined the DNA template they were interested in sequencing with DNA polymerase, a single-stranded DNA primer, free deoxynucleotide bases (dATP, dCTP, dGTP, and dTTP), and a sparse mixture of fluorescently labeled dideoxynucleotide bases (ddATP, ddCTP, ddGTP, and ddTTP) that were each labeled with a different color and would terminate new DNA strand synthesis once incorporated into the end of a growing DNA strand. The mixture was first heated to denature the template DNA strand; this was followed by a cooling step to allow the DNA primer to anneal. Following primer annealing, the polymerase synthesized a complementary DNA strand. The template would grow in length until a dideoxynucleotide base (ddNTP) was incorporated; the conditions were such that this occurred at random along the length of the newly synthesized DNA strands. In the end, the researchers were left with a mixture of newly synthesized DNA strands that differed in length by a single nucleotide, and that were labeled at their 3′ end with the color of the ddNTP-associated dye molecule (Figure 7b).
In order to determine the sequence of the newly synthesized, color-coded DNA strands, researchers needed a way to separate them based on their size, which differed by only one DNA nucleotide. To accomplish this, they electrophoresed the DNA through a gel matrix that permitted single-base differences in size to be easily distinguished. Small fragments run more quickly through the gel, and larger fragments run more slowly (Figure 7c).
By putting the entire mixture into a single well of the gel, a laser can be used to scan the DNA bands as they move through the gel and determine their color; this data can be used to generate a sequence trace (also called an electropherogram), showing the color and signal intensity of each DNA band that passes through the gel (Figure 7d).
The color of each band represents the final 3′ base incorporated at that position, and by reading from the bottom to the top of the gel, one can determine the sequence of the newly synthesized DNA strand from the 5′ to the 3′ end.
Fig 7: How to Sequence DNA
Genome Donors
In the IHGSC international public-sector Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced.
DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Dr. Pieter J. de Jong. It has been informally reported, and is well known in the genomics community, that much of the DNA for the public HGP came from a single anonymous male donor from Buffalo, New York (code name RP11) (Osoegawa et al., 2001).
HGP scientists used white blood cells from the blood of two male and two female donors (randomly selected from 20 of each) each donor yielding a separate DNA library.
One of these libraries (RP11) was used considerably more than others, due to quality considerations. One minor technical issue is that male samples contain just over half as much DNA from the sex chromosomes (one X chromosome and one Y chromosome) compared to female samples (which contain two X chromosomes). The other 22 chromosomes (the autosomes) are the same for both genders.
Although the main sequencing phase of the HGP has been completed, studies of DNA variation continue in the International HapMap Project, whose goal is to identify patterns of single-nucleotide polymorphism (SNP) groups (called haplotypes, or “haps”).
The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese people in Tokyo; Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisms Humain (CEf) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe.
In the Celera Genomics private-sector project, DNA from five different individuals was used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter, later acknowledged (in a public letter to the journal Science) that his DNA was one of 21 samples in the pool, five of which were selected for use (Kennedy, 2002 & Venter, 2003).
On September 4, 2007, a team led by Craig Venter published his complete DNA sequence (Levy, Sutton, Ng PC, et al., September 2007), unveiling the six-billion-nucleotide genome of a single individual for the first time.
Potential Benefits of Human Genome Project Research
Technology and resources generated by the Human Genome Project and other genomics research are already having a major impact on research across the life sciences. The potential for commercial development of genomics research presents U.S. industry with a wealth of opportunities, and sales of DNA-based products and technologies in the biotechnology industry (Consulting Resources Corporation Newsletter, Spring 1999).
The possibilities from the information that will be obtained from the project are virtually endless. It will most likely change many biological and medical research techniques and many of the practices used by our medical professionals today. The knowledge that will be obtained will help lead to new ways of diagnosing, treating, and possibly preventing diseases. Through the discovery of the human genome, the possibilities are endless for agriculture, health services, and new energy sources also. The end result of the HGP will be information about the structure, function and organization of DNA, as we know it today (Daniel Melaas, 1999).
The project will reap fantastic benefits for humankind, some that we can anticipate and others that will surprise us. Generations of biologists and researchers have been provided with detailed DNA information that will be key to understanding the structure, organization, and function of DNA in chromosomes. Genome maps of other organisms will provide the basis for comparative studies that are often critical to understanding more complex biological systems. Information generated and technologies developed are revolutionizing future biological explorations.
Some current and potential applications of genome research include;
v Energy sources and environmental applications
v Bioarchaeology, anthropology, evolution, and human migration
v DNA forensics (identification)
v Agriculture, livestock breeding, and bioprocessing
v Pharmaceutical products
· Improved diagnosis of disease.
· Earlier detection of genetic predispositions to disease.
· Gene therapy and control systems for drugs.
· Pharmacogenomics "custom drugs".
Technology and resources promoted by the Human Genome Project are starting to have profound impacts on biomedical research and promise to revolutionize the wider spectrum of biological research and clinical medicine. Increasingly detailed genome maps have aided researchers seeking genes associated with dozens of genetic conditions, including myotonic dystrophy, fragile X syndrome, neurofibromatosis types 1 and 2, inherited colon cancer, Alzheimer's disease, and familial breast cancer.
On the horizon is a new era of molecular medicine characterized less by treating symptoms and more by looking to the most fundamental causes of disease. Rapid and more specific diagnostic tests will make possible earlier treatment of countless maladies. Medical researchers also will be able to devise novel therapeutic regimens based on new classes of drugs, immunotherapy techniques, and avoidance of environmental conditions that may trigger disease, and possible augmentation or even replacement of defective genes through gene therapy.
(2) Energy and Environmental Applications
· Use microbial genomics research to create new energy sources (biofuels).
· Use microbial genomics research to develop environmental monitoring techniques to detect pollutants.
· Use microbial genomics research for safe, efficient environmental remediation.
· Use microbial genomics research for carbon sequestration.
In 1994, taking advantage of new capabilities developed by the genome project, DOE initiated the Microbial Genome Program to sequence the genomes of bacteria useful in energy production, environmental remediation, toxic waste reduction, and industrial processing.
A follow on program, Genomic Science Program (GSP) builds on data and resources from the Human Genome Project, the Microbial Genome Program, and systems biology. GSP will accelerate understanding of dynamic living systems for solutions to DOE mission challenges in energy and the environment.
Despite our reliance on the inhabitants of the microbial world, we know little of their number or their nature: estimates are that less than 0.01% of all microbes have been cultivated and characterized. Microbial genome sequencing will help lay a foundation for knowledge that will ultimately benefit human health and the environment. The economy will benefit from further industrial applications of microbial capabilities.
Information gleaned from the characterization of complete microbial genomes will lead to insights into the development of such new energy-related biotechnologies as photosynthetic systems, microbial systems that function in extreme environments and organisms that can metabolize readily available renewable resources and waste material with equal facility.
Expected benefits also include development of diverse new products, processes, and test methods that will open the door to a cleaner environment. Biomanufacturing will use nontoxic chemicals and enzymes to reduce the cost and improve the efficiency of industrial processes. Microbial enzymes have been used to bleach paper pulp, stone wash denim, remove lipstick from glassware, break down starch in brewing, and coagulate milk protein for cheese production.
In the health arena, microbial sequences may help researchers find new human genes and shed light on the disease-producing properties of pathogens.
In the field of microbial genomics, this may be able to find new energy sources, through the sequencing of a bacterial genome. This could lead to discoveries that are useful in energy production, toxic waste reduction, and industrial processing.
Microbial genomics will also help pharmaceutical researchers gain a better understanding of how pathogenic microbes cause disease. Sequencing these microbes will help reveal vulnerabilities and identify new drug targets.
Gaining a deeper understanding of the microbial world also will provide insights into the strategies and limits of life on this planet. Data generated in this young program have helped scientists identify the minimum number of genes necessary for life and confirm the existence of a third major kingdom of life.
Additionally, the new genetic techniques now allow us to establish more precisely the diversity of microorganisms and identify those critical to maintaining or restoring the function and integrity of large and small ecosystems; this knowledge also can be useful in monitoring and predicting environmental change. Finally, studies on microbial communities provide models for understanding biological interactions and evolutionary history.
· Assess health damage and risks caused by radiation exposure, including low-dose exposures.
· Assess health damage and risks caused by exposure to mutagenic chemicals and cancer-causing toxins.
· Reduce the likelihood of heritable mutations.
Understanding the human genome will have an enormous impact on the ability to assess risks posed to individuals by exposure to toxic agents. Scientists know that genetic differences make some people more susceptible and others more resistant to such agents. The genetic basis of such variability will directly address DOE's long term mission to understand the effects of low-level exposures to radiation and other energy-related agents, especially in terms of cancer risk.
Human Genome Project technologies also can help to assess health damage and risks caused by radiation exposure, including low-dose exposures. Furthermore, damage and risks caused by exposure to mutagenic chemicals and cancer-causing toxins also can be assessed. Consequently, the likelihood of heritable mutations can be reduced.
(4) Bioarchaeology, Anthropology, Evolution, and Human Migration
· Study evolution through germline mutations in lineages.
· Study migration of different population groups based on female genetic inheritance.
· Study mutations on the Y chromosome to trace lineage and migration of males.
· Compare breakpoints in the evolution of mutations with ages of populations and historical events.
The HGP can also be very useful for the understanding of human evolution and human migration. It may help lead scientists to find out how humans have evolved and how humans are evolving today. It will also help to understand the common biology that we share with all life on earth. Comparing our genome with others may help to lead to associations of diseases with certain traits. Comparative genomics between humans and other organisms such as mice already has led to similar genes associated with diseases and traits. Further comparative studies will help determine the yet unknown function of thousands of other genes.
Comparing the DNA sequences of entire genomes of different microbes will provide new insights about relationships among the three kingdoms of life: Archaebacteria, Eukaryotes, and Prokaryotes.
(5) DNA Forensics (Identification)
· Identify potential suspects whose DNA may match evidence left at crime scenes.
· Exonerate persons wrongly accused of crimes.
· Identify crime and catastrophe victims.
· Establish paternity and other family relationships.
· Identify endangered and protected species as an aid to wildlife officials (could be used for prosecuting poachers).
· Detect bacteria and other organisms that may pollute air, water, soil, and food.
· Match organ donors with recipients in transplant programs.
· Determine pedigree for seed or livestock breeds.
· Authenticate consumables such as caviar and wine
Any type of organism can be identified by examination of DNA sequences unique to that species. Identifying individuals is less precise, although when DNA sequencing technologies progress further, direct characterization of very large DNA segments, and possibly even whole genomes, will become feasible and practical and will allow precise individual identification.
To identify individuals, forensic scientists scan about 10 DNA regions that vary from person to person and use the data to create a DNA profile of that individual (sometimes called a DNA fingerprint). There is an extremely small chance that another person has the same DNA profile for a particular set of regions.
(6) Agriculture, Livestock Breeding, and Bioprocessing
· Disease, insect, and drought-resistant crops.
· Healthier, more productive, disease-resistant farm animals.
· More nutritious produce.
· Edible vaccines incorporated into food products.
· New environmental cleanup uses for plants like tobacco.
Understanding plant and animal genomes will allow us to create stronger, more disease resistant plants and animals reducing the costs of agriculture and providing consumers with more nutritious, pesticide-free foods. Already growers are using bioengineered seeds to grow insect and drought resistant crops that require little or no pesticide. Farmers have been able to increase outputs and reduce waste because their crops and herds are healthier.
This technology could help to develop disease, insect, and drought resistant crops thus being able to produce more for the world. It would also help to produce healthier, more productive, and possibly disease resistant animals to be sent to market.
Alternate uses for crops such as tobacco have been found. One researcher has genetically engineered tobacco plants in his laboratory to produce a bacterial enzyme that breaks down explosives such as TNT and dinitroglycerin. Waste that would take centuries to break down in the soil can be cleaned up by simply growing these special plants in the polluted area.
7. Variations and Mutations
· Scientists have identified about 1.4 million locations where single-base DNA differences (SNPs) occur in humans. This information promises to revolutionize the processes of finding chromosomal locations for disease-associated sequences and tracing human history.
· The ratio of germ line (sperm or egg cell) mutations is 2:1 in male’s vs. females. Researchers point to several reasons for the higher mutation rate in the male germ line, including the greater number of cell divisions required for sperm formation than for eggs.
Much has been made of the possibilities of curing diseases caused by single abnormal genes. But despite over a decade of hype, the worldwide score for gene therapy is: Cures: 0, Deaths: at least 5, Serious Adverse Events: at least 1,000. Mapping the human genome is unlikely to help gene therapy move beyond being only occasionally effective, in a few rare diseases, at great cost.
9. Pharmaceutical Products
It is claimed that precise knowledge of the proteins coded for by specific genes will allow the creation of drugs more accurately aimed at specific proteins. In diseases caused by a variety of genes, each individual would be tested to see which gene he/she carried, and a precisely tailored drug could then be prescribed. The problem is that most chronic diseases, affecting enough Western people for the pharmaceutical industry to be interested in them, are multifactorial. Not only are several genes involved in the etiology, but so is the environment. In the case of high blood pressure only about 30% of the disease burden is genetically determined.
There will be a relatively small number of instances in which testing in adult years will produce results that have an important influence on life decisions. For instance, a positive test for Huntington's disease presymptomatically might lead to a decision not to have children.
Deriving meaningful knowledge from the DNA sequence will define research through the coming decades to inform our understanding of biological systems. This enormous task will require the expertise and creativity of tens of thousands of scientists from varied disciplines in both the public and private sectors worldwide.
Yet we have also discovered that over 50% of the human genome is repetitive sequence that does not code for any proteins and the function of this large portion of “junk” DNA is still puzzling scientists. Along similar lines, the HGP has shown us that the average length of an expressed gene is 3000 bases long.
Genome sequence information has helped scientists more easily identify candidate disease genes; however, we also realize that over 50% of the genes discovered in the human genome are still classified as having unknown function.
Human genome sequence information reveals that genome sequences from person to person are almost (99.9%) identical. Interestingly, comparative genomics shows 95% sequence similarity between the human and chimpanzee genomes. Scientists are just beginning to understand how this small amount of variation contributes to differences in disease incidences in different populations. The discovery of about 3 million locations that have single base differences in the human genome (called single nucleotide polymorphisms or SNPs) offers insights into how genomic information could be used to discover information related to the incidence of common human traits, including susceptibility to certain diseases and illnesses.
The HGP has also shown us that the powerful methods of genome sequencing technology raise important ethical and policy issues for individuals and society. Access to genome sequence information, privacy related issues and the appropriate use of this sort of information are all important issues for researchers, governments, and policy makers worldwide.
For example; The SARS epidemic worldwide is now under control and scientists now have a better understanding of the disease, due in part to the knowledge derived from genome sequencing. It was less than one month, from the isolation of the virus to the publication of the sequence of the SARS genome. This amazing pace of current research can be attributed directly to the development of the technology and expertise emerging out of the HGP and serves to illustrate how genome sequencing technology and bioinformatics will benefit our basic understanding of life and disease processes.
The draft sequence already is having an impact on finding genes associated with disease. A number of genes have been pinpointed and associated with breast cancer, muscle disease, deafness, and blindness. Additionally, finding the DNA sequences underlying such common diseases as cardiovascular disease, diabetes, arthritis, and cancers is being aided by the human variation maps (SNPs) generated in the HGP in cooperation with the private sector. These genes and SNPs provide focused targets for the development of effective new therapies.
One of the greatest impacts of having the sequence may well is in enabling an entirely new approach to biological research. In the past, researchers studied one or a few genes at a time. With whole-genome sequences and new high-throughput technologies, they can approach questions systematically and on a grand scale. They can study all the genes in a genome, for example, or all the transcripts in a particular tissue or organ or tumor, or how tens of thousands of genes and proteins work together in interconnected networks to orchestrate the chemistry of life.
The work on interpretation of genome data is still in its initial stages. It is anticipated that detailed knowledge of the human genome will provide new avenues for advances in medicine and biotechnology.
Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer, disorders of hemostasis, cystic fibrosis, liver diseases and many others.
Also, the etiologies for cancers, Alzheimer's disease and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.
There are also many tangible benefits for biological scientists. For example, a researcher investigating a certain form of cancer may have narrowed down his/her search to a particular gene.
By visiting the human genome database on the World Wide Web, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, diseases associated with this gene or other datatypes.
The HGP has great potential to benefit society;
Fig 8: The Diversity of Genomic Applications to Society (Genomics hold promise for advances in fields ranging from medicine and agriculture, all the way to energy production. This global impact is just beginning to be felt.)
A merging of cytogenetic approaches with the human genome sequence will continue to propel our understanding of human disease to an entirely new level. Thus, although it was met with skepticism at its inception, the Human Genome Project will certainly be heralded as one of the most important scientific endeavors of our time.
Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.
The analysis of similarities between DNA sequences from different organisms is also opening new avenues in the study of evolution. In many cases, evolutionary questions can now be framed in terms of molecular biology; indeed, many major evolutionary milestones (the emergence of the ribosome and organelles, the development of embryos with body plans, the vertebrate immune system) can be related to the molecular level.
Many questions about the similarities and differences between humans and our closest relatives (the primates, and indeed the other mammals) are expected to be illuminated by the data from this project.
The Human Genome Diversity Project (HGDP), spinoff research aimed at mapping the DNA that varies between human ethnic groups, which was rumored to have been halted, actually did continue and to date has yielded new conclusions.
In the future, HGDP could possibly expose new data in disease surveillance, human development and anthropology. HGDP could unlock secrets behind and create new strategies for managing the vulnerability of ethnic groups to certain diseases. It could also show how human populations have adapted to these vulnerabilities.
Advantages of Human Genome Project:
1. Knowledge of the effects of variation of DNA among individuals can revolutionize the ways to diagnose, treat and even prevent a number of diseases that affects the human beings.
2. It provides clues to the understanding of human biology.
Ethical, Legal and Social Issues
The project's goals included not only identifying all of the approximately 24,000 genes in the human genome, but also to address the ethical, legal, and social issues (ELSI) that might arise from the availability of genetic information. Five percent of the annual budget was allocated to address the ELSI arising from the project.
The U.S. Department of Energy (DOE) and the National Institutes of Health (NIH) devoted 3% to 5% of their annual Human Genome Project (HGP) budgets toward studying the ethical, legal, and social issues (ELSI) surrounding availability of genetic information. This represents the world's largest bioethics program, which has become a model for ELSI programs around the world.
Societal Concerns Arising from the New Genetics
Ø Fairness in the use of genetic information, This issue is targeted mainly at who should have access to genetic records and how can they be used. Some of those targeted are insurers, employers, courts, schools, and the military. If this information is used by some of these agencies there could be discrimination based on genetic disorders. This discrimination could be from diseases that run in a family to mental disorders that a person cannot help.
- Who should have access to personal genetic information, and how will it be used?
Ø Privacy and confidentiality of genetic information, For certain reasons, many people would want for no one to see what their genetic makeup is. There would also be concerns of psychological problems associated with knowing your own genetic makeup. If someone were to find out they have a good chance of developing a rare disease it would most likely drastically change their thinking on life. For reproduction, there could be compatibility problems of two individuals to have normal children. This would cause stress in a large number of people's lives.
- Who owns and controls genetic information?
Ø Psychological impact and stigmatization due to an individual's genetic differences.
- How does personal genetic information affect an individual and society's perceptions of that individual?
- How does genomic information affect members of minority communities?
Ø Reproductive issues including adequate informed consent for complex and potentially controversial procedures, use of genetic information in reproductive decision making, and reproductive rights.
- Do healthcare personnel properly counsel parents about the risks and limitations of genetic technology?
- How reliable and useful is fetal genetic testing? What are the larger societal issues raised by new reproductive technologies?
Ø Use of gene therapy to treat disease, the use of a person's genome to tell if a person carries a genetic disease will help in the treatment of these diseases. In gene therapy a faulty or infected gene is replaced with a normal gene, so the individual does not display the trait that they were naturally born with. Many people feel that this is wrong because we are more or less taking over the course of nature, and they feel that this is not the natural way.
Ø Clinical issues including the education of doctors and other health service providers, patients, and the general public in genetic capabilities, scientific limitations, and social risks; and implementation of standards and quality-control measures in testing procedures.
- How will genetic tests be evaluated and regulated for accuracy, reliability,
and utility? (Currently, there is little regulation at the federal level.)
- How do we prepare healthcare professionals for the new genetics?
- How do we prepare the public to make informed choices?
- How do we as a society balance current scientific limitations and social risk with long-term benefits?
Ø Uncertainties associated with gene tests for susceptibilities and complex conditions (e.g., heart disease) linked to multiple genes and gene-environment interactions.
- Should testing be performed when no treatment is available?
- Should parents have the right to have their minor children tested for adult-onset diseases?
- Are genetic tests reliable and interpretable by the medical community?
Ø Conceptual and philosophical implications regarding human responsibility, free will vs. genetic determinism, and concepts of health and disease.
- Do people's genes make them behave in a particular way?
- Can people always control their behavior?
- What is considered acceptable diversity?
- Where is the line between medical treatment and enhancement?
Ø Health and environmental issues concerning genetically modified (GM) foods and microbes.
- Are GM foods and other products safe to humans and the environment?
- How will these technologies affect developing nations' dependence on the West?
Ø Commercialization of products including property rights (patents, copyrights, and trade secrets) and accessibility of data and materials. If there are only a few agencies that are working on the project, who will get the rights to the technology. The major concerns will most likely be over the patents and copyrights of the technology.
- Who owns genes and other pieces of DNA?
- Will patenting DNA sequences limit their accessibility and development into useful products?
Not all of these topics are directly related to the HGP. For example, forensics definitely raises some ethical concerns, but it is not directly related to the HGP nor is gene testing, gene therapy, cloning, and behavioral genetics.
The use of sequencing will make a profound impact on genetic screening of individuals. Medical professionals will be able to look at a person's genome and be able to tell many things about a person just by looking at the person's genes.
Debra Harry, Executive Director of the U.S group Indigenous Peoples Council on Biocolonialism (IPCB), says that despite a decade of ELSI funding, the burden of genetics education has fallen on the tribes themselves to understand the motives of Human genome project and its potential impacts on their lives.
Meanwhile, the government has been busily funding projects studying indigenous groups without any meaningful consultation with the groups (Harry, 2001).
The main criticism of ELSI is the failure to address the conditions raised by population based research, especially with regard to unique processes for group decision-making and cultural worldviews. Genetic variation research such as HGP is group population research, but most ethical guidelines, according to Harry, focus on individual rights instead of group rights. She says the research represents a clash of culture: indigenous people's life revolves around collectivity and group decision making whereas the Western culture promotes individuality. Harry suggests that one of the challenges of ethical research is to include respect for collective review and decision making, while also upholding the Western model of individual rights.
There are also critics of the HGP that contend that the high cost of the project is not justified. Some critics also say that the ability to diagnose a genetic disorder before any treatment is available causes more harm than good, because it will create anxiety and frustration among individuals. There is also the very big question of what is "normal". When and where will the use of genetic material be able to be used in society after the HGP is finished?
Summary
Within a span of only 13 years, an amalgam of public and private researchers was able to successfully complete the Human Genome Project. Although these scientists used a number of different methods in their work, they nonetheless obtained the same results. In doing so, the researchers not only silenced their critics, but they also beat their own estimated project timeline by two entire years. Perhaps even more importantly, these scientists inspired an ongoing revolution in our fight against human disease and provided a new vision of the future of medicine-although that future has yet to be fully realized.
1) Adams, MD., et al. (2000) The genome sequence of Drosophila melanogaster, Science 287 (5461): 2185–2195.
3) C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: A platform for investigating biology, Science 282 (5396): 2012- 18.
4) Chial, H. (2008) DNA sequencing technologies key to the Human Genome Project, Nature Education 1(1).
7) Fiers, W., Contreras, R., Duerinck F., et al. (April 1976) Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene, Nature 260 (5551): 500–7.
8) Fleischmann, R. D., et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd, Science 269 (5223): 496–512.
9) Fox, j. (Aug 2003) The Human Genome project: The impact of human genome sequencing on human health, Science 300(5624): 1399-404.
13) Harry (January/April 2001) Biopiracy and Globalization: Indigenous Peoples face a new wave of Colonialism, Splice 7 (2 & 3).
14) Hood, L., & Galas, D. (2003) The digital code of DNA. Nature 421: 444–448.
19) International Human Genome Sequencing Consortium, Finishing the euchromatic sequence of the human genome (2004) Nature 431: 931–945.
21) Kennedy, D. (2002) Not wicked, perhaps, but tacky, Science 297 (5585): 1237.
24) Marshall, Elliot (27 sept 1996) Whose Genome Is It Anyway? Science 273: 1788-1789. 6.
25) Marshall, Elliot (16 July, 1999) Commercial Firms Win U.S. Sequencing Funds. Science 285: 310.
26) Marshall, Elliot (1998-2003) New Goals for the U.S. Human Genome Project, Science 282: 682-688.
27) Melaas, D. (1999) Human genome Project.
28) Nature (15-02-2001) Issue 409(6822).
29) Nature Publishing Group, IHGSC, Initial sequencing and analysis of the human genome (2001) Nature 409: 860-921.
32) Osoegawa, Kazutoyo, Mammoser, A.G., Wu, C., Frengen, E., Zeng, C., Catanese, J.J., De Jong, P.J. (2001) A Bacterial Artificial Chromosome Library for Sequencing the Complete Human Genome, Genome Research 11 (3): 483–96. http://www.genome.org/cgi/content/full/11/3/483.
34) Sanger, F., Air, GM., Barrell, BG., et al. (February 1977) Nucleotide sequence of bacteriophage phi X174 DNA, Nature 265 (5596): 687–95.
35) Science (16-02-2001) Issue 291(5507).
36) Science (30 may 2003) 300(5624):1399-404.
40) Venter, J. C., et al. (2001) The sequence of the human genome, Science 291: 1304–1351.
41) Venter, D. (2003) A Part of the Human Genome Sequence, Science 299 (5610): 1183–4.
************************************************************************