Genes

Each DNA molecule contains many genes--the basic physical and functional units of heredity. A gene is a specific sequence of nucleotide bases, whose sequences carry the information required for constructing proteins, which provide the structural components of cells and tissues as well as enzymes for essential biochemical reactions. The human genome is estimated to comprise approximately 80,000 genes.

Human genes vary widely in length, often extending over thousands of bases, but only about 10% of the genome is known to include the protein-coding sequences (exons) of genes. Interspersed within many genes are intron sequences, which have no coding function. The balance of the genome is thought to consist of other noncoding regions (such as control sequences and intergenic regions), whose functions are obscure.

All living organisms are composed largely of proteins; humans can synthesize about 80,000 different kinds. Proteins are large, complex molecules made up of long chains of subunits called amino acids. Twenty different kinds of amino acids are usually found in proteins.

Within the gene, each specific sequence of three DNA bases (codons) directs the cells protein-synthesizing machinery to add specific amino acids. For example, the base sequence ATG codes for the amino acid methionine.
Since 3 bases code for 1 amino acid, the  protein coded by an average-sized gene (3000 bp) will contain 1000 amino acids. The genetic code is thus a series of codons that specify which amino acids are required to make up specific proteins.

          From genes to proteins

The protein-coding instructions from the genes are transmitted indirectly through messenger ribonucleic acid (mRNA), a transient intermediary molecule similar to a single strand of DNA. For the information within a gene to be expressed, a complementary RNA strand is produced (a process called transcription) from the DNA template in the nucleus. This mRNA is moved from the nucleus to the cellular cytoplasm, where it serves as the template for protein synthesis. The cells protein-synthesizing machinery then translates the codons into a string of amino acids that will constitute the protein molecule for which it codes. In the laboratory, the mRNA molecule can be isolated and used as a template to synthesize a complementary DNA (cDNA) strand, which can then be used to locate the corresponding genes on a chromosome map.

Chromosomes

The 3 billion bp in the human genome are organized into 24 distinct, physically separate microscopic units called chromosomes. All genes are arranged linearly along the chromosomes. The nucleus of most human cells contains 2 sets of chromosomes, 1 set given by each parent. Each set has 23 single chromosomes--22 autosomes and an X or Y sex chromosome. (A normal female will have a pair of X chromosomes; a male will have an X and Y pair.)
Chromosomes contain roughly equal parts of protein and DNA; chromosomal DNA contains an average of 150 million bases. DNA molecules are among the largest molecules now known.

Chromosomes can be seen under a light microscope and, when stained with certain dyes, reveal a pattern of light and dark bands reflecting regional variations in the amounts of A and T vs G and C. Differences in size and banding pattern allow the 24 chromosomes to be distinguished from each other, an analysis called a karyotype. A few types of major chromosomal abnormalities, including missing or extra copies of a chromosome or gross breaks and rejoinings (translocations), can be detected by microscopic examination; Downs syndrome, in which an individual's cells contain a third copy of chromosome 21, is diagnosed by karyotype analysis.

Most changes in DNA, however, are too subtle to be detected by this technique and require molecular analysis. These subtle DNA abnormalities (mutations) are responsible for many inherited diseases such as cystic fibrosis
and sickle cell anemia or may predispose an individual to cancer, major psychiatric illnesses, and other complex diseases.
 

TERMINOLOGY DEFINITIONS


 
Gene: The fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein or RNA molecule)
 

Gene expression: The process by which a gene's coded information is converted into the structures present and operating in the cell. Expressed genes include those that are transcribed into mRNA and then translated into protein and those that are transcribed into RNA but not translated into protein (e.g., transfer and ribosomal RNAs).

Gene family: Group of closely related genes that make similar products.
 

DNA (deoxyribonucleic acid): The molecule that encodes genetic information. DNA is a doublestranded molecule held together by weak bonds between base pairs of nucleotides.
 

Genetic code: The sequence of nucleotides, coded in triplets (codons) along the mRNA, that determines the sequence of amino acids in protein synthesis. The DNA sequence of a gene can be used to predict the mRNA sequence, and the genetic code can in turn be used to predict the amino acid sequence.
 

Gene mapping: Determination of the relative positions of genes on a DNA molecule (chromosome or plasmid) and of the distance, in linkage units or physical units, between them.
 

Gene product: The biochemical material, either RNA or protein, resulting from expression of a gene. The amount of gene product is used to measure how active a gene is; abnormal amounts can be correlated with diseasecausing alleles.
 

Genetics: The study of the patterns of inheritance of specific traits.
 

Genome: All the genetic material in the chromosomes of a particular organism; its size is generally given as its total number of base pairs.
 

Chromosome: The self-replicating genetic structure of cells containing the cellular DNA that bears in its nucleotide sequence the linear array of genes. In prokaryotes, chromosomal DNA is circular, and the entire genome is carried on one chromosome. Eukaryotic genomes consist of a number of chromosomes whose DNA is associated with different kinds of proteins.
 

Mitosis: The process of nuclear division in cells that produces daughter cells that are genetically identical to each other and to the parent cell.
 

Meiosis: The process of two consecutive cell divisions in the diploid progenitors of sex cells. Meiosis results in four rather than two daughter cells, each with a haploid set of chromosomes.
 

Mutation: Any heritable change in DNA sequence [creating an error], .
 

Nucleus: The cellular organelle in eukaryotes that contains the genetic material.
 

Physical map: A map of the locations of identifiable landmarks on DNA (e.g., restriction enzyme cutting sites, genes), regardless of inheritance. Distance is measured in base pairs.
 

Protein: A large molecule composed of one or more chains of amino acids in a specific order; the order is determined by the base sequence of nucleotides in the gene coding for the protein. Proteins are required for the structure, function, and regulation of the bodys cells, tissues, and organs, and each protein has unique functions. Examples are hormones, enzymes, and antibodies.
 

Recombination: The process by which progeny derive a combination of genes different from that of either parent. In higher organisms, this can occur by crossing over.
 

Regulatory region or sequence: A DNA base sequence that controls gene expression
 

Sequencing: Determination of the order of nucleotides (base sequences) in a DNA or RNA molecule or the order of amino acids in a protein.
 

Sex chromosome: The X or Y chromosome in human beings that determines the sex of an individual.  Females have two X chromosomes in diploid cells; males have an X and a Y chromosome. The sex chromosomes comprise the 23rd chromosome pair in a karyotype.
 

Transcription: The synthesis of an RNA copy from a sequence of DNA (a gene); the first step in gene expression.
 

Exogenous DNA: DNA originating outside an organism.
 

Transformation: A process by which the genetic material carried by an individual cell is altered by incorporation of exogenous DNA into its genome.
 
 

Translation: The process in which the genetic code carried by mRNA directs the synthesis of proteins from amino acids. Compare transcription.
 

Virus: A noncellular biological entity that can reproduce only within a host cell. Viruses consist of nucleic acid covered by protein; some animal viruses are also surrounded by membrane. Inside the infected cell, the virus uses the synthetic capability of the host to produce progeny virus.
 

Clone: A group of cells derived from a single ancestor.
 

Cloning: The process of asexually producing a group of cells (clones), all genetically identical, from a single ancestor. In recombinant DNA technology, the use of DNA manipulation procedures to produce multiple
copies of a single gene or segment of DNA is referred to as cloning DNA.
 
 

Implementing the HGP: Goals 1998-2003


In September 1998, advisory committees at DOE and NIH approved new 5-year goals aimed at completing the Human Genome Project (HGP) 2 years earlier than originally planned in 1990. The target date of 2003 also will mark the 50th anniversary of Watson and Crick's description of DNA's fundamental structure.

The new plan was published in the October 23, 1998, issue of Science, which also cited the contributions of international partners. These partners include the Sanger Centre in the United Kingdom and research centers in Germany, Japan, and France.

The U.S. HGP began officially in 1990 as a $3-billion, 15-year program to find the estimated 80,000 human genes and determine the sequence of the 3 billion DNA building blocks that underlie all of human biology and its diversity. The early phase of the HGP was characterized by efforts to create the biological, instrumentation, and computing resources necessary for efficient production-scale DNA sequencing. The first 5-year plan was revised in 1993 due to remarkable technological progress, and the second plan projected goals through FY 1998. The latest plan was developed during a series of individual and joint DOE and NIH workshops held over the past 2 years.

Observers have predicted that the 21th century will be the "biology century". The analytical power arising from the reference DNA sequences of several entire genomes and other genomic resources is anticipated to help jump start the new millennium.

Human DNA Sequencing

The HGP's continued emphasis is on obtaining a complete and highly accurate reference sequence (1 error in 10,000 bases) that is largely continuous across each human chromosome. Scientists believe that knowing this sequence is critically important for understanding human biology and for applications to other fields.

A March 1999 update of the October 1998 plan calls for generating a "working draft" of the human genome DNA sequence by the spring of 2000--accelerating the efforts of the 1998 plan which called for a draft by December
2001. The working draft will comprise shotgun sequence data from mapped clones, with gaps and ambiguities unresolved. If these data sets can be merged with those from the private sector, they may increase the depth of the mapped draft, which scientists expect will contain about half the genes. Draft sequence will provide a foundation for obtaining the high-quality finished sequence and also will be a valuable tool for researchers hunting disease genes.