Past featured articles
The Language of Human Genetics
The science of genetics required a whole new language to explain the observations made by researchers. Many of these terms were coined in the early part of the twentieth century as genetic research flourished. The terms described below are as relevant to medical genetics as they were to the fly researchers who gave them their meaning.
The two copies of a given gene in an individual may be identical, in which case the instructions to the cell are identical. If the two genes are not identical, the instructions to the cell may differ. One set of gene instructions may dominate the other. Not surprisingly, such a gene is called dominant, while the other gene is called recessive. Recessive genes lie dormant in the cell but are passed on in egg and sperm the same as with dominant genes. Recessive genes "speak" only when there is no dominant version of a gene present, that is, when the genes are both recessive. If the set of instructions from the two genes are not identical but they both "speak," they are called co-dominant.
A person is called homozygous, or a homozygote, for a given gene if the two genes of a pair are identical. The genes may be dominant or recessive. If both genes are in any way different, the individual is called heterozygous, or a heterozygote. Different versions of the same gene are called alleles. In a homozygous individual, the two alleles are identical. In a heterozygous individual, the alleles are different.
Genetics and Human Health
As discussed earlier, the basic principles of genetics were discovered in experiments that looked at easily observed characteristics, such as flower color in peas and eye color in fruit flies. Human genetics has followed the same course, focusing on the inheritance of readily seen traits, such as the color of urine or the occurrence in families of serious diseases.
It had been known since ancient time that certain conditions "ran in families", but real human genetics had to await the rediscovery of Mendel's work for progress to be made. The English physician Archibald Garrod suggested in 1902 that certain diseases were due to "inborn errors of metabolism." By this he meant that defects in the way the body processed specific chemicals had a genetic cause. The main evidence for this was his observation that the same disease clustered in certain families. His insight has been borne out. Many disease conditions in people are due to the inheritance of defective genes that lead to the production of defective enzymes and other proteins. But identifying specific genes that caused disease was a slow process.
In the mid-1950s, protein studies identified abnormal hemoglobin proteins in patients with sickle-cell anemia. Scientists used the protein sequence to deduce the DNA sequence and determined that the hemoglobin abnormalities were due to a simple chemical change in the DNA code (there is a "T" where there is normally an "A"). In the 1970s, with the development of DNA sequencing methods, sickle-cell anemia was the first genetic disease for which the genetic defect in the DNA was identified.
It was not until relatively recently that such basic knowledge as the correct chromosome count for humans was determined. In 1956, the number of chromosomes in humans was established as 23 pairs, or 46 chromosomes. Improvements in techniques for observing chromosomes allowed scientists to determine in 1959 that three congenital disorders, Down syndrome, Klinefelter syndrome, and Turner syndrome, were due to abnormal chromosome numbers.
The field of medical genetics expanded as more and more inherited disorders were described. In the early 1960s, Victor McKusick, hailed as the father of medical genetics, began his ongoing project of cataloging the various disorders in his book, Mendelian Inheritance of Man. The work, which is available online, now contains over 11,000 entries and fills 11 printed books.
In the 1980s, a new technique revolutionized medical genetics. Rather than linking known traits to new traits, variants in the DNA could be used to create a rough map of the human genome. These techniques allowed scientists in 1987 to identify the gene for Duchenne type muscular dystrophy and determine its precise location on the X chromosome. Prior to this breakthrough, scientists did not know which gene was responsible for Duchenne type muscular dystrophy, although they did know that it was located on the X chromosome. Subsequently, gene defects responsible for causing cystic fibrosis, Huntington's disease, breast cancer, and other diseases have been identified. So far, over 500 specific gene defects that cause human disease have been identified.
Many important questions remain about the human genome. The first question is "How many genes are there?" Initial estimates for this number ranged from 70,000 to 150,000. A recent analysis puts the value much lower: 30,000 to 40,000. The Human Genome Project is a large federal research program, begun in 1989, which was organized to sequence the entire human genome and discover all human genes. The project passed a significant milestone when the rough draft of the human genome was published in mid-2000.
The sequence of the human genome is just the beginning. Knowing the sequence of a gene does not necessarily reveal its function. Years of work remain before all the secrets of human DNA are fully revealed.
The Human Genome Project
Goal 1: Sequencing DNA
The Human Genome Project is an international effort under way since 1990. It has several minor goals and the following two major goals:
- Sequence all of the approximately 3 billion base pairs ("DNA letters") in human DNA.
- Identify all 30 to 40 thousand genes in human DNA. (Together, these genes are called the "human genome.")
These goals sound complicated, but the ideas behind them are simple. You only have to know the difference between letters and words, as we explain below.
Goal 2: Finding Genes
Your DNA is like a mammoth encyclopedia that describes you. An encyclopedia written in the English language uses 26 letters over and over again. DNA uses the equivalent of 4 letters over and over again:
An encyclopedia may have 30 million letters. There are about 3 billion DNA letters (technical name: "base pairs") in human DNA. You can picture these letters as pearls on a string, each one labelled with an A, C, G, or T.
The first goal of the Human Genome Project is to list all of these DNA letters -- in order. Special machines do this work, and they work fast. If it took one second for each letter, the project would take 95 years!
Once all the DNA letters have been listed, the next step is figuring out where the genes are. A gene is a group of DNA letters that tells the body how to make a protein molecule.
Finding genes in DNA letters is a challenge similar to finding words in an encyclopedia that has no spacing between words and no punctuation marks. Suppose, for example, you ran across this sequence of letters in your non-punctuated encyclopedia:
With your knowledge of the English language, you could identify the following English words in the sequence:
fours core and seven years ago
But, of course, this is not right. Using additional knowledge -- American history -- you could rearrange the word spacing to be:
four score and seven years ago
The situation is similar for scientists. They look at very long sequences of DNA letters having no punctuation:
Scientists use different kinds of knowledge, such as biology and statistics, to identify where genes start and stop within the sequence of DNA letters. (Actually, they program this knowledge into computers that do the analysis) For example, the three letters ATT always indicate the end of a gene:
A gene may have hundreds or thousands of DNA letters, so the problem is not simple, even for a computer. This is one reason the Human Genome Project would have been impossible to do just 20 years ago. It also shows that the Human Genome Project is just as much a computer project as it is a biology project.