
By Rhianna Wisniewski
Now that the human genome and other mammalian genomes have been sequenced, scientists can begin gleaning insights from the copious genetic information. Lawrence Schook, a professor of pathobiology and animal sciences at the University of Illinois at Urbana-Champaign and a faculty fellow at NCSA, is mining the data in an effort to identify exposure to disease at the genetic level and to predict the outcome of an infection.
Schook says that while infectious diseases have always been a threat, the rapid movement of people into urban areas and their increased mobility around the globe allow diseases to spread more rapidly, easily, and extensively.
"As we begin to move people and animals into more intensely populated areas, we increase contact and no longer have that natural barrier," he said.
The head of the U.S. Centers for Disease Control and Prevention agreed when she recently referred to the global reach of emerging infectious diseases such as SARS (severe acute respiratory syndrome) and monkeypox as "the new normal."
"We're not going to be able to go back to the good old days because world travel has become so commonplace," Dr. Julie Gerberding said in a June address to the American Medical Association.
Schook's research is designed to meet the challenges of this "new normal" environment.
"We are looking at how we can take new genomic tools to assess the damage to date and make predictions," he said.
Traditional diagnostic methods take time. After an animal or person is exposed to an infectious agent, symptoms may not appear for days or even weeks. In this asymptomatic phase, the only way to determine whether an infection is brewing is see whether the body is producing antibodies to fight the invading germs, and antibodies may take days or weeks to make their appearance.
Under these circumstances, quarantine is one of the few techniques available to curtail the spread of an infectious disease, and it has obvious shortcomings. Sequestering large numbers of animals or people is cumbersome, complicated, and expensive. The economic havoc arises when quarantine is deemed insufficient; "culling" of livestock is the next, drastic step. During the outbreak of the highly contagious hoof-and-mouth disease in Great Britain in 2001, thousands of animals that still appeared healthy were killed to prevent the spread of the disease, resulting in tremendous economic loss for the nation's agricultural industry.
Quarantine is utterly ineffective for diseases, such as monkeypox, in which an animal (for example, the prairie dog) is merely a carrier of a disease and is not made ill by the infectious agent.
Schook believes there is a better way. He believes that by mining genetic data, researchers can map the pathway of disease, identifying infections before the body launches a counterattack of antibodies and before visible symptoms have begun. Instead of relying on quarantines, health care workers could quickly determine which people or animals were at the first stage on the pathway of infection.
Schook is looking for the subtle changes an infection sparks in the body's gene expression, the process by which a gene's coded information is converted into the structures operating in a cell. Because the cells infected with a disease are often not readily accessible for testing (lung tissue in SARS, for example), Schook hopes to find changes in "sentinel cells" in the blood or saliva. The key question is, what happens in the body's cells between exposure and the appearance of disease?
"I want to be able to ask, can I take a sentinel sample and predict what is going on inside the organ?" Schook said.
This research will involve analyzing mountains of datagenetic samples of healthy humans and animals, samples taken immediately after exposure to an infectious agent, samples taken a day after exposure, two days after exposure, 10 days after exposure, etc.
To tackle this analysis challenge, Schook has teamed with Michael Welge, the director of NCSA's Automated Learning Group (ALG). Welge introduced him to the ALG-created data-mining framework called D2K (data to knowledge). D2K is a rapid, flexible data-mining system that integrates hundreds of modules perfoming both common and unique data-mining functions. These modules can, among other things, clean the data sets and prepare them for computations, search for patterns, make predictions, identify unusual features, and visualize the data for further analysis. D2K allows users to easily connect these modules to form applications tailored to their needs.
Through collaboration with the ALG and the use of D2K, Schook hopes to mine the genetic data to find patterns that would allow physicians and veterinarians to quickly pinpoint infected patients and animals based on changes in their gene expression. Epidemiologists also could use this technique to track the history of a disease outbreak. The tell-tale clues found in the gene expression of an infected individual could indicate, for example, that initial exposure was five days earlier, which would lead investigators to look for the roots of the infection wherever the individual was five days ago.
With the knowledge mined from genetic data, Schook hopes to give all of us a head start against infectious diseases.
Access Online | Posted 7-15-2003