![]() |
Despite great advances in the use of computers to
create images from scientific data, supercomputers are
still churning out vast volumes of data faster than
they can be made into useful images that scientists
can understand. Specialists in the field of parallel
volume rendering harness the power of supercomputers
themselves to address this problem.
Michael E. Palmer, a recent Ph. D. recipient in computer science from the California Institute of Technology, has made an important breakthrough in parallel volume rendering on NCSA's POWER CHALLENGEarray. "I rendered a 7.1 GB volume derived from the Visible Female dataset. To our knowledge, this is the largest volume dataset ever visualized," said Palmer. The National Library of Medicine maintains two 3D color datasets -- the Visible Male and Female -- that were created by slicing two frozen cadavers into thousands of layers, 1mm thick for the male dataset, and 0.33mm thick for the female dataset. Volume rendering is the process of converting 3D data into a 2D representation of the data. This operation is notoriously expensive, but parallel computers have recently been able to attain near-interactive and interactive rendering frame rates for very large datasets. The 3D data is modelled as a cloudy material of varying color and opacity. Rays are cast from the view point of the observer, and accumulate color and opacity as they pass through the data volume. The final color of each ray is assigned to the appropriate pixel, generating an image of the dataset from the given view point. By changing the view point, a medical researcher can "fly" through the digital representations of the human body. Despite the apparent simplicity of this technique, the sheer size of the Visible Female dataset makes it impractical on all but the fastest computers. Prior to Palmer's work, the largest dataset ever rendered was 1GB. His goal was to increase the rendering speed so that medical researchers could interactively fly through digital images of the body. To accomplish this goal, he developed a method for volume rendering on non-uniform memory access (NUMA) parallel processors, like NCSA's POWER CHALLENGEarray. The POWER CHALLENGEarray, produced by Silicon Graphics, consists of two to eight Power Challenge nodes, each containing up to 18 MIPS R8000 or R10000 processors. In theory, the peak computational power of a parallel computer increases proportionally to the number of processors. However, for most problems, increasing the number of processors requires increased communication among them. This increased communication cost often precludes "parallel speedup" proportional to the number of processors. The memory system of most computers consists of a hierarchy of increasingly small amounts of increasingly fast memory. The POWER CHALLENGEarray has a particularly deep memory hierarchy, consisting of remote memory, local shared memory, and the level two (L2) and level one (L1) caches of individual processors. Palmer was able to reduce communication costs with careful optimization at each level of this deep memory hierarchy, by exploiting the fact that nearby rays tend to pass through the same regions of the volume dataset. In the past, researchers working with distributed parallel computers would either replicate the entire dataset on each distributed node, or accept high performance penalties due to the costs of communication. Palmer's optimizations included a system to dynamically migrate blocks of the Visible Female dataset among the four Power Challenge nodes. This system allowed rendering with no replication of the dataset to attain nearly the same performance as if the entire dataset had been replicated in the local memory of each node. Furthermore, it allowed the rendering of the 7.1GB Visible Female dataset, which is larger than the 4GB memory of an individual Power Challenge node. While his work has already realized medical researcher's dreams of flight, the next generation of parallel computers may allow them to fly higher and faster. Palmer believes that his technique will adapt easily to the newly installed SGI Cray Origin2000, leading to ever greater rendering speeds. "As I try to imagine the ideal parallel architecture for volume rendering, I find myself describing the architecture that SGI has already implemented in the new Origin," he said.
Images used in this story are copyrighted by Michael E. Palmer. Any reproduction is forbidden without permission from the copyright holder. |