NCSA Home
Contact Us | Intranet | Search

NCSA NEWS

News Home
Calendar
Images
Video on Demand
Subscribe to Our Newsletter
Frequently Asked Questions
Globus Goes to Battle For Distributed Computing


The 100,000 vehicles in the simulation were divided into force groups and distributed among the computing nodes spanning multiple sites.



 

In the previous simulations, the researchers manually reserved computing resources at each participating site, specifying the number of nodes and the software they needed to run their executable. They then initiated each portion of the program running on a particular machine individually, as is typical with distributed applications. Globus provided a single interface to these resources. The team was able to initiate the entire distributed application with one interactive command and handle problematic jobs appropriately. This new fault-tolerance feature enabled the team to adjust immediately to unforeseen glitches or problems in connectivity or systems failures that delayed certain machines. Eventually, Globus's global resource management capabilities will also handle all resource reservations.

"Automating these processes did not remove all the hurdles to distributed computing but lowered them to a tolerable level," says Foster. "We aren't at the point yet where a physicist can sit down and say 'I want 100 gigaflops' and the system will find it transparently, but that day is coming."

The Globus team at ISI worked with the Caltech researchers to integrate several Globus tools -- program start-up, security, input/output, and fault detection -- into the demonstration. Both Alliance and NPACI sites provided computing nodes, including Caltech and Hewlett-Packard's HPs, the University of California at San Diego's IBM SP, NCSA's SGI/CRAY Origin2000, and the Maui High Performance Computing Center's two IBM SPs. The Department of Defense's (DoD) four Major Shared Resource Centers also participated. Three of these are working with the Alliance: the Army Research Laboratory, Aberdeen Proving Ground; the Aeronautical Systems Center, Wright-Patterson Air Force Base; and the U.S. Army Engineer Waterways Experiment Station; Vicksburg. The Naval Oceanographic Office, Stennis Space Center is working with NPACI. Teams at all nine sites held a practice run at each site during the week preceding the run, eliminating a few site-specific glitches. Sharon Brunett of Caltech's CACR coordinated the simulation.

In the end, Globus executed successfully and SF Express attained its computing goal four years ahead of schedule. It was the kind of battle that portends victory for the future of distributed computing.

"The goal was to attain this size simulation in 2002 but here we are in 1998 and we've done it," says Messina. "The reason we were able to scale the simulation up was because we could use the best computers anywhere and a world-class research team."

     

Back Up