|
APE4NOW : Automatic Parallelization Environment for Network of Workstations |
APE4NOW integrates multiple facets of parallel computing into a single environment. APE4NOW is a system for automatic parallelization of large-scale scientific and engineering applications on distributed memory computers. The target platform consists of a number of processing workstations connected by a scalable network. APE4NOW provides several paradigms for application programming. The goal is to let both the experts and naïve programmers write high performance computing applications executing on commodity computer networks directly or indirectly. Eventually application execution could be speeded up on multiple workstations. Sequential Programs If programmers know nothing about parallel programming, they can still write sequential FORTRAN or C programs in the programming style they like. APE4NOW takes the extended Stanford SUIF compiler (implemented at Wayne) as the front-end which translates the sequential programs into multi-threaded C code. This compiler could detect possible portions for automatic parallelization. The multi-threaded output of the compiler could run on workstations with multiple processors. Normally the execution time of such programs is shorter than the original sequential programs. APE4NOW removes the requirement for parallel programming expertise. Hand-written Parallel Programs If the programmers are experts in parallel computing, they can write parallel programs directly. This kind of program could be more efficient or readable to them. APE4NOW definitely accepts such code. Hand-written Threaded MPI Programs Experts can write MPI programs and execute them on NOWs directly. Then programmers have to take care of not only the parallelism, but also the communication between workstations. Sometimes this could be far more complicated, error-prone and difficult to debug. This is the traditional parallel programming style. APE4NOW is just to reduce the burden of programmers and let them focus on application problem itself. Note that each of the workstation needs to be a uni-processor, else, one has to implement multi-threaded code as well. This increases the programming complexity drastically. Parallelizing Compilers SUIF handles uniform loops over dense and regular data structures. For non-uniform data dependence loops, we proposed the concept of Complete Dependence Convex Hull, which contains the entire dependence information of the program. We also proposed the concepts of Unique Head Sets and Unique Tail Sets that isolated the dependence information. The relationship of the unique head and tail sets forms the foundation for partitioning the iteration space. Depending on the relative placement of these unique sets, various cases were considered and several partitioning schemes were also suggested. Runtime Parallelization The multi-threaded code is shared memory code. They can only run on single workstation with one or more processors. Some parallelizable portions of programs are not detected at compile time. Or it is impossible to determine at that time because of their dynamic feature. For these cases, the Runtime Parallelization module is used to convert sequential code into parallel one during execution. APE4NOW has two new run-time techniques for the parallelization of loops that have indirect access patterns. Our schemes can handle any type of loop-carried dependencies. They follow the DOACROSS INSPECTOR/EXECUTOR approach and improve upon previous algorithms with the same generality by allowing concurrent reads of the same location and by increasing the overlap of dependent iterations. The algorithms are implemented based on stamping rules and using multithreading tools. The difference between the two proposed algorithms is that one allows partially concurrent reads without causing extra overhead in its inspector, while the other allows fully concurrent reads at the slight overhead in the dependence analysis. Distributed vs. Shared Memory Code Nowadays, workstations are omnipresent. Shared memory code could be transformed into distributed memory code to utilize more computation power in multiple workstations. APE4NOW provides this conversion module so that this could happen automatically sheltering the programmers from complicated communication issues. The APE4NOW environment makes it easy to program in parallel, parallelizes programs automatically, and provides supercomputing-like performance on commodity network of workstations. The developed system will be made available to the entire computing community. |
|
|