|
Applications for distributed memory systems are cumbersome to develop due to the need for programmers to handle communication primitives explicitly, just as coding in MPI. In addition, applications have to be tuned for each individual architecture to achieve reasonable performance. Since hardware shared memory machines do not scale well and are relatively expensive to build, software distributed shared memory (DSM) systems are gaining popularity for providing a logically shared memory over physically distributed memory. These software DSM systems combine programming advantages of shared memory and the cost advantages of distributed memory. The programmer is given the illusion of a large global address space encompassing all available memory, thereby eliminating the task of explicitly moving data between processes located on separate machines. DSMs share data at the relatively large granularity of a virtual memory page and can suffer from a phenomenon known as "false sharing", wherein two processes simultaneously attempt to write to different data items that reside on the same page. If only a single writer is permitted, the page may ping-pong between the nodes. One solution to this problem is to ``hold" a freshly arrived page for some time before releasing it to another requester. Relaxed memory consistency models that allow multiple concurrent writers have also been proposed to alleviate this symptom. The systems ensure that all nodes see the same data at well defined points in the program, usually when synchronization occurs. Extra effort is required to ensure program correctness in this case. One technique that has been investigated to improve DSM performance is the use of multiple threads of control in the system. Up to now, the third generation DSM systems utilize relaxed consistency models and multithreading technologies. Strings is built using POSIX threads, which can be multiplexed on kernel lightweight processes. The kernel can schedule these lightweight processes across multiple processors on symmetrical multiprocessors (SMPs) for better performance. Therefore, in Strings, each thread could be assigned to any processor on the SMP if there is no special request, and all local threads could run in parallel if there are enough processors. Strings is designed to exploit data parallelism by allowing multiple application threads to share the same address space on a node. Additionally, the protocol handler is multi-threaded. The overhead of interrupt driven network I/O is avoided by using a dedicated communication thread. Strings is designed to exploit data parallelism at the application level and task parallelism at the run-time level.
Download Team
Related Publications
S. Roy, V. Chaudhary, S. Jia, and P. Menon In Proc. of ISCA 12th Intl. Conference on Parallel and Distributed Computing Systems , pp. 15 - 21, Ft. Lauderdale, Florida, August 18-20, 1999. V. Chaudhary, C. Xu, S. Roy, S. Jia, G. A. Ezzell, and C. Kota In Proc. of ISCA 12th Intl. Conference on Parallel and Distributed Computing Systems , pp. 534 - 539, Ft. Lauderdale, Florida, August 18-20, 1999.
D. Thaker, V. Chaudhary, G. Edjlali, and S. Roy In Proc. of the Intl. Conference on Parallel and Distributed Processing Techniques and Applications, pp. 718 - 724, Las Vegas, Nevada, June 28 - July 1, 1999.
S. Roy and V. Chaudhary In Cluster Computing: The Journal of Networks, Software Tools and Applications, pp. 177-186, 2(3) 1999.
S. Roy and V. Chaudhary In Proc. of the 1999 IEEE Intl. Performance, Computing, and Communications Conference, pp. 1 - 7, Phoenix, Arizona, Feb. 10 - 12, 1999.
S. Roy and V. Chaudhary In Proc. of the National Conference on Communications, pp. 409 - 416, Kharagpur, India, Jan. 30 - 31, 1999.
S. Roy and V. Chaudhary In Proc. of Seventh IEEE Intl. Symposium on High Performance Distributed Computing, pp. 90 -97, Chicago, Illinois, Jul. 28 - 31, 1998.
|