摘要:Block multithreaded architectures tolerate large memory and synchronization latencies by switching contexts on every remote-memory-access or on a failed synchronization request. We study the performance of a waiting mechanism called switch-blocking where waiting threads are disabled (but not unloaded) and signalled at the completion of the wait in comparison with switch_spinning where waiting threads poll and execute in a round-robin fashion. We present an implementation of switch-blocking on a cycle-by-cycle simulator for Alewife (a block multithreaded machine) for both remote memory accesses and synchronization operations and discuss results from the simulator. Our results indicate that while switch-blocking almost always has better performance than switch-spinning, its performance is similar to switch-spinning under heavy lock contention. Support for switch-blocking for remote memory accesses may be appropriate in the future due to their strong interactions with synchronization operations.