Abstract: Settling on a simple abstraction that programmers aim at, and hardware and software systems people enable and support, is an important step towards convergence to a robust many-core platform. The current paper: (i) advocates incorporating a quest for the simplest possible abstraction in the debate on the future of many-core computers, (ii) suggests “immediate concurrent execution (ICE)” as a new abstraction, and (iii) argues that an XMT architecture is one possible demonstration of ICE providing an easyto-program general-purpose many-core platform. 1. Case for Abstraction In 2004, standard (desktop) computers comprised one processor core. In 2008, some have 8 cores. By 2012, 64core computers (another factor of 8) are expected. Transition from serial computing to parallel computing mandates the reinvention of the very heart of computer science (CS). These highly parallel computers need to be built and programmed in a new way. Current solutions by leading vendors do not scale to tens of cores. Given that clock speeds have not been improving for quite a few years, the use of parallel processing for improving single-program completion time is a critical target for future designs. We need to figure out how to build scalable many-core computers, how to program them effectively so that programmers can get strong performance with minimal programming effort, how to train the workforce, and how to teach this new environment at all levels, including introductory programming courses to college freshmen and K-12 students. Foremost among current challenges is timely convergence to a robust many-core platform that will serve the world for many years to come. Critical to the economy and workforce, the basic motivation behind the current position paper is bringing about the reinvention of CS for meeting this challenge: 1) Andy Grove (Intel) noted that the software spiral (hardware improvements lead to software improvements that lead back to hardware improvements) had been an engine of sustained growth for IT; but (as explained in [6] and since convergence is yet to happen), it is now broken! 2) Both under-trained and mistrained for a future certain to be dominated by parallelism, most CS students only study the old serial paradigm, acquiring serial habits that complicate later transition to parallelism. But, how should we approach the convergence challenge, and, in particular, what the first step should be. The final posting in a special series on why research advances are needed to overcome the problems posed by mulitcore processors on the Computing Community Consortium blog [5] perhaps implies a perception of despair in the community. The problem is not new. Many parallel computer architectures have been proposed and built over the last 40 years, but with limited success. Exploiting the parallelism present in them has often eluded their users. The main source of encouragement in [5] is a call on all involved communities to collaboratively start with a clean slate, rather than have language researchers locked into mechanisms supported by commodity hardware and hardware researchers locked into fully supporting any current software. This is not the first time that CS is facing a complex system problem requiring a solution that involves many different players and should be robust over time in the face of system upgrades. It has become a signature intellectual success story of CS to address such problems by figuring out a simple abstraction that acts as “a single nail holding everything together”. In fact, abstractions that present the user with a virtual machine that is easier to understand and program than the underlying hardware, but still allows effective use of the hardware, facilitated significant Computer Science accomplishments. Broad consensus built around these simple One of the dictionary definitions of abstract is difficult to understand, or abstruse. In CS, however, abstraction has become synonym with the quest for simplicity. Interestingly, the word abstraction in Hebrew shares the same root with simple (as well as undress and expand). abstractions was the key. Some formative abstractions were: (i) that any single instruction available for execution in a serial program executes immediately, henceforth called immediate serial execution (ISE); note that since an instruction may apply to any location in memory, ISE extended another formative abstraction that we call “immediate memory access (IMA)”: that any particular word of an indefinitely large memory is immediately available, and (ii) that a computer is serving the task that the user is currently working on exclusively, henceforth exclusive computer availability (ECA). The IMA abstraction abstracts away a hierarchy of memories, each with greater capacity, but slower access time, than the preceding one, and the ISE abstraction extends it to immediate execution of any operation. The ECA abstraction abstracts away virtual file systems that can be implemented in local storage or a local or global network, access to the Internet, and other tasks that may be concurrently using the same computer system. These abstractions have improved the productivity of programmers and other users, and contributed towards broadening participation in computing. Some simple and robust abstraction can be the first writing on the clean slate sought in [5]. We will then need to build a consensus around such an abstraction as a way to reproduce past CS success stories for the many-core era. Finding the best many-core platform requires a battle of ideas whose outcome will affect a rather broad community. The need for acceptance by all relevant segments of the community suggests the necessity of benchmarks for predicting the success of a many-core platform. Development of such benchmarks is, in fact, long overdue. Abstractions provide an effective way for lowering the bar towards broadening participation in the debate to all relevant participants. While the utility of abstraction will become much clearer once such benchmarks are available, there is no reason not to focus on abstractions immediately. The desired abstraction will: (i) be simple, hiding the details of the underlying hardware, (ii) be accessible to the broadest possible groups of users, (iii) allow strong speedups for applications, (iv) be scalable; a user of a 16-core computer should rely on the same abstraction as a user of a future generation 1024-core computer, or else performance code will have to be continuously rewritten; this will also help put theions was the key. Some formative abstractions were: (i) that any single instruction available for execution in a serial program executes immediately, henceforth called immediate serial execution (ISE); note that since an instruction may apply to any location in memory, ISE extended another formative abstraction that we call “immediate memory access (IMA)”: that any particular word of an indefinitely large memory is immediately available, and (ii) that a computer is serving the task that the user is currently working on exclusively, henceforth exclusive computer availability (ECA). The IMA abstraction abstracts away a hierarchy of memories, each with greater capacity, but slower access time, than the preceding one, and the ISE abstraction extends it to immediate execution of any operation. The ECA abstraction abstracts away virtual file systems that can be implemented in local storage or a local or global network, access to the Internet, and other tasks that may be concurrently using the same computer system. These abstractions have improved the productivity of programmers and other users, and contributed towards broadening participation in computing. Some simple and robust abstraction can be the first writing on the clean slate sought in [5]. We will then need to build a consensus around such an abstraction as a way to reproduce past CS success stories for the many-core era. Finding the best many-core platform requires a battle of ideas whose outcome will affect a rather broad community. The need for acceptance by all relevant segments of the community suggests the necessity of benchmarks for predicting the success of a many-core platform. Development of such benchmarks is, in fact, long overdue. Abstractions provide an effective way for lowering the bar towards broadening participation in the debate to all relevant participants. While the utility of abstraction will become much clearer once such benchmarks are available, there is no reason not to focus on abstractions immediately. The desired abstraction will: (i) be simple, hiding the details of the underlying hardware, (ii) be accessible to the broadest possible groups of users, (iii) allow strong speedups for applications, (iv) be scalable; a user of a 16-core computer should rely on the same abstraction as a user of a future generation 1024-core computer, or else performance code will have to be continuously rewritten; this will also help put the above noted software spiral back on track; (v) extend, rather than replace, existing (successful) abstractions; in particular, when code provides no parallelism, the user will need to be able to fall back on the serial abstraction ISE; and last, but definitely not least, (vi) be buildable; we must be able to build an actual computer system that provides good performance for users that rely on the abstraction. Note also that the ECA abstraction does