scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Maxwell - a 64 FPGA Supercomputer

TL;DR: The machine itself - Maxwell - its hardware and software environment is described and very early benchmark results from runs of the demonstrators are presented.
Abstract: We present the initial results from the FHPCA Supercomputer project at the University of Edinburgh. The project has successfully built a general-purpose 64 FPGA computer and ported to it three demonstration applications from the oil, medical and finance sectors. This paper describes in brief the machine itself - Maxwell - its hardware and software environment and presents very early benchmark results from runs of the demonstrators.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.
Abstract: Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost It is challenging to improve all of these factors simultaneously To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA) Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth networkWe describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput In other words, the reconfigurable fabric enables the same throughput using only half the number of servers

835 citations


Cites methods from "Maxwell - a 64 FPGA Supercomputer"

  • ...Maxwell [4] is the most similar to our design, as it directly connects FPGAs in a 2-D torus using InfiniBand cables, although the FPGAs do not implement routing logic....

    [...]

Journal ArticleDOI
14 Jun 2014
TL;DR: The requirements and architecture of the fabric are described, the critical engineering challenges and solutions needed to make the system robust in the presence of failures are detailed, and the performance, power, and resilience of the system when ranking candidate documents are measured.
Abstract: Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cablesIn this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29%

712 citations


Cites methods from "Maxwell - a 64 FPGA Supercomputer"

  • ...Maxwell [4] is the most similar to our design, as it directly connects FPGAs in a 2-D torus using InfiniBand cables, although the FPGAs do not implement routing logic....

    [...]

Proceedings ArticleDOI
15 Oct 2016
TL;DR: A new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications, and is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication.
Abstract: Hyperscale datacenter providers have struggled to balance the growing need for specialized hardware (efficiency) with the economic benefits of homogeneity (manageability) In this paper we propose a new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications This Configurable Cloud architecture places a layer of reconfigurable logic (FPGAs) between the network switches and the servers, enabling network flows to be programmably transformed at line rate, enabling acceleration of local applications running on the server, and enabling the FPGAs to communicate directly, at datacenter scale, to harvest remote FPGAs unused by their local servers We deployed this design over a production server bed, and show how it can be used for both service acceleration (Web search ranking) and network acceleration (encryption of data in transit at high-speeds) This architecture is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication By coupling to the network plane, direct FPGA-to-FPGA messages can be achieved at comparable latency to previous work, without the secondary network Additionally, the scale of direct inter-FPGA messaging is much larger The average round-trip latencies observed in our measurements among 24, 1000, and 250,000 machines are under 3, 9, and 20 microseconds, respectively The Configurable Cloud architecture has been deployed at hyperscale in Microsoft's production datacenters worldwide

512 citations


Cites background from "Maxwell - a 64 FPGA Supercomputer"

  • ...Another example is Maxwell [18], which also provides rack-scale direct FPGA-to-FPGA communication....

    [...]

Proceedings ArticleDOI
21 Feb 2010
TL;DR: A Map-Reduce framework for the Axel cluster is presented which exploits spatial and temporal locality through different types of processing elements and communication channels, and enables the first demonstration of FPGAs, GPUs and CPUs running collaboratively for N-body simulation.
Abstract: This paper describes a heterogeneous computer cluster called Axel. Axel contains a collection of nodes; each node can include multiple types of accelerators such as FPGAs (Field Programmable Gate Arrays) and GPUs (Graphics Processing Units). A Map-Reduce framework for the Axel cluster is presented which exploits spatial and temporal locality through different types of processing elements and communication channels. The Axel system enables the first demonstration of FPGAs, GPUs and CPUs running collaboratively for N-body simulation. Performance improvement from 4.4 times to 22.7 times has been achieved using our approach, which shows that the Axel system can combine the benefits of the specialization of FPGA, the parallelism of GPU, and the scalability of computer clusters.

176 citations


Cites methods from "Maxwell - a 64 FPGA Supercomputer"

  • ...Building a cluster of computers is a common technique to realize the parallel computing model....

    [...]

Patent
17 Apr 2012
TL;DR: In this article, a method, system, and computer-readable storage medium for using a distributed computing system are disclosed, and a workflow is generated to indicate that the application is to be executed using the computing resource(s).
Abstract: A method, system, and computer-readable storage medium for using a distributed computing system are disclosed. For example, one method involves receiving one or more parameters. The one or more parameters indicate one or more operations. The method also involves selecting one or more computing resources from computing resources. This selecting is based on the parameter(s). An application is configured to be executed using the computing resource(s). The method also involves generating a workflow. The workflow indicates that the application is to be executed using the computing resource(s). The workflow indicates that the application performs the operation(s). The method also involves communicating at least a part of the workflow to one or more nodes, where the node(s) include the computing resource(s).

98 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a theoretical valuation formula for options is derived, based on the assumption that options are correctly priced in the market and it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks.
Abstract: If options are correctly priced in the market, it should not be possible to make sure profits by creating portfolios of long and short positions in options and their underlying stocks. Using this principle, a theoretical valuation formula for options is derived. Since almost all corporate liabilities can be viewed as combinations of options, the formula and the analysis that led to it are also applicable to corporate liabilities such as common stock, corporate bonds, and warrants. In particular, the formula can be used to derive the discount that should be applied to a corporate bond because of the possibility of default.

28,434 citations

Journal ArticleDOI
TL;DR: The hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling are explored, and the software that targets these machines is focused on.
Abstract: Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which reuse the configurable hardware during program execution.

1,666 citations

Proceedings ArticleDOI
23 Apr 2007
TL;DR: The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers and a 64-node prototype cluster is described.
Abstract: While medium- and large-sized computing centers have increasingly relied on clusters of commodity PC hardware to provide cost-effective capacity and capability, it is not clear that this technology will scale to the PetaFLOP range. It is expected that semiconductor technology will continue its exponential advancements over next fifteen years; however, new issues are rapidly emerging and the relative importance of current performance metrics are shifting. Future PetaFLOP architectures will require system designers to solve computer architecture problems ranging from how to house, power, and cool the machine, all the while remaining sensitive to cost. The reconfigurable computing cluster (RCC) project is a multi-institution, multi-disciplinary project investigating the use of Platform FPGAs to build cost-effective petascale computers. This paper describes the nascent project's objectives and a 64-node prototype cluster. Specifically, the aim is to provide an detailed motivation for the project, describe the design principles guiding development, and present a preliminary performance assessment. Microbenchmark results are reported to answer several pragmatic questions about key subsystems, including the system software, network performance, memory bandwidth, power consumption of nodes in the cluster. Results suggest that the approach is sound.

67 citations

Journal ArticleDOI
TL;DR: A comparative analysis of FPGAs and traditional processors is presented, focusing on floating-point performance and procurement costs, revealing economic hurdles in the adoption of FFPAs for general high-performance computing (HPC).
Abstract: For certain applications, custom computational hardware created using field programmable gate arrays (FPGAs) can produce significant performance improvements over processors, leading some in academia and industry to call for the inclusion of FPGAs in supercomputing clusters This paper presents a comparative analysis of FPGAs and traditional processors, focusing on floating-point performance and procurement costs, revealing economic hurdles in the adoption of FPGAs for general high-performance computing (HPC)

63 citations

Journal ArticleDOI
TL;DR: The marine controlled-source electromagnetic (CSEM) sounding method is rapidly gaining acceptance as an exploration tool for detecting and delineating hydrocarbon reservoirs as discussed by the authors, however, distinguishing hydrocarbon fluids from water within these structures is more problematic.
Abstract: The marine controlled-source electromagnetic (CSEM) sounding method is rapidly gaining acceptance as an exploration tool for detecting and delineating hydrocarbon reservoirs. Whereas seismic surveys can detect the structures that may contain hydrocarbons with great accuracy, distinguishing hydrocarbon fluids from water within these structures is more problematic. As a result, less than a third of exploration wells result in a commercial discovery.

52 citations