scispace - formally typeset
Search or ask a question

Showing papers on "Benchmark (computing) published in 2013"


Proceedings ArticleDOI
23 Jun 2013
TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

3,828 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: An efficient sparse combination learning framework based on inherent redundancy of video structures achieves decent performance in the detection phase without compromising result quality and reaches high detection rates on benchmark datasets at a speed of 140-150 frames per second on average.
Abstract: Speedy abnormal event detection meets the growing demand to process an enormous number of surveillance videos. Based on inherent redundancy of video structures, we propose an efficient sparse combination learning framework. It achieves decent performance in the detection phase without compromising result quality. The short running time is guaranteed because the new method effectively turns the original complicated problem to one in which only a few costless small-scale least square optimization steps are involved. Our method reaches high detection rates on benchmark datasets at a speed of 140-150 frames per second on average when computing on an ordinary desktop PC using MATLAB.

995 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a set of 175 benchmark functions for unconstrained optimization problems with diverse properties in terms of modality, separability, and valley landscape, which can be used for validation of new optimization in the future.
Abstract: Test functions are important to validate and compare the performance of optimization algorithms. There have been many test or benchmark functions reported in the literature; however, there is no standard list or set of benchmark functions. Ideally, test functions should have diverse properties so that can be truly useful to test new algorithms in an unbiased way. For this purpose, we have reviewed and compiled a rich set of 175 benchmark functions for unconstrained optimization problems with diverse properties in terms of modality, separability, and valley landscape. This is by far the most complete set of functions so far in the literature, and tt can be expected this complete set of functions can be used for validation of new optimization in the future.

944 citations


Proceedings ArticleDOI
20 Jun 2013
TL;DR: A new, parameter adaptation technique for DE which uses a historical memory of successful control parameter settings to guide the selection of future control parameter values, which is competitive with the state-of-the-art DE algorithms.
Abstract: Differential Evolution is a simple, but effective approach for numerical optimization. Since the search efficiency of DE depends significantly on its control parameter settings, there has been much recent work on developing self-adaptive mechanisms for DE. We propose a new, parameter adaptation technique for DE which uses a historical memory of successful control parameter settings to guide the selection of future control parameter values. The proposed method is evaluated by comparison on 28 problems from the CEC2013 benchmark set, as well as CEC2005 benchmarks and the set of 13 classical benchmark problems. The experimental results show that a DE using our success-history based parameter adaptation method is competitive with the state-of-the-art DE algorithms.

906 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present a set of 175 benchmark functions for unconstrained optimisation problems with diverse properties in terms of modality, separability, and valley landscape.
Abstract: Test functions are important to validate and compare the performance of optimisation algorithms. There have been many test or benchmark functions reported in the literature; however, there is no standard list or set of benchmark functions. Ideally, test functions should have diverse properties to be truly useful to test new algorithms in an unbiased way. For this purpose, we have reviewed and compiled a rich set of 175 benchmark functions for unconstrained optimisation problems with diverse properties in terms of modality, separability, and valley landscape. This is by far the most complete set of functions so far in the literature, and it can be expected that this complete set of functions can be used for validation of new optimisation in the future.

876 citations


Posted Content
TL;DR: This paper proposed a new benchmark corpus for measuring progress in statistical language modeling, which consists of almost one billion words of training data and can be used to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques.
Abstract: We propose a new benchmark corpus to be used for measuring progress in statistical language modeling. With almost one billion words of training data, we hope this benchmark will be useful to quickly evaluate novel language modeling techniques, and to compare their contribution when combined with other advanced techniques. We show performance of several well-known types of language models, with the best results achieved with a recurrent neural network based language model. The baseline unpruned Kneser-Ney 5-gram model achieves perplexity 67.6; a combination of techniques leads to 35% reduction in perplexity, or 10% reduction in cross-entropy (bits), over that baseline. The benchmark is available as a this http URL project; besides the scripts needed to rebuild the training/held-out data, it also makes available log-probability values for each word in each of ten held-out data sets, for each of the baseline n-gram models.

865 citations


Proceedings Article
Adam Coates1, Brody Huval1, Tao Wang1, David J. Wu1, Bryan Catanzaro2, Ng Andrew1 
16 Jun 2013
TL;DR: This paper presents technical details and results from their own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI, and shows that it can scale to networks with over 11 billion parameters using just 16 machines.
Abstract: Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloudlike computing infrastructure and thousands of CPU cores. In this paper, we present technical details and results from our own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI. Our system is able to train 1 billion parameter networks on just 3 machines in a couple of days, and we show that it can scale to networks with over 11 billion parameters using just 16 machines. As this infrastructure is much more easily marshaled by others, the approach enables much wider-spread research with extremely large neural networks.

740 citations


Proceedings ArticleDOI
01 Aug 2013
TL;DR: This work introduces a real-world benchmark data set for traffic sign detection together with carefully chosen evaluation metrics, baseline results, and a web-interface for comparing approaches, and presents the best-performing algorithms of the IJCNN competition.
Abstract: Real-time detection of traffic signs, the task of pinpointing a traffic sign's location in natural images, is a challenging computer vision task of high industrial relevance. Various algorithms have been proposed, and advanced driver assistance systems supporting detection and recognition of traffic signs have reached the market. Despite the many competing approaches, there is no clear consensus on what the state-of-the-art in this field is. This can be accounted to the lack of comprehensive, unbiased comparisons of those methods. We aim at closing this gap by the “German Traffic Sign Detection Benchmark” presented as a competition at IJCNN 2013 (International Joint Conference on Neural Networks). We introduce a real-world benchmark data set for traffic sign detection together with carefully chosen evaluation metrics, baseline results, and a web-interface for comparing approaches. In our evaluation, we separate sign detection from classification, but still measure the performance on relevant categories of signs to allow for benchmarking specialized solutions. The considered baseline algorithms represent some of the most popular detection approaches such as the Viola-Jones detector based on Haar features and a linear classifier relying on HOG descriptors. Further, a recently proposed problem-specific algorithm exploiting shape and color in a model-based Houghlike voting scheme is evaluated. Finally, we present the best-performing algorithms of the IJCNN competition.

717 citations


Journal ArticleDOI
TL;DR: In this paper, Aaronson and Arkhipov's model of computation with photons in integrated optical circuits was implemented and the authors set a benchmark for a type of quantum computer that can potentially outperform a conventional computer by using only a few photons and linear optical elements.
Abstract: The boson-sampling problem is experimentally solved by implementing Aaronson and Arkhipov's model of computation with photons in integrated optical circuits. These results set a benchmark for a type of quantum computer that can potentially outperform a conventional computer by using only a few photons and linear optical elements.

710 citations


01 Jan 2013
TL;DR: Introducing imbalance between the contribution of various subcomponents, subComponents with nonuniform sizes, and conforming and conflicting overlapping functions are among the major new features proposed in this report.
Abstract: This report proposes 15 large-scale benchmark problems as an extension to the existing CEC’2010 large-scale global optimization benchmark suite. The aim is to better represent a wider range of realworld large-scale optimization problems and provide convenience and flexibility for comparing various evolutionary algorithms specifically designed for large-s cale global optimization. Introducing imbalance between the contribution of various subcomponents, subcomponents with nonuniform sizes, and conforming and conflicting overlapping functions are among the major new features proposed in this report.

626 citations


Journal ArticleDOI
TL;DR: Global state-space error bounds are developed that justify the method's design and highlight its advantages in terms of minimizing components of these error bounds and a 'sample mesh' concept is introduced that enables a distributed, computationally efficient implementation of the GNAT method in finite-volume-based computational-fluid-dynamics (CFD) codes.

Journal ArticleDOI
TL;DR: A new metaheuristic optimization algorithm, called bat algorithm (BA), is used to solve constraint optimization tasks, and the optimal solutions obtained are found to be better than the best solutions provided by the existing methods.
Abstract: In this study, we use a new metaheuristic optimization algorithm, called bat algorithm (BA), to solve constraint optimization tasks. BA is verified using several classical benchmark constraint problems. For further validation, BA is applied to three benchmark constraint engineering problems reported in the specialized literature. The performance of the bat algorithm is compared with various existing algorithms. The optimal solutions obtained by BA are found to be better than the best solutions provided by the existing methods. Finally, the unique search features used in BA are analyzed, and their implications for future research are discussed in detail.

Journal ArticleDOI
TL;DR: In this article, a comprehensive review of step-up single-phase non-isolated inverters suitable for ac-module applications is presented, where the selected solutions are designed and simulated complying with the benchmark obtaining passive and semiconductor components ratings.
Abstract: This paper presents a comprehensive review of step-up single-phase non-isolated inverters suitable for ac-module applications. In order to compare the most feasible solutions of the reviewed topologies, a benchmark is set. This benchmark is based on a typical ac-module application considering the requirements for the solar panels and the grid. The selected solutions are designed and simulated complying with the benchmark obtaining passive and semiconductor components ratings in order to perform a comparison in terms of size and cost. A discussion of the analyzed topologies regarding the obtained ratings as well as ground currents is presented. Recommendations for topological solutions complying with the application benchmark are provided.

Journal ArticleDOI
TL;DR: The paper presents an efficient Hybrid Genetic Search with Advanced Diversity Control for a large class of time-constrained vehicle routing problems, introducing several new features to manage the temporal dimension.

Journal ArticleDOI
TL;DR: In this paper, the authors presented a test benchmark model for the evaluation of fault detection and accommodation schemes for a wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system.
Abstract: This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Journal ArticleDOI
TL;DR: A hybrid PSO algorithm is proposed, called DNSPSO, which employs a diversity enhancing mechanism and neighborhood search strategies to achieve a trade-off between exploration and exploitation abilities.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the potential of using occupancy information to realize a more energy efficient building climate control, focusing on Swiss office buildings equipped with Integrated Room Automation (IRA), i.e. the integrated control of Heating, Ventilation, Air Conditioning (HVAC) as well as lighting and blind positioning of a building zone or room.

Journal ArticleDOI
01 Jan 2013
TL;DR: An all-around survey of 12 state-of-the-art geo-textual indices and proposes a benchmark that enables the comparison of the spatial keyword query performance, thus uncovering new insights that may guide index selection as well as further research.
Abstract: Geo-textual indices play an important role in spatial keyword querying. The existing geo-textual indices have not been compared systematically under the same experimental framework. This makes it difficult to determine which indexing technique best supports specific functionality. We provide an all-around survey of 12 state-of-the-art geo-textual indices. We propose a benchmark that enables the comparison of the spatial keyword query performance. We also report on the findings obtained when applying the benchmark to the indices, thus uncovering new insights that may guide index selection as well as further research.

Journal ArticleDOI
TL;DR: Results show that the tool produces hardware solutions of comparable quality to a commercial high-level synthesis tool, and results demonstrate the ability of the tool to explore the hardware/software codesign space by varying the amount of a program that runs in software versus hardware.
Abstract: It is generally accepted that a custom hardware implementation of a set of computations will provide superior speed and energy efficiency relative to a software implementation. However, the cost and difficulty of hardware design is often prohibitive, and consequently, a software approach is used for most applications. In this article, we introduce a new high-level synthesis tool called LegUp that allows software techniques to be used for hardware design. LegUp accepts a standard C program as input and automatically compiles the program to a hybrid architecture containing an FPGA-based MIPS soft processor and custom hardware accelerators that communicate through a standard bus interface. In the hybrid processor/accelerator architecture, program segments that are unsuitable for hardware implementation can execute in software on the processor. LegUp can synthesize most of the C language to hardware, including fixed-sized multidimensional arrays, structs, global variables, and pointer arithmetic. Results show that the tool produces hardware solutions of comparable quality to a commercial high-level synthesis tool. We also give results demonstrating the ability of the tool to explore the hardware/software codesign space by varying the amount of a program that runs in software versus hardware. LegUp, along with a set of benchmark C programs, is open source and freely downloadable, providing a powerful platform that can be leveraged for new research on a wide range of high-level synthesis topics.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A unified benchmark dataset of 100 RGBD videos with high diversity is constructed, different kinds of RGBD tracking algorithms using 2D or 3D model are proposed, and a quantitative comparison of various algorithms with RGB or RGBD input is presented.
Abstract: Despite significant progress, tracking is still considered to be a very challenging task. Recently, the increasing popularity of depth sensors has made it possible to obtain reliable depth easily. This may be a game changer for tracking, since depth can be used to prevent model drift and handle occlusion. We also observe that current tracking algorithms are mostly evaluated on a very small number of videos collected and annotated by different groups. The lack of a reasonable size and consistently constructed benchmark has prevented a persuasive comparison among different algorithms. In this paper, we construct a unified benchmark dataset of 100 RGBD videos with high diversity, propose different kinds of RGBD tracking algorithms using 2D or 3D model, and present a quantitative comparison of various algorithms with RGB or RGBD input. We aim to lay the foundation for further research in both RGB and RGBD tracking, and our benchmark is available at http://tracking.cs.princeton.edu.

Journal ArticleDOI
TL;DR: In this article, the authors systematically cover the significant developments of the last decade, including surrogate modeling of electrical machines and direct and stochastic search algorithms for both single and multi-objective design optimization problems.
Abstract: This paper systematically covers the significant developments of the last decade, including surrogate modeling of electrical machines and direct and stochastic search algorithms for both single- and multi-objective design optimization problems. The specific challenges and the dedicated algorithms for electric machine design are discussed, followed by benchmark studies comparing response surface (RS) and differential evolution (DE) algorithms on a permanent-magnet-synchronous-motor design with five independent variables and a strong nonlinear multiobjective Pareto front and on a function with eleven independent variables. The results show that RS and DE are comparable when the optimization employs only a small number of candidate designs and DE performs better when more candidates are considered.

01 Jan 2013
TL;DR: It is believed it is now time to adopt a unifying framework for evaluating niching methods, so that further advance in this area can be made with ease.
Abstract: Evolutionary Algorithms (EAs) in their original forms are usually designed for locating a single global solution. These algorithms typically converge to a single solution because of the global selection scheme used. Nevertheless, many realworld problems are “multimodal” by nature, i.e., multiple satisfactory solutions exist. It may be desirable to locate many such satisfactory solutions so that a decision maker can choose one that is most proper in his/her problem domain. Numerous techniques have been developed in the past for locating multiple optima (global or local). These techniques are commonly referred to as “niching” methods. A niching method can be incorporated into a standard EA to promote and maintain formation of multiple stable subpopulations within a single population, with an aim to locate multiple globally optimal or suboptimal solutions. Many niching methods have been developed in the past, including crowding [1], fitness sharing [2], deterministic crowding [3], derating [4], restricted tournament selection [5], parallelization [6], stretching and deflation [7], clustering [8], clearing [9], and speciation [10], etc. Although these niching methods have been around for many years, further advances in this area have been hindered by several obstacles: most studies focus on very low dimensional multimodal problems (2 or 3 dimensions), therefore it is difficult to assess these methods’ scalability to high dimensions; some niching methods introduces new parameters which are difficult to set, making these methods difficult to use; different benchmark test functions or different variants of the same functions are used, hence comparing the performance of different niching methods is difficult. We believe it is now time to adopt a unifying framework for evaluating niching methods, so that further advance in this area can be made with ease. In this technical report, we put together 20 benchmark test functions (including several identical functions with different dimension sizes), with different characteristics, for evaluating niching algorithms. The first 10 benchmark functions are simple, well known and widely used functions, largely based on some recent studies on niching [11], [12], [13]. The remaining benchmark functions

ReportDOI
01 Jun 2013
TL;DR: A new high performance conjugate gradient (HPCG) benchmark is described, composed of computations and data access patterns more commonly found in applications that strive for a better correlation to real scientific application performance.
Abstract: The High Performance Linpack (HPL), or Top 500, benchmark [1] is the most widely recognized and discussed metric for ranking high performance computing systems However, HPL is increasingly unreliable as a true measure of system performance for a growing collection of important science and engineering applications In this paper we describe a new high performance conjugate gradient (HPCG) benchmark HPCG is composed of computations and data access patterns more commonly found in applications Using HPCG we strive for a better correlation to real scientific application performance and expect to drive computer system design and implementation in directions that will better impact performance improvement

Journal ArticleDOI
TL;DR: This paper claims that the results of these simulations represent strong benchmarks, which can be used as a basis for evaluating the accuracy of other codes, including other approaches than particle-in-cell simulations.
Abstract: Benchmarking is generally accepted as an important element in demonstrating the correctness of computer simulations. In the modern sense, a benchmark is a computer simulation result that has evidence of correctness, is accompanied by estimates of relevant errors, and which can thus be used as a basis for judging the accuracy and efficiency of other codes. In this paper, we present four benchmark cases related to capacitively coupled discharges. These benchmarks prescribe all relevant physical and numerical parameters. We have simulated the benchmark conditions using five independently developed particle-in-cell codes. We show that the results of these simulations are statistically indistinguishable, within bounds of uncertainty that we define. We, therefore, claim that the results of these simulations represent strong benchmarks, which can be used as a basis for evaluating the accuracy of other codes. These other codes could include other approaches than particle-in-cell simulations, where benchmarking could examine not just implementation accuracy and efficiency, but also the fidelity of different physical models, such as moment or hybrid models. We discuss an example of this kind in the Appendix. Of course, the methodology that we have developed can also be readily extended to a suite of benchmarks with coverage of a wider range of physical and chemical phenomena.

Proceedings Article
03 Nov 2013
TL;DR: SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods, is presented.
Abstract: While many statistical consensus methods now exist, relatively little comparative benchmarking and integration of techniques has made it increasingly difficult to determine the current state-of-the-art, to evaluate the relative benefit of new methods, to understand where specific problems merit greater attention, and to measure field progress over time. To make such comparative evaluation easier for everyone, we present SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods. In addition to measuring performance on a variety of public, real crowd datasets, the benchmark also varies supervision and noise by manipulating training size and labeling error. We envision SQUARE as dynamic and continually evolving, with new datasets and reference implementations being added according to community needs and interest. We invite community contributions and participation.

Proceedings ArticleDOI
20 Jun 2013
TL;DR: This paper proposes a cooperative coevolution framework that is capable of optimizing large scale (in decision variable space) multi-objective optimization problems and compares its proposed algorithm with respect to two state-of-the-art multi- objective evolutionary algorithms.
Abstract: Many real-world multi-objective optimization problems have hundreds or even thousands of decision variables, which contrast with the current practice of multi-objective metaheuristics whose performance is typically assessed using benchmark problems with a relatively low number of decision variables (normally, no more than 30). In this paper, we propose a cooperative coevolution framework that is capable of optimizing large scale (in decision variable space) multi-objective optimization problems. We adopt a benchmark that is scalable in the number of decision variables (the ZDT test suite) and compare our proposed algorithm with respect to two state-of-the-art multi-objective evolutionary algorithms (GDE3 and NSGA-II) when using a large number of decision variables (from 200 up to 5000). The results clearly indicate that our proposed approach is effective as well as efficient for solving large scale multi-objective optimization problems.

Proceedings ArticleDOI
20 Jun 2013
TL;DR: This paper evaluates the performance of Success-History based Adaptive DE (SHADE) on the benchmark set for the CEC2013 Competition on Real-Parameter Single Objective Optimization.
Abstract: This paper evaluates the performance of Success-History based Adaptive DE (SHADE) on the benchmark set for the CEC2013 Competition on Real-Parameter Single Objective Optimization SHADE is an adaptive differential algorithm which uses a history-based parameter adaptation scheme Experimental results on 28 problems from the CEC2013 benchmarks for 10, 30, and 50 dimensions are presented, including measurements of algorithmic complexity In addition, we investigate the parameter adaptation behavior of SHADE on these instances

Journal ArticleDOI
01 Apr 2013
TL;DR: This paper introduces a new hybrid algorithmic approach based on Particle Swarm Optimization (PSO) for successfully solving one of the most popular supply chain management problems, the Vehicle Routing Problem with Stochastic Demands (VRPSD).
Abstract: This paper introduces a new hybrid algorithmic approach based on Particle Swarm Optimization (PSO) for successfully solving one of the most popular supply chain management problems, the Vehicle Routing Problem with Stochastic Demands (VRPSD). The VRPSD is a well known NP-hard problem in which a vehicle with finite capacity leaves from the depot with full load and has to serve a set of customers whose demands are known only when the vehicle arrives to them. A number of different variants of the PSO are tested and the one that performs better is used for solving benchmark instances from the literature.

Proceedings ArticleDOI
23 Feb 2013
TL;DR: A compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores.
Abstract: General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high level-language (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.

01 Jul 2013
TL;DR: The BEAVRS as mentioned in this paper depletion benchmark is based on two operational cycles of a commercial nuclear power plant that provides a detailed description of fuel assemblies, burnable absorbers, in-core fission detectors, core loading patterns, and numerous invessel components.
Abstract: Advances in parallel computing have made possible the development of high-fidelity tools for the design and analysis of nuclear reactor cores, and such tools require extensive verification and validation. This paper introduces BEAVRS, a new multi-cycle full-core Pressurized Water Reactor (PWR) depletion benchmark based on two operational cycles of a commercial nuclear power plant that provides a detailed description of fuel assemblies, burnable absorbers, in-core fission detectors, core loading patterns, and numerous in-vessel components. This benchmark enables analysts to develop extremely detailed reactor core models that can be used for testing and validation of coupled neutron transport, thermal-hydraulics, and fuel isotopic depletion. The benchmark also provides measured reactor data for Hot Zero Power (HZP) physics tests, boron letdown curves, and three-dimensional in-core flux maps from fifty-eight instrumented assemblies. Initial comparisons between calculations performed with MIT's OpenMC Monte Carlo neutron transport code and measured cycle 1 HZP test data are presented, and these results display an average deviation of approximately 100 pcm for the various critical configurations and control rod worth measurements. Computed HZP radial fission detector flux maps also agree reasonably well with the available measured data. All results indicate that this benchmark will be extremely useful in validation ofmore » coupled-physics codes and uncertainty quantification of in-core physics computational predictions. The detailed BEAVRS specification and its associated data package is hosted online at the MIT Computational Reactor Physics Group web site (http://crpg.mit.edu/), where future revisions and refinements to the benchmark specification will be made publicly available. (authors)« less