scispace - formally typeset
Search or ask a question

Showing papers by "Hewlett-Packard published in 2008"


Journal ArticleDOI
01 May 2008-Nature
TL;DR: It is shown, using a simple analytical example, that memristance arises naturally in nanoscale systems in which solid-state electronic and ionic transport are coupled under an external bias voltage.
Abstract: Anyone who ever took an electronics laboratory class will be familiar with the fundamental passive circuit elements: the resistor, the capacitor and the inductor. However, in 1971 Leon Chua reasoned from symmetry arguments that there should be a fourth fundamental element, which he called a memristor (short for memory resistor). Although he showed that such an element has many interesting and valuable circuit properties, until now no one has presented either a useful physical model or an example of a memristor. Here we show, using a simple analytical example, that memristance arises naturally in nanoscale systems in which solid-state electronic and ionic transport are coupled under an external bias voltage. These results serve as the foundation for understanding a wide range of hysteretic current-voltage behaviour observed in many nanoscale electronic devices that involve the motion of charged atomic or molecular species, in particular certain titanium dioxide cross-point switches.

8,971 citations


Journal ArticleDOI
TL;DR: Experimental evidence is provided to support this general model of memristive electrical switching in oxide systems, and micro- and nanoscale TiO2 junction devices with platinum electrodes that exhibit fast bipolar nonvolatile switching are built.
Abstract: Nanoscale metal/oxide/metal switches have the potential to transform the market for nonvolatile memory and could lead to novel forms of computing. However, progress has been delayed by difficulties in understanding and controlling the coupled electronic and ionic phenomena that dominate the behaviour of nanoscale oxide devices. An analytic theory of the ‘memristor’ (memory-resistor) was first developed from fundamental symmetry arguments in 1971, and we recently showed that memristor behaviour can naturally explain such coupled electron–ion dynamics. Here we provide experimental evidence to support this general model of memristive electrical switching in oxide systems. We have built micro- and nanoscale TiO2 junction devices with platinum electrodes that exhibit fast bipolar nonvolatile switching. We demonstrate that switching involves changes to the electronic barrier at the Pt/TiO2 interface due to the drift of positively charged oxygen vacancies under an applied electric field. Vacancy drift towards the interface creates conducting channels that shunt, or short-circuit, the electronic barrier to switch ON. The drift of vacancies away from the interface annilihilates such channels, recovering the electronic barrier to switch OFF. Using this model we have built TiO2 crosspoints with engineered oxygen vacancy profiles that predictively control the switching polarity and conductance. Nanoscale metal/oxide/metal devices that are capable of fast non-volatile switching have been built from platinum and titanium dioxide. The devices could have applications in ultrahigh density memory cells and novel forms of computing.

2,744 citations


Journal ArticleDOI
TL;DR: The capacity of the two-user Gaussian interference channel has been open for 30 years and the best known achievable region is due to Han and Kobayashi as mentioned in this paper, but its characterization is very complicated.
Abstract: The capacity of the two-user Gaussian interference channel has been open for 30 years. The understanding on this problem has been limited. The best known achievable region is due to Han and Kobayashi but its characterization is very complicated. It is also not known how tight the existing outer bounds are. In this work, we show that the existing outer bounds can in fact be arbitrarily loose in some parameter ranges, and by deriving new outer bounds, we show that a very simple and explicit Han-Kobayashi type scheme can achieve to within a single bit per second per hertz (bit/s/Hz) of the capacity for all values of the channel parameters. We also show that the scheme is asymptotically optimal at certain high signal-to-noise ratio (SNR) regimes. Using our results, we provide a natural generalization of the point-to-point classical notion of degrees of freedom to interference-limited scenarios.

1,473 citations


Posted Content
TL;DR: In this paper, a study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the declared set of friends and followers, revealing that the linked structures of social networks do not reveal actual interactions among people.
Abstract: Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention. A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the declared set of friends and followers.

1,151 citations


Proceedings ArticleDOI
15 Dec 2008
TL;DR: This paper considers the one-class problem under the CF setting, and proposes two frameworks to tackle OCCF, one based on weighted low rank approximation; the other based on negative example sampling.
Abstract: Many applications of collaborative filtering (CF), such as news item recommendation and bookmark recommendation, are most naturally thought of as one-class collaborative filtering (OCCF) problems. In these problems, the training data usually consist simply of binary data reflecting a user's action or inaction, such as page visitation in the case of news item recommendation or webpage bookmarking in the bookmarking scenario. Usually this kind of data are extremely sparse (a small fraction are positive examples), therefore ambiguity arises in the interpretation of the non-positive examples. Negative examples and unlabeled positive examples are mixed together and we are typically unable to distinguish them. For example, we cannot really attribute a user not bookmarking a page to a lack of interest or lack of awareness of the page. Previous research addressing this one-class problem only considered it as a classification task. In this paper, we consider the one-class problem under the CF setting. We propose two frameworks to tackle OCCF. One is based on weighted low rank approximation; the other is based on negative example sampling. The experimental results show that our approaches significantly outperform the baselines.

1,058 citations


Posted Content
TL;DR: Early patterns of Digg diggs and YouTube views reflect long-term user interest, according to research published in the journal “Attention to Detail .”
Abstract: We present a method for accurately predicting the long time popularity of online content from early measurements of user access. Using two content sharing portals, Youtube and Digg, we show that by modeling the accrual of views and votes on content offered by these services we can predict the long-term dynamics of individual submissions from initial data. In the case of Digg, measuring access to given stories during the first two hours allows us to forecast their popularity 30 days ahead with remarkable accuracy, while downloads of Youtube videos need to be followed for 10 days to attain the same performance. The differing time scales of the predictions are shown to be due to differences in how content is consumed on the two portals: Digg stories quickly become outdated, while Youtube videos are still found long after they are initially submitted to the portal. We show that predictions are more accurate for submissions for which attention decays quickly, whereas predictions for evergreen content will be prone to larger errors.

880 citations


Journal ArticleDOI
TL;DR: A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the "declared" set of friends and followers as mentioned in this paper, revealing that the linked structures of social networks do not reveal actual interactions among people.
Abstract: Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention. A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the “declared” set of friends and followers.

787 citations


Book ChapterDOI
23 Jun 2008
TL;DR: This paper describes a CF algorithm alternating-least-squares with weighted-?-regularization(ALS-WR), which is implemented on a parallel Matlab platform and shows empirically that the performance of ALS-WR monotonically improves with both the number of features and thenumber of ALS iterations.
Abstract: Many recommendation systems suggest items to users by utilizing the techniques of collaborative filtering(CF) based on historical records of items that the users have viewed, purchased, or rated Two major problems that most CF approaches have to contend with are scalability and sparseness of the user profiles To tackle these issues, in this paper, we describe a CF algorithm alternating-least-squares with weighted-?-regularization(ALS-WR), which is implemented on a parallel Matlab platform We show empirically that the performance of ALS-WR (in terms of root mean squared error(RMSE)) monotonically improves with both the number of features and the number of ALS iterations We applied the ALS-WR algorithm on a large-scale CF problem, the Netflix Challenge, with 1000 hidden features and obtained a RMSE score of 08985, which is one of the best results based on a pure method In addition, combining with the parallel version of other known methods, we achieved a performance improvement of 591% over Netflix's own CineMatch recommendation system Our method is simple and scales well to very large datasets

776 citations


Proceedings ArticleDOI
01 Mar 2008
TL;DR: This paper proposes and validate a power management solution that coordinates different individual approaches and performs a detailed quantitative sensitivity analysis to draw conclusions about the impact of different architectures, implementations, workloads, and system design choices.
Abstract: Power delivery, electricity consumption, and heat management are becoming key challenges in data center environments. Several past solutions have individually evaluated different techniques to address separate aspects of this problem, in hardware and software, and at local and global levels. Unfortunately, there has been no corresponding work on coordinating all these solutions. In the absence of such coordination, these solutions are likely to interfere with one another, in unpredictable (and potentially dangerous) ways. This paper seeks to address this problem. We make two key contributions. First, we propose and validate a power management solution that coordinates different individual approaches. Using simulations based on 180 server traces from nine different real-world enterprises, we demonstrate the correctness, stability, and efficiency advantages of our solution. Second, using our unified architecture as the base, we perform a detailed quantitative sensitivity analysis and draw conclusions about the impact of different architectures, implementations, workloads, and system design choices.

707 citations


Journal ArticleDOI
01 Jun 2008
TL;DR: This work believes that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.
Abstract: We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impediments. Recent developments in silicon nanophotonic technology have the potential to meet these off- and on-stack bandwidth requirements at acceptable power levels. Corona is a 3D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Its peak floating-point performance is 10 teraflops. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We have simulated a 1024 thread Corona system running synthetic benchmarks and scaled versions of the SPLASH-2 benchmark suite. We believe that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously reducing power.

688 citations


Journal ArticleDOI
R. Williams1
TL;DR: A memristor is a two-terminal memory resistor whose resistance depends on the voltage applied to it and the length of time that voltage has been applied as discussed by the authors, i.e., when the voltage is turned off, the memory resistor remembers its most recent resistance until the next time it is turned on.
Abstract: This article discusses the development of a memristor and how it works. A memristor is a contraction of a memory resistor and is a two-terminal device whose resistance depends on the voltage applied to it and the length of time that voltage has been applied. This device remembers its history, that is, when you turn off the voltage, the memristor remembers its most recent resistance until the next time you turn it on.

Journal ArticleDOI
TL;DR: In this paper, a junction between a silicon strip waveguide and an ultra-compact silicon microring resonator is demonstrated, which minimizes spurious light scattering and increases the critical dimensions of the geometry.
Abstract: We demonstrate a junction between a silicon strip waveguide and an ultra-compact silicon microring resonator that minimizes spurious light scattering and increases the critical dimensions of the geometry. We show cascaded silicon microring resonators with radii around 1.5 microm and effective mode volumes around 1.0 microm(3) that are critically coupled to a waveguide with coupled Q's up to 9,000. The radius of 1.5 microm is smaller than the operational wavelength, and is close to the theoretical size limit of the silicon microring ring resonator for the same Q. The device is fabricated with a widely-available SEM-based lithography system using a stitch-free design based on a U-shaped waveguide.

Proceedings ArticleDOI
07 Jun 2008
TL;DR: The simple model the effort to address issues by explicitly providing semantics for threads in the next revision of the C++ standard is described, and how this, together with some practical, but often under-appreciated implementation constraints, drives us towards the above decisions.
Abstract: Currently multi-threaded C or C++ programs combine a single-threaded programming language with a separate threads library. This is not entirely sound [7].We describe an effort, currently nearing completion, to address these issues by explicitly providing semantics for threads in the next revision of the C++ standard. Our approach is similar to that recently followed by Java [25], in that, at least for a well-defined and interesting subset of the language, we give sequentially consistent semantics to programs that do not contain data races. Nonetheless, a number of our decisions are often surprising even to those familiar with the Java effort:We (mostly) insist on sequential consistency for race-free programs, in spite of implementation issues that came to light after the Java work.We give no semantics to programs with data races. There are no benign C++ data races.We use weaker semantics for trylock than existing languages or libraries, allowing us to promise sequential consistency with an intuitive race definition, even for programs with trylock.This paper describes the simple model we would like to be able to provide for C++ threads programmers, and explain how this, together with some practical, but often under-appreciated implementation constraints, drives us towards the above decisions.

Journal ArticleDOI
TL;DR: A manufacturer’s problem of managing his direct online sales channel together with an independently owned bricks-and-mortar retail channel is studied, when the channels compete in service.
Abstract: We study a manufacturer’s problem of managing his direct online sales channel together with an independently owned bricks-and-mortar retail channel, when the channels compete in service. We incorporate a detailed consumer channel choice model in which the demand faced in each channel depends on the service levels of both channels as well as the consumers’ valuation of the product and shopping experience. The direct channel’s service is measured by the delivery lead time for the product; the retail channel’s service is measured by product availability. We identify optimal dual channel strategies that depend on the channel environment described by factors such as the cost of managing a direct channel, retailer inconvenience, and some product characteristics. We also determine when the manufacturer should establish a direct channel or a retail channel if he is already selling through one of these channels. Finally, we conduct a sequence of controlled experiments with human subjects to investigate whether our model makes reasonable predictions of human behavior. We determine that the model accurately predicts the direction of changes in the subjects’ decisions, as well as their channel strategies in response to the changes in the channel environment. These observations suggest that the model can be used in designing channel strategies for an actual dual channel environment. 1

Journal ArticleDOI
TL;DR: The thermal challenges in next-generation electronic systems, as identified through panel presentations and ensuing discussions at the workshop, Thermal Challenges in Next Generation Electronic Systems, held in Santa Fe, NM, January 7-10, 2007, are summarized in this article.
Abstract: Thermal challenges in next-generation electronic systems, as identified through panel presentations and ensuing discussions at the workshop, Thermal Challenges in Next Generation Electronic Systems, held in Santa Fe, NM, January 7-10, 2007, are summarized in this paper. Diverse topics are covered, including electrothermal and multiphysics codesign of electronics, new and nanostructured materials, high heat flux thermal management, site-specific thermal management, thermal design of next-generation data centers, thermal challenges for military, automotive, and harsh environment electronic systems, progress and challenges in software tools, and advances in measurement and characterization. Barriers to further progress in each area that require the attention of the research community are identified.

Proceedings ArticleDOI
Greg Snider1
12 Jun 2008
TL;DR: The key ideas are to factor out two synaptic state variables to pre- and post-synaptic neurons and to separate computational communication from learning by time-division multiplexing of pulse-width-modulated signals through synapses.
Abstract: The neuromorphic paradigm is attractive for nanoscale computation because of its massive parallelism, potential scalability, and inherent defect-, fault-, and failure-tolerance. We show how to implement timing-based learning laws, such as spike-timing-dependent plasticity (STDP), in simple, memristive nanodevices, such as those constructed from certain metal oxides. Such nano-scale ldquosynapsesrdquo can be combined with CMOS ldquoneuronsrdquo to create neuromorphic hardware several orders of magnitude denser than is possible in conventional CMOS. The key ideas are: (1) to factor out two synaptic state variables to pre- and post-synaptic neurons; and (2) to separate computational communication from learning by time-division multiplexing of pulse-width-modulated signals through synapses. This approach offers the advantages of: better control over power dissipation; fewer constraints on the design of memristive materials used for nanoscale synapses; learning dynamics can be dynamically turned on or off (e.g. by attentional priming mechanisms communicated extra-synaptically); greater control over the precise form and timing of the STDP equations; the ability to implement a variety of other learning laws besides STDP; better circuit diversity since the approach allows different learning laws to be implemented in different areas of a single chip using the same memristive material for all synapses.

Proceedings ArticleDOI
21 Apr 2008
TL;DR: This paper formalizes the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data and defines and analyzes the characteristics of heuristics for selectivity-based static BGP optimization.
Abstract: In this paper, we formalize the problem of Basic Graph Pattern (BGP) optimization for SPARQL queries and main memory graph implementations of RDF data. We define and analyze the characteristics of heuristics for selectivity-based static BGP optimization. The heuristics range from simple triple pattern variable counting to more sophisticated selectivity estimation techniques. Customized summary statistics for RDF data enable the selectivity estimation of joined triple patterns and the development of efficient heuristics. Using the Lehigh University Benchmark (LUBM), we evaluate the performance of the heuristics for the queries provided by the LUBM and discuss some of them in more details.

Proceedings ArticleDOI
09 Jun 2008
TL;DR: Overall, overheads and optimizations that explain a total difference of about a factor of 20x in raw performance are identified and it is shown that there is no single "high pole in the tent" in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations.
Abstract: Online Transaction Processing (OLTP) databases include a suite of features - disk-resident B-trees and heap files, locking-based concurrency control, support for multi-threading - that were optimized for computer technology of the late 1970's Advances in modern processors, memories, and networks mean that today's computers are vastly different from those of 30 years ago, such that many OLTP databases will now fit in main memory, and most OLTP transactions can be processed in milliseconds or less Yet database architecture has changed littleBased on this observation, we look at some interesting variants of conventional database systems that one might build that exploit recent hardware trends, and speculate on their performance through a detailed instruction-level breakdown of the major components involved in a transaction processing database system (Shore) running a subset of TPC-C Rather than simply profiling Shore, we progressively modified it so that after every feature removal or optimization, we had a (faster) working system that fully ran our workload Overall, we identify overheads and optimizations that explain a total difference of about a factor of 20x in raw performance We also show that there is no single "high pole in the tent" in modern (memory resident) database systems, but that substantial time is spent in logging, latching, locking, B-tree, and buffer management operations

Proceedings Article
07 Dec 2008
TL;DR: This analysis shows that a model based on OS utilization metrics and CPU performance counters is generally most accurate across the machines and workloads tested, and is particularly useful for machines whose dynamic power consumption is not dominated by the CPU, as well as machines with aggressively power-managed CPUs.
Abstract: Dynamic power management in enterprise environments requires an understanding of the relationship between resource utilization and system-level power consumption. Power models based on resource utilization have been proposed in the context of enabling specific energy-efficiency optimizations on specific machines, but the accuracy and portability of different approaches to modeling have not been systematically compared. In this work, we use a common infrastructure to fit a family of high-level full-system power models, and we compare these models over a wide variation of workloads and machines, from a laptop to a server. This analysis shows that a model based on OS utilization metrics and CPU performance counters is generally most accurate across the machines and workloads tested. It is particularly useful for machines whose dynamic power consumption is not dominated by the CPU, as well as machines with aggressively power-managed CPUs, two classes of systems that are increasingly prevalent.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a quantum non-demolition method to measure a single-electron spin in a quantum dot inside a microcavity where a negatively charged exciton strongly couples to the cavity mode.
Abstract: We propose a quantum nondemolition method---a giant optical Faraday rotation near the resonant regime to measure a single-electron spin in a quantum dot inside a microcavity where a negatively charged exciton strongly couples to the cavity mode. Left-circularly and right-circularly polarized lights reflected from the cavity obtain different phase shifts due to cavity quantum electrodynamics and the optical spin selection rule. This yields giant and tunable Faraday rotation that can be easily detected experimentally. Based on this spin-detection technique, a deterministic photon-spin entangling gate and a scalable scheme to create remote spin entanglement via a single photon are proposed.

Proceedings ArticleDOI
22 Sep 2008
TL;DR: A new paradigm -- the Compliance Budget -- is presented as a means of understanding how individuals perceive the costs and benefits of compliance with organisational security goals, and a range of approaches that security managers can use to influence employee's perceptions are identified.
Abstract: A significant number of security breaches result from employees' failure to comply with security policies. Many organizations have tried to change or influence security behaviour, but found it a major challenge. Drawing on previous research on usable security and economics of security, we propose a new approach to managing employee security behaviour. We conducted interviews with 17 employees from two major commercial organizations, asking why they do or don't comply with security policies. Our results show that key factors in the compliance decision are the actual and anticipated cost and benefits of compliance to the individual employee, and perceived cost and benefits to the organization. We present a new paradigm -- the Compliance Budget - as a means of understanding how individuals perceive the costs and benefits of compliance with organisational security goals, and identify a range of approaches that security managers can use to influence employee's perceptions (which, in turn, influence security behaviour). The Compliance Budget should be understood and managed in the same way as any financial budget, as compliance directly affects, and can place a cap on, effectiveness of organisational security measures.

Journal ArticleDOI
TL;DR: It is shown through an analysis of a massive data set from YouTube that the productivity exhibited in crowdsourcing exhibits a strong positive dependence on attention, measured by the number of downloads.
Abstract: The tragedy of the digital commons does not prevent the copious voluntary production of content that one witnesses in the web. We show through an analysis of a massive data set from YouTube that the productivity exhibited in crowdsourcing exhibits a strong positive dependence on attention, measured by the number of downloads. Conversely, a lack of attention leads to a decrease in the number of videos uploaded and the consequent drop in productivity, which in many cases asymptotes to no uploads whatsoever. Moreover, uploaders compare themselves to others when having low productivity and to themselves when exceeding a threshold.

Journal ArticleDOI
16 Jan 2008
TL;DR: It is shown that there is in principle a four order of magnitude bandwidth-to-power ratio advantage for the photonic interconnect, which indicates that it could be possible to dramatically improve chip performance without scaling transistors but rather utilize the capability of existing transistors much more efficiently.
Abstract: A significant performance limitation in integrated circuits has become the metal interconnect, which is responsible for depressing the on-chip data bandwidth while consuming an increasing percentage of power. These problems will grow as wire diameters scale down and the resistance-capacitance product of the interconnect wires increases hyperbolically, which threatens to choke off the computational performance increases of chips that we have come to expect over time. We examine some of the quantitative implications of these trends by analyzing the International Technology Roadmap for Semiconductors. We compare the potential of replacing the global electronic interconnect of future chips with a photonic interconnect and see that there is in principle a four order of magnitude bandwidth-to-power ratio advantage for the latter. This indicates that it could be possible to dramatically improve chip performance without scaling transistors but rather utilize the capability of existing transistors much more efficiently. However, at this time it is not clear if these advantages can be realized. We discuss various issues related to the architecture and components necessary to implement on-chip photonic interconnect.

Posted Content
TL;DR: The solution removes the burden of verification from the customer, alleviates both the customer and storage service’s fear of data leakage, and provides a method for independent arbitration of data retention contracts.
Abstract: A growing number of online services, such as Google, Yahoo!, and Amazon, are starting to charge users for their storage Customers often use these services to store valuable data such as email, family photos and videos, and disk backups Today, a customer must entirely trust such external services to maintain the integrity of hosted data and return it intact Unfortunately, no service is infallible To make storage services accountable for data loss, we present protocols that allow a thirdparty auditor to periodically verify the data stored by a service and assist in returning the data intact to the customer Most importantly, our protocols are privacy-preserving, in that they never reveal the data contents to the auditor Our solution removes the burden of verification from the customer, alleviates both the customer’s and storage service’s fear of data leakage, and provides a method for independent arbitration of data retention contracts

Journal ArticleDOI
01 Jun 2008
TL;DR: This study uses CACTI-D to model all components of the memory hierarchy including L1, L2, last level SRAM, logic process based DRAM or commodity DRAM L3 caches, and main memory DRAM chips and finds that commodity DRam technology is most attractive for stacked last level caches, with significantly lower energy-delay products.
Abstract: In this paper we introduce CACTI-D, a significant enhancement of CACTI 5.0. CACTI-D adds support for modeling of commodity DRAM technology and support for main memory DRAM chip organization. CACTI-D enables modeling of the complete memory hierarchy with consistent models all the way from SRAM based L1 caches through main memory DRAMs on DIMMs. We illustrate the potential applicability of CACTI-D in the design and analysis of future memory hierarchies by carrying out a last level cache study for a multicore multithreaded architecture at the 32nm technology node. In this study we use CACTI-D to model all components of the memory hierarchy including L1, L2, last level SRAM, logic process based DRAM or commodity DRAM L3 caches, and main memory DRAM chips. We carry out architectural simulation using benchmarks with large data sets and present results of their execution time, breakdown of power in the memory hierarchy, and system energy-delay product for the different system configurations. We find that commodity DRAM technology is most attractive for stacked last level caches, with significantly lower energy-delay products.

Journal ArticleDOI
TL;DR: A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the “declared” set of friends and followers.
Abstract: Scholars, advertisers and political activists see massive online social networks as a representation of social interactions that can be used to study the propagation of ideas, social bond dynamics and viral marketing, among others. But the linked structures of social networks do not reveal actual interactions among people. Scarcity of attention and the daily rythms of life and work makes people default to interacting with those few that matter and that reciprocate their attention. A study of social interactions within Twitter reveals that the driver of usage is a sparse and hidden network of connections underlying the "declared" set of friends and followers.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: An automated model generation procedure effectively characterizes the different virtualization overheads of two diverse hardware platforms and that the models have median prediction error of less than 5% for both the RUBiS and TPC-W benchmarks.
Abstract: Next Generation Data Centers are transforming labor-intensive, hard-coded systems into shared, virtualized, automated, and fully managed adaptive infrastructures. Virtualization technologies promise great opportunities for reducing energy and hardware costs through server consolidation. However, to safely transition an application running natively on real hardware to a virtualized environment, one needs to estimate the additional resource requirements incurred by virtualization overheads.In this work, we design a general approach for estimating the resource requirements of applications when they are transferred to a virtual environment. Our approach has two key components: a set of microbench-marks to profile the different types of virtualization overhead on a given platform, and a regression-based model that maps the native system usage profile into a virtualized one. This derived model can be used for estimating resource requirements of any application to be virtualized on a given platform. Our approach aims to eliminate error-prone manual processes and presents a fully automated solution. We illustrate the effectiveness of our methodology using Xen virtual machine monitor. Our evaluation shows that our automated model generation procedure effectively characterizes the different virtualization overheads of two diverse hardware platforms and that the models have median prediction error of less than 5% for both the RUBiS and TPC-W benchmarks.

Proceedings Article
22 Jun 2008
TL;DR: This paper reduces execution costs for conventional NICs by 56% on the receive path, and achieves close to direct I/O performance for network devices supporting multiple hardware receive queues, making the Xen driver domain model an attractive solution for I/o virtualization for a wider range of scenarios.
Abstract: The paravirtualized I/O driver domain model, used in Xen, provides severl advantages including device driver isolation in a safe execution environment, support for guest VM transparent services including live migration, and hardware independence for guests. However, these advantages currently come at the cost of high CPU overhead which can lead to low throughput for high bandwidth links such as 10 gigabit Ethernet. Direct I/O has been proposed as the solution to this performance problem but at the cost of removing the benefits of the driver domain model. In this paper we show how to significantly narrow the performance gap by improving the performance of the driver domain model. In particular, we reduce execution costs for conventional NICs by 56% on the receive path, and we achieve close to direct I/O performance for network devices supporting multiple hardware receive queues. These results make the Xen driver domain model an attractive solution for I/O virtualization for a wider range of scenarios.

Proceedings ArticleDOI
11 Jun 2008
TL;DR: Several new bottom-up approaches to problems in role engineering for Role-Based Access Control (RBAC) are described, including fast graph reductions that allow recovery of the solution from the solution to a problem on a smaller input graph and a new polynomial-time approximation.
Abstract: We describe several new bottom-up approaches to problems in role engineering for Role-Based Access Control (RBAC). The salient problems are all NP-complete, even to approximate, yet we find that in instances that arise in practice these problems can be solved in minutes. We first consider role minimization, the process of finding a smallest collection of roles that can be used to implement a pre-existing user-to-permission relation. We introduce fast graph reductions that allow recovery of the solution from the solution to a problem on a smaller input graph. For our test cases, these reductions either solve the problem, or reduce the problem enough that we find the optimum solution with a (worst-case) exponential method. We introduce lower bounds that are sharp for seven of nine test cases and are within 3.4% on the other two. We introduce and test a new polynomial-time approximation that on average yields 2% more roles than the optimum. We next consider the related problem of minimizing the number of connections between roles and users or permissions, and we develop effective heuristic methods for this problem as well. Finally, we propose methods for several related problems.

Journal ArticleDOI
01 Jun 2008
TL;DR: A new solution that incorporates volume non-server-class components in novel packaging solutions, with memory sharing and flash-based disk caching, has promise, with a 2X improvement on average in performance-per-dollar for the benchmark suite.
Abstract: This paper seeks to understand and design next-generation servers for emerging "warehouse-computing" environments. We make two key contributions. First, we put together a detailed evaluation infrastructure including a new benchmark suite for warehouse-computing workloads, and detailed performance, cost, and power models, to quantitatively characterize bottlenecks. Second, we study a new solution that incorporates volume non-server-class components in novel packaging solutions, with memory sharing and flash-based disk caching. Our results show that this approach has promise, with a 2X improvement on average in performance-per-dollar for our benchmark suite.