scispace - formally typeset
Search or ask a question

Showing papers by "Hewlett-Packard published in 2019"


Posted ContentDOI
08 Apr 2019-bioRxiv
TL;DR: KofamKOALA is a web server to assign KEGG Orthologs (KOs) to protein sequences by homology search against a database of profile hidden Markov models (KOfam) with pre-computed adaptive score thresholds.
Abstract: Summary KofamKOALA is a web server to assign KEGG Orthologs (KOs) to protein sequences by homology search against a database of profile hidden Markov models (KOfam) with pre-computed adaptive score thresholds. KofamKOALA is faster than existing KO assignment tools with its accuracy being comparable to the best performing tools. Function annotation by KofamKOALA helps linking genes to KEGG resources such as the KEGG pathway maps and facilitates molecular network reconstruction. Availability KofamKOALA, KofamScan, and KOfam are freely available from https://www.genome.jp/tools/kofamkoala/ Contact ogata@kuicr.kyoto-u.ac.jp

457 citations


Journal ArticleDOI
TL;DR: It is demonstrated experimentally that the synaptic weights shared in different time steps in an LSTM can be implemented with a memristor crossbar array, which has a small circuit footprint, can store a large number of parameters and offers in-memory computing capability that contributes to circumventing the ‘von Neumann bottleneck’.
Abstract: Recent breakthroughs in recurrent deep neural networks with long short-term memory (LSTM) units have led to major advances in artificial intelligence. However, state-of-the-art LSTM models with significantly increased complexity and a large number of parameters have a bottleneck in computing power resulting from both limited memory capacity and limited data communication bandwidth. Here we demonstrate experimentally that the synaptic weights shared in different time steps in an LSTM can be implemented with a memristor crossbar array, which has a small circuit footprint, can store a large number of parameters and offers in-memory computing capability that contributes to circumventing the ‘von Neumann bottleneck’. We illustrate the capability of our crossbar system as a core component in solving real-world problems in regression and classification, which shows that memristor LSTM is a promising low-power and low-latency hardware platform for edge inference. Deep neural networks are increasingly popular in data-intensive applications, but are power-hungry. New types of computer chips that are suited to the task of deep learning, such as memristor arrays where data handling and computing take place within the same unit, are required. A well-used deep learning model called long short-term memory, which can handle temporal sequential data analysis, is now implemented in a memristor crossbar array, promising an energy-efficient and low-footprint deep learning platform.

251 citations


Proceedings ArticleDOI
04 Apr 2019
TL;DR: The Programmable Ultra-efficient Memristor-based Accelerator (PUMA) as mentioned in this paper enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads.
Abstract: Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications. We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads. PUMA's microarchitecture techniques exposed through a specialized Instruction Set Architecture (ISA) retain the efficiency of in-memory computing and analog circuitry, without compromising programmability. We also present the PUMA compiler which translates high-level code to PUMA ISA. The compiler partitions the computational graph and optimizes instruction scheduling and register allocation to generate code for large and complex workloads to run on thousands of spatial cores. We have developed a detailed architecture simulator that incorporates the functionality, timing, and power models of PUMA's components to evaluate performance and energy consumption. A PUMA accelerator running at 1 GHz can reach area and power efficiency of 577 GOPS/s/mm 2 and 837~GOPS/s/W, respectively. Our evaluation of diverse ML applications from image recognition, machine translation, and language modelling (5M-800M synapses) shows that PUMA achieves up to 2,446× energy and 66× latency improvement for inference compared to state-of-the-art GPUs. Compared to an application-specific memristor-based accelerator, PUMA incurs small energy overheads at similar inference latency and added programmability.

228 citations


Journal ArticleDOI
01 Mar 2019
TL;DR: An experimental demonstration of reinforcement learning on a three-layer 1-transistor 1-memristor (1T1R) network using a modified learning algorithm tailored for the authors' hybrid analogue–digital platform, which has the potential to achieve a significant boost in speed and energy efficiency.
Abstract: Reinforcement learning algorithms that use deep neural networks are a promising approach for the development of machines that can acquire knowledge and solve problems without human input or supervision. At present, however, these algorithms are implemented in software running on relatively standard complementary metal–oxide–semiconductor digital platforms, where performance will be constrained by the limits of Moore’s law and von Neumann architecture. Here, we report an experimental demonstration of reinforcement learning on a three-layer 1-transistor 1-memristor (1T1R) network using a modified learning algorithm tailored for our hybrid analogue–digital platform. To illustrate the capabilities of our approach in robust in situ training without the need for a model, we performed two classic control problems: the cart–pole and mountain car simulations. We also show that, compared with conventional digital systems in real-world reinforcement learning tasks, our hybrid analogue–digital computing system has the potential to achieve a significant boost in speed and energy efficiency. A reinforcement learning algorithm can be implemented on a hybrid analogue–digital platform based on memristive arrays for parallel and energy-efficient in situ training.

225 citations


Journal ArticleDOI
14 Nov 2019-Cell
TL;DR: The relative contribution of gene expression changes to be significantly lower in polar than in non-polar waters and it is hypothesized that in polar regions, alterations in community activity in response to ocean warming will be driven more strongly by changes in organismal composition than by gene regulatory mechanisms.

217 citations


Journal ArticleDOI
TL;DR: The present survey investigates and discusses DevOps challenges from the perspective of engineers, managers, and researchers, and develops a DevOps conceptual map, correlating the DevOps automation tools with these concepts.
Abstract: DevOpsis a collaborative and multidisciplinary organizational effort to automate continuous delivery of new software updates while guaranteeing their correctness and reliability. The present survey investigates and discusses DevOps challenges from the perspective of engineers, managers, and researchers. We review the literature and develop a DevOps conceptual map, correlating the DevOps automation tools with these concepts. We then discuss their practical implications for engineers, managers, and researchers. Finally, we critically explore some of the most relevant DevOps challenges reported by the literature.

184 citations


Journal ArticleDOI
13 Jun 2019-Chem
TL;DR: In this paper, the authors show that the use of NH4+ results in battery performance governed by the chemical nature of the ion-electrode interaction, and they show that H bonding between NH4+, and a bi-layered V2O5 electrode is coupled with prominent pseudocapacitive behavior.

164 citations


Journal ArticleDOI
TL;DR: In situ training of a five-level convolutional neural network that self-adapts to non-idealities of the one-transistor one-memristor array to classify the MNIST dataset is experimentally demonstrated, achieving a 75% reduction in weights without compromising accuracy.
Abstract: The explosive growth of machine learning is largely due to the recent advancements in hardware and architecture. The engineering of network structures, taking advantage of the spatial or temporal translational isometry of patterns, naturally leads to bio-inspired, shared-weight structures such as convolutional neural networks, which have markedly reduced the number of free parameters. State-of-the-art microarchitectures commonly rely on weight-sharing techniques, but still suffer from the von Neumann bottleneck of transistor-based platforms. Here, we experimentally demonstrate the in situ training of a five-level convolutional neural network that self-adapts to non-idealities of the one-transistor one-memristor array to classify the MNIST dataset, achieving similar accuracy to the memristor-based multilayer perceptron with a reduction in trainable parameters of ~75% owing to the shared weights. In addition, the memristors encoded both spatial and temporal translational invariance simultaneously in a convolutional long short-term memory network—a memristor-based neural network with intrinsic 3D input processing—which was trained in situ to classify a synthetic MNIST sequence dataset using just 850 weights. These proof-of-principle demonstrations combine the architectural advantages of weight sharing and the area/energy efficiency boost of the memristors, paving the way to future edge artificial intelligence. Memristive devices can provide energy-efficient neural network implementations, but they must be tailored to suit different network architectures. Wang et al. develop a trainable weight-sharing mechanism for memristor-based CNNs and ConvLSTMs, achieving a 75% reduction in weights without compromising accuracy.

155 citations


Journal ArticleDOI
TL;DR: Low-voltage and high-performance digital and analog CNT TFT circuits based on high-yield and ultrahigh purity polymer-sorted semiconducting CNTs and the first tunable-gain amplifier with 1,000 gain at 20 kHz are reported.
Abstract: Carbon nanotube (CNT) thin-film transistor (TFT) is a promising candidate for flexible and wearable electronics. However, it usually suffers from low semiconducting tube purity, low device yield, and the mismatch between p- and n-type TFTs. Here, we report low-voltage and high-performance digital and analog CNT TFT circuits based on high-yield (19.9%) and ultrahigh purity (99.997%) polymer-sorted semiconducting CNTs. Using high-uniformity deposition and pseudo-CMOS design, we demonstrated CNT TFTs with good uniformity and high performance at low operation voltage of 3 V. We tested forty-four 2-µm channel 5-stage ring oscillators on the same flexible substrate (1,056 TFTs). All worked as expected with gate delays of 42.7 ± 13.1 ns. With these high-performance TFTs, we demonstrated 8-stage shift registers running at 50 kHz and the first tunable-gain amplifier with 1,000 gain at 20 kHz. These results show great potentials of using solution-processed CNT TFTs for large-scale flexible electronics. Carbon nanotube thin-film transistor is promising for solution-processed, large-scale flexible electronics, but the device yields remain poor to date. Lei et al. show low-voltage flexible digital and analog circuits based on high-purity and high-yield separation of semiconducting carbon nanotubes.

132 citations


Posted Content
TL;DR: The Programmable Ultra-efficient Memristor-based Accelerator (PUMA) is presented which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads.
Abstract: Memristor crossbars are circuits capable of performing analog matrix-vector multiplications, overcoming the fundamental energy efficiency limitations of digital logic. They have been shown to be effective in special-purpose accelerators for a limited set of neural network applications. We present the Programmable Ultra-efficient Memristor-based Accelerator (PUMA) which enhances memristor crossbars with general purpose execution units to enable the acceleration of a wide variety of Machine Learning (ML) inference workloads. PUMA's microarchitecture techniques exposed through a specialized Instruction Set Architecture (ISA) retain the efficiency of in-memory computing and analog circuitry, without compromising programmability. We also present the PUMA compiler which translates high-level code to PUMA ISA. The compiler partitions the computational graph and optimizes instruction scheduling and register allocation to generate code for large and complex workloads to run on thousands of spatial cores. We have developed a detailed architecture simulator that incorporates the functionality, timing, and power models of PUMA's components to evaluate performance and energy consumption. A PUMA accelerator running at 1 GHz can reach area and power efficiency of $577~GOPS/s/mm^2$ and $837~GOPS/s/W$, respectively. Our evaluation of diverse ML applications from image recognition, machine translation, and language modelling (5M-800M synapses) shows that PUMA achieves up to $2,446\times$ energy and $66\times$ latency improvement for inference compared to state-of-the-art GPUs. Compared to an application-specific memristor-based accelerator, PUMA incurs small energy overheads at similar inference latency and added programmability.

108 citations


Journal ArticleDOI
TL;DR: Ex situ HRTEM and corresponding EDX mapping results suggest that NO3 - insertion de-crystallizes the structure of Mn3 O4 and may open a new direction for novel low-cost aqueous dual-ion batteries.
Abstract: We report reversible electrochemical insertion of NO3 - into manganese(II, III) oxide (Mn3 O4 ) as a cathode for aqueous dual-ion batteries. Characterization by TGA, FTIR, EDX, XANES, EXAFS, and EQCM collectively provides unequivocal evidence that reversible oxidative NO3 - insertion takes place inside Mn3 O4 . Ex situ HRTEM and corresponding EDX mapping results suggest that NO3 - insertion de-crystallizes the structure of Mn3 O4 . Kinetic studies reveal fast migration of NO3 - in the Mn3 O4 structure. This finding may open a new direction for novel low-cost aqueous dual-ion batteries.

Journal ArticleDOI
TL;DR: In this article, the authors present a survey of DevOps challenges from the perspective of engineers, managers, and researchers, and discuss their practical implications for developers, managers and researchers.
Abstract: DevOps is a collaborative and multidisciplinary organizational effort to automate continuous delivery of new software updates while guaranteeing their correctness and reliability. The present survey investigates and discusses DevOps challenges from the perspective of engineers, managers, and researchers. We review the literature and develop a DevOps conceptual map, correlating the DevOps automation tools with these concepts. We then discuss their practical implications for engineers, managers, and researchers. Finally, we critically explore some of the most relevant DevOps challenges reported by the literature.

Journal ArticleDOI
22 Jan 2019-ACS Nano
TL;DR: This work uses water-based and biocompatible graphene and hBN inks to fabricate all-2D material and inkjet-printed capacitors, and demonstrates an areal capacitance of 2.0 ± 0.3 nF cm-2 for a dielectric thickness of ∼3 μm and negligible leakage currents, averaged across more than 100 devices.
Abstract: A well-defined insulating layer is of primary importance in the fabrication of passive (e.g., capacitors) and active (e.g., transistors) components in integrated circuits. One of the most widely known two-dimensional (2D) dielectric materials is hexagonal boron nitride (hBN). Solution-based techniques are cost-effective and allow simple methods to be used for device fabrication. In particular, inkjet printing is a low-cost, noncontact approach, which also allows for device design flexibility, produces no material wastage, and offers compatibility with almost any surface of interest, including flexible substrates. In this work, we use water-based and biocompatible graphene and hBN inks to fabricate all-2D material and inkjet-printed capacitors. We demonstrate an areal capacitance of 2.0 ± 0.3 nF cm–2 for a dielectric thickness of ∼3 μm and negligible leakage currents, averaged across more than 100 devices. This gives rise to a derived dielectric constant of 6.1 ± 1.7. The inkjet printed hBN dielectric has ...

Journal ArticleDOI
TL;DR: It is demonstrated that InP-based quantum well lasers can be grown onto silicon waveguides by using a growth template, and this generic concept can be applied to other material systems to provide higher integration density, more functionalities and lower total cost for photonics as well as microelectronics, MEMS, and many other applications.
Abstract: Silicon photonics is becoming a mainstream data-transmission solution for next-generation data centers, high-performance computers, and many emerging applications. The inefficiency of light emission in silicon still requires the integration of a III/V laser chip or optical gain materials onto a silicon substrate. A number of integration approaches, including flip-chip bonding, molecule or polymer wafer bonding, and monolithic III/V epitaxy, have been extensively explored in the past decade. Here, we demonstrate a novel photonic integration method of epitaxial regrowth of III/V on a III/V-on-SOI bonding template to realize heterogeneous lasers on silicon. This method decouples the correlated root causes, i.e., lattice, thermal, and domain mismatches, which are all responsible for a large number of detrimental dislocations in the heteroepitaxy process. The grown multi-quantum well vertical p-i-n diode laser structure shows a significantly low dislocation density of 9.5 × 104 cm-2, two orders of magnitude lower than the state-of-the-art conventional monolithic growth on Si. This low dislocation density would eliminate defect-induced laser lifetime concerns for practical applications. The fabricated lasers show room-temperature pulsed and continuous-wave lasing at 1.31 μm, with a minimal threshold current density of 813 A/cm2. This generic concept can be applied to other material systems to provide higher integration density, more functionalities and lower total cost for photonics as well as microelectronics, MEMS, and many other applications.

Journal ArticleDOI
TL;DR: A novel fabrication method for flexible gas sensors for toxic gases based on sequential wet chemical reaction using zinc oxide nanowires and palladium nanoparticles is developed, which shows a high sensitivity, fast response, and outstanding selectivity to other toxic gases.
Abstract: We have developed a novel fabrication method for flexible gas sensors for toxic gases based on sequential wet chemical reaction. In specific, zinc oxide (ZnO) nanowires were locally synthesized and...

Journal ArticleDOI
TL;DR: DC-current or voltage-driven periodic spiking with sub-20 ns pulse widths from a single device composed of a thin VO2 film with a metallic carbon nanotube as a nanoscale heater, without using an external capacitor is demonstrated.
Abstract: The recent surge of interest in brain-inspired computing and power-efficient electronics has dramatically bolstered development of computation and communication using neuron-like spiking signals. Devices that can produce rapid and energy-efficient spiking could significantly advance these applications. Here we demonstrate direct current or voltage-driven periodic spiking with sub-20 ns pulse widths from a single device composed of a thin VO2 film with a metallic carbon nanotube as a nanoscale heater, without using an external capacitor. Compared with VO2-only devices, adding the nanotube heater dramatically decreases the transient duration and pulse energy, and increases the spiking frequency, by up to 3 orders of magnitude. This is caused by heating and cooling of the VO2 across its insulator-metal transition being localized to a nanoscale conduction channel in an otherwise bulk medium. This result provides an important component of energy-efficient neuromorphic computing systems and a lithography-free technique for energy-scaling of electronic devices that operate via bulk mechanisms.


Proceedings ArticleDOI
13 May 2019
TL;DR: This paper argues that new data structures for far memory need to be built, borrowing techniques from concurrent data structures and distributed systems, and shows how to realize them using simple hardware extensions.
Abstract: Technologies like RDMA and Gen-Z, which give access to memory outside the box, are gaining in popularity. These technologies provide the abstraction of far memory, where memory is attached to the network and can be accessed by remote processors without mediation by a local processor. Unfortunately, far memory is hard to use because existing data structures are mismatched to it. We argue that we need new data structures for far memory, borrowing techniques from concurrent data structures and distributed systems. We examine the requirements of these data structures and show how to realize them using simple hardware extensions.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: A new framework to accurately detect the abnormalities and automatically generate medical reports is presented, based on hierarchical recurrent neural network (HRNN), and a topic matching mechanism is introduced to HRNN, so as to make generated reports more accurate and diverse.
Abstract: Medical images are widely used in the medical domain for the diagnosis and treatment of diseases. Reading a medical image and summarizing its insights is a routine, yet nonetheless time-consuming task, which often represents a bottleneck in the clinical diagnosis process. Automatic report generation can relieve the issues. However, generating medical reports presents two major challenges: (i) it is hard to accurately detect all the abnormalities simultaneously, especially the rare diseases; (ii) a medical image report consists of many paragraphs and sentences, which are longer than natural image captions. We present a new framework to accurately detect the abnormalities and automatically generate medical reports. The report generation model is based on hierarchical recurrent neural network (HRNN). We introduce a topic matching mechanism to HRNN, so as to make generated reports more accurate and diverse. The soft attention mechanism is also introduced to HRNN model. Experimental results on two image-paragraph pair datasets show that our framework outperforms all the state-of-art methods.

Journal ArticleDOI
TL;DR: The status in the understanding of the most common redox-based memristive devices is presented and a rational design of the materials stacks will be required, enabling nanoscale control over the ionic dynamics that gives these devices their variety of capabilities.
Abstract: Memristive devices have been a hot topic in nanoelectronics for the last two decades in both academia and industry. Originally proposed as digital (binary) nonvolatile random access memories, research in this field was predominantly driven by the search for higher performance solid-state drive technologies (e.g., flash replacement) or higher density memories (storage class memory). However, based on their large dynamic range in resistance with analog-tunability along with complex switching dynamics, memristive devices enable revolutionary novel functions and computing paradigms. We present the prospects, opportunities, and materials challenges of memristive devices in computing applications, both near and far terms. Memristive devices offer at least three main types of novel computing applications: in-memory computing, analog computing, and state dynamics. We will present the status in the understanding of the most common redox-based memristive devices while addressing the challenges that materials research will need to tackle in the future. In order to pave the way toward novel computing paradigms, a rational design of the materials stacks will be required, enabling nanoscale control over the ionic dynamics that gives these devices their variety of capabilities.

Proceedings ArticleDOI
11 Oct 2019
TL;DR: GRACE is presented, a DNN-aware compression algorithm that facilitates the edge inference by significantly saving the network bandwidth consumption without disturbing the inference performance and achieves the superior compression performance over existing strategies for key DNN applications.
Abstract: IoT and deep learning based computer vision together create an immense market opportunity, but running deep neural networks (DNNs) on resource-constrained IoT devices remains challenging. Offloading DNN inference to an edge server is a promising solution, but limited wireless bandwidth bottlenecks its end-to-end performance and scalability. While IoT devices can adopt source compression to cope with the limited bandwidth, existing compression algorithms (or codecs) are not designed for DNN (but for human eyes), and thus, suffer from either low compression rates or high DNN inference errors. This paper presents GRACE, a DNN-aware compression algorithm that facilitates the edge inference by significantly saving the network bandwidth consumption without disturbing the inference performance. Given a target DNN, GRACE (i) analyzes DNN's perception model w.r.t both spatial frequencies and colors and (ii) generates an optimized compression strategy for the model -- one-time offline process. Next, GRACE deploys thus-generated compression strategy at IoT devices (or source) to perform online source compression within the existing codec framework, adding no extra overhead. We prototype GRACE on JPEG (the most popular image codec framework), and our evaluation results show that GRACE indeed achieves the superior compression performance over existing strategies for key DNN applications. For semantic segmentation tasks, GRACE reduces a source size by 23% compared to JPEG with similar interference accuracy (0.38% lower than GRACE). Further, GRACE even achieves 7.5% higher inference accuracy than JPEG with a commonly used quality level of 75 does. For classification tasks, GRACE reduces the bandwidth consumption by 90% over JPEG with the same inference accuracy.

Journal ArticleDOI
20 Jun 2019
TL;DR: A waveguide-coupled silicon-germanium avalanche photodiode (APD) detector with three electric terminals was demonstrated with breakdown voltage of −6'V, bandwidth of 18.9'GHz, DC photocurrent gain of 15, open-eye diagram at a data rate of 35'G/s, and sensitivity of −11.4'dBm.
Abstract: A CMOS-compatible avalanche photodiode (APD) with high speed and high sensitivity is a critical component of a low-cost, high-data-rate, and energy-efficient optical communication link. A novel waveguide-coupled silicon–germanium APD detector with three electric terminals was demonstrated with breakdown voltage of −6 V, bandwidth of 18.9 GHz, DC photocurrent gain of 15, open-eye diagram at a data rate of 35 Gb/s, and sensitivity of −11.4 dBm at a data rate of 25 Gb/s. This three-terminal APD allows high-yield fabrication in the standard CMOS process and provides robust high-sensitivity operation under small voltage supply.

Journal ArticleDOI
TL;DR: In this paper, the analog content-addressable-memory (CA-MAM) concept and circuit is proposed to overcome the limitations of traditional content-addressed memory by utilizing the analog conductance tunability of memristors.
Abstract: A content-addressable-memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it suffers from large area, cost and power consumption, limiting its use. Past improvements have been realized by using memristors to replace the static-random-access-memory cell in conventional designs, but employ similar schemes based only on binary or ternary states for storage and search. We propose a new analog content-addressable-memory concept and circuit to overcome these limitations by utilizing the analog conductance tunability of memristors. Our analog content-addressable-memory stores data within the programmable conductance and can take as input either analog or digital search values. Experimental demonstrations, scaled simulations and analysis show that our analog content-addressable-memory can reduce area and power consumption, which enables the acceleration of existing applications, but also new computing application areas.

Journal ArticleDOI
TL;DR: This work devised a novel solution “multi-criteria active leep learning” (MCADL) to learn an active learning strategy for deep neural networks in image classification and demonstrates that the proposed method consistently outperforms highly competitive active learning approaches.
Abstract: As a robust and heuristic technique in machine learning, active learning has been established as an effective method for addressing large volumes of unlabeled data; it interactively queries users (or certain information sources) to obtain desired outputs at new data points. With regard to deep learning techniques (e.g., CNN) and their applications (e.g., image classification), labeling work is of great significance as training processes for obtaining parameters in neural networks which requires abundant labeled samples. Although a few active learning algorithms have been proposed for devising certain straightforward sampling strategies (e.g., density, similarity, uncertainty, and label-based measure) for deep learning algorithms, these employ onefold sampling strategies and do not consider the relationship among multiple sampling strategies. To this end, we devised a novel solution “multi-criteria active leep learning”(MCADL) to learn an active learning strategy for deep neural networks in image classification. Our sample selection strategy selects informative samples by considering multiple criteria simultaneously (i.e., density, similarity, uncertainty, and label-based measure). Moreover, our approach is capable of adjusting weights adaptively to fuse the results from multiple criteria effectively by exploring the utilities of the criteria at different training stages. Through extensive experiments on two popular image datasets (i.e., MNIST and CIFAR-10), we demonstrate that our proposed method consistently outperforms highly competitive active learning approaches; thereby, it can be verified that our multi-criteria active learning proposal is rational and our solution is effective.

Journal ArticleDOI
TL;DR: The analysis of GDPR from a systems perspective reveals the phenomenon of metadata explosion, wherein large quantities of metadata needs to be stored along with the personal data to satisfy the GDPR requirements.
Abstract: The General Data Protection Regulation (GDPR) provides new rights and protections to European people concerning their personal data. We analyze GDPR from a systems perspective, translating its legal articles into a set of capabilities and characteristics that compliant systems must support. Our analysis reveals the phenomenon of metadata explosion, wherein large quantities of metadata needs to be stored along with the personal data to satisfy the GDPR requirements. Our analysis also helps us identify new workloads that must be supported under GDPR. We design and implement an open-source benchmark called GDPRbench that consists of workloads and metrics needed to understand and assess personal-data processing database systems. To gauge the readiness of modern database systems for GDPR, we follow best practices and developer recommendations to modify Redis, PostgreSQL, and a commercial database system to be GDPR compliant. Our experiments demonstrate that the resulting GDPR compliant systems achieve poor performance on GPDR workloads, and that performance scales poorly as the volume of personal data increases. We discuss the real-world implications of these findings, and identify research challenges towards making GDPR compliance efficient in production environments. We release all of our software artifacts and datasets at this http URL

Proceedings ArticleDOI
11 Oct 2019
TL;DR: A large-scale measurement campaign on an operational mobile video telephony service is conducted, showing that the application-layer video codec and transport-layer protocols remain highly uncoordinated, which represents one major reason for the low QoE.
Abstract: Despite the pervasive use of real-time video telephony services, the users' quality of experience (QoE) remains unsatisfactory, especially over the mobile Internet. Previous work studied the problem via controlled experiments, while a systematic and in-depth investigation in the wild is still missing. To bridge the gap, we conduct a large-scale measurement campaign on \appname, an operational mobile video telephony service. Our measurement logs fine-grained performance metrics over 1 million video call sessions. Our analysis shows that the application-layer video codec and transport-layer protocols remain highly uncoordinated, which represents one major reason for the low QoE. We thus propose ame, a machine learning based framework to resolve the issue. Instead of blindly following the transport layer's estimation of network capacity, ame reviews historical logs of both layers, and extracts high-level features of codec/network dynamics, based on which it determines the highest bitrates for forthcoming video frames without incurring congestion. To attain the ability, we train ame with the aforementioned massive data traces using a custom-designed imitation learning algorithm, which enables ame to learn from past experience. We have implemented and incorporated ame into \appname. Our experiments show that ame outperforms state-of-the-art solutions, improving video quality while reducing stalling time by multi-folds under various practical scenarios.

Journal ArticleDOI
TL;DR: The reversible insertion of a large molecular dication, methyl viologen, into the crystal structure of an aromatic solid electrode, 3,4,9,10-perylenetetracarboxylic dianhydride, is reported, the largest insertion charge carrier when non-solvated ever reported for batteries.
Abstract: The interactions between charge carriers and electrode structures represent one of the most important considerations in the search for new energy storage devices. Currently, ionic bonding dominates the battery chemistry. Here we report the reversible insertion of a large molecular dication, methyl viologen, into the crystal structure of an aromatic solid electrode, 3,4,9,10-perylenetetracarboxylic dianhydride. This is the largest insertion charge carrier when non-solvated ever reported for batteries; surprisingly, the kinetic properties of the (de)insertion of methyl viologen are excellent with 60% of capacity retained when the current rate is increased from 100 mA g-1 to 2000 mA g-1. Characterization reveals that the insertion of methyl viologen causes phase transformation of the organic host, and embodies guest-host chemical bonding. First-principles density functional theory calculations suggest strong guest-host interaction beyond the pure ionic bonding, where a large extent of covalency may exist. This study extends the boundary of battery chemistry to large molecular ions as charge carriers and also highlights the electrochemical assembly of a supramolecular system.

Journal ArticleDOI
TL;DR: This work proposes memristor-based TCAMs (Ternary Content Addressable Memory) circuits to accelerate Regular Expression (RegEx) matching through in memory processing of finite automata, demonstrating a promising path to wire-speed RegEx matching on large scale rulesets.
Abstract: We propose memristor-based TCAMs (Ternary Content Addressable Memory) circuits to accelerate Regular Expression (RegEx) matching through in memory processing of finite automata. RegEx matching is a key function in network security to find malicious actors. However, RegEx matching latency and power can be incredibly high and current proposals are challenged to perform wire-speed matching for large rulesets. Our approach dramatically decreases operating power, enables high throughput, and the use of nanoscale memristor TCAM circuits (mTCAMs) enables compression techniques to expand rulesets. We fabricated and demonstrated nanoscale memristor TCAM cells. SPICE simulations investigate performance at scale and a mTCAM dynamic power model using 16 nm layout parameters demonstrates ~0.2 fJ/bit/search energy for a 36 × 250 mTCAM array. A tiled architecture is proposed to implement a Snort ruleset and assess application performance. Compared to a state-of-the-art FPGA approach (2 Gbps, ~1 W), we show ×4 throughput (8 Gbps) at 55% the power (0.55 W) without standard TCAM power-saving techniques. Our performance comparison improves further when striding (searching multiple characters at once) is considered, resulting in 47.2 Gbps at 1.2 W for our approach compared to 3.9 Gbps at 630 mW for strided FPGA NFA, demonstrating a promising path to wire-speed RegEx matching on large scale rulesets.

Journal ArticleDOI
20 Oct 2019
TL;DR: In this paper, the first O-band InAs quantum dot (QD) waveguide photodiode (PD) heterogeneously integrated on silicon is reported. And the authors demonstrate a device sensitivity of −11 dBm at 10Gb/s and open-eye diagrams up to 12.5 dBm.
Abstract: Silicon photonics provides a promising platform for energy-efficient interconnects within supercomputers and data centers. However, developing a complementary metal–oxide–semiconductor compatible high-speed photodetector with low dark current has long presented a challenge in the field. In this paper, we report the first O-band InAs quantum dot (QD) waveguide photodiode (PD) heterogeneously integrated on silicon. Record low dark currents as low as 0.01 nA, responsivities of 0.34 A/W at 1310 nm and 0.9 A/W at 1280 nm, and a record high 3 dB bandwidth of 15 GHz was measured. Avalanche gain was observed and a maximum gain of up to 45 and a gain bandwidth product (GBP) of 240 GHz were achieved, which are also record high results for any QD avalanche photodetector (APD) on silicon. Additionally, we demonstrate a device sensitivity of −11 dBm at 10 Gb/s and open-eye diagrams up to 12.5 Gb/s. These QD-based PDs are able to operate as p-i-n PDs or APDs under different bias conditions and offer a promising alternative to heterogeneous InGaAs-on-silicon and SiGe counterparts in low-power optical communication links. They also leverage the same epitaxial layers and processing steps as heterogeneously integrated QD lasers, significantly simplifying the processing and reducing the cost of a fully integrated QD transceiver on silicon.

Proceedings ArticleDOI
05 Aug 2019
TL;DR: Wang et al. as discussed by the authors conducted a large-scale active-passive measurement study of TCP performance over LTE on HSR, and quantitatively studied the impact of frequent cellular handover on HRS networking performance, and conduct in-depth examination of TCP CUBIC and TCP BBR.
Abstract: High-speed rail (HSR) systems potentially provide a more efficient way of door-to-door transportation than airplane. However, they also pose unprecedented challenges in delivering seamless Internet service for on-board passengers. In this paper, we conduct a large-scale active-passive measurement study of TCP performance over LTE on HSR. Our measurement targets the HSR routes in China operating at above 300 km/h. We performed extensive data collection through both controlled setting and passive monitoring, obtaining 1732.9 GB data collected over 135719 km of trips. Leveraging such a unique dataset, we measure important performance metrics such as TCP goodput, latency, loss rate, as well as key characteristics of TCP flows, application breakdown, and users' behaviors. We further quantitatively study the impact of frequent cellular handover on HSR networking performance, and conduct in-depth examination of the performance of two widely deployed transport-layer protocols: TCP CUBIC and TCP BBR. Our findings reveal the performance of today's commercial HSR networks "in the wild'', as well as identify several performance inefficiencies, which motivate us to design a simple yet effective congestion control algorithm based on BBR to further boost the throughput by up to 36.5%. They together highlight the need to develop dedicated protocol mechanisms that are friendly to extreme mobility.