scispace - formally typeset
Search or ask a question

Showing papers by "Hewlett-Packard published in 2020"


Journal ArticleDOI
TL;DR: KofamKOALA is a web server to assign KEGG Orthologs (KOs) to protein sequences by homology search against a database of profile hidden Markov models (KOfam) with pre-computed adaptive score thresholds.
Abstract: SUMMARY KofamKOALA is a web server to assign KEGG Orthologs (KOs) to protein sequences by homology search against a database of profile hidden Markov models (KOfam) with pre-computed adaptive score thresholds. KofamKOALA is faster than existing KO assignment tools with its accuracy being comparable to the best performing tools. Function annotation by KofamKOALA helps linking genes to KEGG resources such as the KEGG pathway maps and facilitates molecular network reconstruction. AVAILABILITY AND IMPLEMENTATION KofamKOALA, KofamScan and KOfam are freely available from GenomeNet (https://www.genome.jp/tools/kofamkoala/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

607 citations


Journal ArticleDOI
TL;DR: In this article, 3D printing enables on-demand solutions for a wide spectrum of needs ranging from personal protection equipment to medical devices and isolation wards and is suited to address supply-demand imbalances caused by socio-economic trends and disruptions in supply chains.
Abstract: 3D printing enables on-demand solutions for a wide spectrum of needs ranging from personal protection equipment to medical devices and isolation wards. This versatile technology is suited to address supply–demand imbalances caused by socio-economic trends and disruptions in supply chains.

181 citations


Journal ArticleDOI
01 Jul 2020
TL;DR: A memristor-based annealing system that uses an analogue neuromorphic architecture based on a Hopfield neural network can solve non-deterministic polynomial-time (NP)-hard max-cut problems in an approach that is potentially more efficient than current quantum, optical and digital approaches.
Abstract: To tackle important combinatorial optimization problems, a variety of annealing-inspired computing accelerators, based on several different technology platforms, have been proposed, including quantum-, optical- and electronics-based approaches. However, to be of use in industrial applications, further improvements in speed and energy efficiency are necessary. Here, we report a memristor-based annealing system that uses an energy-efficient neuromorphic architecture based on a Hopfield neural network. Our analogue–digital computing approach creates an optimization solver in which massively parallel operations are performed in a dense crossbar array that can inject the needed computational noise through the analogue array and device errors, amplified or dampened by using a novel feedback algorithm. We experimentally show that the approach can solve non-deterministic polynomial-time (NP)-hard max-cut problems by harnessing the intrinsic hardware noise. We also use experimentally grounded simulations to explore scalability with problem size, which suggest that our memristor-based approach can offer a solution throughput over four orders of magnitude higher per power consumption relative to current quantum, optical and fully digital approaches. A memristor-based annealing system that uses an analogue neuromorphic architecture based on a Hopfield neural network can solve non-deterministic polynomial (NP)-hard max-cut problems in an approach that is potentially more efficient than current quantum, optical and digital approaches.

174 citations


Journal ArticleDOI
23 Sep 2020-Nature
TL;DR: This work shows how multiple electrophysical processes-including Mott transition dynamics-form a nanoscale third-order circuit element, and demonstrates simple transistorless networks of third- order elements that perform Boolean operations and find analogue solutions to a computationally hard graph-partitioning problem.
Abstract: Current hardware approaches to biomimetic or neuromorphic artificial intelligence rely on elaborate transistor circuits to simulate biological functions. However, these can instead be more faithfully emulated by higher-order circuit elements that naturally express neuromorphic nonlinear dynamics1-4. Generating neuromorphic action potentials in a circuit element theoretically requires a minimum of third-order complexity (for example, three dynamical electrophysical processes)5, but there have been few examples of second-order neuromorphic elements, and no previous demonstration of any isolated third-order element6-8. Using both experiments and modelling, here we show how multiple electrophysical processes-including Mott transition dynamics-form a nanoscale third-order circuit element. We demonstrate simple transistorless networks of third-order elements that perform Boolean operations and find analogue solutions to a computationally hard graph-partitioning problem. This work paves a way towards very compact and densely functional neuromorphic computing primitives, and energy-efficient validation of neuroscientific models.

163 citations



Journal ArticleDOI
TL;DR: This review points to the important primitives of a brain-inspired computer that could drive another decade-long wave of computer engineering.
Abstract: Computers have undergone tremendous improvements in performance over the last 60 years, but those improvements have significantly slowed down over the last decade, owing to fundamental limits in the underlying computing primitives. However, the generation of data and demand for computing are increasing exponentially with time. Thus, there is a critical need to invent new computing primitives, both hardware and algorithms, to keep up with the computing demands. The brain is a natural computer that outperforms our best computers in solving certain problems, such as instantly identifying faces or understanding natural language. This realization has led to a flurry of research into neuromorphic or brain-inspired computing that has shown promise for enhanced computing capabilities. This review points to the important primitives of a brain-inspired computer that could drive another decade-long wave of computer engineering.

113 citations


Journal ArticleDOI
TL;DR: A neural multimodal cooperative learning model is presented to split the consistent component and the complementary component by a novel relation-aware attention mechanism and outperforms the state-of-the-art methods on a real-world micro-video dataset.
Abstract: The prevailing characteristics of micro-videos result in the less descriptive power of each modality. The micro-video representations, several pioneer efforts proposed, are limited in implicitly exploring the consistency between different modality information but ignore the complementarity. In this paper, we focus on how to explicitly separate the consistent features and the complementary features from the mixed information and harness their combination to improve the expressiveness of each modality. Toward this end, we present a neural multimodal cooperative learning (NMCL) model to split the consistent component and the complementary component by a novel relation-aware attention mechanism. Specifically, the computed attention score can be used to measure the correlation between the features extracted from different modalities. Then, a threshold is learned for each modality to distinguish the consistent and complementary features according to the score. Thereafter, we integrate the consistent parts to enhance the representations and supplement the complementary ones to reinforce the information in each modality. As to the problem of redundant information, which may cause overfitting and is hard to distinguish, we devise an attention network to dynamically capture the features which closely related the category and output a discriminative representation for prediction. The experimental results on a real-world micro-video dataset show that the NMCL outperforms the state-of-the-art methods. Further studies verify the effectiveness and cooperative effects brought by the attentive mechanism.

107 citations


Journal ArticleDOI
TL;DR: In this paper, the analog content-addressable memory (CA-MAM) concept and circuit is proposed to reduce the area and power consumption by utilizing the analog conductance tunability of memristors.
Abstract: A content-addressable memory compares an input search word against all rows of stored words in an array in a highly parallel manner. While supplying a very powerful functionality for many applications in pattern matching and search, it suffers from large area, cost and power consumption, limiting its use. Past improvements have been realized by using memristors to replace the static random-access memory cell in conventional designs, but employ similar schemes based only on binary or ternary states for storage and search. We propose a new analog content-addressable memory concept and circuit to overcome these limitations by utilizing the analog conductance tunability of memristors. Our analog content-addressable memory stores data within the programmable conductance and can take as input either analog or digital search values. Experimental demonstrations, scaled simulations and analysis show that our analog content-addressable memory can reduce area and power consumption, which enables the acceleration of existing applications, but also new computing application areas. Designing low power and high performance content-addressable memory remains a challenge. Here, the authors demonstrate a content-addressable memory concept and circuit which leverages the analog conductance tunability of memristors, reduces power consumption, and enables new functionalities and applications.

69 citations


Journal ArticleDOI
TL;DR: PANTHER, an ISA-programmable training accelerator with compiler support, is developed and can be integrated into other accelerators in the literature to enhance their efficiency.
Abstract: The wide adoption of deep neural networks has been accompanied by ever-increasing energy and performance demands due to the expensive nature of training them. Numerous special-purpose architectures have been proposed to accelerate training: both digital and hybrid digital-analog using resistive RAM (ReRAM) crossbars. ReRAM-based accelerators have demonstrated the effectiveness of ReRAM crossbars at performing matrix-vector multiplication operations that are prevalent in training. However, they still suffer from inefficiency due to the use of serial reads and writes for performing the weight gradient and update step. A few works have demonstrated the possibility of performing outer products in crossbars, which can be used to realize the weight gradient and update step without the use of serial reads and writes. However, these works have been limited to low precision operations which are not sufficient for typical training workloads. Moreover, they have been confined to a limited set of training algorithms for fully-connected layers only. To address these limitations, we propose a bit-slicing technique for enhancing the precision of ReRAM-based outer products, which is substantially different from bit-slicing for matrix-vector multiplication only. We incorporate this technique into a crossbar architecture with three variants catered to different training algorithms. To evaluate our design on different types of layers in neural networks (fully-connected, convolutional, etc.) and training algorithms, we develop PANTHER, an ISA-programmable training accelerator with compiler support. Our design can also be integrated into other accelerators in the literature to enhance their efficiency. Our evaluation shows that PANTHER achieves up to 8.02×, 54.21×, and 103× energy reductions as well as 7.16×, 4.02×, and 16× execution time reductions compared to digital accelerators, ReRAM-based accelerators, and GPUs, respectively.

63 citations


Journal ArticleDOI
TL;DR: The first experimental demonstration of two computational models in memristor TCAM arrays is reported: regular expression matching in a finite state machine for network security intrusion detection and definable inexact pattern matching in an Levenshtein automata for genomic sequencing.
Abstract: The dramatic rise of data-intensive workloads has revived application-specific computational hardware for continuing speed and power improvements, frequently achieved by limiting data movement and implementing "in-memory computation". However, conventional complementary metal oxide semiconductor (CMOS) circuit designs can still suffer low power efficiency, motivating designs leveraging nonvolatile resistive random access memory (ReRAM), and with many studies focusing on crossbar circuit architectures. Another circuit primitive-content addressable memory (CAM)-shows great promise for mapping a diverse range of computational models for in-memory computation, with recent ReRAM-CAM designs proposed but few experimentally demonstrated. Here, programming and control of memristors across an 86 × 12 memristor ternary CAM (TCAM) array integrated with CMOS are demonstrated, and parameter tradeoffs for optimizing speed and search margin are evaluated. In addition to smaller area, this memristor TCAM results in significantly lower power due to very low programmable conductance states, motivating CAM use in a wider range of computational applications than conventional TCAMs are confined to today. Finally, the first experimental demonstration of two computational models in memristor TCAM arrays is reported: regular expression matching in a finite state machine for network security intrusion detection and definable inexact pattern matching in a Levenshtein automata for genomic sequencing.

56 citations


Journal ArticleDOI
TL;DR: Ru is studied as a new type of mobile species for memristors to achieve low switching current, fast speed, good reliability, scalability, and analog switching property simultaneously.
Abstract: The switching parameters and device performance of memristors are predominately determined by their mobile species and matrix materials. Devices with oxygen or oxygen vacancies as the mobile species usually exhibit a great retention but also need a relatively high switching current (e.g., >30 µA), while devices with Ag or Cu as cation mobile species do not require a high switching current but usually show a poor retention. Here, Ru is studied as a new type of mobile species for memristors to achieve low switching current, fast speed, good reliability, scalability, and analog switching property simultaneously. An electrochemical metallization-like memristor with a stack of Pt/Ta2 O5 /Ru is developed. Migration of Ru ions is revealed by energy-dispersive X-ray spectroscopy mapping and in situ transmission electron microscopy within a sub-10 nm active device area before and after switching. The results open up a new avenue to engineer memristors for desired properties.

Journal ArticleDOI
TL;DR: The proposed G-CNN employs a detector with two branches to predict the locations and categories of objects, respectively, as well as an inter-class loss to help detectors learn discrepant information among categories so that the learned detectors could better differentiate similar objects of different categories.

Journal ArticleDOI
01 Mar 2020
TL;DR: GDPRbench as mentioned in this paper is an open-source benchmark that consists of workloads and metrics needed to understand and assess personal-data processing database systems, as well as identify new workloads that must be supported under GDPR.
Abstract: The General Data Protection Regulation (GDPR) provides new rights and protections to European people concerning their personal data. We analyze GDPR from a systems perspective, translating its legal articles into a set of capabilities and characteristics that compliant systems must support. Our analysis reveals the phenomenon of metadata explosion, wherein large quantities of metadata needs to be stored along with the personal data to satisfy the GDPR requirements. Our analysis also helps us identify new workloads that must be supported under GDPR. We design and implement an open-source benchmark called GDPRbench that consists of workloads and metrics needed to understand and assess personal-data processing database systems. To gauge the readiness of modern database systems for GDPR, we follow best practices and developer recommendations to modify Redis, PostgreSQL, and a commercial database system to be GDPR compliant. Our experiments demonstrate that the resulting GDPR-compliant systems achieve poor performance on GPDR workloads, and that performance scales poorly as the volume of personal data increases. We discuss the real-world implications of these .ndings, and identify research challenges towards making GDPR-compliance efficient in production environments. We release all of our so.ware artifacts and datasets at h.p://www:gdprbench:org

Journal ArticleDOI
TL;DR: The label generation, attached to the textual attention mechanism, and the image caption generation, have been merged to form an end-to-end trainable framework to tackle the challenges of the lack of image information and the deviation from the core content of the image.
Abstract: As a crossing domain of computer vision and natural language processing, the image caption generation has been an active research topic in recent years, which contributes to the multimodal social media translation from unstructured image data to structured text data. The conventional research works have proposed a series of image captioning methods, such as template-based, retrieval-based, encode-decode. Among these methods, the one with encode-decode framework is widely used in the image caption generation, in which the encoder extracts the image features by Convolutional Neural Network (CNN), and the decoder adopts Recurrent Neural Network (RNN) to generate the image description. The Neural Image Caption (NIC) model has achieved good performance in image captioning, and however, there still remains some challenges to be addressed. To tackle the challenges of the lack of image information and the deviation from the core content of the image, our proposed model explores visual attention to deepen the understanding of the image, incorporating the image labels generated by Fully Convolutional Network (FCN) into the generation of image caption. Furthermore, our proposed model exploits textual attention to increase the integrity of the information. Finally, the label generation, attached to the textual attention mechanism, and the image caption generation, have been merged to form an end-to-end trainable framework. In this paper, extensive experiments have been carried out on the AIC-ICC image caption benchmark dataset, and the experimental results show that our proposed model is effective and feasible in the image caption generation.

Journal ArticleDOI
TL;DR: Li et al. as discussed by the authors employed 1,1,2,2-tetrafluoroethyl 2,2.2-trifluorethyl ether as a co-solvent in the electrolyte of Li-S batteries to meet the demands.

Journal ArticleDOI
TL;DR: In this article, a waveguide Si-Ge APD with low breakdown voltage of −10V, achieving 60-Gb/s PAM4 successfully, was demonstrated and compared to a PIN PD receiver.
Abstract: Silicon-germanium (Si-Ge) avalanche photodiodes (APDs) have large gain bandwidth product (GBP) and low excess noise due to the low impact ionization coefficient ratio of silicon. Optical receivers using APDs are able to achieve high-speed and energy efficient optical transceiver systems. We demonstrate a waveguide Si-Ge APD with low breakdown voltage of −10 V, achieving 60 Gb/s PAM4 successfully. A compact APD circuit model was constructed to allow photonic devices and transceiver circuitry co-design. The APD receiver has achieved −16 dBm sensitivity at 50 Gb/s PAM4 with a bit error rate (BER) of 2.4 $\times \,10^{-4}$ . The sensitivity of APD receivers changes with the multiplication gain. In our analysis, compared to a PIN PD receiver the APD receiver operating at optimum gain can obtain $\sim$ 8 dB more sensitivity for NRZ signaling at $ 50 Gb/s and 3–4 dB more sensitivity for PAM4 signaling at 50–100 Gb/s. Also, the APD receiver operating at optimum gain can reduce power consumption by $\sim\!\text{10}\%$ at PAM4 data rates of 50 Gb/s and $\sim\!\text{15}\%$ at 100 Gb/s in a silicon carrier-depletion microring modulator based WDM photonic link.

Posted Content
TL;DR: Slingshot is an interconnection network for large scale computing systems based on high-radix switches that provides efficient adaptive routing and congestion control algorithms, and highly tunable traffic classes, and it is found that applications running on Slingshot are less affected by congestion compared to previous generation networks.
Abstract: The interconnect is one of the most critical components in large scale computing systems, and its impact on the performance of applications is going to increase with the system size In this paper, we will describe Slingshot, an interconnection network for large scale computing systems Slingshot is based on high-radix switches, which allow building exascale and hyperscale datacenters networks with at most three switch-to-switch hops Moreover, Slingshot provides efficient adaptive routing and congestion control algorithms, and highly tunable traffic classes Slingshot uses an optimized Ethernet protocol, which allows it to be interoperable with standard Ethernet devices while providing high performance to HPC applications We analyze the extent to which Slingshot provides these features, evaluating it on microbenchmarks and on several applications from the datacenter and AI worlds, as well as on HPC applications We find that applications running on Slingshot are less affected by congestion compared to previous generation networks

Posted Content
TL;DR: This paper presented ShiftAddNet, whose main inspiration is drawn from a common practice in energy-efficient hardware implementation, that is, multiplication can be instead performed with additions and logical bit-shifts, yielding a new type of deep network that involves only bit-shift and additive weight layers.
Abstract: Multiplication (e.g., convolution) is arguably a cornerstone of modern deep neural networks (DNNs). However, intensive multiplications cause expensive resource costs that challenge DNNs' deployment on resource-constrained edge devices, driving several attempts for multiplication-less deep networks. This paper presented ShiftAddNet, whose main inspiration is drawn from a common practice in energy-efficient hardware implementation, that is, multiplication can be instead performed with additions and logical bit-shifts. We leverage this idea to explicitly parameterize deep networks in this way, yielding a new type of deep network that involves only bit-shift and additive weight layers. This hardware-inspired ShiftAddNet immediately leads to both energy-efficient inference and training, without compromising the expressive capacity compared to standard DNNs. The two complementary operation types (bit-shift and add) additionally enable finer-grained control of the model's learning capacity, leading to more flexible trade-off between accuracy and (training) efficiency, as well as improved robustness to quantization and pruning. We conduct extensive experiments and ablation studies, all backed up by our FPGA-based ShiftAddNet implementation and energy measurements. Compared to existing DNNs or other multiplication-less models, ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies. Codes and pre-trained models are available at this https URL.

Proceedings ArticleDOI
20 Aug 2020
TL;DR: SLINGSHOT as mentioned in this paper is an interconnection network for large scale computing systems based on high-radix switches, which allows building exascale and hyper-scale datacenters networks with at most three switch-to-switch hops.
Abstract: The interconnect is one of the most critical components in large scale computing systems, and its impact on the performance of applications is going to increase with the system size. In this paper, we will describe SLINGSHOT, an interconnection network for large scale computing systems. SLINGSHOT is based on high-radix switches, which allow building exascale and hyper-scale datacenters networks with at most three switch-to-switch hops. Moreover, SLINGSHOT provides efficient adaptive routing and congestion control algorithms, and highly tunable traffic classes. SLINGSHOT uses an optimized Ethernet protocol, which allows it to be interoperable with standard Ethernet devices while providing high performance to HPC applications. We analyze the extent to which SLINGSHOT provides these features, evaluating it on microbenchmarks and on several applications from the datacenter and AI worlds, as well as on HPC applications. We find that applications running on SLINGSHOT are less affected by congestion compared to previous generation networks.

Journal ArticleDOI
TL;DR: In this paper, the authors presented widely tunable quantum-dot lasers heterogeneously integrated on silicon-on-insulator substrate, and the tuning mechanism is based on Vernier dual-ring geometry, and a 47nm tuning range with 52dB side-mode suppression ratio is observed.
Abstract: Heterogeneously integrated lasers in the O-band are a key component in realizing low-power optical interconnects for data centers and high-performance computing. Quantum-dot-based materials have been particularly appealing for light generation due to their ultralow lasing thresholds, small linewidth enhancement factor, and low sensitivity to reflections. Here, we present widely tunable quantum-dot lasers heterogeneously integrated on silicon-on-insulator substrate. The tuning mechanism is based on Vernier dual-ring geometry, and a 47 nm tuning range with 52 dB side-mode suppression ratio is observed. These parameters show an increase to 52 nm and 58 dB, respectively, when an additional wavelength filter in the form of a Mach–Zehnder interferometer is added to the cavity. The Lorentzian linewidth of the lasers is measured as low as 5.3 kHz.

Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate low-voltage waveguide silicon-germanium avalanche photodiodes (APDs) integrated with distributed Bragg reflectors (DBRs).
Abstract: We demonstrate low-voltage waveguide silicon-germanium avalanche photodiodes (APDs) integrated with distributed Bragg reflectors (DBRs). The internal quantum efficiency is improved from 60% to 90% at 1550 nm assisted with DBRs while still achieving a 25 GHz bandwidth. A low breakdown voltage of 10 V and a gain bandwidth product of near 500 GHz are obtained. APDs with DBRs at a data rate of 64 Gb/s pulse amplitude modulation with four levels (PAM4) show a 30%–40% increase in optical modulation amplitude (OMA) compared to APDs with no DBR. A sensitivity of around −13 dBm at a data rate of 64 Gb/s PAM4 and a bit error rate of 2.4×10−4 is realized for APDs with DBRs, which improves the sensitivity by ∼2 dB compared to APDs with no DBR.

Proceedings ArticleDOI
30 Jul 2020
TL;DR: REM, Reliable Extreme Mobility management for 4G, 5G, and beyond is devised and evaluation with operational high-speed rail datasets shows that, REM reduces failures comparable to static and low mobility, with low signaling and latency cost.
Abstract: Extreme mobility has become a norm rather than an exception. However, 4G/5G mobility management is not always reliable in extreme mobility, with non-negligible failures and policy conflicts. The root cause is that, existing mobility management is primarily based on wireless signal strength. While reasonable in static and low mobility, it is vulnerable to dramatic wireless dynamics from extreme mobility in triggering, decision, and execution. We devise REM, Reliable Extreme Mobility management for 4G, 5G, and beyond. REM shifts to movement-based mobility management in the delay-Doppler domain. Its signaling overlay relaxes feedback via cross-band estimation, simplifies policies with provable conflict freedom, and stabilizes signaling via scheduling-based OTFS modulation. Our evaluation with operational high-speed rail datasets shows that, REM reduces failures comparable to static and low mobility, with low signaling and latency cost.

Journal ArticleDOI
TL;DR: A symmetric nonlinear photonic device is presented as the fundamental building block that can use self-phase modulation in two microring resonators to emulate a continuously tunable, symmetrically bistable Ising node.
Abstract: We propose an integrated photonic circuit that acts as an optical coherent Ising machine and simulates its performance on the basis of some example problems. In contrast to previous all-optical approaches, the proposed integrated Ising machine does not require an optical parametric oscillator and can, hence, operate at a single wavelength, reducing the overall design complexity. We present a symmetric nonlinear photonic device as the fundamental building block that can use self-phase modulation in two microring resonators to emulate a continuously tunable, symmetrically bistable Ising node. We derive and verify using numerical simulation the key properties of the single Ising node device and the full Ising machine circuit. We estimate the full Ising machine's tolerance to realistic fabrication errors on the basis of randomly sampled example problems, and we discuss which technologies are required to obtain large-scale systems.

Journal ArticleDOI
TL;DR: In this article, the authors presented an analysis of a ring-based DWDM silicon photonic (SiP) link architecture with a comb laser source and p-i-n photodetectors.
Abstract: Current electronic interconnections in high performance computing (HPC) systems are reaching their limit in supporting high data traffic demands. Dense wavelength-division multiplexed (DWDM) links have gained interest as they can potentially alleviate these interconnect bandwidth demands while also lowering the cost and energy consumption compared to traditional electronic links. In this article we present an analysis of a ring-based DWDM silicon photonic (SiP) link architecture with a comb laser source and p-i-n photodetectors. Specifically, we consider microring resonators (MRRs) with narrow bus waveguides and carrier-injection ring modulators. We propose a new method to select the optimal comb source setting to minimize the laser power consumption at a particular data rate. Additionally, we leverage power penalty models supported by measurements to estimate the effective received optical power at the receiver input of each of the DWDM channels which yields a bit error rate (BER) of $\text{10}^{-{\text{12}}}$ or lower. We show that the analyzed comb source has the lowest power consumption per channel for 24 consecutive lines. For these comb settings, the maximum channel data rate of non-return to zero on-off keying (NRZ-OOK) signals is 22 Gbps, and the minimum energy consumption is 3.28 $\frac{\text{pJ}}{\text{bit}}$ .

Journal ArticleDOI
TL;DR: This paper presents a comprehensive analysis of a comb source microring-based SiP link architecture with p-i-n photodetectors, and shows that a select few comb configurations satisfy these requirements, and energy consumption as low as $3.3 Tbps is achievable.
Abstract: The electrical interconnects in high performance computing (HPC) systems are reaching their bandwidth capacities in supporting data-intensive applications. Currently, communication between compute nodes through these interconnects is the main bottleneck for overall HPC system performance. Optical interconnects based on the emerging silicon photonics (SiP) platform are considered to be a promising replacement to boost the speed of the data transfer with reduced cost and energy consumption compared to electrical interconnects. In this paper, we present a comprehensive analysis of a comb source microring-based SiP link architecture with p-i-n photodetectors. In particular, we direct our focus on improved grating coupler and bus waveguide designs to reduce the link power penalties. Additionally, we map the required performance from the comb laser to provide an aggregated data rate of 1 Tbps under the constraints of free spectral range (FSR) and nonlinearities of the microring resonators (MRRs). We show that a select few comb configurations satisfy these requirements, and energy consumption as low as $3\,\frac{\text{pJ}}{\text{bit}}$ is achievable.

Journal ArticleDOI
TL;DR: The strategy to develop appropriate optical link solutions for different data traffic scenarios in memory-driven HPCs and detailed review on recent work to demonstrate fully photonics-electronics-integrated single- and multi-wavelength directly modulated laser (DML) transmitters on silicon for the first time are discussed.
Abstract: Optical connectivity, which has been widely deployed in today's datacenters and high-performance computing (HPC) systems, is a disruptive technological revolution to the IT industry in the new Millennium. In our journey to debut an Exascale supercomputer, a completely new computing concept, called memory-driven computing, was innovated recently. This new computing architecture brings challenges and opportunities for novel optical interconnect solutions. Here, we first discuss our strategy to develop appropriate optical link solutions for different data traffic scenarios in memory-driven HPCs. Then, we present detailed review on recent work to demonstrate fully photonics-electronics-integrated single- and multi-wavelength directly modulated laser (DML) transmitters on silicon for the first time. Compact heterogeneous microring lasers and laser arrays were fabricated as photonic engines to work with a customized complementary metal-oxide semiconductor (CMOS) driver circuit. Microring lasers based on conventional quantum well and new quantum dot lasing medium were compared in the experiment. Thermal shunt and MOS capacitor structures were integrated into the lasers for effective thermal management and ultra low-energy tuning. It enables a controllable dense wavelength division multiplexing (DWDM) link architecture in an HPC environment. An equivalent microring laser circuit model was constructed to allow photonics-electronics co-simulation. Equalization functionality in the CMOS driver circuit proved to be critical to achieve up to 14 Gb/s direct modulation with 6 dB extinction ratio. Finally, the on-going and future work is discussed towards more robust, higher speed, and more energy efficient DML transmitters.

Journal ArticleDOI
Dejan Milojicic1
TL;DR: Computer hosts a virtual roundtable with three experts to discuss the opportunities and obstacles regarding edge-to-cloud technology.
Abstract: Computer hosts a virtual roundtable with three experts to discuss the opportunities and obstacles regarding edge-to-cloud technology.

Proceedings ArticleDOI
06 Jul 2020
TL;DR: An efficient topology and route management approach in Software-Defined Wide Area Networks (SD-WAN) is presented and a centralized control approach that minimizes the total cost while satisfying the quality of service (QoS) on all flows is proposed.
Abstract: This paper presents an efficient topology and route management approach in Software-Defined Wide Area Networks (SD-WAN). Traditional WANs suffer from low utilization and lack of global view of the network. Therefore, during failures, topology/service/traffic changes, or new policy requirements, the system does not always converge to the global optimal state. Using Software Defined Networking architectures in WANs provides the opportunity to design WANs with higher fault tolerance, scalability, and manageability. We exploit the correlation matrix derived from monitoring system between the virtual links to infer the underlying route topology and propose a route update approach that minimizes the total route update cost on all flows. We formulate the problem as an integer linear programming optimization problem and provide a centralized control approach that minimizes the total cost while satisfying the quality of service (QoS) on all flows. Experimental results on real network topologies demonstrate the effectiveness of the proposed approach in terms of disruption cost and average disrupted flows.

Posted ContentDOI
26 Jun 2020-bioRxiv
TL;DR: The Swarm Learning (SL) approach as mentioned in this paper is a decentralized machine learning approach that unifies edge computing, blockchain-based peer-to-peer networking and coordination as well as privacy protection without the need for a central coordinator.
Abstract: Identification of patients with life-threatening diseases including leukemias or infections such as tuberculosis and COVID-19 is an important goal of precision medicine We recently illustrated that leukemia patients are identified by machine learning (ML) based on their blood transcriptomes However, there is an increasing divide between what is technically possible and what is allowed because of privacy legislation To facilitate integration of any omics data from any data owner world-wide without violating privacy laws, we here introduce Swarm Learning (SL), a decentralized machine learning approach uniting edge computing, blockchain-based peer-to-peer networking and coordination as well as privacy protection without the need for a central coordinator thereby going beyond federated learning Using more than 14,000 blood transcriptomes derived from over 100 individual studies with non-uniform distribution of cases and controls and significant study biases, we illustrate the feasibility of SL to develop disease classifiers based on distributed data for COVID-19, tuberculosis or leukemias that outperform those developed at individual sites Still, SL completely protects local privacy regulations by design We propose this approach to noticeably accelerate the introduction of precision medicine

Journal ArticleDOI
TL;DR: In this article, a Si-Ge waveguide avalanche photodiode with extremely high temperature stability was demonstrated, where the breakdown voltage increases ∼4.2mV/°C, bandwidth reduces ∼0.09%/mV, and gain-bandwidth product reduces ∼ 0.24mV with temperature increased from 30°C to 90°C.
Abstract: A Si-Ge waveguide avalanche photodiode with extremely high temperature stability is demonstrated. The breakdown voltage increases ∼4.2 mV/°C, bandwidth reduces ∼0.09%/°C, and gain-bandwidth product reduces ∼0.24%/°C with temperature increased from 30 °C to 90 °C. Additionally, it maintains superior performance with low breakdown voltage of ∼10 V, high multiplication gain of >15, high bandwidth of ∼24.6 GHz, high gain-bandwidth product of >240 GHz, high internal quantum efficiency of ∼100%, and clear eye diagrams with 64 Gbps PAM4 modulation at 90 °C.