scispace - formally typeset
Search or ask a question

Showing papers by "NEC published in 2014"


Proceedings ArticleDOI
22 Aug 2014
TL;DR: This work identifies additional steps that will be required for ONOS to support use cases such as core network traffic engineering and scheduling, and to become a usable open source, distributed network OS platform that the SDN community can build upon.
Abstract: We present our experiences to date building ONOS (Open Network Operating System), an experimental distributed SDN control platform motivated by the performance, scalability, and availability requirements of large operator networks. We describe and evaluate two ONOS prototypes. The first version implemented core features: a distributed, but logically centralized, global network view; scale-out; and fault tolerance. The second version focused on improving performance. Based on experience with these prototypes, we identify additional steps that will be required for ONOS to support use cases such as core network traffic engineering and scheduling, and to become a usable open source, distributed network OS platform that the SDN community can build upon.

1,137 citations


Journal ArticleDOI
TL;DR: Basic experiments on the LSSE in insulators provide a novel and versatile pathway to thermoelectric generation in combination of the inverse spin-Hall effects.
Abstract: The spin Seebeck effect refers to the generation of spin voltage as a result of a temperature gradient in ferromagnetic or ferrimagnetic materials. When a conductor is attached to a magnet under a temperature gradient, the thermally generated spin voltage in the magnet injects a spin current into the conductor, which in turn produces electric voltage owing to the spin-orbit interaction. The spin Seebeck effect is of increasing importance in spintronics, since it enables direct generation of a spin current from heat and appears in a variety of magnets ranging from metals and semiconductors to insulators. Recent studies on the spin Seebeck effect have been conducted mainly in paramagnetic metal/ferrimagnetic insulator junction systems in the longitudinal configuration in which a spin current flowing parallel to the temperature gradient is measured. This 'longitudinal spin Seebeck effect' (LSSE) has been observed in various sample systems and exclusively established by separating the spin-current contribution from extrinsic artefacts, such as conventional thermoelectric and magnetic proximity effects. The LSSE in insulators also provides a novel and versatile pathway to thermoelectric generation in combination of the inverse spin-Hall effects. In this paper, we review basic experiments on the LSSE and discuss its potential thermoelectric applications with several demonstrations.

223 citations


Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate wavelength and mode-division multiplexed transmission over a fiber re-circulating loop comprising 50-km of low-DMGD few-mode fiber, and an optimized fewmode EDFA with reduced wavelength dependent gain and mode dependent gain.
Abstract: We demonstrate wavelength- and mode-division multiplexed transmission over a fiber re-circulating loop comprising 50-km of low-DMGD few-mode fiber, and an optimized few-mode EDFA with reduced wavelength-dependent gain and mode-dependent gain. We characterize the channel matrix in terms of its singular value spread, and investigate its long-term stability.

180 citations


Journal ArticleDOI
TL;DR: The four-phased deployments and demonstration of new networking capabilities enabled by SDN played an important role in maturing SDN and its ecosystem and the experiences and lessons learned are shared.

176 citations


Journal ArticleDOI
Yasuharu Okamoto1
TL;DR: In this paper, first principles calculations were done to examine the energetics of alkali metal intercalation into graphite, based on the exchange-correlation functionals that include a nonlocal correlation.
Abstract: First-principles calculations were done to examine the energetics of alkali metal intercalation into graphite. Calculations based on the exchange-correlation functionals that include a nonlocal correlation were found to give reasonable agreement with experiments concerning the crystal structure of graphite and LiC6, binding energy of graphene sheets, and Li intercalation potential. We found that K intercalation from KC8 to KC6 cannot be achieved through electrochemical reactions. We also found that the absence of the stage-I structures for Na graphite intercalation compounds such as NaC6 or NaC8 is linked to a relatively high redox potential of Na/Na+ compared to that of Li/Li+.

158 citations


Journal ArticleDOI
TL;DR: The proposed approach can be attractive for monetizing optical access/aggregation networks via on-demand support for high-speed, low latency, high quality of service (QoS) applications over legacy fiber infrastructure.
Abstract: We propose and discuss the extension of software-defined networking (SDN) and OpenFlow principles to optical access/aggregation networks for dynamic flex-grid wavelength circuit creation. The first experimental demonstration of an OpenFlow1.0-based flex-grid λ-flow architecture for dynamic 150 Mb/s per-cell 4 G Orthogonal Frequency Division Multiple Access (OFDMA) mobile backhaul (MBH) overlays onto 10 Gb/s passive optical networks (PON) without optical network unit (ONU)-side optical filtering, amplification, or coherent detection, over 20 km standard single mode fiber (SSMF) with a 1:64 passive split is also detailed. The proposed approach can be attractive for monetizing optical access/aggregation networks via on-demand support for high-speed, low latency, high quality of service (QoS) applications over legacy fiber infrastructure.

97 citations


Book ChapterDOI
Kazuhiko Minematsu1
11 May 2014
TL;DR: The key idea of the proposal is a novel usage of two-round Feistel permutation, where the round functions are derived from the theory of tweakable blockcipher, which attains similar characteristics as the seminal OCB mode, without using the inverse block cipher.
Abstract: This paper proposes a new scheme for authenticated encryption (AE) which is typically realized as a blockcipher mode of operation. The proposed scheme has attractive features for fast and compact operation. When it is realized with a blockcipher, it requires one blockcipher call to process one input block (i.e. rate-1), and uses the encryption function of the blockcipher for both encryption and decryption. Moreover, the scheme enables one-pass, parallel operation under two-block partition. The proposed scheme thus attains similar characteristics as the seminal OCB mode, without using the inverse blockcipher. The key idea of our proposal is a novel usage of two-round Feistel permutation, where the round functions are derived from the theory of tweakable blockcipher. We also provide basic software results, and describe some ideas on using a non-invertible primitive, such as a keyed hash function.

87 citations


Journal ArticleDOI
TL;DR: This work realizes a PPLO with Josephson-junction circuitry and operates it as a sensitive phase detector, demonstrating the demodulation of a weak binary phase-shift keying microwave signal of the order of a femtowatt and applying it to dispersive readout of a superconducting qubit.
Abstract: The parametric phase-locked oscillator (PPLO) is a class of frequency-conversion device, originally based on a nonlinear element such as a ferrite ring, that served as a fundamental logic element for digital computers more than 50 years ago. Although it has long since been overtaken by the transistor, there have been numerous efforts more recently to realize PPLOs in different physical systems such as optical photons, trapped atoms, and electromechanical resonators. This renewed interest is based not only on the fundamental physics of nonlinear systems, but also on the realization of new, high-performance computing devices with unprecedented capabilities. Here we realize a PPLO with Josephson-junction circuitry and operate it as a sensitive phase detector. Using a PPLO, we demonstrate the demodulation of a weak binary phase-shift keying microwave signal of the order of a femtowatt. We apply PPLO to dispersive readout of a superconducting qubit, and achieved high-fidelity, single-shot and non-destructive readout with Rabi-oscillation contrast exceeding 90%.

76 citations


Patent
Nobuyuki Yamashita1, Kaoru Uchida1
05 Sep 2014
TL;DR: A customer behavior analysis system as discussed by the authors includes an image information acquisition unit that acquires input image information on an image taken of a presentation area where a product is presented to a customer, an action detection unit that detects whether the customer is holding the product and looking at an identification display of the product based on the input image, and a customer behaviour analysis information generation unit that generates customer behavior information containing a relationship between a result of the detection and a purchase result of a product by the customer.
Abstract: A customer behavior analysis system ( 10 ) includes an image information acquisition unit ( 11 ) that acquires input image information on an image taken of a presentation area where a product is presented to a customer, an action detection unit ( 12 ) that detects whether the customer is holding the product and looking at an identification display of the product based on the input image information, and a customer behavior analysis information generation unit ( 13 ) that generates customer behavior analysis information containing a relationship between a result of the detection and a purchase result of the product by the customer. This enables to analyze the more detailed behavior of a customer.

71 citations


Journal ArticleDOI
TL;DR: In this paper, the temporal evolution of the longitudinal spin Seebeck effect in a YIG$|$Pt bilayer system was studied and it was shown that the temporal behavior of this effect depends on the time development of the temperature gradient in the magnetic material close to the interface.
Abstract: We present the temporal evolution of the longitudinal spin Seebeck effect in a YIG$|$Pt bilayer system. Our findings reveal that this effect is a submicrosecond fast phenomenon governed by the thermal-magnon diffusion along the thermal gradient inside the magnetic material. A comparison of experimental results with the thermal-driven magnon-diffusion model demonstrates that the temporal behavior of this effect depends on the time development of the temperature gradient in the magnetic material close to the interface. The effective thermal-magnon diffusion length for the YIG$|$Pt system is estimated to be around 500 nm.

69 citations


Journal ArticleDOI
TL;DR: In this article, the authors report the results of two field trials aimed at achieving high fiber capacity over regional and longhaul distances, achieving the highest field trial capacity to date at 54.2 Tb/s in regional distances.
Abstract: We report the results of two field trials aimed at achieving high fiber capacity over regional and long-haul distances. In the first trial, 41 superchannels with digital Nyquist pulse-shaping were generated and tightly packed to fill up both C-band and L-band. Each subcarrier was modulated with 24.8-Gbaud dual-polarization 16 quadrature amplitude modulation (DP-16QAM) data. The signal carrying net 54.2 Tb/s data was transmitted over 634 km of dispersion uncompensated field-installed standard single mode fiber with the aid of hybrid EDFA and Raman amplification and digital coherent detection. In the second trial for long-haul distances, we extended the transmission distance over 1,822 km. This increase in reach was achieved by reducing the net total capacity to 40.5 Tb/s and modulating the signals with dual-polarization 8 quadrature amplitude modulation (DP-8QAM) Nyquist carrier modulation. A novel rate-adaptive low-density parity-check coding was employed, so that the transmitted channels can exhibit different code rates, adapted by the concatenation of hard-decision and soft-decision forward error correcting codes for enhancing error-correction capability. To the best of our knowledge, we achieved the highest field trial capacity to date at 54.2 Tb/s in regional distances. Furthermore, in long-haul applications, the reported capacity × distance product of 73.79 Pb/s·km is the highest to date.

Proceedings ArticleDOI
10 Jun 2014
TL;DR: MT-MPI is presented, an internally multithreaded MPI implementation that transparently coordinates with the threading runtime system to share idle threads with the application and requires modifications to both theMPI implementation and the OpenMP runtime in order to share appropriate information between them.
Abstract: Many-core architectures, such as the Intel Xeon Phi, provide dozens of cores and hundreds of hardware threads. To utilize such architectures, application programmers are increasingly looking at hybrid programming models, where multiple threads interact with the MPI library (frequently called "MPI+X" models). A common mode of operation for such applications uses multiple threads to parallelize the computation, while one of the threads also issues MPI operations (i.e., MPI FUNNELED or SERIALIZED thread-safety mode). In MPI+OpenMP applications, this is achieved, for example, by placing MPI calls in OpenMP critical sections or outside the OpenMP parallel regions. However, such a model often means that the OpenMP threads are active only during the parallel computation phase and idle during the MPI calls, resulting in wasted computational resources. In this paper, we present MT-MPI, an internally multithreaded MPI implementation that transparently coordinates with the threading runtime system to share idle threads with the application. It is designed in the context of OpenMP and requires modifications to both the MPI implementation and the OpenMP runtime in order to share appropriate information between them. We demonstrate the benefit of such internal parallelism for various aspects of MPI processing, including derived datatype communication, shared-memory communication, and network I/O operations.

Patent
Toshiyuki Nomura1, Kota Iwamoto1, Kyota Higa1, Keishi Ohashi1, Wataru Hattori1 
17 Feb 2014
TL;DR: In this article, an information processing apparatus that effectively counts, on a type basis, articles of a plurality of types displayed in a depth direction on a display shelf is presented. But this apparatus is not suitable for display recognition.
Abstract: An apparatus of this invention is directed to an information processing apparatus that effectively counts, on a type basis, articles of a plurality of types displayed in a depth direction on a display shelf. The information processing apparatus includes a display count acquirer that acquires display count information of articles using article presence/absence sensors provided on the display shelf on which the articles are placed, an article identifier that acquires article identification information capable of identifying the types of articles based on an image acquired by capturing the display shelf, and a display recognizer that recognizes, based on the display count information and the article identification information, display count of each type of the articles.

Book ChapterDOI
07 Dec 2014
TL;DR: In this paper, the authors studied the problem of semantically hiding plaintext information in order-preserving encryption (OPE) and showed that some plaintext bits can be semantically hidden by OPE encryptions.
Abstract: Semantic-security of individual plaintext bits given the corresponding ciphertext is a fundamental notion in modern cryptography. We initiate the study of this basic problem for Order-Preserving Encryption (OPE), asking “what plaintext information can be semantically hidden by OPE encryptions?” OPE has gained much attention in recent years due to its usefulness for secure databases, and has received a thorough formal treamtment with innovative and useful security notions. However, all previous notions are one-way based, and tell us nothing about partial-plaintext indistinguishability (semantic security).

Patent
Hisashi Futaki1
04 Sep 2014
TL;DR: In this paper, the authors propose an improved technique for allowing the M2M terminal that is supporting a special coverage enhancement processing for MTCs to camp on an appropriate cell.
Abstract: A Machine-to-machine (M2M) terminal ( 11 ) comprises a radio communication unit ( 111 ) and a controller ( 112 ). The radio communication unit ( 111 ) is configured to communicate with a base station ( 13 ). The controller ( 112 ) is configured to change at least one of a cell selection operation, a cell reselection operation, and a handover operation according to whether a specific coverage enhancement processing is required or according to whether the specific coverage enhancement processing is supported by at least one of a cell ( 13 ) in which the M2M terminal ( 11 ) camps on and a neighbouring cell ( 14 ) of the cell ( 13 ) which the M2M terminal ( 11 ) camps on. It is thus possible to provide an improved technique for allowing the M2M terminal that is supporting a special coverage enhancement processing for M2M terminals to camp on an appropriate cell.

Book ChapterDOI
03 Mar 2014
TL;DR: In this paper, the authors define and analyze the security of a blockcipher mode of operation for provably secure authenticated encryption with associated data, and prove it secure in a reduction-based provable security paradigm, under the assumption that the block cipher is a pseudorandom permutation.
Abstract: We define and analyze the security of a blockcipher mode of operation, \(\mathrm {CLOC}\), for provably secure authenticated encryption with associated data. The design of \(\mathrm {CLOC}\) aims at optimizing previous schemes, CCM, EAX, and EAX-prime, in terms of the implementation overhead beyond the blockcipher, the precomputation complexity, and the memory requirement. With these features, \(\mathrm {CLOC}\) is suitable for handling short input data, say 16 bytes, without needing precomputation nor large memory. This property is especially beneficial to small microprocessors, where the word size is typically 8 bits or 16 bits, and there are significant restrictions in the size and the number of registers. \(\mathrm {CLOC}\) uses a variant of CFB mode in its encryption part and a variant of CBC MAC in the authentication part. We introduce various design techniques in order to achieve the above mentioned design goals. We prove \(\mathrm {CLOC}\) secure, in a reduction-based provable security paradigm, under the assumption that the blockcipher is a pseudorandom permutation. We also present our preliminary implementation results.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated properties of Co/Pt multilayer for reference layer in CoFeB-MgO magnetic tunnel junctions with perpendicular easy axis.
Abstract: We investigated properties of Co/Pt multilayer for reference layer in CoFeB–MgO magnetic tunnel junctions with perpendicular easy axis. The sufficient thermal stability factor of 284 was obtained under zero applied field in 40-nm-diameter Co/Pt multilayer based reference layer annealed at 350 °C. By applying a synthetic ferrimagnetic (SyF) structure to the Co/Pt multilayer based reference layer, the shift of the center of minor resistance-magnetic field curves was suppressed, leading to higher thermal stability of antiparallel magnetization configuration than that without a SyF structure.

Journal ArticleDOI
TL;DR: A survey on OpenFlow related technologies that have been proposed as a means for researchers, network service creators, and others to easily design, test, and deploy their innovative ideas in experimental or production networks to accelerate research activities on network technologies.
Abstract: SUMMARY The paper presents a survey on OpenFlow related technologies that have been proposed as a means for researchers, network service creators, and others to easily design, test, and deploy their innovative ideas in experimental or production networks to accelerate research activities on network technologies. Rather than having programmability within each network node, separated OpenFlow controllers provide network control through pluggable software modules; thus, it is easy to develop new network control functions in executable form and test them in production networks. The emergence of OpenFlow has started various research activities. The paper surveys these activities and their results.

Patent
09 Apr 2014
TL;DR: In this paper, a wireless communication system (100) is disclosed in which UEs (120a - 120c) can operate in a cellular mode, where data is transmitted from one UE to another via one or more access nodes in cellular network, and at least some UEs are D2D-UEs which can also operate in direct communication mode.
Abstract: A wireless communication system (100) is disclosed in which UEs (120a - 120c) can operate in a cellular mode wherein data is transmitted from one UE to another via one or more access nodes in a cellular network, and at least some UEs (120c) are D2D-UEs which can also operate in a direct communication mode wherein a pair of D2D-UEs transmit data directly from one to the other, wherein D2D-UEs operating in the direct communication mode maintain control signaling connection with the network and periodically change to the cellular mode and send CSI to the network, and the network uses CSI and/or network available geographical location information for D2D-UEs in determining whether to cause D2D-UEs to operate in the direct communication mode or the cellular mode.

Proceedings ArticleDOI
01 Dec 2014
TL;DR: An extensible programming framework to separate platform-specific optimizations from application codes, and to incrementally improve the performance of an existing application without messing up the code is proposed.
Abstract: This paper proposes an extensible programming framework to separate platform-specific optimizations from application codes. The framework allows programmers to define their own code translation rules for special demands of individual systems, compilers, libraries, and applications. Code translation rules associated with user-defined compiler directives are defined in an external file, and the application code is just annotated by the directives. For code transformations based on the rules, the framework exposes the abstract syntax tree (AST) of an application code as an XML document to expert programmers. Hence, the XML document of an AST can be transformed using any XML-based technologies. Our case studies using real applications demonstrate that the framework is effective to separate platform-specific optimizations from application codes, and to incrementally improve the performance of an existing application without messing up the code.

Journal ArticleDOI
TL;DR: With the parameter settings, VMM rejuvenation with prs job interruption improves the performance of job execution regardless of the aging type, with performance degradation is taken into account.
Abstract: This article analyzes the completion time of a job running on a virtualized server subject to software aging and rejuvenation in a virtual machine monitor (VMM). A job running on the server may be interrupted by virtual machine (VM) failure, VMM failure or VMM rejuvenation. The job interruption is categorized as either preemptive-repeat (prt), in which case the interrupted job needs to restart from the beginning, or preemptive-resume (prs), in which case the job resumes execution from the point of interruption. Using a semi-Markov process (SMP) to model the server behavior, the steady-state server availability is computed and the theory developed in Kulkarni et al. [1987] is used to obtain the Laplace-Stieltjes transform (LST) of the job completion time. In the numerical experiments, we introduce four types of aging behavior of VMM. The effectiveness of VMM rejuvenation on job completion time is discussed in association with the type of interruption it causes and the VMM aging type. With our parameter settings, VMM rejuvenation with prs job interruption improves the performance of job execution regardless of the aging type, with performance degradation is taken into account.

Journal ArticleDOI
Ryota Yuge1, Noriyuki Tamura1, Takashi Manako1, Kaichiro Nakano1, Kentaro Nakahara1 
TL;DR: In this article, a mixture of graphite, vapor grown carbon fibers (VGCFs), and carbon nanohorns (CNHs) was heat-treated in Ar atmosphere and carbon-coated by using a chemical vapor deposition (CVD) method.

Patent
22 Aug 2014
TL;DR: In this paper, the authors propose a reconfigurable and flexible rate shared rate multi-transponder network architecture, where the transponders are configured to map one or more signals to multiple parallel Virtual Ethernet Links and insert blocks of idle characters to enable transmission over a lower rate transmission pipe.
Abstract: Systems and methods for data transport, including receiving one or more signals into a reconfigurable and flexible rate shared rate multi-transponder network architecture, wherein the network architecture includes one or more transponders with multiple line side interfaces and one or more client side interfaces. The transponders are configured to map one or more signals to multiple parallel Virtual Ethernet Links, remove idle characters from the one or more signals, buffer one or more blocks of characters using an intermediate block buffer, activate and deactivate one or more portions of input/output electrical lanes of an Ethernet module, multiplex and demultiplex the one or more signals to and from the input/output electrical lanes to enable sharing of a single optical transceiver by multiple independent signals, and insert blocks of idle characters to enable transmission over a lower rate transmission pipe.

Proceedings ArticleDOI
24 Nov 2014
TL;DR: Si-photonic hybrid ring-filter external Cavity (SHREC) wavelength tunable lasers by passive alignment techniques with over 100mW fiber-coupled power and linewidth narrower than 15 kHz along the whole C-band are demonstrated in this paper.
Abstract: Si-photonic Hybrid Ring-filter External Cavity (SHREC) wavelength tunable lasers by passive alignment techniques with over 100-mW fiber-coupled power and linewidth narrower than 15 kHz along the whole C-band are demonstrated. Obtained results show excellent features of Si-photonics towards commercial products.

Journal ArticleDOI
TL;DR: This paper presents algorithm, architecture, and fabrication results of a nonvolatile context-driven search engine that reduces energy consumption as well as computational delay compared to classical hardware and software-based approaches.
Abstract: This paper presents algorithm, architecture, and fabrication results of a nonvolatile context-driven search engine that reduces energy consumption as well as computational delay compared to classical hardware and software-based approaches. The proposed architecture stores only associations between items from multiple search fields in the form of binary links, and merges repeated field items to reduce the memory requirements and accesses. The fabricated chip achieves $13.6\times$ memory reduction and 89% energy saving compared to a classical field-based approach in hardware, based on content-addressable memory (CAM). Furthermore, it achieves $8.6\times$ reduced number of clock cycles in performing search operations compared to the CAM, and five orders of magnitude reduced number of clock cycles compared to a fabricated and measured ultra low-power CPU-based counterpart running a classical search algorithm in software. The energy consumption of the proposed architecture is on average three orders of magnitude smaller than that of a software-based approach. A magnetic tunnel junction (MTJ)-based logic-in-memory architecture is presented that allows simple routing and eliminates leakage current in standby using 90 nm CMOS/MTJ-hybrid technologies.

Patent
Phong Thanh Nguyen1, Yuanrong Lan1
12 Sep 2014
TL;DR: In this paper, a signalling method for FDD-TDD carrier aggregation (CA) is proposed for an advanced wireless communication network, where the UE configures the UE by establishing radio resource control (RRC) connection with the network through the first access node.
Abstract: A signalling method is disclosed for use in an advanced wireless communication network that supports FDD-TDD carrier aggregation (CA). The signalling method comprises configuring the UE (by establishing radio resource control (RRC) connection with the network through the first access node) for data transmission between the UE and the network through the first access node on the first duplex mode carrier as a primary component carrier (PCell), configuring the UE (via dedicated RRC signalling on the PCell) for data transmission between the UE and the network through the second access node on the second duplex mode carrier as a secondary component carrier (SCell), and performing scheduling for data transmission on the aggregated SCell using either self-scheduling or cross-carrier scheduling.


Proceedings Article
02 Apr 2014
TL;DR: This paper addresses simultaneous model selection issues of PLMs; partition structure determination and feature selection of individual experts; and extends factorized asymptotic Bayesian inference for hierarchical mixture.
Abstract: Piecewise linear models (PLMs) have been widely used in many enterprise machine learning problems, which assign linear experts to individual partitions on feature spaces and express whole models as patches of local experts. This paper addresses simultaneous model selection issues of PLMs; partition structure determination and feature selection of individual experts. Our contributions are mainly three-fold. First, we extend factorized asymptotic Bayesian (F AB) inference for hierarchical mixture

Patent
Phong Thanh Nguyen1, Yuanrong Lan1
29 May 2014
TL;DR: In this article, the authors provide methods for use in control signaling in advanced wireless communication systems that support flexible allocation of TDD UL-DL configurations. But they do not address the problem of cross-subframe scheduling.
Abstract: There is provided methods for use in control signaling in advanced wireless communication systems that support flexible allocation of TDD UL-DL configurations. Where HARQ-ACK bundling is used, PDCCH/EPDCCH transmissions indicating DL SPS release and PDSCH transmissions with corresponding PDCCH/EPDCCH are scheduled only on DL and/or special subframes in a DL association set which are not after the subframe in the DL association set carrying an UL grant. Where HARQ-ACK multiplexing is used, the value of the DL assignment index (V DAI UL ) is set to the number of subframes in the DL association set. Cross-subframe scheduling is also used.

Proceedings ArticleDOI
20 Oct 2014
TL;DR: This approach augments the software memcached running on the host CPU by caching its data and some operations at the FPGA-equipped network interface card (NIC) mounted on the server, and estimates that the latency improved by an order of magnitude over softwarememcachedrunning on a high performance CPU.
Abstract: Memcached is a technology that improves response speed of web servers by caching data on DRAMs in distributed servers. In order to achieve higher performance, memcached has been evaluated on various platforms. Among them, FPGA seems to be the most efficient platform to run memcached, and several research groups are trying to achieve higher throughput with it. However, it is difficult to utilize a large amount of memory (several dozen gigabytes) with an FPGA. Some groups are trying to solve this problem by using an embedded CPU for memory allocation and another group is employing an SSD. Unlike other approaches that try to replace memcached itself on FPGAs, our approach augments the software memcached running on the host CPU by caching its data and some operations at the FPGA-equipped network interface card (NIC) mounted on the server. The locality of memcached data enables the FPGA NIC to have a fairly high hit rate with a smaller memory. We first explore the cache parameters by software simulations and estimate the effectiveness of our approach, and then prototype a system to prove its effectiveness. Through our evaluation with YCSB, a standard key-value store (KVS) benchmarking tool, we estimate that the latency improved by an order of magnitude over software memcached running on a high performance CPU.