scispace - formally typeset
Search or ask a question
Topic

Memory management

About: Memory management is a research topic. Over the lifetime, 16743 publications have been published within this topic receiving 312028 citations. The topic is also known as: memory allocation.


Papers
More filters
Proceedings ArticleDOI
13 Apr 2008
TL;DR: An algorithm, called ACC, is presented in this paper for multiple pattern matching that uses a novel model, namely cached deterministic finite automate (CDFA), and a new scheme named next-state addressing (NSA) to store and access transition rules of DFA in memory.
Abstract: Pattern matching is one of the most important components for the content inspection based applications of network security, and it requires well designed algorithms and architectures to keep up with the increasing network speed. For most of the solutions, AC and its derivative algorithms are widely used. They are based on the DFA model but utilize large amount of memory because of so many transition rules. An algorithm, called ACC, is presented in this paper for multiple pattern matching. It uses a novel model, namely cached deterministic finite automate (CDFA). In ACC, by using CDFA, only 4.1% transition rules for ClamAV (20.8% for Snort) are needed to represent the same function using DFA built by AC. This paper also proposes a new scheme named next-state addressing (NSA) to store and access transition rules of DFA in memory. Using this method, transition rules can be efficiently stored and directly accessed. Finally the architecture for multiple pattern matching is optimized by several approaches. Experiments show our architecture can achieve matching speed faster than 10 Gbps with very efficient memory utilization, i.e., 81KB memory for 1.8 K Snort rules with total 29 K characters, and 9.5 MB memory for 50 K ClamAV rules with total 4.44 M characters. A single architecture is memory efficient for large pattern set, and it is possible to support more than 10 M patterns with at most half amount of the memory utilization compared to the state-of-the-art architectures.

86 citations

Journal ArticleDOI
TL;DR: Compared with a variety of Intel i7s and Nvidia GPUs, the KiloCore at 1.1 V has geometric mean improvements of 4.3 $\times$ higher throughput per area and 9.3 pJ/instruction for AES encryption, 4095-b low-density parity-check decoding, 4096-point complex fast Fourier transform, and 100-B record sorting applications.
Abstract: A processor array containing 1000 independent processors and 12 memory modules was fabricated in 32-nm partially depleted silicon on insulator CMOS. The programmable processors occupy 0.055 mm2 each, contain no algorithm-specific hardware, and operate up to an average maximum clock frequency of 1.78 GHz at 1.1 V. At 0.9 V, processors operating at an average of 1.24 GHz dissipate 17 mW while issuing one instruction per cycle. At 0.56 V, processors operating at an average of 115 MHz dissipate 0.61 mW while issuing one instruction per cycle, resulting in an energy consumption of 5.3 pJ/instruction. On-die communication is performed by complementary circuit and packet-based networks that yield a total array bisection bandwidth of 4.2 Tb/s. Independent memory modules handle data and instructions and operate up to an average maximum clock frequency of 1.77 GHz at 1.1 V. All processors, their packet routers, and the memory modules contain unconstrained clock oscillators within independent clock domains that adapt to large supply voltage noise. Compared with a variety of Intel i7s and Nvidia GPUs, the KiloCore at 1.1 V has geometric mean improvements of 4.3 $\times$ higher throughput per area and 9.4 $\times$ higher energy efficiency for AES encryption, 4095-b low-density parity-check decoding, 4096-point complex fast Fourier transform, and 100-B record sorting applications.

86 citations

Proceedings ArticleDOI
Zoltan Majo1, Thomas R. Gross1
30 May 2011
TL;DR: This paper experimentally analyzes the behavior of the memory controllers of a commercial multicore processor, the Intel Xeon 5520 (Nehalem), and develops a simple model to characterize the sharing of local and remote memory bandwidth.
Abstract: Modern multicore processors with an on-chip memory controller form the base for NUMA (non-uniform memory architecture) multiprocessors. Each processor accesses part of the physical memory directly and has access to the other parts via the memory controller of other processors. These other processors are reached via the cross-processor interconnect. As a consequence a processor's memory controller must satisfy two kinds of requests: those that are generated by the local cores and those that arrive via the interconnect from other processors. On the other hand, a core (respectively the core's cache) can obtain data from multiple sources: data can be supplied by the local memory controller or by a remote memory controller on another processor. In this paper we experimentally analyze the behavior of the memory controllers of a commercial multicore processor, the Intel Xeon 5520 (Nehalem). We develop a simple model to characterize the sharing of local and remote memory bandwidth. The uneven treatment of local and remote accesses has implications for mapping applications onto such a NUMA multicore multiprocessor. Maximizing data locality does not always minimize execution time; it may be more advantageous to allocate data on a remote processor (and then to fetch these data via the cross-processor interconnect) than to store the data of all processes in local memory (and consequently over-loading the on-chip memory controller).

86 citations

Patent
Dotan Sokolov1, Barak Rotbard1
22 Mar 2010
TL;DR: In this paper, the authors present a non-volatile memory system for data storage, which includes a host having a host memory and a memory controller that is separate from the host.
Abstract: A method for data storage includes, in a system that includes a host having a host memory and a memory controller that is separate from the host and stores data for the host in a non-volatile memory including multiple analog memory cells, storing in the host memory information items relating to respective groups of the analog memory cells of the non-volatile memory. A command that causes the memory controller to access a given group of the analog memory cells is received from the host. In response to the command, a respective information item relating to the given group of the analog memory cells is retrieved from the host memory by the memory controller, and the given group of the analog memory cells is accessed using the retrieved information item.

85 citations

Proceedings ArticleDOI
23 May 2021
TL;DR: A new protocol for constant-round interactive ZK proofs that simultaneously allows for a highly efficient prover and low communication and an improved subfield Vector Oblivious Linear Evaluation (sVOLE) protocol with malicious security that is of independent interest is presented.
Abstract: Efficient zero-knowledge (ZK) proofs for arbitrary boolean or arithmetic circuits have recently attracted much attention. Existing solutions suffer from either significant prover overhead (i.e., high memory usage) or relatively high communication complexity (at least κ bits per gate, for computational security parameter κ). In this paper, we propose a new protocol for constant-round interactive ZK proofs that simultaneously allows for an efficient prover with asymptotically optimal memory usage and significantly lower communication compared to protocols with similar memory efficiency. Specifically:•The prover in our ZK protocol has linear running time and, perhaps more importantly, memory usage linear in the memory needed to evaluate the circuit non-cryptographically. This allows our proof system to scale easily to very large circuits.•for statistical security parameter ρ = 40, our ZK protocol communicates roughly 9 bits/gate for boolean circuits and 2–4 field elements/gate for arithmetic circuits over large fields.Using 5 threads, 400 MB of memory, and a 200 Mbps network to evaluate a circuit with hundreds of billions of gates, our implementation (ρ = 40, κ = 128) runs at a rate of 0.45 μs/gate in the boolean case, and 1.6 μs/gate for an arithmetic circuit over a 61-bit field.We also present an improved subfield Vector Oblivious Linear Evaluation (sVOLE) protocol with malicious security that is of independent interest.

85 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
94% related
Scalability
50.9K papers, 931.6K citations
92% related
Server
79.5K papers, 1.4M citations
89% related
Virtual machine
43.9K papers, 718.3K citations
87% related
Scheduling (computing)
78.6K papers, 1.3M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202333
202288
2021629
2020467
2019461
2018591