There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality.

Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

I and i

On robust estimation of the location parameter

Aspect] is a simple and practical aspect-oriented extension to Java With just a few new constructs, AspectJ provides support for modular implementation of a range of crosscutting concerns. In AspectJ's dynamic join point model, join points are well-defined points in the execution of the program; pointcuts are collections of join points; advice are special method-like constructs that can be attached to pointcuts; and aspects are modular units of crosscutting implementation, comprising pointcuts, advice, and ordinary Java member declarations. AspectJ code is compiled into standard Java bytecode. Simple extensions to existing Java development environments make it possible to browse the crosscutting structure of aspects in the same kind of way as one browses the inheritance structure of classes. Several examples show that AspectJ is powerful, and that programs written using it are easy to understand.

An overview of AspectJ

Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without sacrificing application accuracy or increasing hardware cost are critical to the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances toward the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various hardware platforms and architectures that support DNNs, and highlight key trends in reducing the computation cost of DNNs either solely via hardware design changes or via joint hardware design and DNN algorithm changes. It will also summarize various development resources that enable researchers and practitioners to quickly get started in this field, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic codesigns, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand the tradeoffs between various hardware architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand recent implementation trends and opportunities.

Efficient Processing of Deep Neural Networks: A Tutorial and Survey

The objective of this paper is to give a comprehensive introduction to applied cryptography with an engineer or computer scientist in mind. The emphasis is on the knowledge needed to create practical systems which supports integrity, confidentiality, or authenticity. Topics covered includes an introduction to the concepts in cryptography, attacks against cryptographic systems, key use and handling, random bit generation, encryption modes, and message authentication codes. Recommendations on algorithms and further reading is given in the end of the paper. This paper should make the reader able to build, understand and evaluate system descriptions and designs based on the cryptographic components described in the paper.

[서평]「Applied Cryptography」

We have developed a highly-efficient and simple parallel hardware design for merging two sorted lists residing in banked (or multi-ported) memory. The FPGA implementation uses half the hardware resources required for implementing the current state-of-the-art architecture. This is achieved with better performance and half the latency, for the same amount of parallelism. The challenges for the merge operations in FPGAs have been the low clock frequency due to the feedback datapath of the merger being the critical path for timing, and also the high resource utilisation in recent attempts to eliminate/remove the feedback datapath. Our solution uses a modified version of the bitonic merge block, as found in a bitonic sorter, repurposed for performing parallel merge for streaming data. As with the state-of-the-art, it can be considered feedback-less since it only nests one parallel comparison for any desired level of parallelism. This leads to high operating frequency designs, 1.3 times higher than the previous best on our test platform. Since the new design uses 2 times fewer hardware resources, it allows more parallelism and leaves room for additional logic.

FLiMS: Fast Lightweight Merge Sorter

We present new scalable hardware designs of modular multiplication, modular exponentiation and primality test. These operations are at the core of most public-key crypto-systems. All the modules are based on an original Montgomery modular multiplier. Our multiplier is the first Montgomery multiplier design with variable pipeline stages and variable serial replications. It is 8 times faster than the best existing hardware implementation and 30 times faster than an optimised software implementation on an Intel Core 2 Duo running at 2.8 GHz. Our exponentiator is 2.4 times faster than an optimised software implementation. It reaches the performance of a more complex FPGA design using DSP blocks which is the fastest in the literature. Our prime tester is 2.2 times faster than the software implementation and is 85 times faster than hardware implementations of the same algorithm with only 60% area overhead.

/pdf/parametric-encryption-hardware-design-42hpaqpgtb.pdf

Parametric encryption hardware design

Exotic options are financial derivatives which have complex features including path-dependency. These complex features make them difficult to price, as only computationally intensive Monte-Carlo methods can provide accurate prices. This paper proposes an FPGA-accelerated control variate Monte-Carlo (CVMC) framework for pricing exotic options. An optimised implementation of arithmetic Asian option pricing under this framework in a Virtex-5 xc5vlx330t FPGA at 200MHz is 24 times faster than a multi-threaded software implementation on a Xeon E5420 at 2.5GHz; it is also 2.4 times faster than the Tesla C1060 GPU at 1.3 GHz.

Reconfigurable Control Variate Monte-Carlo Designs for Pricing Exotic Options

Through energy harvesting system, new energy sources are made available immediately for many advanced applications based on environmentally embedded systems. However, the harvested power, such as the solar energy, varies significantly under different ambient conditions, which in turn affects the energy conversion efficiency. In this paper, we propose an approach for designing power-adaptive computing systems to maximize the energy utilization under variable solar power supply. Using the geometric programming technique, the proposed approach can generate a customized parallel computing structure effectively. Then, based on the prediction of the solar energy in the future time slots by a multilayer perceptron neural network, a convex model-based adaptation strategy is used to modulate the power behavior of the real-time computing system. The developed power-adaptive computing system is implemented on the hardware and evaluated by a solar harvesting system simulation framework for five applications. The results show that the developed power-adaptive systems can track the variable power supply better. The harvested solar energy utilization efficiency is 2.46 times better than the conventional static designs and the rule-based adaptation approaches. Taken together, the present thorough design approach for self-powered embedded computing systems has a better utilization of ambient energy sources.

/pdf/power-adaptive-computing-system-design-for-solar-energy-4plc6oy9ph.pdf

Power-Adaptive Computing System Design for Solar-Energy-Powered Embedded Systems

Generative adversarial networks (GANs) are a class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks: a generative network (generator) and a discriminative network (discriminator). These two networks compete with each other to perform better at their respective tasks. The generator is typically a deconvolutional neural network and the discriminator is a convolutional neural network (CNN). Deconvolution performs a fundamentally new type of mathematical operation which differs from convolution. While the FPGA-based CNN accelerators have been widely studied in prior work, the acceleration of deconvolutional networks on FPGA is rarely explored. This paper proposes a novel parametrized deconvolutional architecture based on an FPGA-friendly method, in contrast to the transposed convolution implementation in CPUs and GPUs. Hardware design templates which map this architecture to FPGAs are provided with configurable deconvolutional layer parameters. Furthermore, a memory-efficient architecture with a new tiling method is proposed to accelerate the generator of GANs, by storing all intermediate data in on-chip memories and significantly reducing off-chip data transfers. The performance of the proposed accelerator is evaluated using a variety of GANs on a Xilinx Zynq 706 board, which shows 2.3x higher speed and 8.2x off-chip memory access reduction than an optimized Vanilla FPGA design. Compared to the respective implementations on CPUs and GPUs, the achieved improvements are in the range of 30x-92x in speed over an Intel 8-core i7-950 CPU, and 8x-108x in terms of Performance-per-Watt over an NVIDIA Titan X GPU.

/pdf/memory-efficient-architecture-for-accelerating-generative-2kn52ngifp.pdf

Wayne Luk

Papers

FLiMS: Fast Lightweight Merge Sorter

Parametric encryption hardware design

Reconfigurable Control Variate Monte-Carlo Designs for Pricing Exotic Options

Power-Adaptive Computing System Design for Solar-Energy-Powered Embedded Systems

Memory-Efficient Architecture for Accelerating Generative Networks on FPGA