•Journal•ISSN: 2185-2839

International journal of networking and computing

IJNC Editorial Committee

About: International journal of networking and computing is an academic journal published by IJNC Editorial Committee. The journal publishes majorly in the area(s): Computer science & Shared memory. It has an ISSN identifier of 2185-2839. It is also open access. Over the lifetime, 223 publications have been published receiving 1021 citations. The journal is also known as: IJNC.

...read moreread less

Topics: Computer science, Shared memory, CUDA, Parallel algorithm, Encryption ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs

[...]

Duhu Man¹, Kenji Uda¹, Hironobu Ueyama¹, Yasuaki Ito¹, Koji Nakano¹ - Show less +1 more•Institutions (1)

Hiroshima University¹

01 Jul 2011-International journal of networking and computing

TL;DR: A simple parallel algorithm for the EDM is developed and implemented and it achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system.

...read moreread less

Abstract: Given a 2-D binary image of size n×n, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in O(n2) and thus this algorithm is optimal. Also, work-time optimal parallel algorithms for shared memory model have been presented. However, the presented parallel algorithms are too complicated to implement in existing shared memory parallel machines. The main contribution of this paper is to develop a simple parallel algorithm for the EDM and implement it in two different parallel platforms: multicore processors and Graphics Processing Units (GPUs). We have implemented our parallel algorithm in a Linux server with four Intel hexad-core processors (Intel Xeon X7460 2.66GHz). We have also implemented it in the following two modern GPU systems, Tesla C1060 and GTX 480, respectively. The experimental results have shown that, for an input binary image with size of 9216×9216, our implementation in the multicore system achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system. Meanwhile, for the same input binary image, our implementation on the GPU achieves a speedup factor of 26 over the sequential algorithm implementation.

...read moreread less

72 citations

Journal Article•DOI•

Acceleration of AES encryption on CUDA GPU

[...]

Keisuke Iwai, Naoki Nishikawa, Takakazu Kurokawa

06 Jan 2012-International journal of networking and computing

TL;DR: Results of several experiments showed that the decision of granularity and memory allocation is the most important factor for effective processing in AES encryption on GPU.

...read moreread less

Abstract: GPU exhibits the capability for applications with a high level of parallelism despite its low cost. The support of integer and logical instructions by the latest generation of GPUs enables us to implement cipher algorithms more easily. However, decisions such as parallel processing granularity and memory allocation impose a heavy burden on programmers. Therefore, this paper presents results of several experiments that were conducted to elucidate the relation between memory allocation styles of variables of AES and granularity as the parallelism exploited from AES encoding processes using CUDA with an NVIDIA GeForce GTX285 (Nvidia Corp.). Results of these experiments showed that the 16 bytes/thread granularity had the highest performance. It achieved approximately 35 Gbps throughput. It also exhibited differences of memory allocation and granularity effects around 2%–30% for performance in standard implementation. It shows that the decision of granularity and memory allocation is the most important factor for effective processing in AES encryption on GPU. Moreover, implementation with overlapping between processing and data transfer yielded 22.5 Gbps throughput including the data transfer time.

...read moreread less

57 citations

Journal Article•DOI•

Implementation and Evaluation of FPGA-based Annealing Processor for Ising Model by use of Resource Sharing

[...]

Chihiro Yoshimura¹, Masato Hayashi¹, Takuya Okuyama¹, Masanao Yamaoka¹•Institutions (1)

Hitachi¹

03 Jul 2017-International journal of networking and computing

TL;DR: An FPGA-based prototyping environment is described to develop the annealing processor's architecture for the Ising model and it supports a highly complex topology.

...read moreread less

Abstract: The non-von Neumann computer architecture has been widely studied to prepare us for the post-Moore era. The authors implemented this kind of architecture, which finds the lower energy state of the Ising model using circuit operations inspired by simulated annealing in SRAM-based integrated circuits. Our previous prototype was suited for the Ising model because of its simple and typical structure such as its three-dimensional lattice topology, but it could not be used in real world applications. A reconfigurable prototyping environment is needed to develop the architecture and to make it suitable for applications. Here, we describe an FPGA-based prototyping environment to develop the annealing processor's architecture for the Ising model. We implemented the new architecture using a prototyping environment. The new architecture performs approximated simulated annealing for the Ising model, and it supports a highly complex topology. It consists of units having fully-connected multiple spins. Multiple units are placed in a two-dimensional lattice topology, and neighboring units are connected to perform interactions between spins. The number of logic elements was reduced by sharing the operator among multiple spins within the unit. Furthermore, a pseudo-random number generator, which produces random pulse sequences for annealing, is also shared among all the units. As a result, the number of logic elements was reduced to less than 1/10, and the solution accuracy became comparable to that of a conventional computer's simulated annealing.

...read moreread less

36 citations

Journal Article•DOI•

High-Performance Symmetric Block Ciphers on Multicore CPU and GPUs

[...]

Naoki Nishikawa, Keisuke Iwai, Takakazu Kurokawa

05 Jul 2012-International journal of networking and computing

TL;DR: This study targeted five 128-bit symmetric block ciphers, AES, Camellia, CIPHERUNICORN-A, Hierocrypt-3, and SC2000, from an e-government recommended cipher list by the CRYPTography Research and Evaluation Committees (CRYPTREC) in Japan, to test the hypothesis that these two findings are applicable to implementation of other asymmetric blockciphers on two generation of GPU.

...read moreread less

Abstract: As the data protection with encryption becomes important day by day, the encryption processing using General Purpose computation on a Graphic Processing Unit (GPGPU) has been noticed as one of the methods to realize high-speed data protection technology. GPUs have evolved in recent years into powerful parallel computing devices, with a high cost-performance ratio. However, many factors affect GPU performance. In earlier work to gain higher AES performance using GPGPU in various ways, we obtained the following two technical viewpoints: (1) 16 Bytes/Thread is the best granularity (2) Extended key and substitution table stored in shared memory and plaintext stored in register are the best memory allocation style. However, AES is not the only cipher algorithm widely used in the real world. For this reason, this study was undertaken to test the hypothesis that these two findings are applicable to implementation of other symmetric block ciphers on two generation of GPU. In this study, we targeted five 128-bit symmetric block ciphers, AES, Camellia, CIPHERUNICORN-A, Hierocrypt-3, and SC2000, from an e-government recommended ciphers list by the CRYPTography Research and Evaluation Committees (CRYPTREC) in Japan. We evaluated the performance of these five symmetric block ciphers on the machine including a 4-core CPU and each GPU using three method: (A) throughput without data transfer, (B) throughput with data transfer and overlapping encryption processing on GPU, (C) throughput with data transfer and non-overlapping encryption processing on GPU. Results demonstrate that the throughput of implementation of SC2000 in method (A) on Tesla C2050 achieved extremely high 73.4 Gbps. Additionally, the throughput obtained using methods (B) and (C) deteriorated to 33.4 Gbps and 18.3 Gbps, respectively. Method (B) showed effective throughput with an approximately 4.7 times higher speed compared to that obtained when using 8 threads on a 4-core CPU.

...read moreread less

30 citations

Journal Article•DOI•

Efficient Exhaustive Verification of the Collatz Conjecture using DSP blocks of Xilinx FPGAs

[...]

Yasuaki Ito¹, Koji Nakano¹•Institutions (1)

Hiroshima University¹

14 Jan 2011-International journal of networking and computing

TL;DR: This paper presents an efficient implementation of a coprocessor that performs the exhaustive search to verify the Collatz conjecture using a Xilinx Virtex-6 FPGA with DSP blocks, each of which contains one multiplier and one adder.

...read moreread less

Abstract: Consider the following operation on an arbitrary positive number: if the number is even, divide it by two, and if the number is odd, triple it and add one. The Collatz conjecture asserts that, starting from any positive number m, repeated iteration of the operations eventually produces the value 1. The main contribution of this paper is to present an efficient implementation of a coprocessor that performs the exhaustive search to verify the Collatz conjecture using a Xilinx Virtex-6 FPGA with DSP blocks, each of which contains one multiplier and one adder. The experimental results show that, our coprocessor can verify 4.99×10 8 64-bit numbers per second. Also, we have implemented a multi-coprocessors system that has 380 coprocessors on the FPGA. The experimental results show that our multi-coprocessor system can verify 1.64×10 11 64-bit numbers per second.

...read moreread less

23 citations

Collapse

Performance

Metrics

234

Papers

1,022

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	16
2022	20
2021	13
2020	20
2019	21
2018	20