scispace - formally typeset
Search or ask a question
Author

Satoshi Goto

Other affiliations: Fudan University
Bio: Satoshi Goto is an academic researcher from Waseda University. The author has contributed to research in topics: Motion estimation & Encoder. The author has an hindex of 27, co-authored 444 publications receiving 3612 citations. Previous affiliations of Satoshi Goto include Fudan University.


Papers
More filters
Proceedings ArticleDOI
07 May 2012
TL;DR: Experimental results show that proposed algorithm can speed up original HEVC intra coding by a factor of up to 73.7% and by averagely 44.91% for 4k×2k video sequences.
Abstract: The quadtree based picture partition scheme in HEVC contributes to significant coding efficiency improvement, especially for high resolution videos. But encoding complexity also increases dramatically. This paper brings forward a two-stage prediction unit size decision algorithm to speed up the original intra coding in HEVC. In the pre-stage, texture complexity of down-sampled largest coding unit (LCU) and its four sub-blocks are analyzed according to video content, to filter out unnecessary prediction units for both the LCU and its sub-blocks. Secondly, during intra coding, prediction unit sizes of encoded neighboring blocks are utilized to skip small prediction unit candidates for current block. Experimental results show that proposed algorithm can speed up original HEVC intra coding by a factor of up to 73.7% and by averagely 44.91% for 4k×2k video sequences. Meanwhile, the peak signal-to-noise ratio degradation is less than 0.04dB and bit-rate stays almost the same compared with that of original HEVC intra coding.

109 citations

Journal ArticleDOI
TL;DR: A H.264/AVC baseline-profile real-time encoder for HDTV-1080p at 30 fps is proposed in this paper and the design considerations for chief components, including high throughput integer motion estimation, data reusing fractionalmotion estimation, and hardware friendly mode reduction for intra prediction are described.
Abstract: A H.264/AVC baseline-profile real-time encoder for HDTV-1080p at 30 fps is proposed in this paper. On the basis of the specifications and algorithm optimizations, the dedicated hardware engines and one 32-bit media embedded processor (MeP) equipped with hardware extensions are mapped into the three-stage macroblock pipelining system architecture. This paper describes the design considerations for chief components, including high throughput integer motion estimation, data reusing fractional motion estimation, and hardware friendly mode reduction for intra prediction. The 11.5 Gbps 64 Mb system-in-silicon DRAM is embedded to alleviate the external memory bandwidth. Using TSMC one-poly six-metal 0.18 mum CMOS technology, the prototype chip is implemented with 1140 k logic gates and 108.3 KB internal SRAM. The SoC core occupies 27.1 mm2 die area and consumes 1.41 W at 200 MHz execution speed in typical work conditions.

82 citations

Journal ArticleDOI
TL;DR: A lossless frame recompression technique and a partial MB reordering scheme are proposed to save the DRAM access of a QFHD video decoder chip and the core energy is saved by 54% by pipelining and parallelization.
Abstract: The increased resolution of Quad Full High Definition (QFHD) offers significantly enhanced visual experience. However, the corresponding huge data throughput of up to 530 Mpixels/s greatly challenges the design of real-time video decoder VLSI with the extensive requirement on both DRAM bandwidth and computational power. In this work, a lossless frame recompression technique and a partial MB reordering scheme are proposed to save the DRAM access of a QFHD video decoder chip. Besides, pipelining and parallelization techniques such as NAL/slice-parallel entropy decoding are implemented to efficiently enhance its computational power. The chip supporting H.264/AVC high profile is fabricated in 90 nm CMOS and verified. It delivers a maximum throughput of 4096×2160@60fps, which is at least 4.3 times higher than the state-of-the-art. DRAM bandwidth requirement is reduced by typically 51%, which fits the design into a 64-bit LPDDR SDRAM interface and results in 58% DRAM power saving. Meanwhile, the core energy is saved by 54% by pipelining and parallelization.

71 citations

Journal ArticleDOI
TL;DR: In this article, a text detection method based on the combination of connected component and texture feature analysis of unknown text region contours is proposed to accurately detect text in color images possibly with a complex background.
Abstract: Text detection in color images has become an active research area in the past few decades. In this paper, we present a novel approach to accurately detect text in color images possibly with a complex background. The proposed algorithm is based on the combination of connected component and texture feature analysis of unknown text region contours. First, we utilize an elaborate color image edge detection algorithm to extract all possible text edge pixels. Connected component analysis is performed on these edge pixels to detect the external contour and possible internal contours of potential text regions. The gradient and geometrical characteristics of each region contour are carefully examined to construct candidate text regions and classify part non-text regions. Then each candidate text region is verified with texture features derived from wavelet domain. Finally, the Expectation maximization algorithm is introduced to binarize each text region to prepare data for recognition. In contrast to previous approach, our algorithm combines both the efficiency of connected component based method and robustness of texture based analysis. Experimental results show that our proposed algorithm is robust in text detection with respect to different character size, orientation, color and language and can provide reliable text binarization result.

64 citations

Journal ArticleDOI
TL;DR: This work proposes a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process, and achieves more than 52% improvement on power efficiency, relative to previous works in H.264.
Abstract: Fractional motion estimation (FME) significantly enhances video compression efficiency, but its high computational complexity also limits the real-time processing capability. In this brief, we present a VLSI implementation of FME design in High Efficiency Video Coding for ultrahigh definition video applications. We first propose a bilinear quarter pixel approximation, together with a search pattern based on it to reduce the complexity of interpolation and fractional search process. Furthermore, a data reuse strategy is exploited to reduce the hardware cost of transform. In addition, using the considered pixel parallelism and dedicated access pattern for memory, we fully pipeline the computation and achieve high hardware utilization. This design has been implemented as a 65-nm CMOS chip and verified. The measured throughput reaches 995 Mpixels/s for $7680\,\times \,4320~30$ frames/s at 188 MHz, at least 4.7 times faster than prior arts. The corresponding power dissipation is 198.6 mW, with a power efficiency of 0.2 nJ/pixel. Due to the optimization, our work achieves more than 52% improvement on power efficiency, relative to previous works in H.264.

60 citations


Cited by
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel image operator is presented that seeks to find the value of stroke width for each image pixel, and its use on the task of text detection in natural images is demonstrated.
Abstract: We present a novel image operator that seeks to find the value of stroke width for each image pixel, and demonstrate its use on the task of text detection in natural images. The suggested operator is local and data dependent, which makes it fast and robust enough to eliminate the need for multi-scale computation or scanning windows. Extensive testing shows that the suggested scheme outperforms the latest published algorithms. Its simplicity allows the algorithm to detect texts in many fonts and languages.

1,531 citations

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A system which detects texts of arbitrary orientations in natural images using a two-level classification scheme and two sets of features specially designed for capturing both the intrinsic characteristics of texts to better evaluate its algorithm and compare it with other competing algorithms.
Abstract: With the increasing popularity of practical vision systems and smart phones, text detection in natural scenes becomes a critical yet challenging task. Most existing methods have focused on detecting horizontal or near-horizontal texts. In this paper, we propose a system which detects texts of arbitrary orientations in natural images. Our algorithm is equipped with a two-level classification scheme and two sets of features specially designed for capturing both the intrinsic characteristics of texts. To better evaluate our algorithm and compare it with other competing algorithms, we generate a new dataset, which includes various texts in diverse real-world scenarios; we also propose a protocol for performance evaluation. Experiments on benchmark datasets and the proposed dataset demonstrate that our algorithm compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on texts of arbitrary orientations in complex natural scenes.

750 citations

Journal ArticleDOI
TL;DR: The purpose of this paper is to provide a complete survey of the traditional and recent approaches to background modeling for foreground detection, and categorize the different approaches in terms of the mathematical models used.

664 citations