scispace - formally typeset
Search or ask a question
Author

Kishor Sarawadekar

Bio: Kishor Sarawadekar is an academic researcher from Indian Institute of Technology (BHU) Varanasi. The author has contributed to research in topics: Gesture & JPEG 2000. The author has an hindex of 7, co-authored 39 publications receiving 182 citations. Previous affiliations of Kishor Sarawadekar include Xilinx & Indian Institute of Technology Kharagpur.

Papers
More filters
Proceedings ArticleDOI
01 Oct 2019
TL;DR: Another warm restart technique which is inspired by cyclical learning rate and stochastic gradient descent with warm restarts is introduced and it uses “poly” LR policy which helps in faster convergence of the DNN and it has slightly higher classification accuracy.
Abstract: Learning rate (LR) is one of the most important hyper-parameters in any deep neural network (DNN) optimization process. It controls the speed of network convergence to the point of global minima by navigation through non-convex loss surface. The performance of a DNN is affected by presence of local minima, saddle points, etc. in the loss surface. Decaying the learning rate by a factor at fixed number of epochs or exponentially is the conventional way of varying the LR. Recently, two new approaches for setting learning rate have been introduced namely cyclical learning rate and stochastic gradient descent with warm restarts. In both of these approaches, the learning rate value is varied in a cyclic pattern between two boundary values. This paper introduces another warm restart technique which is inspired by these two approaches and it uses “poly” LR policy. The proposed technique is called as polynomial learning rate with warm restart and it requires only a single warm restart. The proposed LR policy helps in faster convergence of the DNN and it has slightly higher classification accuracy. The performance of the proposed LR policy is demonstrated on CIFAR-10, CIFAR-100 and tiny ImageNet dataset with CNN, ResNets and Wide Residual Networks (WRN) architectures.

57 citations

Journal ArticleDOI
TL;DR: To encode all samples in a stripe-column, concurrently a new technique named as compact context coding is devised, and high throughput is attained and hardware requirement is also cut down.
Abstract: The embedded block coding with optimized truncation (EBCOT) is a key algorithm in JPEG 2000 image compression system. Various applications, such as medical imaging, satellite imagery, digital cinema, and others, require high speed, high performance EBCOT architecture. Though efficient EBCOT architectures have been proposed, hardware requirement of these existing architectures is very high and throughput is low. To solve this problem, we investigated rate of concurrent context generation. Our paper revealed that in an image rate of four or more context pairs generation is about 68.9%. Therefore, to encode all samples in a stripe-column, concurrently a new technique named as compact context coding is devised. As a consequence, high throughput is attained and hardware requirement is also cut down. The performance of the matrix quantizer coder is improved by operating renormalization and byte out stages concurrently. The entire design of EBCOT encoder is tested on the field programmable gate array platform. The implementation results show that throughput of the proposed architecture is 163.59 MSamples/s which is equivalent to encoding 1920p (1920 × 1080, 4:2:2) high-definition TV picture sequence at 39 f/s. However, only bit plane coder (BPC) architecture operates at 315.06 MHz which implies that it is 2.86 times faster than the fastest BPC design available so far. Moreover, it is capable of encoding digital cinema size (2048 × 1080) at 42 f/s. Thus, it satisfies the requirement of applications like cartography, medical imaging, satellite imagery, and others, which demand high-speed real-time image compression system.

29 citations

Journal ArticleDOI
TL;DR: A new dactylology is proposed to achieve functionality similar to a standard keyboard and a new feature extraction technique called as reduced shape signature (RSS), which is rotation, translation and scale invariant is introduced.

25 citations

Journal ArticleDOI
TL;DR: A new transform kernel for HEVC is proposed which uses a new set of real-valued DCT coefficients which reduces the hardware cost and the processing time by reducing the complexity as well as intermediate data length.
Abstract: Integer discrete cosine transform (DCT) reduces the complexity of the transform kernel in High Efficiency Video Coding (HEVC) by eliminating the need for floating point multiplications. However, the dynamic range of integer DCT is large and therefore hardware cost is high. In this brief, a new transform kernel for HEVC is proposed which uses a new set of real-valued DCT coefficients. The proposed real-valued DCT reduces the hardware cost and the processing time by reducing the complexity as well as intermediate data length. However, it maintains coding performance similar to that of the integer DCT. Further, a hardware efficient data flow model of 2D-DCT architecture is also presented, which shows that a transpose memory of 15-bit data depth is enough to process 9-bit residual data. Field-programmable gate array implementation of the proposed 1-D DCT architecture reduces the area-delay product and power by 37.5% and 43.4%, respectively, as compared to that of the integer DCT. The proposed architecture requires 88.6K logic gates to produce a constant throughput of 32 samples per clock and it operates at 256.4 MHz on CMOS 90-nm ASIC platform.

20 citations

Journal ArticleDOI
TL;DR: Relative figure of merit is computed to compare the overall efficiency of all architectures which show that the proposed architecture provides good balance between the throughput and hardware cost.

14 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Experimental results show that the performance of the proposed method outperforms state-of-the-art recognition accuracy using leave-one-out evaluation strategy and is evaluated using public dataset of real depth images captured from various users.
Abstract: Sign language is the most natural and effective way for communications among deaf and normal people. American Sign Language (ASL) alphabet recognition (i.e. fingerspelling) using marker-less vision sensor is a challenging task due to the difficulties in hand segmentation and appearance variations among signers. Existing color-based sign language recognition systems suffer from many challenges such as complex background, hand segmentation, large inter-class and intra-class variations. In this paper, we propose a new user independent recognition system for American sign language alphabet using depth images captured from the low-cost Microsoft Kinect depth sensor. Exploiting depth information instead of color images overcomes many problems due to their robustness against illumination and background variations. Hand region can be segmented by applying a simple preprocessing algorithm over depth image. Feature learning using convolutional neural network architectures is applied instead of the classical hand-crafted feature extraction methods. Local features extracted from the segmented hand are effectively learned using a simple unsupervised Principal Component Analysis Network (PCANet) deep learning architecture. Two strategies of learning the PCANet model are proposed, namely to train a single PCANet model from samples of all users and to train a separate PCANet model for each user, respectively. The extracted features are then recognized using linear Support Vector Machine (SVM) classifier. The performance of the proposed method is evaluated using public dataset of real depth images captured from various users. Experimental results show that the performance of the proposed method outperforms state-of-the-art recognition accuracy using leave-one-out evaluation strategy.

69 citations

Posted Content
Tu Zheng, Hao Fang1, Yi Zhang1, Wenjian Tang, Zheng Yang, Haifeng Liu1, Deng Cai 
TL;DR: A novel module named REcurrent Feature-Shift Aggregator (RESA) to enrich lane feature after preliminary feature extraction with an ordinary CNN, which achieves state-of-the-art results on two popular lane detection benchmarks (CULane and Tusimple).
Abstract: Lane detection is one of the most important tasks in self-driving. Due to various complex scenarios (e.g., severe occlusion, ambiguous lanes, and etc.) and the sparse supervisory signals inherent in lane annotations, lane detection task is still challenging. Thus, it is difficult for ordinary convolutional neural network (CNN) trained in general scenes to catch subtle lane feature from raw image. In this paper, we present a novel module named REcurrent Feature-Shift Aggregator (RESA) to enrich lane feature after preliminary feature extraction with an ordinary CNN. RESA takes advantage of strong shape priors of lanes and captures spatial relationships of pixels across rows and columns. It shifts sliced feature map recurrently in vertical and horizontal directions and enables each pixel to gather global information. With the help of slice-by-slice information propagation, RESA can conjecture lanes accurately in challenging scenarios with weak appearance clues. Moreover, we also propose a Bilateral Up-Sampling Decoder which combines coarse grained feature and fine detailed feature in up-sampling stage, and it can recover low-resolution feature map into pixel-wise prediction meticulously. Our method achieves state-of-the-art results on two popular lane detection benchmarks (CULane and Tusimple). The code will be released publicly available.

69 citations

Journal ArticleDOI
TL;DR: This paper presents a modular and extensible quadrotor architecture and its specific prototyping for automatic tracking applications and is the first ever proposed hardware architecture for SBPG compression integrated with an SDC.
Abstract: Image or video exchange over the Internet of Things (IoT) is a requirement in diverse applications, including smart health care, smart structures, and smart transportations. This paper presents a modular and extensible quadrotor architecture and its specific prototyping for automatic tracking applications. The architecture is extensible and based on off-the-shelf components for easy system prototyping. A target tracking and acquisition application is presented in detail to demonstrate the power and flexibility of the proposed design. Complete design details of the platform are also presented. The designed module implements the basic proportional–integral–derivative control and a custom target acquisition algorithm. Details of the sliding-window-based algorithm are also presented. This algorithm performs $20\times $ faster than comparable approaches in OpenCV with equal accuracy. Additional modules can be integrated for more complex applications, such as search-and-rescue, automatic object tracking, and traffic congestion analysis. A hardware architecture for the newly introduced Better Portable Graphics (BPG) compression algorithm is also introduced in the framework of the extensible quadrotor architecture. Since its introduction in 1987, the Joint Photographic Experts Group (JPEG) graphics format has been the de facto choice for image compression. However, the new compression technique BPG outperforms the JPEG in terms of compression quality and size of the compressed file. The objective is to present a hardware architecture for enhanced real-time compression of the image. Finally, a prototyping platform of a hardware architecture for a secure digital camera (SDC) integrated with the secure BPG (SBPG) compression algorithm is presented. The proposed architecture is suitable for high-performance imaging in the IoT and is prototyped in Simulink. To the best of our knowledge, this is the first ever proposed hardware architecture for SBPG compression integrated with an SDC.

61 citations

Journal ArticleDOI
TL;DR: This paper presents a novel approach to Multimodality Medical Image Fusion (MMIF) used for the analysis of the lesions for the diagnostic purpose and post treatment review of NCC and shows promising and superior results when compared with the state of the art wavelet based fusion algorithms.

50 citations