scispace - formally typeset
Search or ask a question
Author

R. Kumaraswamy

Bio: R. Kumaraswamy is an academic researcher from Siddaganga Institute of Technology. The author has contributed to research in topics: Speaker recognition & Voice activity detection. The author has an hindex of 6, co-authored 30 publications receiving 127 citations. Previous affiliations of R. Kumaraswamy include National Aerospace Laboratories & Indian Institute of Technology Madras.

Papers
More filters
Proceedings ArticleDOI
14 Nov 2005
TL;DR: The efficacy of the performance based fusion method is demonstrated by applying it to classification of short video clips into six popular TV broadcast genre, namely cartoon, commercial, news, cricket, football, and tennis.
Abstract: In this paper, we investigate the problem of video classification into predefined genre, by combining the evidence from multiple classifiers. It is well known in the pattern recognition community that the accuracy of classification obtained by combining decisions made by independent classifiers can be substantially higher than the accuracy of the individual classifiers. The conventional method for combining individual classifiers weighs each classifier equally (sum or vote rule fusion). In this paper, we study a method that estimates the performances of the individual classifiers and combines the individual classifiers by weighing them according to their estimated performance. We demonstrate the efficacy of the performance based fusion method by applying it to classification of short video clips (20 seconds) into six popular TV broadcast genre, namely cartoon, commercial, news, cricket, football, and tennis. The individual classifiers are trained using different spatial and temporal features derived from the video sequences, and two different classifier methodologies, namely hidden Markov models (HMMs) and support vector machines (SVMs). The experiments were carried out on more than 3 hours of video data. A classification rate of 93.12% for all the six classes and 97.14% for sports category alone has been achieved, which is significantly higher than the performance of the individual classifiers.

44 citations

Journal ArticleDOI
TL;DR: A two stage separation process where the initial stage is based on empirical mode decomposition (EMD) and Hilbert transform generally known as Hilbert–Huang transform and results obtained show significant improvement in objective measures over other existing single-channel speech separation methods.
Abstract: In this paper we discuss an unsupervised approach for co-channel speech separation where two speakers are speaking simultaneously over same channel. We propose a two stage separation process where the initial stage is based on empirical mode decomposition (EMD) and Hilbert transform generally known as Hilbert---Huang transform. EMD decomposes the mixed signal into oscillatory functions known as intrinsic mode functions. Hilbert transform is applied to find the instantaneous amplitudes and Fuzzy C-Means clustering is applied to group the speakers at initial stage. In second stage of separation speaker groups are transformed into time---frequency domain using short time Fourier transform (STFT). Time---frequency ratio's are computed by dividing the STFT matrix of mixed speech signal and STFT matrix of stage1 recovered speech signals. Histogram of the ratios obtained can be used to estimate the ideal binary mask for each speaker. These masks are applied to the speech mixture and the underlying speakers are estimated. Masks are estimated from the speech mixture and helps in imputing the missing values after stage1 grouping of speakers. Results obtained show significant improvement in objective measures over other existing single-channel speech separation methods.

27 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: The speech recognition is implemented using MATLAB and the results are validated against the Hidden Markov Model Tool Kit (HTK), an open source tool for speech recognition.
Abstract: This paper gives details of the development of a speech recognition system for voice activated Ground Control Station (GCS). The speech recognition is implemented using MATLAB and the results are validated against the Hidden Markov Model Tool Kit (HTK), an open source tool for speech recognition. The menu items of Mission planner, a typical open source GCS used for flying of Micro Air Vehicles (MAV) are used for the experiments.

13 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: Collection of speech data in Kannada language for prosodically guided phonetic search engine and the issues involved in transcription are explained.
Abstract: Development and availability of spoken language corpora in regional languages is of utmost importance for a multicultural and multilingual country like India. The issues of regional bias, accent, unique style and diversity associated with each geographical region and language will have a significant effect on the performance of speech recognition/synthesis systems. In this paper, collection of speech data in Kannada language for prosodically guided phonetic search engine and the issues involved in transcription are explained. The speech corpus consists of data in three different contexts namely, read mode, conversation mode and extempore mode. A four layered transcription namely, phonetic transcription using IPA symbols, syllabification, pitch marking and break marking is done for the entire data. A baseline recognition system for Kannada language is built using HTK for the data collected in different modes and the results are presented.

13 citations

Proceedings ArticleDOI
04 Apr 2019
TL;DR: The objective of the proposed system is to detect weed from crop using machine learning algorithms using the exhaustive dataset collected for four different commercial crops and two types of weeds such as Para grass and Nutsedge.
Abstract: Weed control is essential in agricultural productivity as weeds act as a pest to crops. The conventional methods of weed removal are time-consuming and require more manual labour work. Hence there is a need to automate this process. The objective of the proposed system is to detect weed from crop using machine learning algorithms. The exhaustive dataset is collected for four different commercial crops and two types of weeds such as Para grass and Nutsedge. Excess green method and Otsu’s thresholding is used for masking the soil and extract the region of interest. The shape features of an image are extracted to provide distinguish properties between weed and crop. The classification of weed and crop has experimented with three different classifiers: Support Vector Machine, Artificial Neural Network and Convolutional Neural Network. The performance comparison of weed detection algorithms is executed on the Open CV and Keras platform using python language.

9 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Journal ArticleDOI

1,011 citations

Journal ArticleDOI
TL;DR: In this article, a review of existing deep learning-based weed detection and classification techniques is presented, which includes data acquisition, dataset preparation, DL techniques employed for detection, location and classification of weeds in crops, and evaluation metrics approaches.

128 citations

01 Dec 2007
TL;DR: In this paper, the authors propose an alternative reference frame for climate anomalies, the modulated annual cycle (MAC) that allows the annual cycle to change from year to year, for defining anomalies.
Abstract: In climate science, an anomaly is the deviation of a quantity from its annual cycle. There are many ways to define annual cycle. Traditionally, this annual cycle is taken to be an exact repeat of itself year after year. This stationary annual cycle may not reflect well the intrinsic nonlinearity of the climate system, especially under external forcing. In this paper, we re-examine the reference frame for anomalies by re-examining the annual cycle. We propose an alternative reference frame for climate anomalies, the modulated annual cycle (MAC) that allows the annual cycle to change from year to year, for defining anomalies. In order for this alternative reference frame to be useful, we need to be able to define the instantaneous annual cycle: we therefore also introduce a new method to extract the MAC from climatic data. In the presence of a MAC, modulated in both amplitude and frequency, we can then define an alternative version of an anomaly, this time with respect to the instantaneous MAC rather than a permanent and unchanging AC. Based on this alternative definition of anomalies, we re-examine some familiar physical processes: in particular SST re-emergence and ENSO phase locking to the annual cycle. We find that the re-emergence mechanism may be alternatively interpreted as an explanation of the change of the annual cycle instead of an explanation of the interannual to interdecadal persistence of SST anomalies. We also find that the ENSO phase locking can largely be attributed to the residual annual cycle (the difference of the MAC and the corresponding traditional annual cycle) contained in the traditional anomaly, and, therefore, can be alternatively interpreted as a part of the annual cycle phase locked to the annual cycle itself. In addition to the examples of reinterpretation of physics of well known climate phenomena, we also present an example of the implications of using a MAC against which to define anomalies. We show that using MAC as a reference framework for anomaly can bypass the difficulty brought by concepts such as “decadal variability of summer (or winter) climate” for understanding the low-frequency variability of the climate system. The concept of an amplitude and frequency modulated annual cycle, a method to extract it, and its implications for the interpretation of physical processes, all may contribute potentially to a more consistent and fruitful way of examining past and future climate variability and change.

113 citations

Journal ArticleDOI
TL;DR: This article proposes in this article a methodology for classifying the genre of television programmes, which reaches a classification accuracy rate of 95% and is used for training a parallel neural network system able to distinguish between seven video genres.
Abstract: Improvements in digital technology have made possible the production and distribution of huge quantities of digital multimedia data. Tools for high-level multimedia documentation are becoming indispensable to efficiently access and retrieve desired content from such data. In this context, automatic genre classification provides a simple and effective solution to describe multimedia contents in a structured and well understandable way. We propose in this article a methodology for classifying the genre of television programmes. Features are extracted from four informative sources, which include visual-perceptual information (colour, texture and motion), structural information (shot length, shot distribution, shot rhythm, shot clusters duration and saturation), cognitive information (face properties, such as number, positions and dimensions) and aural information (transcribed text, sound characteristics). These features are used for training a parallel neural network system able to distinguish between seven video genres: football, cartoons, music, weather forecast, newscast, talk show and commercials. Experiments conducted on more than 100 h of audiovisual material confirm the effectiveness of the proposed method, which reaches a classification accuracy rate of 95%.

60 citations