scispace - formally typeset
Search or ask a question

Showing papers by "International Institute of Information Technology, Hyderabad published in 2011"


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This approach proposes to use the template-based model to detect a distinctive part for the class, followed by detecting the rest of the object via segmentation on image specific information learnt from that part, and achieves accuracy comparable to the state-of-the-art on the PASCAL VOC competition, which includes other models such as bag- of-words.
Abstract: Template-based object detectors such as the deformable parts model of Felzenszwalb et al [11] achieve state-of-the-art performance for a variety of object categories, but are still outperformed by simpler bag-of-words models for highly flexible objects such as cats and dogs In these cases we propose to use the template-based model to detect a distinctive part for the class, followed by detecting the rest of the object via segmentation on image specific information learnt from that part This approach is motivated by two ob- servations: (i) many object classes contain distinctive parts that can be detected very reliably by template-based detec- tors, whilst the entire object cannot; (ii) many classes (eg animals) have fairly homogeneous coloring and texture that can be used to segment the object once a sample is provided in an image We show quantitatively that our method substantially outperforms whole-body template-based detectors for these highly deformable object categories, and indeed achieves accuracy comparable to the state-of-the-art on the PASCAL VOC competition, which includes other models such as bag-of-words

148 citations


Proceedings Article
24 Jun 2011
TL;DR: An automatic question generation system that can generate gap-fill questions for content in a document by first blanking keys from the sentences and then determining the distractors for these keys.
Abstract: In this paper, we present an automatic question generation system that can generate gap-fill questions for content in a document. Gap-fill questions are fill-in-the-blank questions with multiple choices (one correct answer and three distractors) provided. The system finds the informative sentences from the document and generates gap-fill questions from them by first blanking keys from the sentences and then determining the distractors for these keys. Syntactic and lexical features are used in this process without relying on any external resource apart from the information in the document. We evaluated our system on two chapters of a standard biology textbook and presented the results.

113 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This paper presents a realtime, incremental multibody visual SLAM system that allows choosing between full 3D reconstruction or simply tracking of the moving objects, and enables building of a unified dynamic 3D map of scenes involving multiple moving objects.
Abstract: This paper presents a realtime, incremental multibody visual SLAM system that allows choosing between full 3D reconstruction or simply tracking of the moving objects. Motion reconstruction of dynamic points or objects from a monocular camera is considered very hard due to well known problems of observability. We attempt to solve the problem with a Bearing only Tracking (BOT) and by integrating multiple cues to avoid observability issues. The BOT is accomplished through a particle filter, and by integrating multiple cues from the reconstruction pipeline. With the help of these cues, many real world scenarios which are considered unobservable with a monocular camera is solved to reasonable accuracy. This enables building of a unified dynamic 3D map of scenes involving multiple moving objects. Tracking and reconstruction is preceded by motion segmentation and detection which makes use of efficient geometric constraints to avoid difficult degenerate motions, where objects move in the epipolar plane. Results reported on multiple challenging real world image sequences verify the efficacy of the proposed framework.

94 citations


Journal ArticleDOI
08 Aug 2011-Small
TL;DR: A water-based, non-'seed-mediated', straightforward method for the synthesis of gold nanoparticles with well-developed surface spikes is reported here, and the yield of multispiked gold particles is very high.
Abstract: Multispiked gold nanoparticles are required in large quantities for many fundamental studies and applications like (bio)sensing, but their preparation in high yield by the bottom-up chemical synthetic method is challenging. A water-based, non-‘seed-mediated’, straightforward method for the synthesis of gold nanoparticles with well-developed surface spikes is reported here. The yield of multispiked gold particles is very high (>90%). The method allows the tuning of the number and size of the spikes and the overall size of the particles, and hence the localized surface plasmon resonances of the particles over the broad spectral range in the visible and near-infrared. A mechanism for the evolution of twinned, sharp-tipped surface protrusions has been proposed based on systematic spectrophotometric and transmission electron microscopic studies, which were employed to elucidate the morphological features, structure, chemical composition, and optical properties of the multispiked gold nanoparticles.

87 citations


Proceedings ArticleDOI
21 Mar 2011
TL;DR: An efficient CFP-growth algorithm is proposed by proposing new pruning techniques to reduce the search space and experimental results show that the proposed pruned patterns are effective.
Abstract: Frequent patterns are an important class of regularities that exist in a transaction database. Certain frequent patterns with low minimum support (minsup) value can provide useful information in many real-world applications. However, extraction of these frequent patterns with single minsup-based frequent pattern mining algorithms such as Apriori and FP-growth leads to "rare item problem." That is, at high minsup value, the frequent patterns with low minsup are missed, and at low minsup value, the number of frequent patterns explodes. In the literature, "multiple minsups framework" was proposed to discover frequent patterns. Furthermore, frequent pattern mining techniques such as Multiple Support Apriori and Conditional Frequent Pattern-growth (CFP-growth) algorithms have been proposed. As the frequent patterns mined with this framework do not satisfy downward closure property, the algorithms follow different types of pruning techniques to reduce the search space. In this paper, we propose an efficient CFP-growth algorithm by proposing new pruning techniques. Experimental results show that the proposed pruning techniques are effective.

68 citations


Proceedings Article
24 Jun 2011
TL;DR: A system that automatically generates questions from natural language text using discourse connectives that looks at the problem beyond sentence level and divides the QG task into content selection and question formation.
Abstract: In this paper, we present a system that automatically generates questions from natural language text using discourse connectives. We explore the usefulness of the discourse connectives for Question Generation (QG) that looks at the problem beyond sentence level. Our work divides the QG task into content selection and question formation. Content selection consists of finding the relevant part in text to frame question from while question formation involves sense disambiguation of the discourse connectives, identification of question type and applying syntactic transformations on the content. The system is evaluated manually for syntactic and semantic correctness.

66 citations


Journal ArticleDOI
TL;DR: In this article, the authors suggest a prepaid energy meter behaving like a prepaid mobile phone, where the prepaid card communicates with the power utility using mobile communication infrastructure. But, the proposed prepaid meter is implemented in a software model and Matlab has been used for simulation.
Abstract: Energy meters in India have dominantly been electromechanical in nature but are gradually being replaced by more sophisticated and accurate digital and electronic meters. A high percentage of electricity revenue is lost to power theft, incorrect meter reading and billing, and reluctance of consumers towards paying electricity bills on time. Considerable amount of revenue losses can be reduced by using Prepaid Energy Meters. A prepaid energy meter enables power utilities to collect energy bills from the consumers prior to the usage of power by delivering only as much as what has been paid for. This paper suggests a prepaid energy meter behaving like a prepaid mobile phone. The meter contains a prepaid card analogous to mobile SIM card. The prepaid card communicates with the power utility using mobile communication infrastructure. Once the prepaid card is out of balance, the consumer load is disconnected from the utility supply by the contactor. The power utility can recharge the prepaid card remotely through mobile communication based on customer requests. A prior billing is bound to do away with the problems of unpaid bills and human error in meter readings, thereby ensuring justified revenue for the utility. The proposed prepaid meter is implemented in a software model and Matlab has been used for simulation.

62 citations


Book ChapterDOI
24 May 2011
TL;DR: An efficient model based on the notion of "multiple constraints" is proposed and results show that the proposed model can be efficiently discovered and generates some uninteresting patterns as periodic-frequent patterns.
Abstract: Recently, temporal occurrences of the frequent patterns in a transactional database has been exploited as an interestingness criterion to discover a class of user-interest-based frequent patterns, called periodic-frequent patterns. Informally, a frequent pattern is said to be periodic-frequent if it occurs at regular intervals specified by the user throughout the database. The basic model of periodic-frequent patterns is based on the notion of "single constraints." The use of this model to mine periodic-frequent patterns containing both frequent and rare items leads to a dilemma called the "rare item problem." To confront the problem, an alternative model based on the notion of "multiple constraints" has been proposed in the literature. The periodic-frequent patterns discovered with this model do not satisfy downward closure property. As a result, it is computationally expensive to mine periodic-frequent patterns with the model. Furthermore, it has been observed that this model still generates some uninteresting patterns as periodic-frequent patterns. With this motivation, we propose an efficient model based on the notion of "multiple constraints." The periodic-frequent patterns discovered with this model satisfy downward closure property. Hence, periodic-frequent patterns can be efficiently discovered. A pattern-growth algorithm has also been discussed for the proposed model. Experimental results show that the proposed model is effective.

55 citations


Proceedings ArticleDOI
13 Sep 2011
TL;DR: This work devises techniques to understand the nature of the sparse matrix and then choose appropriate data structures accordingly, and is able to improve the performance of the spmv kernel on an Nvidia Tesla GPU by a factor of up to80% in some instances, and about 25% on average compared to the best results of Bell and Garland on the standard dataset.
Abstract: Multiplying a sparse matrix with a vector (spmv for short) is a fundamental operation in many linear algebra kernels. Having an efficient spmv kernel on modern architectures such as the GPUs is therefore of principal interest. The computational challenges that spmv poses are significantlydifferent compared to that of the dense linear algebra kernels. Recent work in this direction has focused on designing data structures to represent sparse matrices so as to improve theefficiency of spmv kernels. However, as the nature of sparseness differs across sparse matrices, there is no clear answer as to which data structure to use given a sparse matrix. In this work, we address this problem by devising techniques to understand the nature of the sparse matrix and then choose appropriate data structures accordingly. By using our technique, we are able to improve the performance of the spmv kernel on an Nvidia Tesla GPU (C1060) by a factor of up to80% in some instances, and about 25% on average compared to the best results of Bell and Garland [3] on the standard dataset (cf. Williams et al. SC'07) used in recent literature. We also use our spmv in the conjugate gradient method and show an average 20% improvement compared to using HYB spmv of [3], on the dataset obtained from the The University of Florida Sparse Matrix Collection [9].

52 citations


Proceedings ArticleDOI
25 Mar 2011
TL;DR: The experimental results show that the legal-term cosine similarity method performs better than all- term cosine Similarity method and bibliographic coupling-based similarity method improves the performance over co-citation approach.
Abstract: In this paper, we have made an effort to propose approaches to find similar legal judgements by extending the popular techniques used in information retrieval and search engines. Legal judgements are complex in nature and refer other judgements. We have analyzed all-term, legal-term, co-citation and bibliographic coupling-based similarity methods to find similar judgements. The experimental results show that the legal-term cosine similarity method performs better than all-term cosine similarity method. Also, the results show that bibliographic coupling similarity method improves the performance over co-citation approach.

48 citations


Proceedings Article
01 Jan 2011
TL;DR: The authors' results evidence that the tunings in Carnatic and Hindustani music differ, the former tending to a just intonation system and the latter having much equal-tempered influences.
Abstract: The issue of tuning in Indian classical music has been, historically, a matter of theoretical debate. In this paper, we study its contemporary practice in sung performances of Carnatic and Hindustani music following an empiric and quantitative approach. To do so, we select stable fundamental frequencies, estimated via a standard algorithm, and construct interval histograms from a pool of recordings. We then compare such histograms against the ones obtained for different music sources and against the theoretical values derived from 12-note just intonation and equal temperament. Our results evidence that the tunings in Carnatic and Hindustani music differ, the former tending to a just intonation system and the latter having much equal-tempered influences. Carnatic music also presents signs of a more continuous distribution of pitches. Further subdivisions of the octave are partially investigated, finding no strong evidence of them.

Proceedings Article
01 Jan 2011
TL;DR: Pitch, duration and strength modification factors for emotion conversion are derived using the syllable-like units of initial, middle and final regions from an emotion speech database having different speakers, texts and emotions.
Abstract: This work uses instantaneous pitch and strength of excitation along with duration of syllable-like units as the parameters for emotion conversion. Instantaneous pitch and duration of the syllable-like units of the neutral speech are modified by the prosody modification of its linear prediction (LP) residual using the instants of significant excitation. The strength of excitation is modified by scaling the Hilbert envelope (HE) of the LP residual. The target emotion speech is then synthesized using the prosody and strength modified LP residual. The pitch, duration and strength modification factors for emotion conversion are derived using the syllable-like units of initial, middle and final regions from an emotion speech database having different speakers, texts and emotions. The effectiveness of the region wise modification of source and supra segmental features over the gross level modification is confirmed by the waveforms, spectrograms and subjective evaluations. Index Terms: Emotions, ZFF, strength of excitation, instantaneous pitch, duration

Proceedings ArticleDOI
03 Oct 2011
TL;DR: This paper provides new variations to incremental growing neural gas algorithm exploiting in an incremental way knowledge from clusters about their current labeling along with cluster distance measure data that leads to significant gain in performance for all types of datasets, especially for the clustering of complex heterogeneous textual data.
Abstract: Neural clustering algorithms show high performance in the general context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental growing neural gas algorithm (IGNG) and the labeling maximization based incremental growing neural gas algorithm (IGNG-F). In this paper we highlight that there is a drastic decrease of performance of these algorithms, as well as the one of more classical algorithms, when a heterogeneous textual dataset is considered as an input. Specific quality measures and cluster labeling techniques that are independent of the clustering method are used for the precise performance evaluation. We provide new variations to incremental growing neural gas algorithm exploiting in an incremental way knowledge from clusters about their current labeling along with cluster distance measure data. This solution leads to significant gain in performance for all types of datasets, especially for the clustering of complex heterogeneous textual data.

01 Jan 2011
TL;DR: This work investigates the properties of a raaga and the natural process by which people identify the raaga, and surveys the past raaga recognition techniques correlating them with human techniques, in both north Indian (Hindustani) and south Indian (Carnatic) music systems.
Abstract: Raaga is the spine of Indian classical music. It is the single most crucial element of the melodic framework on which the music of the subcontinent thrives. Naturally, automatic raaga recognition is an important step in computational musicology as far as Indian music is considered. It has several applications like indexing Indian music, automatic note transcription, comparing, classifying and recommending tunes, and teaching to mention a few. Simply put, it is the first logical step in the process of creating computational methods for Indian classical music. In this work, we investigate the properties of a raaga and the natural process by which people identify the raaga. We survey the past raaga recognition techniques correlating them with human techniques, in both north Indian (Hindustani) and south Indian (Carnatic) music systems. We identify the main drawbacks and propose minor, but multiple improvements to the state-of-the-art raaga recognition technique.

Journal ArticleDOI
TL;DR: A set of axioms of concords in preference orderings and a new class of concordance measures that outperform classic measures like Kendall's @t and W and Spearman's @r in sensitivity and apply to large sets of orderings instead of just to pairs of ordering are proposed.

Journal ArticleDOI
01 Apr 2011
TL;DR: In this paper, the authors proposed Doubly Cognitive WSN, which works by progressively allocating the sensing resources only to the most promising areas of the spectrum and is based on pattern analysis and learning.
Abstract: Scarcity of spectrum is increasing not only in cellular communication but also in wireless sensor networks. Adding cognition to the existing wireless sensor network WSN infrastructure has helped. As sensor nodes in WSN are limited with constraints like power, efforts are required to increase the lifetime and other performance measures of the network. In this article, the authors propose Doubly Cognitive WSN, which works by progressively allocating the sensing resources only to the most promising areas of the spectrum and is based on pattern analysis and learning. As the load of sensing resource is reduced significantly, this approach saves the energy of the nodes and reduces the sensing time dramatically. The proposed method can be enhanced by periodic pattern analysis to review the strategy of sensing. Finally the ongoing research work and contribution on cognitive wireless sensor networks in Communication Research Centre IIIT-H is discussed.

Journal ArticleDOI
TL;DR: This work proposes a simple and statistical methodology called review summary (RSUMM) and uses it in combination with well-known feature selection methods to extract subjectivity and the experimental results prove the effectiveness of the proposed methodology.
Abstract: With the growth of social media, document sentiment classification has become an active area of research in this decade. It can be viewed as a special case of topical classification applied only to subjective portions of a document (sources of sentiment). Hence, the key task in document sentiment classification is extracting subjectivity. Existing approaches to extract subjectivity rely heavily on linguistic resources such as sentiment lexicons and complex supervised patterns based on part-of-speech (POS) information. This makes the task of subjective feature extraction complex and resource dependent. In this work, we try to minimize the dependency on linguistic resources in sentiment classification. We propose a simple and statistical methodology called review summary (RSUMM) and use it in combination with well-known feature selection methods to extract subjectivity. Our experimental results on a movie review dataset prove the effectiveness of the proposed methodology.

Proceedings ArticleDOI
28 Jun 2011
TL;DR: An efficient method to rank the research papers from various fields of research published in various conferences over the years using a modified version of the PageRank algorithm, which takes into account the time factor to reduce the bias against the recent papers.
Abstract: In this paper we propose an efficient method to rank the research papers from various fields of research published in various conferences over the years. This ranking method is based on citation network. The importance of a research paper is captured well by the peer vote, which in this case is the research paper being cited in other research papers. Using a modified version of the PageRank algorithm, we rank the research papers, assigning each of them an authoritative score. Using the scores of the research papers calculated by above mentioned method, we formulate scores for conferences and authors and rank them as well. We have introduced a new metric in the algorithm which takes into account the time factor in ranking the research papers to reduce the bias against the recent papers which get less time for being studied and consequently cited by the researchers as compared to the older papers. Often a researcher is more interested in finding the top conferences in a particular year rather than the overall conference ranking. Considering the year of publication of the papers, in addition to the paper scores we also calculated the year-wise score of each conference by slight improvisation of the above mentioned algorithm.

Proceedings ArticleDOI
17 Sep 2011
TL;DR: This paper attempts to adapt a state-of-the-art English POS tagger, which is trained on the Wall-Street-Journal corpus, to noisy text, and demonstrates the working of the proposed models on a Short Message Service (SMS) dataset which achieve a significant improvement over the baseline accuracy.
Abstract: With the increase in the number of people communicating through internet, there has been a steady increase in the amount of text available online. Most such text is different from the standard language, as people try to use various kinds of short forms for words to save time and effort. We call that noisy text. Part-Of-Speech (POS) tagging has reached high levels of accuracy enabling the use of automatic POS tags in various language processing tasks, however, tagging performance on noisy text degrades very fast. This paper is an attempt to adapt a state-of-the-art English POS tagger, which is trained on the Wall-Street-Journal (WSJ) corpus, to noisy text. We classify the noise in text into different types and evaluate the tagger with respect to each type of noise. The problem of tagging noisy text is attacked in two ways; a) Trying to overcome noise as a post processing step to the tagging b) Cleaning the noise and then doing tagging. We propose techniques to solve the problem in both the ways and critically compare them based on the error analysis. We demonstrate the working of the proposed models on a Short Message Service (SMS) dataset which achieve a significant improvement over the baseline accuracy of tagging noisy words by a state-of-the-art English POS tagger.

23 Jun 2011
TL;DR: A novel unsupervised approach to the problem of multi-document summarization of scientific articles, in which the document collection is a list of papers cited together within the same source article, otherwise known as a co-citation, is presented.
Abstract: We present a novel unsupervised approach to the problem of multi-document summarization of scientific articles, in which the document collection is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each co-cited article and relevance ranking using a query generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. We present a system called SciSumm that embodies this approach and apply it to the 2008 ACL Anthology. We evaluate this summarization system for relevant content selection using gold standard summaries prepared on principle based guidelines. Evaluation with gold standard summaries demonstrates that our system performs better in content selection than an existing summarization system (MEAD). We present a detailed summary of our findings and discuss possible directions for future research.

Proceedings Article
01 Nov 2011
TL;DR: A novel supervised domain independent model for product attribute extraction from user reviews is proposed for user generated content where conventional language grammar dependent tools like parts-of-speech taggers, named entity recognizers, parsers do not perform at expected levels.
Abstract: The world of E-commerce is expanding, posing a large arena of products, their descriptions, customer and professional reviews that are pertinent to them. Most of the product attribute extraction techniques in literature work on structured descriptions using several text analysis tools. However, attributes in these descriptions are limited compared to those in customer reviews of a product, where users discuss deeper and more specific attributes. In this paper, we propose a novel supervised domain independent model for product attribute extraction from user reviews. The user generated content contains unstructured and semi-structured text where conventional language grammar dependent tools like parts-of-speech taggers, named entity recognizers, parsers do not perform at expected levels. We used Wikipedia and Web to identify product attributes from customer reviews and achieved F1score of 0.73.

Proceedings ArticleDOI
18 Dec 2011
TL;DR: This work uses a new model of multicore computing where the computation is performed simultaneously a control device, such as a CPU, and an acceleratorsuch as a GPU to address the issues related to the design of hybrid solutions.
Abstract: The advent of multicore and many-core architectures saw them being deployed to speed-up computations across several disciplines and application areas. Prominent examples include semi-numerical algorithms such as sorting, graph algorithms, image processing, scientific computations, and the like. In particular, using GPUs for general purpose computations has attracted a lot of attention given that GPUs can deliver more than one TFLOP of computing power at very low prices. In this work, we use a new model of multicore computing called hybrid multicore computing where the computation is performed simultaneously a control device, such as a CPU, and an accelerator such as a GPU. To this end, we use two case studies to explore the algorithmic and analytical issues in hybrid multicore computing. Our case studies involve two different ways of designing hybrid multicore algorithms. The main contribution of this paper is to address the issues related to the design of hybrid solutions. We show our hybrid algorithm for list ranking is faster by 50% compared to the best known implementation [Z. Wei, J. JaJa; IPDPS 2010]. Similarly, our hybrid algorithm for graph connected components is faster by 25% compared to the best known GPU implementation [26].

Proceedings ArticleDOI
17 Sep 2011
TL;DR: The project is an attempt to implement an integrated platform for OCR of different Indian languages and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.
Abstract: This paper presents integration and testing scheme for managing a large Multilingual OCR Project. The project is an attempt to implement an integrated platform for OCR of different Indian languages. Software engineering, workflow management and testing processes have been discussed in this paper. The OCR has now been experimentally deployed for some specific applications and currently is being enhanced for handling the space and time constraints, achieving higher recognition accuracies and adding new functionalities.

Journal ArticleDOI
TL;DR: This paper proposes a framework for agent simulation environment built on Hadoop cloud, and shows how Agents are represented, how agents do their computation and communication, and how agents are mapped to datanodes.

Proceedings ArticleDOI
28 Mar 2011
TL;DR: This paper presents an Associative Classification (AC) approach to LAM for tail campaigns, where pairs of features are used to derive rules to build a Rule-based Associative Classifier, with the rules being sorted by frequency-weighted log-likelihood ratio (F-LLR).
Abstract: Online advertising offers significantly finer granularity, which has been leveraged in state-of-the-art targeting methods, like Behavioral Targeting (BT). Such methods have been further complemented by recent work in Look-alike Modeling (LAM) which helps in creating models which are customized according to each advertiser's requirements and each campaign's characteristics, and which show ads to users who are most likely to convert on them, not just click them. In Look-alike Modeling given data about converters and nonconverters, obtained from advertisers, we would like to train models automatically for each ad campaign. Such custom models would help target more users who are similar to the set of converters the advertiser provides. The advertisers get more freedom to define their preferred sets of users which should be used as a basis to build custom targeting models. In behavioral data, the number of conversions (positive class) per campaign is very small (conversions per impression for the advertisers in our data set are much less than 10-4), giving rise to a highly skewed training dataset, which has most records pertaining to the negative class. Campaigns with very few conversions are called as tail campaigns, and those with many conversions are called head campaigns. Creation of Look-alike Models for tail campaigns is very challenging and tricky using popular classifiers like Linear SVM and GBDT, because of the very few number of positive class examples such campaigns contain. In this paper, we present an Associative Classification (AC) approach to LAM for tail campaigns. Pairs of features are used to derive rules to build a Rule-based Associative Classifier, with the rules being sorted by frequency-weighted log-likelihood ratio (F-LLR). The top k rules, sorted by F-LLR, are then applied to any test record to score it. Individual features can also form rules by themselves, though the number of such rules in the top k rules and the whole rule-set is very small. Our algorithm is based on Hadoop, and is thus very efficient in terms of speed.

Proceedings ArticleDOI
28 Mar 2011
TL;DR: These results support the hypothesis of using Wikipedia to evaluate the parallel coefficient between sentences that can be used to build bilingual dictionaries, and support the proposed classification based approach.
Abstract: This paper details a novel classification based approach to identify parallel sentences between two languages in a language independent way. We substitute the required language specific resources by the richly structured multilingual content, Wikipedia. Our approach is particularly useful to extract parallel sentences for under-resourced languages like most Indian and African languages, where resources are not readily available with necessary accuracies. We extract various statistics based on the cross lingual links present in Wikipedia and use them to generate feature vectors for each sentence pair. Binary classification of each pair of sentences into parallel or non-parallel has been done using these feature vectors. We achieved a precision upto 78% which is encouraging when compared to other state-of-art approaches.These results support our hypothesis of using Wikipedia to evaluate the parallel coefficient between sentences that can be used to build bilingual dictionaries.

Proceedings ArticleDOI
07 Jul 2011
TL;DR: The focus of this study is to develop a conversational system which is adaptable to the users over a period of time, in the sense that fewer interactions with the system to get the required information.
Abstract: We demonstrate a speech based conversation system under development for information access by farmers in rural and semi-urban areas of India. The challenges are that the system should take care of the significant variations in the pronunciation and also the highly natural and hence unstructured dialog in the usage of the system. The focus of this study is to develop a conversational system which is adaptable to the users over a period of time, in the sense that fewer interactions with the system to get the required information. Some other novel features of the system include multiple decoding schemes and accountability of the wide variations in dialog, pronunciation and environment. A video demonstrating the Mandi information system is available at http://speech.iiit.ac.in/index.php/demos.html

Book ChapterDOI
22 Apr 2011
TL;DR: This paper proposes an improved approach by introducing a new interestingness measure to discover periodic-frequent patterns that occur almost periodically in the database and shows that the proposed model is effective.
Abstract: Periodic-frequent patterns are a class of user-interest-based frequent patterns that exist in a transactional database. A frequent pattern can be said periodic-frequent if it appears at a regular user-specified interval in a database. In the literature, an approach has been proposed to extract periodic-frequent patterns that occur periodically throughout the database. However, it is generally difficult for a frequent pattern to appear periodically throughout the database without any interruption in many real-world applications. In this paper, we propose an improved approach by introducing a new interestingness measure to discover periodic-frequent patterns that occur almost periodically in the database. A pattern-growth algorithm has been proposed to discover the complete set of periodic-frequent patterns. Experimental results show that the proposed model is effective.

Journal ArticleDOI
TL;DR: In this article, a temperature and soil moisture sensor that can be placed on suitable locations on field for monitoring of temperature and moisture of soil, the two parameters to which the crops are susceptive.
Abstract: The recent trends in developing low cost techniques to support cost effective agriculture in developing countries with large population has motivated the development of low cost sensing systems to provide for low cost irrigation facilities and also to provide for conservation of water at the same time. The current paper highlights the development of temperature and soil moisture sensor that can be placed on suitable locations on field for monitoring of temperature and moisture of soil, the two parameters to which the crops are susceptive. The sensing system is based on a feedback control mechanism with a centralized control unit which regulates the flow of water on to the field in the real time based on the instantaneous temperature and moisture values. Depending on the varied requirement of different crops, a lookup table has been prepared and referred to for ascertaining the amount of water needed by that crop. The system based on a microcontroller has been applied on rice and maize fields spanning over area of 1 acres each for 3 weeks and more than 94% of the INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS VOL. 4, NO. 1, MARCH 2011

Journal ArticleDOI
TL;DR: The interaction of chitooligosaccharides with pumpkin phloem exudate lectin was investigated by isothermal titration calorimetry and computational methods, suggesting that hydrogen bonds and van der Waals' interactions are the main factors that stabilize PPL-saccharide association.
Abstract: The interaction of chitooligosaccharides [(GlcNAc) 2−6 ] with pumpkin phloem exudate lectin (PPL) was investigated by isothermal titration calorimetry and computational methods. The dimeric PPL binds to (GlcNAc) 3−5 with binding constants of 1.26−1.53 × 10 5 M −1 at 25 °C, whereas chitobiose exhibits approximately 66-fold lower affinity. Interestingly, chitohexaose shows nearly 40-fold higher affinity than chitopentaose with a binding constant of 6.16 × 10 6 M −1 . The binding stoichiometry decreases with an increase in the oligosaccharide size from 2.26 for chitobiose to 1.08 for chitohexaose. The binding reaction was essentially enthalpy driven with negative entropic contribution, suggesting that hydrogen bonds and van der Waals’ interactions are the main factors that stabilize PPL−saccharide association. The three-dimensional structure of PPL was predicted by homology modeling, and binding of chitooligosaccharides was investigated by molecular docking and molecular dynamics simulations, which showed that the protein binding pocket can accommodate up to three GlcNAc residues, whereas additional residues in chitotetraose and chitopentaose did not exhibit any interactions with the binding pocket. Docking studies with chitohexaose indicated that the two triose units of the molecule could interact with different protein binding sites, suggesting formation of higher order complexes by the higher oligomers of GlcNAc by their simultaneous interaction with two protein molecules.