Home
/
Authors
/
L. Venkata Subramaniam

Author

L. Venkata Subramaniam

Other affiliations: Indian Institute of Technology Delhi

Bio: L. Venkata Subramaniam is an academic researcher from IBM. The author has contributed to research in topics: Noisy text & Data cleansing. The author has an hindex of 20, co-authored 93 publications receiving 1470 citations. Previous affiliations of L. Venkata Subramaniam include Indian Institute of Technology Delhi.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2003
2001
2000
1999

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Using content and interactions for discovering communities in social networks

[...]

Mrinmaya Sachan¹, Danish Contractor¹, Tanveer A. Faruquie¹, L. Venkata Subramaniam¹•Institutions (1)

IBM¹

16 Apr 2012

TL;DR: This paper proposes generative models that can discover communities based on the discussed topics, interaction types and the social connections among people and shows that it performs better than existing community discovery models.

...read moreread less

Abstract: In recent years, social networking sites have not only enabled people to connect with each other using social links but have also allowed them to share, communicate and interact over diverse geographical regions. Social network provide a rich source of heterogeneous data which can be exploited to discover previously unknown relationships and interests among groups of people. In this paper, we address the problem of discovering topically meaningful communities from a social network. We assume that a persons' membership in a community is conditioned on its social relationship, the type of interaction and the information communicated with other members of that community. We propose generative models that can discover communities based on the discussed topics, interaction types and the social connections among people. In our models a person can belong to multiple communities and a community can participate in multiple topics. This allows us to discover both community interests and user interests based on the information and linked associations. We demonstrate the effectiveness of our model on two real word data sets and show that it performs better than existing community discovery models.

...read moreread less

183 citations

Proceedings Article•DOI•

A survey of types of text noise and techniques to handle noisy text

[...]

L. Venkata Subramaniam¹, Shourya Roy², Tanveer A. Faruquie¹, Sumit Negi¹•Institutions (2)

IBM¹, Xerox²

23 Jul 2009

TL;DR: A survey of the existing measures for noise in text is presented and application areas that ingest this noisy text for various tasks like Information Retrieval and Information Extraction are covered.

...read moreread less

Abstract: Often, in the real world noise is ubiquitous in text communications. Text produced by processing signals intended for human use are often noisy for automated computer processing. Automatic speech recognition, optical character recognition and machine translation all introduce processing noise. Also digital text produced in informal settings such as online chat, SMS, emails, message boards, newsgroups, blogs, wikis and web pages contain considerable noise. In this paper, we present a survey of the existing measures for noise in text. We also cover application areas that ingest this noisy text for various tasks like Information Retrieval and Information Extraction.

...read moreread less

88 citations

Proceedings Article•

Unsupervised cleansing of noisy text

[...]

Danish Contractor¹, Tanveer A. Faruquie¹, L. Venkata Subramaniam¹•Institutions (1)

IBM¹

23 Aug 2010

TL;DR: An unsupervised method for the translation of noisy text to clean text and a weighted list of possible clean tokens for each noisy token are obtained.

...read moreread less

Abstract: In this paper we look at the problem of cleansing noisy text using a statistical machine translation model. Noisy text is produced in informal communications such as Short Message Service (SMS), Twitter and chat. A typical Statistical Machine Translation system is trained on parallel text comprising noisy and clean sentences. In this paper we propose an unsupervised method for the translation of noisy text to clean text. Our method has two steps. For a given noisy sentence, a weighted list of possible clean tokens for each noisy token are obtained. The clean sentence is then obtained by maximizing the product of the weighted lists and the language model scores.

...read moreread less

76 citations

Proceedings Article•DOI•

SMS based Interface for FAQ Retrieval

[...]

Govind Kothari¹, Sumit Negi¹, Tanveer A. Faruquie¹, Venkatesan T. Chakaravarthy¹, L. Venkata Subramaniam¹ - Show less +1 more•Institutions (1)

IBM¹

02 Aug 2009

TL;DR: This work presents an efficient search algorithm that does not require any training data or SMS normalization and can handle semantic variations in question formulation and demonstrates the effectiveness of the approach on two reallife datasets.

...read moreread less

Abstract: Short Messaging Service (SMS) is popularly used to provide information access to people on the move. This has resulted in the growth of SMS based Question Answering (QA) services. However automatically handling SMS questions poses significant challenges due to the inherent noise in SMS questions. In this work we present an automatic FAQ-based question answering system for SMS users. We handle the noise in a SMS query by formulating the query similarity over FAQ questions as a combinatorial search problem. The search space consists of combinations of all possible dictionary variations of tokens in the noisy query. We present an efficient search algorithm that does not require any training data or SMS normalization and can handle semantic variations in question formulation. We demonstrate the effectiveness of our approach on two reallife datasets.

...read moreread less

73 citations

Proceedings Article•DOI•

Information extraction from biomedical literature: methodology, evaluation and an application

[...]

L. Venkata Subramaniam¹, Sougata Mukherjea¹, Pankaj Kankar¹, Biplav Srivastava¹, Vishal S. Batra¹, Pasumarti V. Kamesam¹, Ravi Kothari¹ - Show less +3 more•Institutions (1)

IBM¹

03 Nov 2003

TL;DR: A system called BioAnnotator for identifying and annotating biological terms in documents and a system called MedSummarizer that uses the extracted terms to identify the common concepts in a given group of genes.

...read moreread less

Abstract: Journals and conference proceedings represent the dominant mechanisms of reporting new biomedical results. The unstructured nature of such publications makes it difficult to utilize data mining or automated knowledge discovery techniques. Annotation (or markup) of these unstructured documents represents the first step in making these documents machine analyzable. In this paper we first present a system called BioAnnotator for identifying and annotating biological terms in documents. BioAnnotator uses domain based dictionary look-up for recognizing known terms and a rule engine for discovering new terms. The combination and dictionary look-up and rules result in good performance (87% precision and 94% recall on the GENIA 1.1 corpus for extracting general biological terms based on an approximate matching criterion). To demonstrate the subsequent mining and knowledge discovery activities that are made feasible by BioAnnotator, we also present a system called MedSummarizer that uses the extracted terms to identify the common concepts in a given group of genes.

...read moreread less

54 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Machine learning

[...]

Thomas G. Dietterich¹•Institutions (1)

Oregon State University¹

01 Dec 1996-ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

...read moreread less

13,246 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

What is Twitter

[...]

Rizal Setya Perdana

01 Jan 2013

1,098 citations

Book•

Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection

[...]

Peter Christen

05 Jul 2012

TL;DR: Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database as mentioned in this paper.

...read moreread less

Abstract: Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases. Peter Christens book is divided into three parts: Part I, Overview, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, Steps of the Data Matching Process, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, Further Topics, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today. By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.

...read moreread less

713 citations

Journal Article•

The Health Insurance Portability and Accountability Act.

[...]

A Meyer

01 Mar 1997-Tennessee medicine : journal of the Tennessee Medical Association

TL;DR: The Health Insurance Portability and Accountability Act, also known as HIPAA, was designed to protect health insurance coverage for workers and their families while between jobs and establishes standards for electronic health care transactions.

...read moreread less

Abstract: The Health Insurance Portability and Accountability Act, also known as HIPAA, was first delivered to congress in 1996 and consisted of just two Titles. It was designed to protect health insurance coverage for workers and their families while between jobs. It establishes standards for electronic health care transactions and addresses the issues of privacy and security when dealing with Protected Health Information (PHI). HIPAA is applicable only in the United States of America.

...read moreread less

561 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse