Home
/
Authors
/
Sameep Mehta

Author

Sameep Mehta

Other affiliations: Lady Hardinge Medical College, All India Institute of Medical Sciences, Ohio State University

Bio: Sameep Mehta is an academic researcher from IBM. The author has contributed to research in topics: Service (business) & Resource (project management). The author has an hindex of 22, co-authored 160 publications receiving 2093 citations. Previous affiliations of Sameep Mehta include Lady Hardinge Medical College & All India Institute of Medical Sciences.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002

Papers

PDF

Open Access

More filters

Proceedings Article•

Mining spatial object associations for scientific data

[...]

Hui Yang¹, Srinivasan Parthasarathy¹, Sameep Mehta¹•Institutions (1)

Ohio State University¹

30 Jul 2005

TL;DR: This work has developed algorithms to discover two types of spatial association patterns in scientific data that are modeled as geometric objects rather than points and define multiple distance metrics that take into account objects' extent.

...read moreread less

Abstract: In this paper, we present efficient algorithms to discover spatial associations among features extracted from scientific datasets. In contrast to previous work in this area, features are modeled as geometric objects rather than points. We define multiple distance metrics that take into account objects' extent. We have developed algorithms to discover two types of spatial association patterns in scientific data. We present experimental results to demonstrate the efficacy of our approach on real datasets drawn from the bioinformatic domain. We also highlight the importance of the discovered patterns by integrating the underlying domain knowledge.

...read moreread less

9 citations

Proceedings Article•DOI•

Correlation preserving discretization

[...]

Sameep Mehta¹, Srinivasan Parthasarathy¹, Hui Yang¹•Institutions (1)

Ohio State University¹

01 Nov 2004

TL;DR: A novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate datasets is presented, which leverages the underlying correlation structure in the dataset to obtain the discrete intervals, and ensures that the inherent correlations are preserved.

...read moreread less

Abstract: Discretization is a crucial preprocessing primitive for a variety of data warehousing and mining tasks. In this article we present a novel PCA-based unsupervised algorithm for the discretization of continuous attributes in multivariate datasets. The algorithm leverages the underlying correlation structure in the dataset to obtain the discrete intervals, and ensures that the inherent correlations are preserved. The approach also extends easily to datasets containing missing values. We demonstrate the efficacy of the approach on real datasets and as a preprocessing step for both classification and frequent item set mining tasks. We also show that the intervals are meaningful and can uncover hidden patterns in data.

...read moreread less

9 citations

Proceedings Article•DOI•

Theme Based Clustering of Tweets

[...]

Rudra M. Tripathy¹, Shashank Sharma², Sachindra Joshi³, Sameep Mehta³, Amitabha Bagchi² - Show less +1 more•Institutions (3)

Silicon Institute of Technology¹, Indian Institute of Technology Delhi², IBM³

21 Mar 2014

TL;DR: This paper proposes to use Wikipedia topic taxonomy to discover the themes from the tweets and use the themes along with traditional word based similarity metric for clustering tweets.

...read moreread less

Abstract: In this paper, we present overview of our approach for clustering tweets. Due to short text of tweets, traditional text clustering mechanisms alone may not produce optimal results. We believe that there is an underlying theme/topic present in majority of tweets which is evident in growing usage of hashtag feature in the Twitter network. Clustering tweets based on these themes seems a more natural way for grouping. We propose to use Wikipedia topic taxonomy to discover the themes from the tweets and use the themes along with traditional word based similarity metric for clustering. We show some of our initial results to demonstrate the effectiveness of our approach.

...read moreread less

8 citations

Patent•

Optimizing Cloud Service Delivery within a Cloud Computing Environment

[...]

Kalapriya Kannan¹, Sameep Mehta¹•Institutions (1)

IBM¹

24 Dec 2014

TL;DR: In this paper, a cloud service request (CSR) is received from a cloud customer in the cloud computing environment, the CSR comprising at least one parameter of one or more existing cloud services accessed by the cloud customer that are provided by one or multiple existing cloud service providers.

...read moreread less

Abstract: Embodiments of the invention provide systems, methods and computer program products for optimizing cloud service delivery within a cloud computing environment. A cloud service request (CSR) is received from a cloud customer in the cloud computing environment, the CSR comprising at least one parameter of one or more existing cloud services accessed by the cloud customer that are provided by one or more existing cloud service providers. At least one parameter of the CSR is monitored in a cloud service registry comprising a plurality of cloud services provided by a plurality of cloud service providers and one or more parameters corresponding to each cloud service of the plurality of cloud services. Based on the monitoring, a new cloud service provider is determined who may provide a better cloud service with respect to the at least one parameter in the CSR being monitored.

...read moreread less

8 citations

Patent•

Assessing Value of One or More Data Sets in the Context of a Set of Applications

[...]

Rema Ananthanarayanan¹, Kalapriya Kannan¹, Sameep Mehta¹•Institutions (1)

IBM¹

11 Apr 2016

TL;DR: In this article, a computer-implemented method includes selecting analytic applications of interest based on a characterization of data attributes of each of the available data sets; automatically determining an impact of each attribute on an analytic application of interest; automatically computing an amount of improvement to the end value of each analytic application based on inclusion of an additional data set; and automatically determining a value attributed to the additional attribute based on the comparison of the cost of adding the extra attribute to the available attributes to the computed amount of improvements.

...read moreread less

Abstract: Methods, systems, and computer program products for assessing value of one or more data sets in the context of a set of applications are provided herein. A computer-implemented method includes selecting analytic applications of interest based on a characterization of data attributes of each of the available data sets; automatically determining an impact of each of the data attributes of each of the available data sets on an end value of each of the analytic applications of interest; automatically computing an amount of improvement to the end value of each of the analytic applications of interest based on inclusion of an additional data set; and automatically determining a value attributed to the additional data set based on a comparison of (i) the cost of adding the additional data set to the available data sets to (ii) the computed amount of improvement based on the inclusion of the additional data set.

...read moreread less

8 citations

1
2
3
4
5
6
7
8
…
9
10
11
12
13
14
15
…
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The spread of true and false news online

[...]

Soroush Vosoughi¹, Deb Roy¹, Sinan Aral¹•Institutions (1)

Massachusetts Institute of Technology¹

09 Mar 2018-Science

TL;DR: A large-scale analysis of tweets reveals that false rumors spread further and faster than the truth, and false news was more novel than true news, which suggests that people were more likely to share novel information.

...read moreread less

Abstract: We investigated the differential diffusion of all of the verified true and false news stories distributed on Twitter from 2006 to 2017. The data comprise ~126,000 stories tweeted by ~3 million people more than 4.5 million times. We classified news as true or false using information from six independent fact-checking organizations that exhibited 95 to 98% agreement on the classifications. Falsehood diffused significantly farther, faster, deeper, and more broadly than the truth in all categories of information, and the effects were more pronounced for false political news than for false news about terrorism, natural disasters, science, urban legends, or financial information. We found that false news was more novel than true news, which suggests that people were more likely to share novel information. Whereas false stories inspired fear, disgust, and surprise in replies, true stories inspired anticipation, sadness, joy, and trust. Contrary to conventional wisdom, robots accelerated the spread of true and false news at the same rate, implying that false news spreads more than the truth because humans, not robots, are more likely to spread it.

...read moreread less

4,241 citations

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Social Network Analysis

[...]

Tom A. B. Snijders

01 Jan 2012

3,692 citations

Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification

[...]

Joy Buolamwini, Timnit Gebru

21 Jan 2018

TL;DR: It is shown that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men, in commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition.

...read moreread less

Abstract: The paper “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” by Joy Buolamwini and Timnit Gebru, that will be presented at the Conference on Fairness, Accountability, and Transparency (FAT*) in February 2018, evaluates three commercial API-based classifiers of gender from facial images, including IBM Watson Visual Recognition. The study finds these services to have recognition capabilities that are not balanced over genders and skin tones [1]. In particular, the authors show that the highest error involves images of dark-skinned women, while the most accurate result is for light-skinned men.

...read moreread less

2,528 citations

Posted Content•

A Survey on Bias and Fairness in Machine Learning

[...]

Ninareh Mehrabi¹, Fred Morstatter¹, Nripsuta Saxena¹, Kristina Lerman¹, Aram Galstyan¹ - Show less +1 more•Institutions (1)

Information Sciences Institute¹

23 Aug 2019-arXiv: Learning

TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.

...read moreread less

Abstract: With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

...read moreread less

1,571 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse