Home
/
Authors
/
Shashi Shekhar

Author

Shashi Shekhar

Other affiliations: Indian Institute of Technology Guwahati, IBM, University of California, Berkeley ...read more

Bio: Shashi Shekhar is an academic researcher from University of Minnesota. The author has contributed to research in topics: Spatial analysis & Flow network. The author has an hindex of 56, co-authored 400 publications receiving 13629 citations. Previous affiliations of Shashi Shekhar include Indian Institute of Technology Guwahati & IBM.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Multilevel hypergraph partitioning: applications in VLSI domain

[...]

George Karypis¹, R. Aggarwal², Vipin Kumar¹, Shashi Shekhar¹•Institutions (2)

University of Minnesota¹, Lattice Semiconductor²

01 Mar 1999-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new hypergraph-partitioning algorithm that is based on the multilevel paradigm, which scales very well for large hypergraphs and produces high-quality partitioning in a relatively small amount of time.

...read moreread less

Abstract: In this paper, we present a new hypergraph-partitioning algorithm that is based on the multilevel paradigm. In the multilevel paradigm, a sequence of successively coarser hypergraphs is constructed. A bisection of the smallest hypergraph is computed and it is used to obtain a bisection of the original hypergraph by successively projecting and refining the bisection to the next level finer hypergraph. We have developed new hypergraph coarsening strategies within the multilevel framework. We evaluate their performance both in terms of the size of the hyperedge cut on the bisection, as well as on the run time for a number of very large scale integration circuits. Our experiments show that our multilevel hypergraph-partitioning algorithm produces high-quality partitioning in a relatively small amount of time. The quality of the partitionings produced by our scheme are on the average 6%-23% better than those produced by other state-of-the-art schemes. Furthermore, our partitioning algorithm is significantly faster, often requiring 4-10 times less time than that required by the other schemes. Our multilevel hypergraph-partitioning algorithm scales very well for large hypergraphs. Hypergraphs with over 100 000 vertices can be bisected in a few minutes on today's workstations. Also, on the large hypergraphs, our scheme outperforms other schemes (in hyperedge cut) quite consistently with larger margins (9%-30%).

...read moreread less

941 citations

Proceedings Article•DOI•

Multilevel hypergraph partitioning: application in VLSI domain

[...]

George Karypis¹, Rajat Aggarwal¹, Vipin Kumar¹, Shashi Shekhar¹•Institutions (1)

University of Minnesota¹

13 Jun 1997

TL;DR: The experiments show that the multilevel hypergraph partitioning algorithm produces high quality partitioning in relatively small amount of time and outperforms other schemes (in hyperedge cut) quite consistently with larger margins.

...read moreread less

Abstract: In this paper, we present a new hypergraph partitioning algorithmthat is based on the multilevel paradigm. In the multilevel paradigm,a sequence of successively coarser hypergraphs is constructed. Abisection of the smallest hypergraph is computed and it is used toobtain a bisection of the original hypergraph by successively projectingand refining the bisection to the next level finer hypergraph.We evaluate the performance both in terms of the size of the hyper-edgecut on the bisection as well as run time on a number of VLSIcircuits. Our experiments show that our multilevel hypergraph partitioningalgorithm produces high quality partitioning in relativelysmall amount of time. The quality of the partitionings produced byour scheme are on the average 4% to 23% better than those producedby other state-of-the-art schemes. Furthermore, our partitioning algorithmissignificantly faster, often requiring 4 to 15 times less timethan that required by the other schemes. Our multilevel hypergraphpartitioning algorithm scales very well for large hypergraphs. Hypergraphswith over 100,000 vertices can be bisected in a few minuteson today's workstations. Also, on the large hypergraphs, ourscheme outperforms other schemes (in hyperedge cut) quite consistentlywith larger margins (9% to 30%).

...read moreread less

887 citations

Book•

Spatial Databases: A Tour

[...]

Shashi Shekhar, Sanjay Chawla

20 Jun 2003

TL;DR: An introduction to Spatial Databases and Trends in Spatial Data Mining.

...read moreread less

Abstract: 1. Introduction to Spatial Databases. 2. Spatial Concepts and Data Models. 3. Spatial Query Language. 4. Spatial Storage and Indexing. 5. Query Processing and Optimization. 6. Spatial Networks. 7. Introduction to Spatial Data Mining. 8. Trends in Spatial Databases.

...read moreread less

714 citations

Journal Article•DOI•

Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data

[...]

Anuj Karpatne¹, Gowtham Atluri², James H. Faghmous³, Michael Steinbach¹, Arindam Banerjee¹, Auroop R. Ganguly⁴, Shashi Shekhar¹, Nagiza F. Samatova⁵, Vipin Kumar¹ - Show less +5 more•Institutions (5)

University of Minnesota¹, University of Cincinnati², Icahn School of Medicine at Mount Sinai³, Northeastern University⁴, North Carolina State University⁵

01 Oct 2017-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The paradigm of theory-guided data science is formally conceptualized and a taxonomy of research themes in TGDS is presented and several approaches for integrating domain knowledge in different research themes are described using illustrative examples from different disciplines.

...read moreread less

Abstract: Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.

...read moreread less

621 citations

Journal Article•DOI•

Theory-guided Data Science: A New Paradigm for Scientific Discovery from Data

[...]

University of Minnesota¹, University of Cincinnati², Icahn School of Medicine at Mount Sinai³, Northeastern University⁴, North Carolina State University⁵

27 Dec 2016-arXiv: Learning

TL;DR: The theory-guided data science (TGDS) paradigm as mentioned in this paper is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery.

...read moreread less

532 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85

Collapse

Cited by

PDF

Open Access

More filters

Book•

Data Mining: Concepts and Techniques

[...]

Jiawei Han¹, Micheline Kamber², Jian Pei²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Simon Fraser University²

08 Sep 2000

TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.

...read moreread less

Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

...read moreread less

23,600 citations

Journal Article•DOI•

Anomaly detection: A survey

[...]

Varun Chandola¹, Arindam Banerjee¹, Vipin Kumar¹•Institutions (1)

University of Minnesota¹

30 Jul 2009-ACM Computing Surveys

TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.

...read moreread less

Abstract: Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.

...read moreread less

9,627 citations

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Statistics for Spatial Data.

[...]

Andrew B. Lawson¹, Noel A Cressie•Institutions (1)

University of Dundee¹

01 Mar 1993-The Statistician

6,278 citations

Journal Article•DOI•

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

[...]

Alexander Strehl¹, Joydeep Ghosh¹•Institutions (1)

University of Texas at Austin¹

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions).

...read moreread less

Abstract: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.

...read moreread less

4,375 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse