Home
/
Authors
/
Sameet Agarwal

Author

Sameet Agarwal

Bio: Sameet Agarwal is an academic researcher. The author has contributed to research in topics: Computation & Multidimensional analysis. The author has an hindex of 3, co-authored 3 publications receiving 686 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•

On the Computation of Multidimensional Aggregates

[...]

Sameet Agarwal, Rakesh Agrawal, Prasad M. Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi - Show less +3 more

03 Sep 1996

TL;DR: In this article, the authors present fast algorithms for computing a collection of group bys, which is equivalent to the union of a number of standard group-by operations, and show how the structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations.

...read moreread less

Abstract: At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of group bys. We focus on a special case of the aggregation problem - computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computation can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward meth

...read moreread less

608 citations

Book•

On the computation of multidimensional aggregates

[...]

Sameet Agarwal, Rakesh Agarwal, Prasad M. Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan - Show less +2 more

01 Jun 1999

55 citations

Computation of Multidimensional Aggregates

[...]

Prasad M. Deshpande, Sameet Agarwal, Jeffrey F. Naughton, Raghu Ramakrishnan

01 Jan 1997

TL;DR: This paper presents fast algorithms for computing a collection of group bys using sort-based and hashbased grouping methods with several .optimizations, like combining common operations across multiple groupbys, caching, and using pre-computed group-by8 for computing other groupbys.

...read moreread less

27 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

An overview of data warehousing and OLAP technology

[...]

Surajit Chaudhuri¹, Umeshwar Dayal²•Institutions (2)

Microsoft¹, Hewlett-Packard²

01 Mar 1997

TL;DR: An overview of data warehousing and OLAP technologies, with an emphasis on their new requirements, is provided, based on a tutorial presented at the VLDB Conference, 1996.

...read moreread less

Abstract: Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at the VLDB Conference, 1996.

...read moreread less

2,835 citations

Proceedings Article•DOI•

Automatic subspace clustering of high dimensional data for data mining applications

[...]

Rakesh Agrawal¹, Johannes Gehrke¹, Dimitrios Gunopulos¹, Prabhakar Raghavan¹•Institutions (1)

IBM¹

01 Jun 1998

TL;DR: CLIQUE is presented, a clustering algorithm that satisfies each of these requirements of data mining applications including the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records.

...read moreread less

Abstract: Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.

...read moreread less

2,782 citations

Data Mining: Concepts and Techniques (2nd edition)

[...]

Jiawei Han, Micheline Kamber

01 Jan 2006

TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].

...read moreread less

Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

...read moreread less

2,591 citations

Journal Article•DOI•

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

[...]

Jim Gray¹, A. Bosworth¹, A. Lyaman¹, Hamid Pirahesh²•Institutions (2)

Microsoft¹, IBM²

26 Feb 1996

TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.

...read moreread less

Abstract: Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. The paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensionaI cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value": ALL, so the point (ALL,ALL,...,ALL, sum(*)) represents the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.

...read moreread less

2,308 citations

Posted Content•

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

[...]

Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, Hamid Pirahesh - Show less +4 more

25 Jan 2007-arXiv: Databases

TL;DR: The cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers, and treats each of the N aggregation attributes as a dimension of N-space.

...read moreread less

Abstract: Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensional cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.

...read moreread less

1,870 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

Collapse