scispace - formally typeset
Search or ask a question
Author

Gautam Das

Bio: Gautam Das is an academic researcher from University of Texas at Arlington. The author has contributed to research in topics: Tuple & Ranking. The author has an hindex of 54, co-authored 253 publications receiving 11363 citations. Previous affiliations of Gautam Das include University of Toronto & Qatar Computing Research Institute.


Papers
More filters
Proceedings ArticleDOI
07 Aug 2002
TL;DR: DBXplorer, a system that enables keyword-based searches in relational databases using a commercial relational database and Web server and allows users to interact via a browser front-end is discussed.
Abstract: Internet search engines have popularized the keyword-based search paradigm. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keyword-based searches in relational databases. DBXplorer has been implemented using a commercial relational database and Web server and allows users to interact via a browser front-end. We outline the challenges and discuss the implementation of our system, including results of extensive experimental evaluation.

818 citations

Proceedings Article
27 Aug 1998
TL;DR: In this article, the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series, was considered, and adaptive methods for finding rules of the above type from time-series data were described.
Abstract: We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series A simple example is a rule such as "a period of low telephone call activity is usually followed by a sharp rise in call volume" Examples of rules relating two or more time series are "if the Microsoft stock price goes up and Intel falls, then IBM goes up the next day," and "if Microsoft goes up strongly for one day, then declines strongly on the next day, and on the same days Intel stays about level, then IBM stays about level" Our emphasis is in the discovery of local patterns in multivariate time series, in contrast to traditional time series analysis which largely focuses on global models Thus, we search for rules whose conditions refer to patterns in time series However, we do not want to define beforehand which patterns are to be used; rather, we want the patterns to be formed from the data in the context of rule discovery We describe adaptive methods for finding rules of the above type from time-series data The methods are based on discretizing the sequence by methods resembling vector quantization We first form subsequences by sliding a window through the time series, and then cluster these subsequences by using a suitable measure of time-series similarity The discretized version of the time series is obtained by taking the cluster identifiers corresponding to the subsequence Once the time-series is discretized, we use simple rule finding methods to obtain rules from the sequence We present empirical results on the behavior of the method

713 citations

Journal ArticleDOI
TL;DR: This paper gives a simple algorithm for constructing sparse spanners for arbitrary weighted graphs and applies this algorithm to obtain specific results for planar graphs and Euclidean graphs.
Abstract: Given a graphG, a subgraphG' is at-spanner ofG if, for everyu,v ?V, the distance fromu tov inG' is at mostt times longer than the distance inG. In this paper we give a simple algorithm for constructing sparse spanners for arbitrary weighted graphs. We then apply this algorithm to obtain specific results for planar graphs and Euclidean graphs. We discuss the optimality of our results and present several nearly matching lower bounds.

654 citations

Journal ArticleDOI
01 Aug 2009
TL;DR: In this article, a formal semantics that accounts for both item relevance to a group and disagreements among group members is proposed for group recommendation and evaluated on MovieLens data set with 10M ratings.
Abstract: We study the problem of group recommendation. Recommendation is an important information exploration paradigm that retrieves interesting items for users based on their profiles and past activities. Single user recommendation has received significant attention in the past due to its extensive use in Amazon and Netflix. How to recommend to a group of users who may or may not share similar tastes, however, is still an open problem. The need for group recommendation arises in many scenarios: a movie for friends to watch together, a travel destination for a family to spend a holiday break, and a good restaurant for colleagues to have a working lunch. Intuitively, items that are ideal for recommendation to a group may be quite different from those for individual members. In this paper, we analyze the desiderata of group recommendation and propose a formal semantics that accounts for both item relevance to a group and disagreements among group members. We design and implement algorithms for efficiently computing group recommendations. We evaluate our group recommendation method through a comprehensive user study conducted on Amazon Mechanical Turk and demonstrate that incorporating disagreements is critical to the effectiveness of group recommendation. We further evaluate the efficiency and scalability of our algorithms on the MovieLens data set with 10M ratings.

346 citations

Book ChapterDOI
24 Jun 1997
TL;DR: This paper presents an intuitive model for measuring the similarity between two time series that takes into account outliers, different scaling functions, and variable sampling rates, and shows the naturalness of this notion of similarity.
Abstract: Similarity of objects is one of the crucial concepts in several applications, including data mining For complex objects, similarity is nontrivial to define In this paper we present an intuitive model for measuring the similarity between two time series The model takes into account outliers, different scaling functions, and variable sampling rates Using methods from computational geometry, we show that this notion of similarity can be computed in polynomial time Using statistical approximation techniques, the algorithms can be speeded up considerably We give preliminary experimental results that show the naturalness of the notion

336 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

01 Jan 2002

9,314 citations

Proceedings ArticleDOI
03 Jun 2002
TL;DR: The need for and research issues arising from a new model of data processing, where data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams are motivated.
Abstract: In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues.

2,933 citations

Journal ArticleDOI
TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.
Abstract: We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.

2,723 citations