scispace - formally typeset
Search or ask a question
Author

Daniel Tam

Other affiliations: IBM
Bio: Daniel Tam is an academic researcher from University of Michigan. The author has contributed to research in topics: Automatic summarization & Stub file. The author has an hindex of 6, co-authored 8 publications receiving 1237 citations. Previous affiliations of Daniel Tam include IBM.

Papers
More filters
Journal ArticleDOI
TL;DR: A multi-document summarizer, MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and an evaluation scheme based on sentence utility and subsumption is applied.
Abstract: We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

1,121 citations

Proceedings ArticleDOI
03 Nov 2003
TL;DR: The results using the JHU summary corpus indicate that RU is a reasonable and often superior alternative to several common evaluation metrics.
Abstract: We present a series of experiments to demonstrate the validity of Relative Utility (RU) as a measure for evaluating extractive summarizers. RU is applicable in both single-document and multi-document summarization, is extendable to arbitrary compression rates with no extra annotation effort, and takes into account both random system performance and interjudge agreement. Our results using the JHU summary corpus indicate that RU is a reasonable and often superior alternative to several common evaluation metrics.

48 citations

Patent
10 Oct 2006
TL;DR: The Source Code Author Identifier (SCAI) as mentioned in this paper automates the process of manually running a comparison to identify changes between versions of a source code file and associates identified changes with the author who made the change.
Abstract: A Source Code Author Identifier (SCAI) automates the process of manually running a comparison to identify changes between versions of a source code file and associates identified changes with the author who made the change. After a developer identifies a segment of code in a first file, wherein the first file as a newer version of a second file, SCAI compares the segment of code in the first file to a corresponding segment of code in the second file. SCAI identifies the author of the first file whenever a difference is detected between the segment of code in the first file and the corresponding segment of code in the second file. SCAI displays the author of the first file next to the detected difference between the segment of code from the first file and the corresponding segment of code from the second file. SCAI can repeat the comparison across a plurality of versions of the file, comparing each version with the previously created version.

47 citations

Patent
28 Sep 2006
TL;DR: In this paper, a method and system for providing information obtained from both online stores and offline stores and for offering more purchasing options to customers is presented, where a customer can specify a particular item with detailed aspects for an information search, along with location information to define a local geographic area.
Abstract: The present invention is directed to a method and system for providing information obtained from both online stores and offline stores and for offering more purchasing options to customers A customer can specify a particular item with detailed aspects for an information search, along with location information to define a local geographic area The gathered local price information is presented to the customer over a network connection The customer can purchase the item from either online stores or offline stores over a network connection

42 citations

01 Jan 2003
TL;DR: The authors used mean length-adjusted coverage and obtained the best score of all systems on task 4 - question-focused summaries. But their results were not as good as the results of Michigan's participation in DUC 2003.
Abstract: We present the results of Michigan’s participation in DUC 2003. Using mean length-adjusted coverage, we obtained the best score of all systems on task 4 - question-focused summaries.

36 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A multi-document summarizer, MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and an evaluation scheme based on sentence utility and subsumption is applied.
Abstract: We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.

1,121 citations

Book
27 Jun 2011
TL;DR: The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed.
Abstract: It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field. We would like to thank the anonymous reviewers, our students and Noemie Elhadad, Hongyan Jing, Julia Hirschberg, Annie Louis, Smaranda Muresan and Dragomir Radev for their helpful feedback. This paper was supported in part by the U.S. National Science Foundation (NSF) under IIS-05-34871 and CAREER 09-53445. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. Full text available at: http://dx.doi.org/10.1561/1500000015

697 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of recent text summarization extractive approaches developed in the last decade is presented and the discussion of useful future directions that can help researchers to identify areas where further research is needed are discussed.
Abstract: As information is available in abundance for every topic on internet, condensing the important information in the form of summary would benefit a number of users. Hence, there is growing interest among the research community for developing new approaches to automatically summarize the text. Automatic text summarization system generates a summary, i.e. short length text that includes all the important information of the document. Since the advent of text summarization in 1950s, researchers have been trying to improve techniques for generating summaries so that machine generated summary matches with the human made summary. Summary can be generated through extractive as well as abstractive methods. Abstractive methods are highly complex as they need extensive natural language processing. Therefore, research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. During a decade, several extractive approaches have been developed for automatic summary generation that implements a number of machine learning and optimization techniques. This paper presents a comprehensive survey of recent text summarization extractive approaches developed in the last decade. Their needs are identified and their advantages and disadvantages are listed in a comparative manner. A few abstractive and multilingual text summarization approaches are also covered. Summary evaluation is another challenging issue in this research field. Therefore, intrinsic as well as extrinsic both the methods of summary evaluation are described in detail along with text summarization evaluation conferences and workshops. Furthermore, evaluation results of extractive summarization approaches are presented on some shared DUC datasets. Finally this paper concludes with the discussion of useful future directions that can help researchers to identify areas where further research is needed.

581 citations

Journal ArticleDOI
TL;DR: In this article, the authors examined three major online review platforms, TripAdvisor, Expedia, and Yelp, in terms of information quality related to online reviews about the entire hotel population in Manhattan, New York City.

549 citations

Book ChapterDOI
01 Jan 2012
TL;DR: This chapter gives a broad overview of existing approaches based on how representation, sentence scoring or summary selection strategies alter the overall performance of the summarizer, and points out some of the peculiarities of the task of summarization.
Abstract: Numerous approaches for identifying important content for automatic text summarization have been developed to date. Topic representation approaches first derive an intermediate representation of the text that captures the topics discussed in the input. Based on these representations of topics, sentences in the input document are scored for importance. In contrast, in indicator representation approaches, the text is represented by a diverse set of possible indicators of importance which do not aim at discovering topicality. These indicators are combined, very often using machine learning techniques, to score the importance of each sentence. Finally, a summary is produced by selecting sentences in a greedy approach, choosing the sentences that will go in the summary one by one, or globally optimizing the selection, choosing the best set of sentences to form a summary. In this chapter we give a broad overview of existing approaches based on these distinctions, with particular attention on how representation, sentence scoring or summary selection strategies alter the overall performance of the summarizer. We also point out some of the peculiarities of the task of summarization which have posed challenges to machine learning approaches for the problem, and some of the suggested solutions.

538 citations