scispace - formally typeset
Search or ask a question
Author

Robert E. Frederking

Other affiliations: Siemens
Bio: Robert E. Frederking is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Machine translation & Speech translation. The author has an hindex of 18, co-authored 70 publications receiving 1389 citations. Previous affiliations of Robert E. Frederking include Siemens.


Papers
More filters
Proceedings Article
01 Jan 1997
TL;DR: New TIR methods are introduced and it is shown that using bilingual corpora for extraction of term equivalences in con text outperforms other methods.
Abstract: Translingual information retrieval TIR con sists of providing a query in one language and searching document collections in one or more di erent languages This paper introduces new TIR methods and reports on comparative TIR experiments with these new methods and with previously reported ones in a realistic setting Methods fall into two categories query trans lation based and statistical IR approaches es tablishing translingual associations The re sults show that using bilingual corpora for au tomated extraction of term equivalences in con text outperforms other methods Translin gual versions of the Generalized Vector Space Model GVSM and Latent Semantic Indexing LSI perform relatively well as does translin gual pseudo relevance feedback PRF All showed relatively small performance loss be tween monolingual and translingual versions Query translation based on a general machine readable bilingual dictionary heretofore the most popular method did not match the per formance of other more sophisticated methods Also the previous very high LSI results in the literature were discon rmed by more realistic relevance based evaluations

199 citations

01 Jan 2011
TL;DR: A study on automatically clustering and classifying Twitter messages into different categories, inspired by the approaches taken by news aggregating services like Google News, suggests that the clusters produced by traditional unsupervised methods can often be incoherent from a topical perspective.
Abstract: In the emerging field of micro-blogging and social communication services, users post millions of short messages every day. Keeping track of all the messages posted by your friends and the conversation as a whole can become tedious or even impossible. In this paper, we presented a study on automatically clustering and classifying Twitter messages, also known as “tweets”, into different categories, inspired by the approaches taken by news aggregating services like Google News. Our results suggest that the clusters produced by traditional unsupervised methods can often be incoherent from a topical perspective, but utilizing a supervised methodology that utilize the hash-tags as indicators of topics produce surprisingly good results. We also offer a discussion on temporal effects of our methodology and training set size considerations. Lastly, we describe a simple method of finding the most representative tweet in a cluster, and provide an analysis of the results.

169 citations

Proceedings ArticleDOI
13 Oct 1994
TL;DR: Health minister with responsibility for emergency care Hazel Blears helped launch a £1.4 million NHS walk-in centre in St Helens, Merseyside, as part of her tour of emergency services in the area.
Abstract: Machine translation (MT) systems do not currently achieve optimal quality translation on free text, whatever translation method they employ. Our hypothesis is that the quality of MT will improve if an MT environment uses output from a variety of MT systems working on the same text. In the latest version of the Pangloss MT project, we collect the results of three translation engines---typically, subsentential chunks---in a chart data structure. Since the individual MT systems operate completely independently, their results may be incomplete, conflicting, or redundant. We use simple scoring heuristics to estimate the quality of each chunk, and find the highest-score sequence of chunks (the "best cover"). This paper describes in detail the combining method, presenting the algorithm and illustrations of its progress on one of many actual translations it has produced. It uses dynamic programming to efficiently compare weighted averages of sets of adjacent scored component translations. The current system operates primarily in a human-aided MT mode. The translation delivery system and its associated post-editing aide are briefly described, as is an initial evaluation of the usefulness of this method. Individual MT engines will be reported separately and are not, therefore, described in detail here.

162 citations

Journal ArticleDOI
TL;DR: The results show that using bilingual corpora for automated extraction of term equivalences in context outperforms dictionarybased methods and is comparable to that of other statistical corpus-based methods.

107 citations

Proceedings Article
01 May 2012
TL;DR: This article investigated the use of additional semantic features and pre-processing steps to improve automatic key phrase extraction, including signal words and freebase categories, which led to significant improvements in the accuracy of the results.
Abstract: Fast and effective automated indexing is critical for search and personalized services. Key phrases that consist of one or more words and represent the main concepts of the document are often used for the purpose of indexing. In this paper, we investigate the use of additional semantic features and pre-processing steps to improve automatic key phrase extraction. These features include the use of signal words and freebase categories. Some of these features lead to significant improvements in the accuracy of the results. We also experimented with 2 forms of document pre-processing that we call light filtering and co-reference normalization. Light filtering removes sentences from the document, which are judged peripheral to its main content. Co-reference normalization unifies several written forms of the same named entity into a unique form. We also needed a “Gold Standard” ― a set of labeled documents for training and evaluation. While the subjective nature of key phrase selection precludes a true “Gold Standard”, we used Amazon's Mechanical Turk service to obtain a useful approximation. Our data indicates that the biggest improvements in performance were due to shallow semantic features, news categories, and rhetorical signals (nDCG 78.47% vs. 68.93%). The inclusion of deeper semantic features such as Freebase sub-categories was not beneficial by itself, but in combination with pre-processing, did cause slight improvements in the nDCG scores.

56 citations


Cited by
More filters
01 Mar 1999

3,234 citations

Proceedings Article
08 Aug 2006
TL;DR: A new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments is defined.
Abstract: We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-reference variant of TER correlates as well with human judgments of MT quality as the four-reference variant of BLEU. We also define a human-targeted TER (or HTER) and show that it yields higher correlations with human judgments than BLEU—even when BLEU is given human-targeted references. Our results indicate that HTER correlates with human judgments better than HMETEOR and that the four-reference variants of TER and HTER correlate with human judgments as well as—or better than—a second human judgment does.

2,210 citations

Journal ArticleDOI
TL;DR: Barwise and Perry as discussed by the authors tackle the slippery subject of ''meaning, '' a subject that has long vexed linguists, language philosophers, and logicians, and they tackle it in this book.
Abstract: In this provocative book, Barwise and Perry tackle the slippery subject of \"meaning, \" a subject that has long vexed linguists, language philosophers, and logicians.

1,834 citations

Journal ArticleDOI
TL;DR: This paper provides an extensive survey of mobile cloud computing research, while highlighting the specific concerns in mobile cloud Computing, and presents a taxonomy based on the key issues in this area, and discusses the different approaches taken to tackle these issues.

1,671 citations