scispace - formally typeset
Search or ask a question
Author

Lev Finkelstein

Bio: Lev Finkelstein is an academic researcher from Technion – Israel Institute of Technology. The author has contributed to research in topics: Scheduling (computing) & Schedule. The author has an hindex of 9, co-authored 11 publications receiving 2550 citations.

Papers
More filters
Journal Article
TL;DR: A new conceptual paradigm for performing search in context is presented, that largely automates the search process, providing even non-professional users with highly relevant results.
Abstract: Keyword-based search engines are in widespread use today as a popular means for Web-based information retrieval Although such systems seem deceptively simple, a considerable amount of skill is required in order to satisfy non-trivial information needs This paper presents a new conceptual paradigm for performing search in context, that largely automates the search process, providing even non-professional users with highly relevant results This paradigm is implemented in practice in the IntelliZap system, where search is initiated from a text query marked by the user in a document she views, and is guided by the text surrounding the marked query in that document (“the context”) The context-driven information retrieval process involves semantic keyword extraction and clustering to automatically generate new, augmented queries The latter are submitted to a host of general and domain-specific search engines Search results are then semantically reranked, using context Experimental results testify that using context to guide search, effectively offers even inexperienced users an advanced search tool on the Web

1,615 citations

Proceedings ArticleDOI
01 Apr 2001
TL;DR: A new conceptual paradigm for performing search in context is presented, that largely automates the search process, providing even non-professional users with highly relevant results.
Abstract: Keyword-based search engines are in widespread use today as a popular means for Web-based information retrieval. Although such systems seem deceptively simple, a considerable amount of skill is required in order to satisfy non-trivial information needs. This paper presents a new conceptual paradigm for performing search in context, that largely automates the search process, providing even non-professional users with highly relevant results. This paradigm is implemented in practice in the IntelliZap system, where search is initiated from a text query marked by the user in a document she views, and is guided by the text surrounding the marked query in that document (“the context”). The context-driven information retrieval process involves semantic keyword extraction and clustering to automatically generate new, augmented queries. The latter are submitted to a host of general and domain-specific search engines. Search results are then semantically reranked, using context. Experimental results testify that using context to guide search, effectively offers even inexperienced users an advanced search tool on the Web.

922 citations

Proceedings ArticleDOI
28 Jul 2002
TL;DR: This paper provides a schedule for parallel processors and proves that it is optimal for all m, and provides general guidelines for the use of parallel processors in the design of real-time systems.
Abstract: Anytime algorithms offer a tradeoff between computation time and the quality of the result returned They can be divided into two classes: contract algorithms, for which the total run time must be specified in advance, and interruptible algorithms, which can be queried at any time for a solution An interruptible algorithm can be constructed from a contract algorithm by repeatedly activating the contract algorithm with increasing run times The acceleration ratio of a run-time schedule is a worst-case measure of how inefficient the constructed interruptible algorithm is compared to the contract algorithm The smallest acceleration ratio achievable on a single processor is known Using multiple processors, smaller acceleration ratios are possible In this paper, we provide a schedule for m processors and prove that it is optimal for all m Our results provide general guidelines for the use of parallel processors in the design of real-time systems

27 citations

Proceedings Article
09 Aug 2003
TL;DR: It is demonstrated that search strategies and contract schedules are formally equivalent, and a formula relating the acceleration ratio of a schedule to the time-competitive ratio of the corresponding search strategy is derived.
Abstract: We study two apparently different, but formally similar, scheduling problems. The first problem involves contract algorithms, which can trade off run time for solution quality, as long as the amount of available run time is known in advance. The problem is to schedule contract algorithms to run on parallel processors, under the condition that an interruption can occur at any time, and upon interruption a solution to any one of a number of problems can be requested. Schedules are compared in terms of acceleration ratio, which is a worst-case measure of efficiency. We provide a schedule and prove its optimality among a particular class of schedules. Our second problem involves multiple robots searching for a goal on one of multiple rays. Search strategics are compared in terms of time-competitive ratio, the ratio of the total search time to the time it would take for one robot to traverse directly to the goal. We demonstrate that search strategies and contract schedules are formally equivalent. In addition, for our class of schedules, we derive a formula relating the acceleration ratio of a schedule to the time-competitive ratio of the corresponding search strategy.

26 citations

Patent
14 Jun 2001
TL;DR: A method and system for classification, including the steps of searching a data structure including categories for elements related to an input, calculating statistics describing the relevance of each of the elements to the input, ranking the elements by relevance to the inputs, determining if the ranked elements exceed a threshold confidence value, and returning a set of elements from the ranked items when the threshold confidence values is exceeded.
Abstract: A method and system for classification, including the steps of searching a data structure including categories for elements related to an input, calculating statistics describing the relevance of each of the elements to the input, ranking the elements by relevance to the input, determining if the ranked elements exceed a threshold confidence value, and returning a set of elements from the ranked elements when the threshold confidence value is exceeded.

21 citations


Cited by
More filters
Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations

Journal ArticleDOI
TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.
Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

7,537 citations

Book
01 Dec 1999
TL;DR: It is now clear that HAL's creator, Arthur C. Clarke, was a little optimistic in predicting when an artificial agent such as HAL would be avail-able as discussed by the authors.
Abstract: is one of the most recognizablecharacters in 20th century cinema. HAL is an artificial agent capable of such advancedlanguage behavior as speaking and understanding English, and at a crucial moment inthe plot, even reading lips. It is now clear that HAL’s creator, Arthur C. Clarke, wasa little optimistic in predicting when an artificial agent such as HAL would be avail-able. But just how far off was he? What would it take to create at least the language-relatedpartsofHAL?WecallprogramslikeHALthatconversewithhumansinnatural

3,077 citations

Posted Content
TL;DR: A new approach based on the skipgram model, where each word is represented as a bag of character n-grams, with words being represented as the sum of these representations, which achieves state-of-the-art performance on word similarity and analogy tasks.
Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character $n$-grams. A vector representation is associated to each character $n$-gram; words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows us to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

2,425 citations

Proceedings Article
06 Jan 2007
TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.
Abstract: Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r = 0.56 to 0.75 for individual words and from r = 0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

2,285 citations