scispace - formally typeset
Search or ask a question

Showing papers on "Document retrieval published in 1970"


Journal ArticleDOI
Gerard Salton1
TL;DR: The methods are evaluated and it is shown that the effectiveness of the mixed language processing is approximately equivalent to that of the standard process operating within a single language only.
Abstract: Experiments conducted over the last few years with the SMART document retrieval system have shown that fully automatic text processing methods using relatively simple English language analysis tools are as effective for document indexing, classification, search, and retrieval as the more elaborate manual methods normally used. The present study describes an extension of the SMART procedures to German language materials. A multilingual thesaurus is used for the analysis of documents and search requests, and tools are provided which make it possible to process English documents against German queries, and vice versa. The methods are evaluated and it is shown that the effectiveness of the mixed language processing is approximately equivalent to that of the standard process operating within a single language only.

137 citations


Journal ArticleDOI
TL;DR: The question of how design equations, when and if they become available, will be the keystones of retrieval system theory, are derived is investigated in three case studies corresponding to three different types of retrieval systems.
Abstract: In recent research on document retrieval systems, considerable attention has been devoted to the problem of defining appropriate performance measures, but very little has been done to derive design equations that make use of them. Design equations show the relationships that obtain between retrieval performance and the design characteristics of the system under analysis. Because design equations, when and if they become available, will be the keystones of retrieval system theory, the question of how they can be derived is an important one. In this paper, the question is investigated in three case studies corresponding to three different types of retrieval systems. A design equation is derived for each of the three system types, the equation in each case showing the relationship between expected search length, used as a performance measure, and certain system characteristics having to do with the distribution of index terms over the document collection and the number of errors in the search requests.

11 citations


Journal ArticleDOI
TL;DR: This paper describes a document retrieval system implemented with a subset of the medical literature and introduces methods for computation of term‐term association factors, indexing, assignment ofterm‐document relevance values, and computations for recall and relevance.
Abstract: This paper describes a document retrieval system implemented with a subset of the medical literature. With the exception of the development of a negative dictionary, all system operations are completely automatic. Introduced are methods for computation of term-term association factors, indexing, assignment of term-document relevance values, and computations for recall and relevance. High weights are provided for low-frequency terms, and retrieval is performed directly from highly connected term-document files without elaboration. Recall and relevance are based on quantitative internal system computations, and results are compared with user evaluations.

10 citations


Posted ContentDOI
01 Jan 1970-ChemRxiv
TL;DR: This paper applied text mining with named entity recognition (NER), along with entity normalization, for large-scale information extraction from the published materials science literature and achieved an overall accuracy of 87% on a test set.
Abstract: Over the past decades, the number of published materials science articles has increased manyfold. Now, a major bottleneck in the materials discovery pipeline arises in connecting new results with the previously established literature. A potential solution to this problem is to map the unstructured raw-text of published articles onto a structured database entry that allows for programmatic querying. To this end, we apply text-mining with named entity recognition (NER), along with entity normalization, for large-scale information extraction from the published materials science literature. The NER is based on supervised machine learning with a recurrent neural network architecture, and the model is trained to extract summary-level information from materials science documents, including: inorganic material mentions, sample descriptors, phase labels, material properties and applications, as well as any synthesis and characterization methods used. Our classifer, with an overall accuracy (f1) of 87% on a test set, is applied to information extraction from 3.27 million materials science abstracts - the most information-dense section of published articles.Overall, we extract more than 80 million materials-science-related named entities, and the content of each abstract is represented as a database entry in a structured format. Our database shows far greater recall in document retrieval when compared to traditional text-based searches due to an entity normalization procedure that recognizes synonyms. We demonstrate that simple database queries can be used to answer complex \meta-questions" of the published literature that would have previously required laborious, manual literature searches to answer. All of our data has been made freely available for bulk download; we have also made a public facing application programming interface (https://github.com/materialsintelligence/matscholar) and website http://matscholar.herokuapp.com/search for easy interfacing with the data, trained models and functionality described in this paper. These results will allow researchers to access targeted information on a scale and with a speed that has not been previously available, and can be expected to accelerate the pace of future materials science discovery.

3 citations


Patent
James E Young1
14 Jul 1970

2 citations



01 Feb 1970
TL;DR: Abstract : Response time of on-line document retrieval systems are analyzed and Linear and inverted file organizations are considered and their response times are evaluated.
Abstract: : Response time of on-line document retrieval systems are analyzed. Linear and inverted file organizations are considered and their response times are evaluated.

1 citations