Journal ArticleDOI
An algorithm for suffix stripping
TLDR
An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.Abstract:
The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.read more
Citations
More filters
Journal Article
Natural Language Processing (Almost) from Scratch
TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling is proposed.
BookDOI
Semi-Supervised Learning
TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).
Journal ArticleDOI
bibliometrix: An R-tool for comprehensive science mapping analysis
Massimo Aria,Corrado Cuccurullo +1 more
TL;DR: This paper proposes a unique open-source tool, designed by the authors, called bibliometrix, for performing comprehensive science mapping analysis, programmed in R, and can be rapidly upgraded and integrated with other statistical R-packages.
Journal ArticleDOI
RCV1: A New Benchmark Collection for Text Categorization Research
TL;DR: This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.
Journal ArticleDOI
From frequency to meaning: vector space models of semantics
Peter D. Turney,Patrick Pantel +1 more
TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
References
More filters
Development of a Stemming Algorithm
TL;DR: A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application.
Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices
TL;DR: An essential requirement of the project involved cooperation of a large number of research scientists, and the response to the request was most satisfactory, and I acknowledge with thanks the generous assistance of some two hundred scientists.
Journal ArticleDOI
FIRST: Flexible Information Retrieval System for Text
TL;DR: An on‐line document retrieval system is described which combines a data base management system with automatic processing of natural language queries and abstracts, providing direct access to documents with specified bibliographic or descriptor items.