scispace - formally typeset
Journal ArticleDOI

An algorithm for suffix stripping

M. F. Porter
- 01 Dec 1997 - 
- Vol. 40, Iss: 3, pp 313-316
TLDR
An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL, and performs slightly better than a much more elaborate system with which it has been compared.
Abstract
The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length.

read more

Citations
More filters
Journal Article

Natural Language Processing (Almost) from Scratch

TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling is proposed.
BookDOI

Semi-Supervised Learning

TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).
Journal ArticleDOI

bibliometrix: An R-tool for comprehensive science mapping analysis

TL;DR: This paper proposes a unique open-source tool, designed by the authors, called bibliometrix, for performing comprehensive science mapping analysis, programmed in R, and can be rapidly upgraded and integrated with other statistical R-packages.
Journal ArticleDOI

RCV1: A New Benchmark Collection for Text Categorization Research

TL;DR: This work describes the coding policy and quality control procedures used in producing the RCV1 data, the intended semantics of the hierarchical category taxonomies, and the corrections necessary to remove errorful data.
Journal ArticleDOI

From frequency to meaning: vector space models of semantics

TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
References
More filters

Development of a Stemming Algorithm

TL;DR: A new version of a context-sensitive, longest-match stemming algorithm for English is proposed; though developed for use in a library information transfer system, it is of general application.

Aslib Cranfield research project - Factors determining the performance of indexing systems; Volume 1, Design; Part 2, Appendices

TL;DR: An essential requirement of the project involved cooperation of a large number of research scientists, and the response to the request was most satisfactory, and I acknowledge with thanks the generous assistance of some two hundred scientists.
Journal ArticleDOI

FIRST: Flexible Information Retrieval System for Text

TL;DR: An on‐line document retrieval system is described which combines a data base management system with automatic processing of natural language queries and abstracts, providing direct access to documents with specified bibliographic or descriptor items.
Related Papers (5)