scispace - formally typeset
Search or ask a question
Topic

Character (computing)

About: Character (computing) is a research topic. Over the lifetime, 8210 publications have been published within this topic receiving 53115 citations. The topic is also known as: char & rune.


Papers
More filters
Proceedings Article
12 Feb 2016
TL;DR: A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.
Abstract: We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

1,499 citations

Journal ArticleDOI
TL;DR: A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed.
Abstract: A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular expression as source language and produces an IBM 7094 program as object language. The object program then accepts the text to be searched as input and produces a signal every time an embedded string in the text matches the given regular expression. Examples, problems, and solutions are also presented.

897 citations

01 Jan 1970
TL;DR: A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed.
Abstract: A method for locating specific character strings embedded in character text is described and an implementation of this method in the form of a compiler is discussed. The compiler accepts a regular expression as source language and produces an IBM 7094 program as object language. The object program then accepts the text to be searched as input and produces a signal every time an embedded string in the text matches the given regular expression. Examples, problems, and solutions are also presented.

463 citations

Journal ArticleDOI
TL;DR: Digital signal processing provides a set of novel and useful tools for solving highly relevant problems in genomic information science and technology, in the form of local texture, color spectrograms visually provide significant information about biomolecular sequences which facilitates understanding of local nature, structure, and function.
Abstract: Genomics is a highly cross-disciplinary field that creates paradigm shifts in such diverse areas as medicine and agriculture. It is believed that many significant scientific and technological endeavors in the 21st century will be related to the processing and interpretation of the vast information that is currently revealed from sequencing the genomes of many living organisms, including humans. Genomic information is digital in a very real sense; it is represented in the form of sequences of which each element can be one out of a finite number of entities. Such sequences, like DNA and proteins, have been mathematically represented by character strings, in which each character is a letter of an alphabet. In the case of DNA, the alphabet is size 4 and consists of the letters A, T, C and G; in the case of proteins, the size of the corresponding alphabet is 20. As the list of references shows, biomolecular sequence analysis has already been a major research topic among computer scientists, physicists, and mathematicians. The main reason that the field of signal processing does not yet have significant impact in the field is because it deals with numerical sequences rather than character strings. However, if we properly map a character string into, one or more numerical sequences, then digital signal processing (DSP) provides a set of novel and useful tools for solving highly relevant problems. For example, in the form of local texture, color spectrograms visually provide significant information about biomolecular sequences which facilitates understanding of local nature, structure, and function. Furthermore, both the magnitude and the phase of properly defined Fourier transforms can be used to predict important features like the location and certain properties of protein coding regions in DNA. Even the process of mapping DNA into proteins and the interdependence of the two kinds of sequences can be analyzed using simulations based on digital filtering. These and other DSP-based approaches result in alternative mathematical formulations and may provide improved computational techniques for the solution of useful problems in genomic information science and technology.

453 citations

Patent
Ken Kocienda1
11 Aug 2008
TL;DR: In this article, the authors present a system for displaying accented or related characters for characters selected by a user through a virtual keyboard operating in a multi-language environment, based on the frequency of occurrence of the accented character in the current language being typed by the user.
Abstract: The disclosed implementations include displays of accented or related characters for characters selected by a user through a virtual keyboard operating in a multi-language environment. In one aspect, when a user clicks and holds down a key, a popup displays accented characters for the character associated with the key. In another aspect, the order of accented characters can be based a frequency of occurrence of the accented character in the current language being typed by the user. In another aspect, when a character is at edge of a display, the popup is visually displayed in a different location and the ordering of the accents in the display are set with the more frequently occurring accents being more quickly accessible. In another aspect, auto correction is used to correct accented equivalents for compounds. In another aspect, a different visual keyboard layout is provided for different languages.

380 citations


Network Information
Related Topics (5)
MX record
179 papers, 7.8K citations
84% related
C++ string handling
4.7K papers, 49.2K citations
83% related
Taskbar
129 papers, 3.8K citations
82% related
Percent-encoding
145 papers, 6.5K citations
82% related
IndoWordNet
35 papers, 12.6K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202165
2020123
2019151
2018190
2017207
2016185