scispace - formally typeset
Search or ask a question
JournalISSN: 2052-3033

Sequence 

About: Sequence is an academic journal. The journal publishes majorly in the area(s): Data compression & Context-adaptive binary arithmetic coding. It has an ISSN identifier of 2052-3033. Over the lifetime, 73 publications have been published receiving 2706 citations. The journal is also known as: ordered list.

Papers published on a yearly basis

Papers
More filters
Proceedings ArticleDOI
11 Jun 1997-Sequence
TL;DR: The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that could be done independently for each document.
Abstract: Given two documents A and B we define two mathematical notions: their resemblance r(A, B) and their containment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document. This paper discusses the mathematical properties of these measures and the efficient implementation of the sampling process using Rabin (1981) fingerprints.

1,989 citations

Proceedings ArticleDOI
11 Jun 1997-Sequence
TL;DR: The authors isolate the most basic issues in molecular biological group testing and formulate a set of novel group testing problems for designing cost effective experiments.
Abstract: Group testing is a basic paradigm for experimental design. In computational biology, group testing problems come up in designing experiments with sequences for mapping, screening libraries, etc. While a great deal of classical research has been done on group testing over the last fifty years, the current biological applications bring up many new issues in group testing which had not been previously considered. The authors isolate the most basic issues in molecular biological group testing. Given these, they formulate a set of novel group testing problems for designing cost effective experiments. For some of these problems they give solutions, while leaving others open.

101 citations

Proceedings ArticleDOI
11 Jun 1997-Sequence
TL;DR: Questions related to counting and representing code and parse trees are discussed and variants of Huffman coding in which the assignment of 0s and 1s within codewords is significant such as bidirectionality and synchronization are discussed.
Abstract: This paper surveys the theoretical literature on fixed-to-variable-length lossless source code trees, called code trees, and on variable-length-to-fixed lossless source code trees, called parse trees. In particular, the following code tree topics are outlined in this survey: characteristics of the Huffman (1952) code tree; Huffman-type coding for infinite source alphabets and universal coding; the Huffman problem subject to a lexicographic constraint, or the Hu-Tucker (1982) problem; the Huffman problem subject to maximum codeword length constraints; code trees which minimize other functions besides average codeword length; coding for unequal cost code symbols, or the Karp problem, and finite state channels; and variants of Huffman coding in which the assignment of 0s and 1s within codewords is significant such as bidirectionality and synchronization. The literature on parse tree topics is less extensive. Treated here are: variants of Tunstall (1968) parsing; dualities between parsing and coding; dual tree coding in which parsing and coding are combined to yield variable-length-to-variable-length codes; and parsing and random number generation. Finally, questions related to counting and representing code and parse trees are also discussed.

84 citations

Proceedings ArticleDOI
11 Jun 1997-Sequence
TL;DR: This work proposes the use of a signature-based technique to "shrink" the data sequences into signatures, and search the signatures instead of the real sequences, with further comparison being required only when a possible match is indicated.
Abstract: Jagadish et al. (see Proc. ACM SIGACT-SIGMOD-SIGART PODS, p.36-45, 1995) developed a general framework for posing queries based on similarity. The framework enables a formal definition of the notion of similarity for an application domain of choice, and then its use in queries to perform similarity-based search. We adapt this framework to the specialized domain of real-valued sequences. (Although some of the ideas we present are applicable to other types of data as well). In particular we focus on whole-match queries. By whole-match query we mean the case where the user has to specify the whole sequence. Similarity-based search can be computationally very expensive. The computation cost depends heavily on the length of sequences being compared. To make such similarity testing feasible on large data sets, we propose the use of a signature based technique. In a nutshell, our approach is to "shrink" the data sequences into signatures, and search the signatures instead of the real sequences, with further comparison being required only when a possible match is indicated. Being shorter, signatures can usually be compared much faster than the original sequences. In addition, signatures are usually easier to index. For such a signature-based technique to be effective one has to assure that (1) the signature comparison is fast, and (2) the signature comparison gives few false alarms, and no false dismissals. We obtain measures of goodness for our technique. The technique is illustrated with a couple of very different examples.

71 citations

Proceedings ArticleDOI
11 Jun 1997-Sequence
TL;DR: It is shown that these descriptions of the alphabet can be separated in such a way that the encoding of the actual sequence can be performed independently of the Alphabet description, and sequential coding methods for such sequences are presented.
Abstract: For lossless universal source coding of memoryless sequences with an a priori unknown alphabet size (multialphabet coding), the alphabet of the sequence must be described as well as the sequence itself. Usually an efficient description of the alphabet can be made only by taking into account some additional information. We show that these descriptions can be separated in such a way that the encoding of the actual sequence can be performed independently of the alphabet description, and present sequential coding methods for such sequences. Such methods have applications in coding methods where the alphabet description is made available sequentially, such as PPM.

46 citations

Network Information
Related Journals (5)
Information Processing Letters
7.7K papers, 189.7K citations
78% related
SIAM Journal on Computing
3.5K papers, 327.5K citations
77% related
Discrete Applied Mathematics
9.1K papers, 178.6K citations
74% related
Theoretical Computer Science
12.4K papers, 368.9K citations
74% related
Journal of the ACM
2.9K papers, 426.3K citations
74% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
199731
199042