Journal•ISSN: 2052-3033

Sequence

About: Sequence is an academic journal. The journal publishes majorly in the area(s): Data compression & Context-adaptive binary arithmetic coding. It has an ISSN identifier of 2052-3033. Over the lifetime, 73 publications have been published receiving 2706 citations. The journal is also known as: ordered list.

...read moreread less

Topics: Data compression, Context-adaptive binary arithmetic coding, Lossless compression, Lossy compression, Prefix code ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

On the resemblance and containment of documents

[...]

Andrei Z. Broder

11 Jun 1997-Sequence

TL;DR: The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that could be done independently for each document.

...read moreread less

Abstract: Given two documents A and B we define two mathematical notions: their resemblance r(A, B) and their containment c(A, B) that seem to capture well the informal notions of "roughly the same" and "roughly contained." The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that can be done independently for each document. Furthermore, the resemblance can be evaluated using a fixed size sample for each document. This paper discusses the mathematical properties of these measures and the efficient implementation of the sampling process using Rabin (1981) fingerprints.

...read moreread less

1,989 citations

Proceedings Article•DOI•

Group testing problems with sequences in experimental molecular biology

[...]

Martin Farach¹, Sampath Kannan¹, E. Knill¹, S. Muthukrishnan•Institutions (1)

Rutgers University¹

11 Jun 1997-Sequence

TL;DR: The authors isolate the most basic issues in molecular biological group testing and formulate a set of novel group testing problems for designing cost effective experiments.

...read moreread less

Abstract: Group testing is a basic paradigm for experimental design. In computational biology, group testing problems come up in designing experiments with sequences for mapping, screening libraries, etc. While a great deal of classical research has been done on group testing over the last fifty years, the current biological applications bring up many new issues in group testing which had not been previously considered. The authors isolate the most basic issues in molecular biological group testing. Given these, they formulate a set of novel group testing problems for designing cost effective experiments. For some of these problems they give solutions, while leaving others open.

...read moreread less

101 citations

Proceedings Article•DOI•

Code and parse trees for lossless source encoding

[...]

J. Abrahams¹•Institutions (1)

Office of Naval Research¹

11 Jun 1997-Sequence

TL;DR: Questions related to counting and representing code and parse trees are discussed and variants of Huffman coding in which the assignment of 0s and 1s within codewords is significant such as bidirectionality and synchronization are discussed.

...read moreread less

Abstract: This paper surveys the theoretical literature on fixed-to-variable-length lossless source code trees, called code trees, and on variable-length-to-fixed lossless source code trees, called parse trees. In particular, the following code tree topics are outlined in this survey: characteristics of the Huffman (1952) code tree; Huffman-type coding for infinite source alphabets and universal coding; the Huffman problem subject to a lexicographic constraint, or the Hu-Tucker (1982) problem; the Huffman problem subject to maximum codeword length constraints; code trees which minimize other functions besides average codeword length; coding for unequal cost code symbols, or the Karp problem, and finite state channels; and variants of Huffman coding in which the assignment of 0s and 1s within codewords is significant such as bidirectionality and synchronization. The literature on parse tree topics is less extensive. Treated here are: variants of Tunstall (1968) parsing; dualities between parsing and coding; dual tree coding in which parsing and coding are combined to yield variable-length-to-variable-length codes; and parsing and random number generation. Finally, questions related to counting and representing code and parse trees are also discussed.

...read moreread less

84 citations

Proceedings Article•DOI•

A signature technique for similarity-based queries

[...]

Christos Faloutsos¹, H. V. Jagadish¹, Alberto O. Mendelzon¹, Tova Milo¹•Institutions (1)

University of Maryland, College Park¹

11 Jun 1997-Sequence

TL;DR: This work proposes the use of a signature-based technique to "shrink" the data sequences into signatures, and search the signatures instead of the real sequences, with further comparison being required only when a possible match is indicated.

...read moreread less

Abstract: Jagadish et al. (see Proc. ACM SIGACT-SIGMOD-SIGART PODS, p.36-45, 1995) developed a general framework for posing queries based on similarity. The framework enables a formal definition of the notion of similarity for an application domain of choice, and then its use in queries to perform similarity-based search. We adapt this framework to the specialized domain of real-valued sequences. (Although some of the ideas we present are applicable to other types of data as well). In particular we focus on whole-match queries. By whole-match query we mean the case where the user has to specify the whole sequence. Similarity-based search can be computationally very expensive. The computation cost depends heavily on the length of sequences being compared. To make such similarity testing feasible on large data sets, we propose the use of a signature based technique. In a nutshell, our approach is to "shrink" the data sequences into signatures, and search the signatures instead of the real sequences, with further comparison being required only when a possible match is indicated. Being shorter, signatures can usually be compared much faster than the original sequences. In addition, signatures are usually easier to index. For such a signature-based technique to be effective one has to assure that (1) the signature comparison is fast, and (2) the signature comparison gives few false alarms, and no false dismissals. We obtain measures of goodness for our technique. The technique is illustrated with a couple of very different examples.

...read moreread less

71 citations

Proceedings Article•DOI•

Multialphabet coding with separate alphabet description

[...]

J. Aberg¹, Yu. M. Shtarkov², Ben Smeets•Institutions (2)

Lund University¹, Russian Academy of Sciences²

11 Jun 1997-Sequence

TL;DR: It is shown that these descriptions of the alphabet can be separated in such a way that the encoding of the actual sequence can be performed independently of the Alphabet description, and sequential coding methods for such sequences are presented.

...read moreread less

Abstract: For lossless universal source coding of memoryless sequences with an a priori unknown alphabet size (multialphabet coding), the alphabet of the sequence must be described as well as the sequence itself. Usually an efficient description of the alphabet can be made only by taking into account some additional information. We show that these descriptions can be separated in such a way that the encoding of the actual sequence can be performed independently of the alphabet description, and present sequential coding methods for such sequences. Such methods have applications in coding methods where the alphabet description is made available sequentially, such as PPM.

...read moreread less

46 citations

Collapse

Network Information

Related Journals (5)

Information Processing Letters

7.7K papers, 189.7K citations

78% related

SIAM Journal on Computing

3.5K papers, 327.5K citations

77% related

Discrete Applied Mathematics

9.1K papers, 178.6K citations

74% related

Theoretical Computer Science

12.4K papers, 368.9K citations

74% related

Journal of the ACM

2.9K papers, 426.3K citations

74% related

Performance

Metrics

Papers

2,869

Citations

No. of papers from the Journal in previous years
Year	Papers
1997	31
1990	42