scispace - formally typeset
Search or ask a question
Author

Shmuel T. Klein

Other affiliations: Ashkelon Academic College, IBM, Weizmann Institute of Science  ...read more
Bio: Shmuel T. Klein is an academic researcher from Bar-Ilan University. The author has contributed to research in topics: Huffman coding & Data compression. The author has an hindex of 28, co-authored 162 publications receiving 2496 citations. Previous affiliations of Shmuel T. Klein include Ashkelon Academic College & IBM.


Papers
More filters
Patent
Michael Hirsch1, Haim Bitner1, Lior Aronovich1, Ron Asher1, Eitan Bachmat1, Shmuel T. Klein1 
19 Mar 2009
TL;DR: In this paper, the similarity measure is used to locate similar data segments in a repository and then to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input.
Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. The similar data segments thus located are further analyzed to determine their common (identical) data sections, regardless of the order and position of the common data sections in the repository and input, and in a time that is linear in the segment size and in constant space.

186 citations

Proceedings ArticleDOI
04 May 2009
TL;DR: It is shown how to combine similarity matching schemes with byte by byte comparison or hash based identity schemes and a novel type of similarity signatures is presented and its advantages in the context of deduplication requirements are explained.
Abstract: We describe some of the design choices that were made during the development of a fast, scalable, inline, deduplication device. The system's design goals and how they were achieved are presented. This is the firs deduplication device that uses similarity matching. The paper provides the following original research contributions: we show how similarity signatures can serve in a deduplication scheme; a novel type of similarity signatures is presented and its advantages in the context of deduplication requirements are explained. It is also shown how to combine similarity matching schemes with byte by byte comparison or hash based identity schemes.

132 citations

Journal ArticleDOI
TL;DR: A simple parallel algorithm for decoding a Huffman encoded file is presented, exploiting the tendency of Huffman codes to resynchronize quickly, i.e. recovering after possible decoding errors, in most cases.
Abstract: A simple parallel algorithm for decoding a Huffman encoded file is presented, exploiting the tendency of Huffman codes to resynchronize quickly, i.e. recovering after possible decoding errors, in most cases. The average number of bits that have to be processed until synchronization is analyzed and shows good agreement with empirical data. As Huffman coding is also a part of the JPEG image compression standard, the suggested algorithm is then adapted to the parallel decoding of JPEG files.

116 citations

Journal ArticleDOI
TL;DR: New universal and complete sequences of variable-length codewords are proposed, based on representing the integers in a binary Fibonacci numeration system, which can be used as alternatives to Huffman codes when the optimal compression of the latter is not required and simplicity, faster processing and robustness are preferred.

103 citations

Patent
Michael Hirsch, Haim Bitner1, Lior Aronovich1, Ron Asher1, Eitan Bachmat1, Shmuel T. Klein1 
29 Jul 2005
TL;DR: In this paper, the authors present a system that enables search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of input data.
Abstract: Systems and methods enabling search of a repository for the location of data that is similar to input data, using a defined measure of similarity, in a time that is independent of the size of the repository and linear in a size of the input data, and a space that is proportional to a small fraction of the size of the repository. Additionally, remote operations are accomplished with significantly reduced system bandwidth by implementing remote differencing operations.

86 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This tutorial introduces the key techniques in the area of text indexing, describing both a core implementation and how the core can be enhanced through a range of extensions.
Abstract: The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.

1,218 citations

DOI
01 Jan 1998
TL;DR: Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories.
Abstract: Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories The TDT problem consists of three major tasks: (1) segmenting a stream of data, especially recognized speech, into distinct stories; (2) identifying those news stories that are the first to discuss a new event occurring in the news; and (3) given a small number of sample news stories about an event, finding all following stories in the stream

1,097 citations

Journal ArticleDOI
TL;DR: A novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding, which can detect eavesdropping without joint quantum operations and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth.
Abstract: With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications.

812 citations

Journal ArticleDOI
TL;DR: A new implementation of arithmetic coding is described that incorporates several improvements over a widely used earlier version by Witten, Neal, and Cleary, which has become a de facto standard and a modular structure that separates the coding, modeling, and probability estimation components of a compression system is described.
Abstract: Over the last decade, arithmetic coding has emerged as an important compression tool. It is now the method of choice for adaptive coding on myltisymbol alphabets because of its speed, low storage requirements, and effectiveness of compression. This article describes a new implementation of arithmetic coding that incorporates several improvements over a widely used earlier version by Witten, Neal, and Cleary, which has become a de facto standard. These improvements include fewer multiplicative operations, greatly extended range of alphabet sizes and symbol probabilities, and the use of low-precision arithmetic, permitting implementation by fast shift/add operations. We also describe a modular structure that separates the coding, modeling, and probability estimation components of a compression system. To motivate the improved coder, we consider the needs of a word-based text compression program. We report a range of experimental results using this and other models. Complete source code is available.

569 citations