scispace - formally typeset
Patent

Method for estimating the probability of collisions of fingerprints

Reads0
Chats0
TLDR
In this paper, the probability of a collision among fingerprints of dissimilar strings is estimated by a computerized method, and the number of matching matching fingerprints is recorded regarding the number.
Abstract
Strings, such as Web pages or other documents, are fingerprinted in order to detect substantially similar strings, so as to avoid processing duplicate strings. At the same time determine a computerized method estimates the probability that a collision among fingerprints of dissimilar strings. As fingerprints are generated for strings presented for processing, when the fingerprint of a string is determined not to be identical to any fingerprint in a set of stored fingerprints, the new fingerprint is masked and the unmasked portion of the fingerprint is compared with a corresponding portion of the fingerprints in the stored set. Information is recorded regarding the number of matching masked fingerprints.

read more

Citations
More filters
Patent

A method, apparatus, and system for clustering and classification

Seth Patinkin
TL;DR: In this article, a method, apparatus and system for classification and clustering electronic data streams such as email, images and sound files for identification, sorting and efficient storage is presented.
Patent

Processing of textual electronic communication distributed in bulk

TL;DR: In this paper, a process for blocking electronic text communication distributed in bulk is described, in which a first electronic and a second electronic submission are received, and a first code is determined for the first portion and the second code for the second portion.
Patent

Unsolicited electronic mail reduction

TL;DR: In this article, a method for automatically detecting unsolicited electronic mail from a mailer and automatically notifying facilitators of the mailer of the unsolicited e-mail is disclosed.
Patent

Processing of unsolicited bulk electronic mail

TL;DR: In this paper, a method for automatically processing electronic mail loads an electronic mail message is presented. But the method is not suitable for the handling of large amounts of unsolicited electronic mail distributed in bulk.
Patent

Web crawler system using parallel queues for queing data sets having common address and concurrently downloading data associated with data set in each queue

TL;DR: In this paper, a method and system for scheduling downloads in a web crawler is presented, where each thread enqueues URL's as new URL's are discovered in the course of downloading web pages.
References
More filters
Journal ArticleDOI

Universal classes of hash functions

TL;DR: An input independent average linear time algorithm for storage and retrieval on keys that makes a random choice of hash function from a suitable class of hash functions.
Journal ArticleDOI

Probabilistic Algorithms in Finite Fields

TL;DR: Probabilistic algorithms for the problems of finding an irreducible polynomial of degree n over a finite field, finding roots of a polynometric, and factoring aPolynomial into its irredUCible factors over a infinite field are presented.
Book ChapterDOI

Some applications of Rabin’s fingerprinting method

TL;DR: This paper presents an implementation and several applications of BenRabin's fingerprinting scheme that take considerable advantage of its algebraic properties.
Patent

Method and apparatus for recognizing a bit pattern in a string of bits, altering the string of bits, and removing the alteration from the string of bits

TL;DR: In this paper, a method and apparatus for recognizing a search pattern in an input string of bits is provided, and a determination is then made whether the currently selected portion contains any of the occurrence patterns in the first set of occurrence patterns.
Patent

System and method utilizing multiple search trees to route data within a data processing network

TL;DR: In this article, a search tree is utilized within the bridge/router in order to store addresses, which is used by the router for more efficiently and more quickly determining where one or more received frames of data are to be transmitted within the network.