scispace - formally typeset
Search or ask a question
Author

Monika H. Henzinger

Bio: Monika H. Henzinger is an academic researcher from Google. The author has contributed to research in topics: Web search query & Query expansion. The author has an hindex of 13, co-authored 22 publications receiving 1695 citations.

Papers
More filters
Patent
Monika H. Henzinger1
03 Aug 2007
TL;DR: Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by extracting parts from the document, assigning the extracted parts to one or more of a predetermined number of lists, and generating a fingerprint from each of the populated lists as mentioned in this paper.
Abstract: Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.

528 citations

Patent
02 Mar 2001
TL;DR: In this paper, a search query is received and a list of responsive documents is identified, and the responsive documents are organized based in whole or in part on usage statistics, based on the search query.
Abstract: Methods and apparatus consistent with the invention provide improved organization of documents responsive to a search query. In one embodiment, a search query is received and a list of responsive documents is identified. The responsive documents are organized based in whole or in part on usage statistics.

304 citations

Patent
07 Feb 2001
TL;DR: In this article, a system receives a voice search query from a user, derives one or more recognition hypotheses, each associated with a weight, from the voice search queries, and constructs a weighted boolean query using the recognition hypotheses.
Abstract: A system provides search results from a voice search query. The system receives a voice search query from a user, derives one or more recognition hypotheses, each being associated with a weight, from the voice search query, and constructs a weighted boolean query using the recognition hypotheses. The system then provides the weighted boolean query to a search system and provides the results of the search system to a user.

199 citations

Patent
31 Mar 2004
TL;DR: In this article, techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document, which can be used in the calculation of distance values between terms in the documents.
Abstract: Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents. The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search query.

181 citations

Patent
30 Jun 2011
TL;DR: In this paper, a system performs cross-language query translations by locating documents in the first language that contain references that match the terms of the search query and identifying documents in second language.
Abstract: A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

114 citations


Cited by
More filters
Patent
11 Jan 2011
TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

1,462 citations

Proceedings ArticleDOI
06 Jul 2001
TL;DR: This model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node, and produces word alignments that are better than those produced by IBM Model 5.
Abstract: We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are estimated in polynomial time using an EM algorithm. The model produces word alignments that are better than those produced by IBM Model 5.

924 citations

Patent
24 Sep 2003
TL;DR: In this article, the authors present a method for placing targeted ads on page on the web (or some other document of any media type) by obtaining content that includes available spots for ads, determining ads relevant to content, and/or combining content with ads determined to be relevant to the content.
Abstract: Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content.

809 citations

Patent
12 Jun 2009
TL;DR: In this article, improved capabilities are described for displaying mobile content in association with a website on a mobile communication facility based at least in part on receiving a website request from a mobile carrier gateway, receiving contextual information relating to the requested website, associating the received contextual information with a mobile content, and finally displaying the mobile content with the website on mobile communication facilities.
Abstract: In embodiments of the present invention improved capabilities are described for displaying mobile content in association with a website on a mobile communication facility based at least in part on receiving a website request from a mobile carrier gateway, receiving contextual information relating to the requested website, associating the received contextual information with a mobile content, and, finally, displaying the mobile content with the website on a mobile communication facility.

675 citations

Proceedings ArticleDOI
08 May 2007
TL;DR: This work demonstrates that Charikar's fingerprinting technique is appropriate for near-duplicate detection and presents an algorithmic technique for identifying existing f-bit fingerprints that differ from a given fingerprint in at most k bit-positions, for small k.
Abstract: Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not. In the course of developing a near-duplicate detection system for a multi-billion page repository, we make two research contributions. First, we demonstrate that Charikar's fingerprinting technique is appropriate for this goal. Second, we present an algorithmic technique for identifying existing f-bit fingerprints that differ from a given fingerprint in at most k bit-positions, for small k. Our technique is useful for both online queries (single fingerprints) and all batch queries (multiple fingerprints). Experimental evaluation over real data confirms the practicality of our design.

631 citations