Home
/
Authors
/
Monika H. Henzinger

Author

Monika H. Henzinger

Bio: Monika H. Henzinger is an academic researcher from Google. The author has contributed to research in topics: Web search query & Query expansion. The author has an hindex of 13, co-authored 22 publications receiving 1695 citations.

Papers

PDF

Open Access

More filters

Patent•

Detecting duplicate and near-duplicate files

[...]

Monika H. Henzinger¹•Institutions (1)

Google¹

03 Aug 2007

TL;DR: Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by extracting parts from the document, assigning the extracted parts to one or more of a predetermined number of lists, and generating a fingerprint from each of the populated lists as mentioned in this paper.

...read moreread less

Abstract: Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.

...read moreread less

528 citations

Patent•

Methods and Apparatus for Employing Usage Statistics in Document Retrieval

[...]

Jeffrey Dean¹, Benedict A. Gomes¹, Krishna Bharat¹, Georges R. Harik¹, Monika H. Henzinger¹ - Show less +1 more•Institutions (1)

Google¹

02 Mar 2001

TL;DR: In this paper, a search query is received and a list of responsive documents is identified, and the responsive documents are organized based in whole or in part on usage statistics, based on the search query.

...read moreread less

Abstract: Methods and apparatus consistent with the invention provide improved organization of documents responsive to a search query. In one embodiment, a search query is received and a list of responsive documents is identified. The responsive documents are organized based in whole or in part on usage statistics.

...read moreread less

304 citations

Patent•

Voice interface for a search engine

[...]

Alexander Franz¹, Monika H. Henzinger¹, Sergey Brin¹, Brian Milch¹•Institutions (1)

Google¹

07 Feb 2001

TL;DR: In this article, a system receives a voice search query from a user, derives one or more recognition hypotheses, each associated with a weight, from the voice search queries, and constructs a weighted boolean query using the recognition hypotheses.

...read moreread less

Abstract: A system provides search results from a voice search query. The system receives a voice search query from a user, derives one or more recognition hypotheses, each being associated with a weight, from the voice search query, and constructs a weighted boolean query using the recognition hypotheses. The system then provides the weighted boolean query to a search system and provides the results of the search system to a user.

...read moreread less

199 citations

Patent•

Document ranking based on semantic distance between terms in a document

[...]

Georges R. Harik¹, Monika H. Henzinger¹•Institutions (1)

Google¹

31 Mar 2004

TL;DR: In this article, techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document, which can be used in the calculation of distance values between terms in the documents.

...read moreread less

Abstract: Techniques are disclosed that locate implicitly defined semantic structures in a document, such as, for example, implicitly defined lists in an HTML document. The semantic structures can be used in the calculation of distance values between terms in the documents. The distance values may be used, for example, in the generation of ranking scores that indicate a relevance level of the document to a search query.

...read moreread less

181 citations

Patent•

Systems and methods for using anchor text as parallel corpora for cross-language information retrieval

[...]

Luis Gravano¹, Monika H. Henzinger¹•Institutions (1)

Google¹

30 Jun 2011

TL;DR: In this paper, a system performs cross-language query translations by locating documents in the first language that contain references that match the terms of the search query and identifying documents in second language.

...read moreread less

Abstract: A system performs cross-language query translations. The system receives a search query that includes terms in a first language and determines possible translations of the terms of the search query into a second language. The system also locates documents for use as parallel corpora to aid in the translation by: (1) locating documents in the first language that contain references that match the terms of the search query and identify documents in the second language; (2) locating documents in the first language that contain references that match the terms of the query and refer to other documents in the first language and identify documents in the second language that contain references to the other documents; or (3) locating documents in the first language that match the terms of the query and identify documents in the second language that contain references to the documents in the first language. The system may use the second language documents as parallel corpora to disambiguate among the possible translations of the terms of the search query and identify one of the possible translations as a likely translation of the search query into the second language.

...read moreread less

114 citations

1
2
3
4
…
5

Cited by

PDF

Open Access

More filters

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

Proceedings Article•DOI•

A Syntax-based Statistical Translation Model

[...]

Kenji Yamada¹, Kevin Knight¹•Institutions (1)

University of Southern California¹

06 Jul 2001

TL;DR: This model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node, and produces word alignments that are better than those produced by IBM Model 5.

...read moreread less

Abstract: We present a syntax-based statistical translation model. Our model transforms a source-language parse tree into a target-language string by applying stochastic operations at each node. These operations capture linguistic differences such as word order and case marking. Model parameters are estimated in polynomial time using an EM algorithm. The model produces word alignments that are better than those produced by IBM Model 5.

...read moreread less

924 citations

Patent•

Serving advertisements based on content

[...]

Darrell Anderson¹, Paul T. Buchheit¹, Alexander Paul Carobus¹, Yingwei Cui¹, Jeffrey Dean¹, Georges R. Harik¹, Deepak Jindal¹, Narayanan Shivakumar¹ - Show less +4 more•Institutions (1)

Google¹

24 Sep 2003

TL;DR: In this article, the authors present a method for placing targeted ads on page on the web (or some other document of any media type) by obtaining content that includes available spots for ads, determining ads relevant to content, and/or combining content with ads determined to be relevant to the content.

...read moreread less

Abstract: Advertisers are permitted to put targeted ads on page on the web (or some other document of any media type). The present invention may do so by (i) obtaining content that includes available spots for ads, (ii) determining ads relevant to content, and/or (iii) combining content with ads determined to be relevant to the content.

...read moreread less

809 citations

Patent•

Contextual Mobile Content Placement on a Mobile Communication Facility

[...]

Jorey Ramer, Adam Soroca, Dennis Doughty

12 Jun 2009

TL;DR: In this article, improved capabilities are described for displaying mobile content in association with a website on a mobile communication facility based at least in part on receiving a website request from a mobile carrier gateway, receiving contextual information relating to the requested website, associating the received contextual information with a mobile content, and finally displaying the mobile content with the website on mobile communication facilities.

...read moreread less

Abstract: In embodiments of the present invention improved capabilities are described for displaying mobile content in association with a website on a mobile communication facility based at least in part on receiving a website request from a mobile carrier gateway, receiving contextual information relating to the requested website, associating the received contextual information with a mobile content, and, finally, displaying the mobile content with the website on a mobile communication facility.

...read moreread less

675 citations

Proceedings Article•DOI•

Detecting near-duplicates for web crawling

[...]

Gurmeet Singh Manku¹, Arvind Jain¹, Anish Das Sarma²•Institutions (2)

Google¹, Stanford University²

08 May 2007

TL;DR: This work demonstrates that Charikar's fingerprinting technique is appropriate for near-duplicate detection and presents an algorithmic technique for identifying existing f-bit fingerprints that differ from a given fingerprint in at most k bit-positions, for small k.

...read moreread less

Abstract: Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not. In the course of developing a near-duplicate detection system for a multi-billion page repository, we make two research contributions. First, we demonstrate that Charikar's fingerprinting technique is appropriate for this goal. Second, we present an algorithmic technique for identifying existing f-bit fingerprints that differ from a given fingerprint in at most k bit-positions, for small k. Our technique is useful for both online queries (single fingerprints) and all batch queries (multiple fingerprints). Experimental evaluation over real data confirms the practicality of our design.

...read moreread less

631 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse