Home
/
Authors
/
Marcus Fontoura

Author

Marcus Fontoura

Other affiliations: Princeton University, Google, IBM ...read more

Bio: Marcus Fontoura is an academic researcher from Microsoft. The author has contributed to research in topics: Set (abstract data type) & Inverted index. The author has an hindex of 33, co-authored 122 publications receiving 3606 citations. Previous affiliations of Marcus Fontoura include Princeton University & Google.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms

[...]

Eli Cortez¹, Anand Bonde¹, Alexandre Muzio, Mark Russinovich¹, Marcus Fontoura¹, Ricardo Bianchini¹ - Show less +2 more•Institutions (1)

Microsoft¹

14 Oct 2017

TL;DR: An extensive characterization of Microsoft Azure's VM workload, including distributions of the VMs' lifetime, deployment size, and resource consumption is introduced, and Resource Central, a system that collects VM telemetry, learns these behaviors offline, and provides predictions online to various resource managers via a general client-side library is introduced.

...read moreread less

Abstract: Cloud research to date has lacked data on the characteristics of the production virtual machine (VM) workloads of large cloud providers. A thorough understanding of these characteristics can inform the providers' resource management systems, e.g. VM scheduler, power manager, server health manager. In this paper, we first introduce an extensive characterization of Microsoft Azure's VM workload, including distributions of the VMs' lifetime, deployment size, and resource consumption. We then show that certain VM behaviors are fairly consistent over multiple lifetimes, i.e. history is an accurate predictor of future behavior. Based on this observation, we next introduce Resource Central (RC), a system that collects VM telemetry, learns these behaviors offline, and provides predictions online to various resource managers via a general client-side library. As an example of RC's online use, we modify Azure's VM scheduler to leverage predictions in oversubscribing servers (with oversubscribable VM types), while retaining high VM performance. Using real VM traces, we then show that the prediction-informed schedules increase utilization and prevent physical resource exhaustion. We conclude that providers can exploit their workloads' characteristics and machine learning to improve resource management substantially.

...read moreread less

479 citations

Proceedings Article•DOI•

A semantic approach to contextual advertising

[...]

Andrei Z. Broder¹, Marcus Fontoura¹, Vanja Josifovski¹, Lance Riedel¹•Institutions (1)

Yahoo!¹

23 Jul 2007

TL;DR: A system for contextual ad matching based on a combination of semantic and syntactic features is proposed, which will help improve the user experience and reduce the number of irrelevant ads.

...read moreread less

Abstract: Contextual advertising or Context Match (CM) refers to the placement of commercial textual advertisements within the content of a generic web page, while Sponsored Search (SS) advertising consists in placing ads on result pages from a web search engine, with ads driven by the originating query. In CM there is usually an intermediary commercial ad-network entity in charge of optimizing the ad selection with the twin goal of increasing revenue (shared between the publisher and the ad-network) and improving the user experience. With these goals in mind it is preferable to have ads relevant to the page content, rather than generic ads. The SS market developed quicker than the CM market, and most textual ads are still characterized by "bid phrases" representing those queries where the advertisers would like to have their ad displayed. Hence, the first technologies for CM have relied on previous solutions for SS, by simply extracting one or more phrases from the given page content, and displaying ads corresponding to searches on these phrases, in a purely syntactic approach. However, due to the vagaries of phrase extraction, and the lack of context, this approach leads to many irrelevant ads. To overcome this problem, we propose a system for contextual ad matching based on a combination of semantic and syntactic features.

...read moreread less

356 citations

Proceedings Article•DOI•

Robust classification of rare queries using web knowledge

[...]

Andrei Z. Broder¹, Marcus Fontoura¹, Evgeniy Gabrilovich¹, Amruta Joshi¹, Vanja Josifovski¹, Tong Zhang¹ - Show less +2 more•Institutions (1)

Yahoo!¹

23 Jul 2007

TL;DR: This work proposes a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search engine.

...read moreread less

Abstract: We propose a methodology for building a practical robust query classification system that can identify thousands of query classes with reasonable accuracy, while dealing in real-time with the query volume of a commercial web search engine. We use a blind feedback technique: given a query, we determine its topic by classifying the web search results retrieved by the query. Motivated by the needs of search advertising, we primarily focus on rare queries, which are the hardest from the point of view of machine learning, yet in aggregation account for a considerable fraction of search engine traffic. Empirical evaluation confirms that our methodology yields a considerably higher classification accuracy than previously reported. We believe that the proposed methodology will lead to better matching of online ads to rare queries and overall to a better user experience.

...read moreread less

207 citations

Journal Article•DOI•

Querying XML streams

[...]

Vanja Josifovski¹, Marcus Fontoura¹, Attila Barta²•Institutions (2)

IBM¹, University of Toronto²

01 Apr 2005

TL;DR: The TurboXPath path processor is proposed, which accepts a language equivalent to a subset of the for-let-where constructs of XQuery over a single document, and can be extended to provide full XQuery support or used to augment federated database engines for efficient handling of queries over XML data streams produced by external sources.

...read moreread less

Abstract: Efficient querying of XML streams will be one of the fundamental features of next-generation information systems. In this paper we propose the TurboXPath path processor, which accepts a language equivalent to a subset of the for-let-where constructs of XQuery over a single document. TurboXPath can be extended to provide full XQuery support or used to augment federated database engines for efficient handling of queries over XML data streams produced by external sources. Internally, TurboXPath uses a tree-shaped path expression with multiple outputs to drive the execution. The result of a query execution is a sequence of tuples of XML fragments matching the output nodes. Based on a streamed execution model, TurboXPath scales up to large documents and has limited memory consumption for increased concurrency. Experimental evaluation of a prototype demonstrates performance gains compared to other state-of-the-art path processors.

...read moreread less

172 citations

Patent•

System and method for querying xml streams

[...]

Marcus Fontoura¹, Vanja Josifovsld¹•Institutions (1)

IBM¹

14 Apr 2003

TL;DR: In this paper, a system and method for querying a stream of XML data in a single pass using standard XQuery expressions is presented, consisting of an expression parser that receives a query and generates a parse tree; a SAX events API that receives the stream of XQuery data and generates an evaluator that receives parse trees and stream of events and buffers fragments from the stream.

...read moreread less

Abstract: A system and method for querying a stream of XML data in a single pass using standard XQuery expressions. The system comprises: an expression parser that receives a query and generates a parse tree; a SAX events API that receives the stream of XML data and generates a stream of SAX events; an evaluator that receives the parse tree and stream of SAX events and buffers fragments from the stream of SAX events that meet an evaluation criteria; and a tuple constructor that joins fragments to form a set of tuple results that satisfies the query for the stream of XML data.

...read moreread less

133 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Collapse

Cited by

PDF

Open Access

More filters

[서평]「Applied Cryptography」

[...]

염흥렬

01 Apr 1997

TL;DR: The objective of this paper is to give a comprehensive introduction to applied cryptography with an engineer or computer scientist in mind on the knowledge needed to create practical systems which supports integrity, confidentiality, or authenticity.

...read moreread less

Abstract: The objective of this paper is to give a comprehensive introduction to applied cryptography with an engineer or computer scientist in mind. The emphasis is on the knowledge needed to create practical systems which supports integrity, confidentiality, or authenticity. Topics covered includes an introduction to the concepts in cryptography, attacks against cryptographic systems, key use and handling, random bit generation, encryption modes, and message authentication codes. Recommendations on algorithms and further reading is given in the end of the paper. This paper should make the reader able to build, understand and evaluate system descriptions and designs based on the cryptographic components described in the paper.

...read moreread less

2,188 citations

[서평]「The Unified Modeling Language User Guide」

[...]

강문설

01 Dec 1999

1,636 citations

[서평]「Component Software」 - Beyond Object-Oriented Programming -

[...]

하수철

01 Apr 2000

1,154 citations

Book•

Search Engines: Information Retrieval in Practice

[...]

W. Bruce Croft, Donald Metzler, Trevor Strohman

16 Feb 2009

TL;DR: This text provides the background and tools needed to evaluate, compare and modify search engines and numerous programming exercises make extensive use of Galago, a Java-based open source search engine.

...read moreread less

Abstract: KEY BENEFIT: Written by a leader in the field of information retrieval, this text provides the background and tools needed to evaluate, compare and modify search engines. KEY TOPICS: Coverage of the underlying IR and mathematical models reinforce key concepts. Numerous programming exercises make extensive use of Galago, a Java-based open source search engine. MARKET: A valuable tool for search engine and information retrieval professionals.

...read moreread less

1,050 citations

Book•

Automatic Summarization

[...]

Ani Nenkova¹, Sameer Maskey², Yang Liu³•Institutions (3)

University of Pennsylvania¹, IBM², University of Texas at Dallas³

27 Jun 2011

TL;DR: The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed.

...read moreread less

Abstract: It has now been 50 years since the publication of Luhn’s seminal paper on automatic summarization. During these years the practical need for automatic summarization has become increasingly urgent and numerous papers have been published on the topic. As a result, it has become harder to find a single reference that gives an overview of past efforts or a complete view of summarization tasks and necessary system components. This article attempts to fill this void by providing a comprehensive overview of research in summarization, including the more traditional efforts in sentence extraction as well as the most novel recent approaches for determining important content, for domain and genre specific summarization and for evaluation of summarization. We also discuss the challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field. We would like to thank the anonymous reviewers, our students and Noemie Elhadad, Hongyan Jing, Julia Hirschberg, Annie Louis, Smaranda Muresan and Dragomir Radev for their helpful feedback. This paper was supported in part by the U.S. National Science Foundation (NSF) under IIS-05-34871 and CAREER 09-53445. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. Full text available at: http://dx.doi.org/10.1561/1500000015

...read moreread less

697 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse