Home
/
Authors
/
Huaiyu Zhu

Author

Huaiyu Zhu

Other affiliations: Aston University, Santa Fe Institute

Bio: Huaiyu Zhu is an academic researcher from IBM. The author has contributed to research in topics: Information extraction & Bayesian probability. The author has an hindex of 18, co-authored 56 publications receiving 1242 citations. Previous affiliations of Huaiyu Zhu include Aston University & Santa Fe Institute.

Papers published on a yearly basis

2022
2021
2019
2018
2017
2016
2015
2014
2013
2012
2011
2009
2008
2007
2006
2005
2004
2003
1998
1997
1996
1995

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SystemT: a system for declarative information extraction

[...]

Rajasekar Krishnamurthy¹, Yunyao Li¹, Sriram Raghavan¹, Frederick Reiss¹, Shivakumar Vaithyanathan¹, Huaiyu Zhu¹ - Show less +2 more•Institutions (1)

IBM¹

20 Mar 2009

TL;DR: The extraction algebra is described and the effectiveness of the optimization techniques in providing orders of magnitude reduction in the running time of complex extraction tasks are demonstrated.

...read moreread less

Abstract: As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) -- the discipline concerned with extracting structured information from unstructured text. Classical IE techniques developed by the NLP community were based on cascading grammars and regular expressions. However, due to the inherent limitations of grammarbased extraction, these techniques are unable to: (i) scale to large data sets, and (ii) support the expressivity requirements of complex information tasks. At the IBM Almaden Research Center, we are developing SystemT, an IE system that addresses these limitations by adopting an algebraic approach. By leveraging well-understood database concepts such as declarative queries and costbased optimization, SystemT enables scalable execution of complex information extraction tasks. In this paper, we motivate the SystemT approach to information extraction. We describe our extraction algebra and demonstrate the effectiveness of our optimization techniques in providing orders of magnitude reduction in the running time of complex extraction tasks.

...read moreread less

160 citations

Proceedings Article•DOI•

An Algebraic Approach to Rule-Based Information Extraction

[...]

Frederick Ralph Reiss¹, Sriram Raghavan¹, Rajasekar Krishnamurthy¹, Huaiyu Zhu¹, Shivakumar Vaithyanathan¹ - Show less +1 more•Institutions (1)

IBM¹

07 Apr 2008

TL;DR: This work proposes an algebraic approach to rule-based IE that addresses scalability issues through query optimization and presents the operators of this algebra and proposes several optimization strategies motivated by the text-specific characteristics of the operators.

...read moreread less

Abstract: Traditional approaches to rule-based information extraction (IE) have primarily been based on regular expression grammars However, these grammar-based systems have difficulty scaling to large data sets and large numbers of rules Inspired by traditional database research, we propose an algebraic approach to rule-based IE that addresses these scalability issues through query optimization The operators of our algebra are motivated by our experience in building several rule-based extraction programs over diverse data sets We present the operators of our algebra and propose several optimization strategies motivated by the text-specific characteristics of our operators Finally we validate the potential benefits of our approach by extensive experiments over real-world blog data

...read moreread less

129 citations

Gaussian regression and optimal finite dimensional linear models

[...]

Huaiyu Zhu¹, Christopher Williams², Richard Rohwer, Michal Morciniec³•Institutions (3)

Santa Fe Institute¹, Aston University², Hewlett-Packard³

03 Jul 1997

TL;DR: The problem of regression under Gaussian assumptions is treated in this paper, where the relationship between Bayesian prediction, regularization and smoothing is elucidated and the ideal regression is the posterior mean and its computation scales as O(n 3 ).

...read moreread less

Abstract: The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n 3 ) , where n is the sample size. We show that the optimal m -dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.

...read moreread less

114 citations

Proceedings Article•DOI•

Avatar semantic search: a database approach to information retrieval

[...]

Eser Kandogan¹, Rajasekar Krishnamurthy¹, Sriram Raghavan¹, Shivakumar Vaithyanathan¹, Huaiyu Zhu¹ - Show less +1 more•Institutions (1)

IBM¹

27 Jun 2006

TL;DR: In this demonstration the overall architecture of the Avatar Semantic Search engine is described and the superiority of the AVATAR approach over traditional keyword search engines using Enron email data set and a blog corpus is demonstrated.

...read moreread less

Abstract: We present Avatar Semantic Search, a prototype search engine that exploits annotations in the context of classical keyword search. The process of annotations is accomplished offline by using high-precision information extraction techniques to extract facts, con-cepts, and relationships from text. These facts and concepts are represented and indexed in a structured data store. At runtime, keyword queries are interpreted in the context of these extracted facts and converted into one or more precise queries over the structured store. In this demonstration we describe the overall architecture of the Avatar Semantic Search engine. We also demonstrate the superiority of the AVATAR approach over traditional keyword search engines using Enron email data set and a blog corpus.

...read moreread less

90 citations

Proceedings Article•DOI•

Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling

[...]

Alan Akbik¹, Laura Chiticariu², Marina Danilevsky², Yunyao Li², Shivakumar Vaithyanathan², Huaiyu Zhu² - Show less +2 more•Institutions (2)

Technical University of Berlin¹, IBM²

01 Jul 2015

TL;DR: This paper presents a two-stage method to enable the construction of SRL models for resourcepoor languages by exploiting monolingual SRL and multilingual parallel data and shows that this method outperforms existing methods.

...read moreread less

Abstract: Semantic role labeling (SRL) is crucial to natural language understanding as it identifies the predicate-argument structure in text with semantic labels. Unfortunately, resources required to construct SRL models are expensive to obtain and simply do not exist for most languages. In this paper, we present a two-stage method to enable the construction of SRL models for resourcepoor languages by exploiting monolingual SRL and multilingual parallel data. Experimental results show that our method outperforms existing methods. We use our method to generate Proposition Banks with high to reasonable quality for 7 languages in three language families and release these resources to the research community.

...read moreread less

88 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Book•

Gaussian Processes for Machine Learning

[...]

Carl Edward Rasmussen¹, Christopher Williams•Institutions (1)

Max Planck Society¹

23 Nov 2005

TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.

...read moreread less

Abstract: A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

...read moreread less

11,357 citations

Book•

Methods of information geometry

[...]

Shun-ichi Amari¹, Hiroshi Nagaoka²•Institutions (2)

RIKEN Brain Science Institute¹, University of Electro-Communications²

01 Jan 2000

2,534 citations

Journal Article•

A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models

[...]

Jeff A. Bilmes¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1998-CTIT technical reports series

TL;DR: In this paper, the authors describe the EM algorithm for finding the parameters of a mixture of Gaussian densities and a hidden Markov model (HMM) for both discrete and Gaussian mixture observation models.

...read moreread less

Abstract: We describe the maximum-likelihood parameter estimation problem and how the ExpectationMaximization (EM) algorithm can be used for its solution. We first describe the abstract form of the EM algorithm as it is often given in the literature. We then develop the EM parameter estimation procedure for two applications: 1) finding the parameters of a mixture of Gaussian densities, and 2) finding the parameters of a hidden Markov model (HMM) (i.e., the Baum-Welch algorithm) for both discrete and Gaussian mixture observation models. We derive the update equations in fairly explicit detail but we do not prove any convergence properties. We try to emphasize intuition rather than mathematical rigor.

...read moreread less

2,455 citations

Journal Article•DOI•

The lack of a priori distinctions between learning algorithms

[...]

David H. Wolpert¹•Institutions (1)

Santa Fe Institute¹

01 Oct 1996-Neural Computation

TL;DR: It is shown that one cannot say: if empirical misclassification rate is low, the Vapnik-Chervonenkis dimension of your generalizer is small, and the training set is large, then with high probability your OTS error is small.

...read moreread less

Abstract: This is the first of two papers that use off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are “as many” targets (or priors over targets) for which A has lower expected OTS error than B as vice versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is “anti-cross-validation” (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one cannot say: if empirical misclassification rate is low, the Vapnik-Chervonenkis dimension of your generalizer is small, and the training set is large, then with high probability your OTS error is small. Other implications for “membership queries” algorithms and “punting” algorithms are also discussed.

...read moreread less

1,371 citations

Journal Article•DOI•

Inference for the Generalization Error

[...]

Claude Nadeau¹, Yoshua Bengio²•Institutions (2)

CIRANO¹, Université de Montréal²

29 Nov 1999

TL;DR: This work performs a theoretical investigation of the variance of a variant of the cross-validation estimator of the generalization error that takes into account the variability due to the randomness of the training set as well as test examples and proposes new estimators of this variance.

...read moreread less

Abstract: In order to compare learning algorithms, experimental results reported in the machine learning literature often use statistical tests of significance to support the claim that a new learning algorithm generalizes better. Such tests should take into account the variability due to the choice of training set and not only that due to the test examples, as is often the case. This could lead to gross underestimation of the variance of the cross-validation estimator, and to the wrong conclusion that the new algorithm is significantly better when it is not. We perform a theoretical investigation of the variance of a variant of the cross-validation estimator of the generalization error that takes into account the variability due to the randomness of the training set as well as test examples. Our analysis shows that all the variance estimators that are based only on the results of the cross-validation experiment must be biased. This analysis allows us to propose new estimators of this variance. We show, via simulations, that tests of hypothesis about the generalization error using those new variance estimators have better properties than tests involving variance estimators currently in use and listed in Dietterich (1998). In particular, the new tests have correct size and good power. That is, the new tests do not reject the null hypothesis too often when the hypothesis is true, but they tend to frequently reject the null hypothesis when the latter is false.

...read moreread less

925 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse