Home
/
Authors
/
John G. Cleary

Author

John G. Cleary

Other affiliations: University of Calgary

Bio: John G. Cleary is an academic researcher from University of Waikato. The author has contributed to research in topics: Discrete event simulation & Arithmetic coding. The author has an hindex of 24, co-authored 79 publications receiving 8455 citations. Previous affiliations of John G. Cleary include University of Calgary.

Papers published on a yearly basis

2015
2014
2013
2011
2005
2002
2000
1999
1998
1997
1996
1995
1994
1993
1992
1990
1989
1988
1987
1986
1985
1984
1983
1982

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Arithmetic coding for data compression

[...]

Ian H. Witten¹, Radford M. Neal¹, John G. Cleary¹•Institutions (1)

University of Calgary¹

01 Jun 1987-Communications of The ACM

TL;DR: The state of the art in data compression is arithmetic coding, not the better-known Huffman method, which gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.

...read moreread less

Abstract: The state of the art in data compression is arithmetic coding, not the better-known Huffman method. Arithmetic coding gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.

...read moreread less

3,188 citations

Journal Article•DOI•

Data Compression Using Adaptive Coding and Partial String Matching

[...]

John G. Cleary¹, Ian H. Witten¹•Institutions (1)

University of Calgary¹

01 Apr 1984-IEEE Transactions on Communications

TL;DR: This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.

...read moreread less

Abstract: The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during the course of the transmission, and has been shown to incur a smaller coding overhead than explicit transmission of the model's statistics. But there is a basic conflict between the desire to use high-order Markov models and the need to have them formed quickly as the initial part of the message is sent. This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.

...read moreread less

1,318 citations

Book•

Text Compression

[...]

Tim Bell, John G. Cleary, Ian H. Witten

01 Feb 1990

1,149 citations

Book Chapter•DOI•

K*: An Instance-based Learner Using an Entropic Distance Measure

[...]

John G. Cleary¹, Leonard Eric Trigg¹•Institutions (1)

University of Waikato¹

01 Jan 1995

TL;DR: K*, an instance-based learner that uses entropy as a distance measure, is described, and results that compare favourably with several machine learning algorithms are presented.

...read moreread less

Abstract: The use of entropy as a distance measure has several benefits. It provides a consistent approach to handling symbolic attributes, real valued attributes and missing values. We discuss the approach of taking all possible transformation paths between instances. We describe K*, an instance-based learner that uses such a measure, and present results that compare favourably with several machine learning algorithms.

...read moreread less

759 citations

Journal Article•DOI•

Modeling for text compression

[...]

Tim Bell¹, Ian H. Witten², John G. Cleary²•Institutions (2)

University of Canterbury¹, University of Calgary²

01 Dec 1989-ACM Computing Surveys

TL;DR: This paper surveys successful strategies for adaptive modeling that are suitable for use in practical text compression systems, and falls into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one.

...read moreread less

Abstract: The best schemes for text compression use large models to help them predict which characters will come next. The actual next characters are coded with respect to the prediction, resulting in compression of information. Models are best formed adaptively, based on the text seen so far. This paper surveys successful strategies for adaptive modeling that are suitable for use in practical text compression systems.The strategies fall into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one; finite-state modeling, in which the distribution is conditioned by the current state (and which subsumes finite-context modeling as an important special case); and dictionary modeling, in which strings of characters are replaced by pointers into an evolving dictionary. A comparison of different methods on the same sample texts is included, along with an analysis of future research directions.

...read moreread less

315 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16

Collapse

Cited by

PDF

Open Access

More filters

Book•

Data Mining: Practical Machine Learning Tools and Techniques

[...]

Ian H. Witten, Eibe Frank, Mark Hall

25 Oct 1999

TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.

...read moreread less

Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

...read moreread less

20,196 citations

Book•

A wavelet tour of signal processing

[...]

Stéphane Mallat

01 Jan 1998

TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.

...read moreread less

Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

...read moreread less

17,693 citations

Journal Article•DOI•

Metagenomic biomarker discovery and explanation

[...]

Nicola Segata¹, Jacques Izard¹, Jacques Izard², Levi Waldron¹, Dirk Gevers³, Larisa Miropolsky¹, Wendy S. Garrett¹, Curtis Huttenhower¹ - Show less +4 more•Institutions (3)

Harvard University¹, The Forsyth Institute², Broad Institute³

24 Jun 2011-Genome Biology

TL;DR: A new method for metagenomic biomarker discovery is described and validates by way of class comparison, tests of biological consistency and effect size estimation to address the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities.

...read moreread less

Abstract: This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/.

...read moreread less

9,057 citations

Book•

Information Theory, Inference and Learning Algorithms

[...]

David J. C. MacKay

06 Oct 2003

TL;DR: A fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.

...read moreread less

Abstract: Fun and exciting textbook on the mathematics underpinning the most dynamic areas of modern science and engineering.

...read moreread less

8,091 citations

Journal Article•

Natural Language Processing (Almost) from Scratch

[...]

Ronan Collobert, Jason Weston¹, Léon Bottou, Michael Karlen, Koray Kavukcuoglu², Pavel P. Kuksa³ - Show less +2 more•Institutions (3)

Google¹, New York University², Rutgers University³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling is proposed.

...read moreread less

Abstract: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.

...read moreread less

6,734 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse