Home
/
Authors
/
Alberto Segre

Author

Alberto Segre

Bio: Alberto Segre is an academic researcher. The author has contributed to research in topics: Computer science & Automaton. The author has an hindex of 1, co-authored 1 publications receiving 7843 citations.

Papers

PDF

Open Access

More filters

Programs for Machine Learning

[...]

Steven L. Salzberg¹, Alberto Segre•Institutions (1)

Johns Hopkins University¹

01 Jan 1994

TL;DR: In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments, which will be a welcome addition to the library of many researchers and students.

...read moreread less

Abstract: Algorithms for constructing decision trees are among the most well known and widely used of all machine learning methods. Among decision tree algorithms, J. Ross Quinlan's ID3 and its successor, C4.5, are probably the most popular in the machine learning community. These algorithms and variations on them have been the subject of numerous research papers since Quinlan introduced ID3. Until recently, most researchers looking for an introduction to decision trees turned to Quinlan's seminal 1986 Machine Learning journal article [Quinlan, 1986]. In his new book, C4.5: Programs for Machine Learning, Quinlan has put together a definitive, much needed description of his complete system, including the latest developments. As such, this book will be a welcome addition to the library of many researchers and students.

...read moreread less

8,046 citations

Journal Article•DOI•

Risk for Asymptomatic Household Transmission of Clostridioides difficile Infection Associated with Recently Hospitalized Family Members

[...]

Aaron C. Miller, Alan Arakkal, Daniel K. Sewell, Alberto Segre, Sriram V. Pemmaraju, Philip M. Polgreen - Show less +2 more

01 May 2022-Emerging Infectious Diseases

TL;DR: CDI incidence among insurance enrollees exposed to a recently hospitalized family member was 73% greater than enrollees not exposed, and incidence increased with length of hospitalization among family members.

...read moreread less

Abstract: We evaluated whether hospitalized patients without diagnosed Clostridioides difficile infection (CDI) increased the risk for CDI among their family members after discharge. We used 2001–2017 US insurance claims data to compare monthly CDI incidence between persons in households with and without a family member hospitalized in the previous 60 days. CDI incidence among insurance enrollees exposed to a recently hospitalized family member was 73% greater than enrollees not exposed, and incidence increased with length of hospitalization among family members. We identified a dose-response relationship between total days of within-household hospitalization and CDI incidence rate ratio. Compared with persons whose family members were hospitalized <1 day, the incidence rate ratio increased from 1.30 (95% CI 1.19–1.41) for 1–3 days of hospitalization to 2.45 (95% CI 1.66–3.60) for >30 days of hospitalization. Asymptomatic C. difficile carriers discharged from hospitals could be a major source of community-associated CDI cases.

...read moreread less

9 citations

Proceedings Article•DOI•

AdaAX: Explaining Recurrent Neural Networks by Learning Automata with Adaptive States

[...]

Dat Hong, Alberto Segre, Tong Wang

14 Aug 2022

TL;DR: A new method to construct deterministic finite automata to explain RNN, which identifies small sets of hidden states determined by patterns with finer granularity in data, and allows the automata states to be formed adaptively during the extraction.

...read moreread less

Abstract: Recurrent neural networks (RNN) are widely used for handling sequence data. However, their black-box nature makes it difficult for users to interpret the decision-making process. We propose a new method to construct deterministic finite automata to explain RNN. In an automaton, states are abstracted from hidden states produced by the RNN, and the transitions represent input symbols. Thus, users can follow the paths of transitions, called patterns, to understand how a prediction is produced. Existing methods for extracting automata partition the hidden state space at the beginning of the extraction, which often leads to solutions that are either inaccurate or too large in size to comprehend. Unlike previous methods, our approach allows the automata states to be formed adaptively during the extraction. Instead of defining patterns on pre-determined clusters, our proposed model, AdaAX, identifies small sets of hidden states determined by patterns with finer granularity in data. Then these small sets are gradually merged to form states, allowing users to trade fidelity for lower complexity. Experiments show that our automata can achieve higher fidelity while being significantly smaller in size than baseline methods on synthetic and complex real datasets.

...read moreread less

Proceedings Article•DOI•

AdaAX

[...]

Dat Hong, Alberto Segre, Tong Wang

14 Aug 2022

TL;DR: In this paper , the authors proposed AdaAX, which allows the automata states to be formed adaptively during the extraction by identifying small sets of hidden states determined by patterns with finer granularity in data.

...read moreread less

Proceedings Article•DOI•

Near-Optimal Spectral Disease Mitigation in Healthcare Facilities

[...]

Masahiro Kiji, D. M. Hasibul Hasan, Alberto Segre, Sriram V. Pemmaraju, Bijaya Adhikari - Show less +1 more

01 Nov 2022

TL;DR: In this article , the authors presented the first epidemic threshold results on temporal bipartite networks for the susceptible-infected-Susceptible (SIS) model and leverage their epidemic threshold result to pose the HAI mitigation problem as minimizing the spectral radius of the system matrix, while removing few nodes or edges.

...read moreread less

Abstract: Healthcare associated infections (HAIs) impose a substantial burden, both on patients and on the healthcare system. Designing effective strategies by using interventions such as vaccination, isolation, cleaning, mobility modification, etc., to reduce HAI spread is an important computational challenge. Spectral approaches are quite useful for modeling and solving problems of reducing disease spread over contact networks, but they have not been used for disease-spread models and contact networks that are specific for HAIs. Our main contribution in this paper is to close this gap. We make 3 specific contributions. (i) We present the first epidemic threshold results on temporal bipartite networks, i.e., a time-varying sequence of bipartite people-location network, for the Susceptible-Infected-Susceptible (SIS) model. (ii) We leverage our epidemic threshold result to pose the HAI mitigation problem as minimizing the spectral radius of the system matrix, while removing few nodes or edges. We present a scalable combinatorial algorithm that provides approximation guarantees. (iii) Through extensive experiments on actual healthcare contact networks derived from operations data from the University of Iowa Hospitals and Clinics, Carilion Clinic, and several other healthcare facilities, we show that our algorithm consistently outperforms a number of baselines (random, degree, top-k, eigen centrality) both in terms of reducing the spectral radius of the system matrix and in terms of reducing infections.

...read moreread less

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Classification and regression trees

[...]

Wei-Yin Loh¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2011-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.

...read moreread less

Abstract: Classification and regression trees are machine-learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14-23 DOI: 10.1002/widm.8 This article is categorized under: Technologies > Classification Technologies > Machine Learning Technologies > Prediction Technologies > Statistical Fundamentals

...read moreread less

16,974 citations

Proceedings Article•

Fast algorithms for mining association rules

[...]

Rakesh Agrawal, Ramakrishnan Srikant

01 Jul 1998

TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.

...read moreread less

Abstract: We consider the problem of discovering association rules between items in a large database of sales transactions. We present two new algorithms for solving thii problem that are fundamentally different from the known algorithms. Empirical evaluation shows that these algorithms outperform the known algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems. We also show how the best features of the two proposed algorithms can be combined into a hybrid algorithm, called AprioriHybrid. Scale-up experiments show that AprioriHybrid scales linearly with the number of transactions. AprioriHybrid also has excellent scale-up properties with respect to the transaction size and the number of items in the database.

...read moreread less

10,863 citations

Book Chapter•DOI•

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

[...]

Thorsten Joachims¹•Institutions (1)

Technical University of Dortmund¹

21 Apr 1998

TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.

...read moreread less

Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

...read moreread less

8,658 citations

Proceedings Article•

Experiments with a new boosting algorithm

[...]

Yoav Freund¹, Robert E. Schapire¹•Institutions (1)

AT&T¹

03 Jul 1996

TL;DR: This paper describes experiments carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems and compared boosting to Breiman's "bagging" method when used to aggregate various classifiers.

...read moreread less

Abstract: In an earlier paper, we introduced a new "boosting" algorithm called AdaBoost which, theoretically, can be used to significantly reduce the error of any learning algorithm that con- sistently generates classifiers whose performance is a little better than random guessing. We also introduced the related notion of a "pseudo-loss" which is a method for forcing a learning algorithm of multi-label concepts to concentrate on the labels that are hardest to discriminate. In this paper, we describe experiments we carried out to assess how well AdaBoost with and without pseudo-loss, performs on real learning problems. We performed two sets of experiments. The first set compared boosting to Breiman's "bagging" method when used to aggregate various classifiers (including decision trees and single attribute- value tests). We compared the performance of the two methods on a collection of machine-learning benchmarks. In the second set of experiments, we studied in more detail the performance of boosting using a nearest-neighbor classifier on an OCR problem.

...read moreread less

7,601 citations

Experiment with a new boosting algorithm

[...]

Y. Freund

01 Jan 1996

7,386 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse