Home
/
Topics
/
Data modeling

Topic

Data modeling

About: Data modeling is a research topic. Over the lifetime, 29624 publications have been published within this topic receiving 470187 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•

Statistical modeling: The two cultures

[...]

Leo Breiman¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2001-Quality Engineering

TL;DR: If the goal as a field is to use data to solve problems, then the statistical community needs to move away from exclusive dependence on data models and adopt a more diverse set of tools.

...read moreread less

Abstract: There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated bya given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical communityhas been committed to the almost exclusive use of data models. This commit- ment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current prob- lems. Algorithmic modeling, both in theoryand practice, has developed rapidlyin fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move awayfrom exclusive dependence on data models and adopt a more diverse set of tools.

...read moreread less

1,735 citations

Journal Article•DOI•

Specification and testing of some modified count data models

[...]

John Mullahy¹•Institutions (1)

Yale University¹

01 Dec 1986-Journal of Econometrics

TL;DR: These alternatives permit more flexible specification of the data-generating process (dgp) than do familiar count data models, and provide a natural means for modeling data that are over- or underdispersed by the standards of the basic models.

...read moreread less

1,700 citations

Journal Article•

Data Cleaning: Problems and Current Approaches.

[...]

Erhard Rahm, Hong Hai Do

01 Jan 2000-IEEE Data(base) Engineering Bulletin

TL;DR: This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.

...read moreread less

Abstract: We classify data quality problems that are addressed by data cleaning and provide an overview of the main solution approaches. Data cleaning is especially required when integrating heterogeneous data sources and should be addressed together with schema-related data transformations. In data warehouses, data cleaning is a major part of the so-called ETL process. We also discuss current tool support for data cleaning.

...read moreread less

1,675 citations

Journal Article•DOI•

Aurora: a new model and architecture for data stream management

[...]

Daniel J. Abadi¹, Don Carney², Uğur Çetintemel², Mitch Cherniack¹, Christian Convey², Sangdon Lee², Michael Stonebraker³, Nesime Tatbul², Stan Zdonik² - Show less +5 more•Institutions (3)

Brandeis University¹, Brown University², Massachusetts Institute of Technology³

01 Aug 2003

TL;DR: The basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications, are described and a stream-oriented set of operators are described.

...read moreread less

Abstract: .This paper describes the basic processing model and architecture of Aurora, a new system to manage data streams for monitoring applications. Monitoring applications differ substantially from conventional business data processing. The fact that a software system must process and react to continual inputs from many sources (e.g., sensors) rather than from human operators requires one to rethink the fundamental architecture of a DBMS for this application area. In this paper, we present Aurora, a new DBMS currently under construction at Brandeis University, Brown University, and M.I.T. We first provide an overview of the basic Aurora model and architecture and then describe in detail a stream-oriented set of operators.

...read moreread less

1,518 citations

Journal Article•DOI•

Adaptive Federated Learning in Resource Constrained Edge Computing Systems

[...]

Shiqiang Wang¹, Tiffany Tuor², Theodoros Salonidis¹, Kin K. Leung², Christian Makaya¹, Ting He³, Kevin S. Chan⁴ - Show less +3 more•Institutions (4)

IBM¹, Imperial College London², Pennsylvania State University³, United States Army Research Laboratory⁴

11 Mar 2019-IEEE Journal on Selected Areas in Communications

TL;DR: In this paper, the authors consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place, and propose a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget.

...read moreread less

Abstract: Emerging technologies and applications including Internet of Things, social networking, and crowd-sourcing generate large amounts of data at the network edge. Machine learning models are often built from the collected data, to enable the detection, classification, and prediction of future events. Due to bandwidth, storage, and privacy concerns, it is often impractical to send all the data to a centralized location. In this paper, we consider the problem of learning model parameters from data distributed across multiple edge nodes, without sending raw data to a centralized place. Our focus is on a generic class of machine learning models that are trained using gradient-descent-based approaches. We analyze the convergence bound of distributed gradient descent from a theoretical point of view, based on which we propose a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget. The performance of the proposed algorithm is evaluated via extensive experiments with real datasets, both on a networked prototype system and in a larger-scale simulated environment. The experimentation results show that our proposed approach performs near to the optimum with various machine learning models and different data distributions.

...read moreread less

1,441 citations

…
1
2
3
4
5
6
7
…
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

30,136

Papers

565,025

Citations

No. of papers in the topic in previous years
Year	Papers
2023	124
2022	410
2021	2,121
2020	2,034
2019	1,550
2018	2,042

Data modeling

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics