Home
/
Authors
/
Durga Toshniwal

Author

Durga Toshniwal

Other affiliations: Indian Institutes of Technology

Bio: Durga Toshniwal is an academic researcher from Indian Institute of Technology Roorkee. The author has contributed to research in topics: Cluster analysis & Naive Bayes classifier. The author has an hindex of 17, co-authored 86 publications receiving 1209 citations. Previous affiliations of Durga Toshniwal include Indian Institutes of Technology.

Papers published on a yearly basis

2021
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Hybrid prediction model for Type-2 diabetic patients

[...]

B. M. Patil¹, Ramesh C. Joshi¹, Durga Toshniwal¹•Institutions (1)

Indian Institute of Technology Roorkee¹

01 Dec 2010-Expert Systems With Applications

TL;DR: This study proposes Hybrid Prediction Model (HPM) which uses Simple K-means clustering algorithm aimed at validating chosen class label of given data and subsequently applying the classification algorithm to the result set.

...read moreread less

Abstract: A wide range of computational methods and tools for data analysis are available. In this study we took advantage of those available technological advancements to develop prediction models for the prediction of a Type-2 Diabetic Patient. We aim to investigate how the diabetes incidents are affected by patients' characteristics and measurements. Efficient predictive modeling is required for medical researchers and practitioners. This study proposes Hybrid Prediction Model (HPM) which uses Simple K-means clustering algorithm aimed at validating chosen class label of given data (incorrectly classified instances are removed, i.e. pattern extracted from original data) and subsequently applying the classification algorithm to the result set. C4.5 algorithm is used to build the final classifier model by using the k-fold cross-validation method. The Pima Indians diabetes data was obtained from the University of California at Irvine (UCI) machine learning repository datasets. A wide range of different classification methods have been applied previously by various researchers in order to find the best performing algorithm on this dataset. The accuracies achieved have been in the range of 59.4-84.05%. However the proposed HPM obtained a classification accuracy of 92.38%. In order to evaluate the performance of the proposed method, sensitivity and specificity performance measures that are used commonly in medical classification studies were used.

...read moreread less

160 citations

Journal Article•DOI•

A data mining framework to analyze road accident data

[...]

Sachin Kumar¹, Durga Toshniwal¹•Institutions (1)

Indian Institute of Technology Roorkee¹

21 Nov 2015-Journal of Big Data

TL;DR: This paper proposed a framework that used K-modes clustering technique as a preliminary task for segmentation of 11,574 road accidents on road network of Dehradun (India) between 2009 and 2014 and revealed that the combination of k mode clustering and association rule mining is very inspiring.

...read moreread less

Abstract: One of the key objectives in accident data analysis to identify the main factors associated with a road and traffic accident. However, heterogeneous nature of road accident data makes the analysis task difficult. Data segmentation has been used widely to overcome this heterogeneity of the accident data. In this paper, we proposed a framework that used K-modes clustering technique as a preliminary task for segmentation of 11,574 road accidents on road network of Dehradun (India) between 2009 and 2014 (both included). Next, association rule mining are used to identify the various circumstances that are associated with the occurrence of an accident for both the entire data set (EDS) and the clusters identified by K-modes clustering algorithm. The findings of cluster based analysis and entire data set analysis are then compared. The results reveal that the combination of k mode clustering and association rule mining is very inspiring as it produces important information that would remain hidden if no segmentation has been performed prior to generate association rules. Further a trend analysis have also been performed for each clusters and EDS accidents which finds different trends in different cluster whereas a positive trend is shown by EDS. Trend analysis also shows that prior segmentation of accident data is very important before analysis.

...read moreread less

118 citations

Journal Article•DOI•

A data mining approach to characterize road accident locations

[...]

Sachin Kumar¹, Durga Toshniwal¹•Institutions (1)

Indian Institute of Technology Roorkee¹

11 Feb 2016-Journal of Modern Transportation

TL;DR: This paper applied k-means algorithm to group the accident locations into three categories, high-frequency, moderate-frequency and low-frequency accident locations, and used association rule mining to characterize these locations.

...read moreread less

Abstract: Data mining has been proven as a reliable technique to analyze road accidents and provide productive results. Most of the road accident data analysis use data mining techniques, focusing on identifying factors that affect the severity of an accident. However, any damage resulting from road accidents is always unacceptable in terms of health, property damage and other economic factors. Sometimes, it is found that road accident occurrences are more frequent at certain specific locations. The analysis of these locations can help in identifying certain road accident features that make a road accident to occur frequently in these locations. Association rule mining is one of the popular data mining techniques that identify the correlation in various attributes of road accident. In this paper, we first applied k-means algorithm to group the accident locations into three categories, high-frequency, moderate-frequency and low-frequency accident locations. k-means algorithm takes accident frequency count as a parameter to cluster the locations. Then we used association rule mining to characterize these locations. The rules revealed different factors associated with road accidents at different locations with varying accident frequencies. The association rules for high-frequency accident location disclosed that intersections on highways are more dangerous for every type of accidents. High-frequency accident locations mostly involved two-wheeler accidents at hilly regions. In moderate-frequency accident locations, colonies near local roads and intersection on highway roads are found dangerous for pedestrian hit accidents. Low-frequency accident locations are scattered throughout the district and the most of the accidents at these locations were not critical. Although the data set was limited to some selected attributes, our approach extracted some useful hidden information from the data which can be utilized to take some preventive efforts in these locations.

...read moreread less

117 citations

Proceedings Article•DOI•

Association Rule for Classification of Type-2 Diabetic Patients

[...]

B. M. Patil¹, Ramesh C. Joshi¹, Durga Toshniwal¹•Institutions (1)

Indian Institute of Technology Roorkee¹

09 Feb 2010

TL;DR: A new approach to generate association rules on numeric data and a modified equal width binning interval approach to discretizing continuous valued attributes are introduced to help the health doctors to explore their data and to understand the discovered rules better.

...read moreread less

Abstract: The discovery of knowledge from medical databases is important in order to make effective medical diagnosis. The aim of data mining is extract the information from database and generate clear and understandable description of patterns. In this study we have introduced a new approach to generate association rules on numeric data. We propose a modified equal width binning interval approach to discretizing continuous valued attributes. The approximate width of the desired intervals is chosen based on the opinion of medical expert and is provided as an input parameter to the model. First we have converted numeric attributes into categorical form based on above techniques. Apriori algorithm is usually used for the market basket analysis was used to generate rules on Pima Indian diabetes data. The data set was taken from UCI machine learning repository containing total instances 768 and 8 numeric attributes.We discover that the often neglected pre-processing steps in knowledge discovery are the most critical elements in determining the success of a data mining application. Lastly we have generated the association rules which are useful to identify general associations in the data, to understand the relationship between the measured fields whether the patient goes on to develop diabetes or not. We are presented step-by-step approach to help the health doctors to explore their data and to understand the discovered rules better.

...read moreread less

81 citations

Proceedings Article•DOI•

Analysing road accident data using association rule mining

[...]

Sachin Kumar¹, Durga Toshniwal¹•Institutions (1)

Indian Institute of Technology Roorkee¹

01 Dec 2015

TL;DR: Data mining techniques are used to analyze the data provided by EMRI in which they first cluster the accident data and further association rule mining technique is applied to identify circumstances in which an accident may occur for each cluster.

...read moreread less

Abstract: Road accident is one of the crucial areas of research in India. A variety of research has been done on data collected through police records covering a limited portion of highways. The analysis of such data can only reveal information regarding that portion only; but accidents are scattered not only on highways but also on local roads. A different source of road accident data in India is Emergency Management research Institute (EMRI) which serves and keeps track of every accident record on every type of road and cover information of entire State's road accidents. In this paper, we have used data mining techniques to analyze the data provided by EMRI in which we first cluster the accident data and further association rule mining technique is applied to identify circumstances in which an accident may occur for each cluster. The results can be utilized to put some accident prevention efforts in the areas identified for different categories of accidents to overcome the number of accidents.

...read moreread less

54 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18

Collapse

Cited by

PDF

Open Access

More filters

Data Mining - Concepts and Techniques.

[...]

Petra Perner

01 Jan 2002

9,314 citations

Journal Article•DOI•

Learning from class-imbalanced data

[...]

Guo Haixiang¹, Li Yijing¹, Jennifer Shang², Gu Mingyun¹, Huang Yuanyue¹, Gong Bing³ - Show less +2 more•Institutions (3)

China University of Geosciences (Wuhan)¹, University of Pittsburgh², Technical University of Madrid³

01 May 2017-Expert Systems With Applications

TL;DR: An in depth review of rare event detection from an imbalanced learning perspective and a comprehensive taxonomy of the existing application domains of im balanced learning are provided.

...read moreread less

Abstract: 527 articles related to imbalanced data and rare events are reviewed.Viewing reviewed papers from both technical and practical perspectives.Summarizing existing methods and corresponding statistics by a new taxonomy idea.Categorizing 162 application papers into 13 domains and giving introduction.Some opening questions are discussed at the end of this manuscript. Rare events, especially those that could potentially negatively impact society, often require humans decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields.

...read moreread less

1,448 citations

Intercomparison, interpretation, and assessment of spring phenology in North America estimated from remote sensing for 1982-2006 M I C H A E L A. W H I T E*, K I R S T E N M. DE BEURS w , K A M E L D I D A Nz, D AV I D W. I N O U Y E § ,

[...]

Allard De Wit, Rt Z

01 Jan 2009

TL;DR: In this paper, the authors assess 10 start-of-spring (SOS) methods for North America between 1982 and 2006 and find that SOS estimates were more related to the first leaf and first flowers expanding phenological stages.

...read moreread less

Abstract: Shifts in the timing of spring phenology are a central feature of global change research. Long-term observations of plant phenology have been used to track vegetation responses to climate variability but are often limited to particular species and locations and may not represent synoptic patterns. Satellite remote sensing is instead used for continental to global monitoring. Although numerous methods exist to extract phenological timing, in particular start-of-spring (SOS), from time series of reflectance data, a comprehensive intercomparison and interpretation of SOS methods has not been conducted. Here, we assess 10 SOS methods for North America between 1982 and 2006. The techniques include consistent inputs from the 8km Global Inventory Modeling and Mapping Studies Advanced Very High Resolution Radiometer NDVIg dataset, independent data for snow cover, soil thaw, lake ice dynamics, spring streamflow timing, over 16000 individual measurements of ground-based phenology, and two temperature-driven models of spring phenology. Compared with an ensemble of the 10 SOS methods, we found that individual methods differed in average day-of-year estimates by ! 60 days and in standard deviation by ! 20 days. The ability of the satellite methods to retrieve SOS estimates was highest in northern latitudes and lowest in arid, tropical, and Mediterranean ecoregions. The ordinal rank of SOS methods varied geographically, as did the relationships between SOS estimates and the cryospheric/hydrologic metrics. Compared with ground observations, SOS estimates were more related to the first leaf and first flowers expanding phenological stages. We found no evidence for time trends in spring arrival from ground- or model-based data; using an ensemble estimate from two methods that were more closely related to ground observations than other methods, SOS

...read moreread less

828 citations

Journal Article•DOI•

Machine Learning for Internet of Things Data Analysis: A Survey

[...]

Mohammad Saeid Mahdavinejad¹, Mohammad Saeid Mahdavinejad², Mohammadreza Rezvan¹, Mohammadreza Rezvan², Mohammadamin Barekatain³, Peyman Adibi², Payam Barnaghi⁴, Amit P. Sheth¹ - Show less +4 more•Institutions (4)

Wright State University¹, University of Isfahan², Technische Universität München³, University of Surrey⁴

12 Oct 2017-Digital Communications and Networks

TL;DR: This article assesses the different machine learning methods that deal with the challenges in IoT data by considering smart cities as the main use case and presents a taxonomy of machine learning algorithms explaining how different techniques are applied to the data in order to extract higher level information.

...read moreread less

690 citations

Journal Article•DOI•

Comparing different supervised machine learning algorithms for disease prediction

[...]

Shahadat Uddin¹, Arif Khan¹, Ekramul Hossain¹, Mohammad Ali Moni¹•Institutions (1)

University of Sydney¹

21 Dec 2019-BMC Medical Informatics and Decision Making

TL;DR: It is found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies), however, the Random Forest algorithm showed superior accuracy comparatively.

...read moreread less

Abstract: Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naive Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.

...read moreread less

580 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse