# Applicability of Data Mining Techniques for Predicting Electrical Resistivity of Soils Based on Thermal Resistivity

VIT University

^{1}01 Oct 2013-International Journal of Geomechanics (American Society of Civil Engineers)-Vol. 13, Iss: 5, pp 692-697

TL;DR: In this article, two data mining techniques, support vector machine (SVM) and least squares SVM (LSSVM), were used for prediction of soil electrical resistivity based on soil pr...

Abstract: This article adopts two data mining techniques, support vector machine (SVM) and least-squares support vector machine (LSSVM), for prediction of soil electrical resistivity based on soil pr...

##### Citations

More filters

••

TL;DR: An apples-to-apples comparison of the literature of nine stakeholder-centric engineering domains reveals that many problem types are shared across different domains and are approached differently in those domains, e.g., transportation problems have similar characteristics to critical care, food science, robotics, and civil engineering.

Abstract: Multimodal data fusion (MMDF) is the process of combining disparate data streams (of different dimensionality, resolution, type, etc.) to generate information in a form that is more understandable or usable. Despite the explosion of data availability in recent decades, as yet there is no well-developed theoretical basis for multimodal data fusion, i.e., no way to determine a priori which approach is best suited to combine an arbitrary set of available data to achieve a stated goal for a given application. This has resulted in exploration of a wide variety of approaches across numerous domains but as yet very little integration of conclusions at a meta (cross-disciplinary) level. In response, this manuscript poses the following questions: (1) How convergent (or divergent) are approaches within single disciplines? (2) How similar are the challenges posed across different disciplines, i.e., might there be opportunity for successes in MMDF achieved in one field to inform progress in other areas as well? and (3) Where are the outstanding gaps in MMDF research, and what does this imply as targets for high impact research in the coming years? To begin to answer these questions, an apples-to-apples comparison of the literature of nine stakeholder-centric engineering domains (civil engineering, transportation, energy, environmental engineering, food engineering, critical care (healthcare), neuroscience, manufacturing/automation, and robotics) was created by quantifying the numbers and dimensionalities of modalities and sensors in each published project and classifying the algorithms used and purposes for which they are used. Within disciplines, it is shown there is often a tendency for use of similar methodologies, both in choice of level of fusion and data algorithm class. Yet this analysis also reveals that many problem types (defined by data dimensionality, modality number and type, and fusion purpose) are shared across different domains and are approached differently in those domains, e.g., transportation problems have similar characteristics to critical care, food science, robotics, and civil engineering. Of the disciplines studied, most ( > 75 %) share problem characteristics with 3–5 others; to support leveraging these resources, lookup tables indexed by data dimensions, number of modalities, etc. are provided as a starting point for cross-disciplinary MMDF literature searches for new applications. Critical gaps identified are (1) a drop off of the number of published studies with increasing number of distinct modalities and (2) a dearth of publications tackling challenges with high dimensionality inputs, especially time-series 2D and 3D data. These gaps may point to topics where algorithm development will be fruitful to enable future solutions as video and other high-dimensionality sensors decrease in price. Finally, the lack of a shared vocabulary across disciplines makes analyses like the one conducted here challenging, as does the often implicit incorporation of expert knowledge into design; therefore progress toward a better leveraging of the current state of knowledge and toward a theoretical MMDF framework depends critically on improved cross-disciplinary communication and coordination on this topic.

20 citations

••

TL;DR: In this paper , the authors developed a labour-saving activity sequence classification model for earthwork construction simulation which realizes flexibly modifying simulation activities according to different weather conditions, and three heterogeneous semi-supervised classifiers with complementary characteristics were ensembled to reduce the workload of manual labelling and enhance the generalization ability of activity sequences classification based on weather data.

Abstract: Construction activity sequence changes are among the most crucial considerations in establishing construction simulation models. However, conventional simulation models are designed with a fixed sequence of simulation activities, which is unable to update automatically. Existing machine learning methods require abundant time for manual labelling and are unsuitable for construction data with high‐dimensional and heterogeneous characteristics. The motivation for this work is to develop a labour‐saving activity sequence classification model for earthwork construction simulation which realizes flexibly modifying simulation activities according to different weather conditions. Three heterogeneous semi‐supervised classifiers with complementary characteristics were ensembled to reduce the workload of manual labelling and enhance the generalization ability of activity sequence classification based on weather data. Furthermore, Dempster–Shafer‐based evidence reasoning improved by a security filtering mechanism was adopted to enhance the accuracy of semi‐supervised classification. The proposed enhanced ensemble semi‐supervised activity sequence classification model was embedded in an earthwork construction simulation model, which was evaluated in a case study of rockfill dam construction. The proposed classifier outperformed four common semi‐supervised methods and two common supervised methods in terms of accuracy and generalization ability. Additionally, the proposed simulation method outperformed conventional simulation methods in terms of the construction schedule, construction intensity and consistency of the simulated activity sequence with the true values by 65.55%, 28.47% and 88.15%, respectively.

1 citations

••

TL;DR: In this paper , three artificial intelligence models, namely group method of data handing (GMDH), multi expression programming (MEP), and random forest (RF), are proposed to predict soil thermal conductivity.

##### References

More filters

••

Bell Labs

^{1}TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data.
High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations

### "Applicability of Data Mining Techni..." refers background or methods in this paper

...More details are found in many publications (Boser et al. 1992; Cortes and Vapnik 1995; Gualtieri et al. 1999; Vapnik 1998)....

[...]

...The SVM algorithm developed by Vapnik (Cortes and Vapnik 1995) is based on statistical learning theory....

[...]

01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations

••

01 Jul 1992TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.

Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

11,211 citations

••

TL;DR: A least squares version for support vector machine (SVM) classifiers that follows from solving a set of linear equations, instead of quadratic programming for classical SVM's.

Abstract: In this letter we discuss a least squares version for support vector machine (SVM) classifiers. Due to equality type constraints in the formulation, the solution follows from solving a set of linear equations, instead of quadratic programming for classical SVM‘s. The approach is illustrated on a two-spiral benchmark classification problem.

8,811 citations

•

03 Dec 1996

TL;DR: This work compares support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space and expects that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.

Abstract: A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.

4,009 citations

### "Applicability of Data Mining Techni..." refers background in this paper

...It has given excellent performance on many regression and time series prediction problems (Muller et al. 1997; Drucker et al. 1997)....

[...]