scispace - formally typeset
Search or ask a question

Showing papers in "Scientific Programming in 2019"


Journal ArticleDOI
TL;DR: The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.
Abstract: Class imbalance is an important problem, encountered in machine learning applications, where one class (named as, the minority class) has extremely small number of instances and the other class (referred as, the majority class) has immense quantity of instances. Imbalanced datasets can be of great importance in several real-world applications, including medical diagnosis, malware detection, anomaly identification, bankruptcy prediction, and spam filtering. In this paper, we present a consensus clustering based-undersampling approach to imbalanced learning. In this scheme, the number of instances in the majority class was undersampled by utilizing a consensus clustering-based scheme. In the empirical analysis, 44 small-scale and 2 large-scale imbalanced classification benchmarks have been utilized. In the consensus clustering schemes, five clustering algorithms (namely, k-means, k-modes, k-means++, self-organizing maps, and DIANA algorithm) and their combinations were taken into consideration. In the classification phase, five supervised learning methods (namely, naive Bayes, logistic regression, support vector machines, random forests, and k-nearest neighbor algorithm) and three ensemble learner methods (namely, AdaBoost, bagging, and random subspace algorithm) were utilized. The empirical results indicate that the proposed heterogeneous consensus clustering-based undersampling scheme yields better predictive performance.

80 citations


Journal ArticleDOI
TL;DR: This paper proposes a framework called defect prediction via attention-based recurrent neural network (DP-ARNN), which first parses abstract syntax trees of programs and extracts them as vectors and employs the attention mechanism to further generate significant features for accurate defect prediction.
Abstract: In order to improve software reliability, software defect prediction is applied to the process of software maintenance to identify potential bugs. Traditional methods of software defect prediction mainly focus on designing static code metrics, which are input into machine learning classifiers to predict defect probabilities of the code. However, the characteristics of these artificial metrics do not contain the syntactic structures and semantic information of programs. Such information is more significant than manual metrics and can provide a more accurate predictive model. In this paper, we propose a framework called defect prediction via attention-based recurrent neural network (DP-ARNN). More specifically, DP-ARNN first parses abstract syntax trees (ASTs) of programs and extracts them as vectors. Then it encodes vectors which are used as inputs of DP-ARNN by dictionary mapping and word embedding. After that, it can automatically learn syntactic and semantic features. Furthermore, it employs the attention mechanism to further generate significant features for accurate defect prediction. To validate our method, we choose seven open-source Java projects in Apache, using F1-measure and area under the curve (AUC) as evaluation criteria. The experimental results show that, in average, DP-ARNN improves the F1-measure by 14% and AUC by 7% compared with the state-of-the-art methods, respectively.

59 citations


Journal ArticleDOI
TL;DR: A novel CF recommendation approach in which opinion-based sentiment analysis is used to achieve hotel feature matrix by polarity identification and the developed system not only has the ability to handle heterogeneous data using big data Hadoop platform but it also recommends hotel class based on guest type using fuzzy rules.
Abstract: In recent times, selection of a suitable hotel location and reservation of accommodation have become a critical issue for the travelers. The online hotel search has been increased at a very fast pace and became very time-consuming due to the presence of huge amount of online information. Recommender systems (RSs) are getting importance due to their significance in making decisions and providing detailed information about the required product or a service. To acquire the hotel recommendations while dealing with textual hotel reviews, numerical ranks, votes, ratings, and number of video views have become difficult. To generate true recommendations, we have proposed an intelligent approach which also deals with large-sized heterogeneous data to fulfill the needs of the potential customers. The collaborative filtering (CF) approach is one of the most popular techniques of the RS to generate recommendations. We have proposed a novel CF recommendation approach in which opinion-based sentiment analysis is used to achieve hotel feature matrix by polarity identification. Our approach combines lexical analysis, syntax analysis, and semantic analysis to understand sentiment towards hotel features and the profiling of guest type (solo, family, couple etc). The proposed system recommends hotels based on the hotel features and guest type for personalized recommendation. The developed system not only has the ability to handle heterogeneous data using big data Hadoop platform but it also recommends hotel class based on guest type using fuzzy rules. Different experiments are performed over the real-world datasets obtained from two hotel websites. Moreover, the values of precision and recall and F-measure have been calculated, and the results are discussed in terms of improved accuracy and response time, significantly better than the traditional approaches.

46 citations


Journal ArticleDOI
TL;DR: This study aims to identify data mining classification algorithms and use them to predict default risks, avoid possible payment difficulties, and reduce potential problems in extending credit.
Abstract: Big data and its analysis have become a widespread practice in recent times, applicable to multiple industries. Data mining is a technique that is based on statistical applications. This method extracts previously undetermined data items from large quantities of data. The banking and insurance industries use data mining analysis to detect fraud, offer the appropriate credit or insurance solutions to customers, and better understand customer demands. This study aims to identify data mining classification algorithms and use them to predict default risks, avoid possible payment difficulties, and reduce potential problems in extending credit. The data for this study, which contains demographic and socioeconomic characteristics of individuals, were obtained from the Turkish Statistical Institute 2015 survey. Six classification algorithms—Naive Bayes, Bayesian networks, J48, random forest, multilayer perceptron, and logistic regression—were applied to the dataset using WEKA 3.9 data mining software. These algorithms were compared considering the root mean error squares, receiver operating characteristic area, accuracy, precision, F-measure, and recall statistical criteria. The best algorithm—logistic regression—was obtained and applied to the real dataset to determine the attributes causing the default risk by using odds ratios. The socioeconomic and demographic characteristics of the individuals were examined, and based on the odds ratio values, the results of which individuals and characteristics were more likely to default, were reached. These results are not only beneficial to the literature but also have a significant influence in the financial industry in terms of the ability to predict customers’ default risk.

45 citations


Journal ArticleDOI
TL;DR: The experimental results show that the EMT as the ensemble technique gained a high accuracy performance reaching 98.5% and the proposed EMT technique obtains a high performance, which is a superior result compared to the other techniques.
Abstract: In recent decades, predicting the performance of students in the academic field has revealed the attention by researchers for enhancing the weaknesses and provides support for future students. In order to facilitate the task, educational data mining (EDM) techniques are utilized for constructing prediction models built from student academic historical records. These models present the embedded knowledge that is more readable and interpretable by humans. Hence, in this paper, the contributions are presented in three folds that include the following: (i) providing a thorough analysis about the selected features and their effects on the performance value using statistical analysis techniques, (ii) building and studying the performance of several classifiers from different families of machine learning (ML) techniques, (iii) proposing an ensemble meta-based tree model (EMT) classifier technique for predicting the student performance. The experimental results show that the EMT as the ensemble technique gained a high accuracy performance reaching 98.5% (or 0.985). In addition, the proposed EMT technique obtains a high performance, which is a superior result compared to the other techniques.

36 citations


Journal ArticleDOI
TL;DR: The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods.
Abstract: The paper presents state of the art of energy-aware high-performance computing (HPC), in particular identification and classification of approaches by system and device types, optimization metrics, and energy/power control methods. System types include single device, clusters, grids, and clouds while considered device types include CPUs, GPUs, multiprocessor, and hybrid systems. Optimization goals include various combinations of metrics such as execution time, energy consumption, and temperature with consideration of imposed power limits. Control methods include scheduling, DVFS/DFS/DCT, power capping with programmatic APIs such as Intel RAPL, NVIDIA NVML, as well as application optimizations, and hybrid methods. We discuss tools and APIs for energy/power management as well as tools and environments for prediction and/or simulation of energy/power consumption in modern HPC systems. Finally, programming examples, i.e., applications and benchmarks used in particular works are discussed. Based on our review, we identified a set of open areas and important up-to-date problems concerning methods and tools for modern HPC systems allowing energy-aware processing.

35 citations


Journal ArticleDOI
TL;DR: The experimental results show the importance of subfeatures over others, which are vital for enhancing the efficiency of classification models’ performance, and the importance for finding promisingly optimal features for improving classification models.
Abstract: The speech entailed in human voice comprises essentially paralinguistic information used in many voice-recognition applications. Gender voice is considered one of the pivotal parts to be detected from a given voice, a task that involves certain complications. In order to distinguish gender from a voice signal, a set of techniques have been employed to determine relevant features to be utilized for building a model from a training set. This model is useful for determining the gender (i.e., male or female) from a voice signal. The contributions are three-fold including (i) providing analysis information about well-known voice signal features using a prominent dataset, (ii) studying various machine learning models of different theoretical families to classify the voice gender, and (iii) using three prominent feature selection algorithms to find promisingly optimal features for improving classification models. The experimental results show the importance of subfeatures over others, which are vital for enhancing the efficiency of classification models’ performance. Experimentation reveals that the best recall value is equal to 99.97%; the best recall value is 99.7% for two models of deep learning (DL) and support vector machine (SVM), and with feature selection, the best recall value is 100% for SVM techniques.

33 citations


Journal ArticleDOI
TL;DR: The proposed mesh simplification method based on an energy-operator can dramatically collapse the excessive details in relatively smooth areas and preserve more important salient features during the simplification process and maintain a trade-off between time efficiency and salient feature-preserving accuracy.
Abstract: To avoid excessive details, thus omitting less important content, of three-dimensional (3D) geometric models, this study proposes a fast mesh simplification method based on an energy-operator for 3D geometric models with salient feature-preserving efficiency. The energy-operator can evaluate the smoothness and complexity of the regional mesh in 3D models. Accordingly, it can be directly used to simultaneously reduce the candidate triangle and its three neighboring triangles. The proposed method can dramatically collapse the excessive details in relatively smooth areas and preserve more important salient features during the simplification process. It can also maintain a trade-off between time efficiency and salient feature-preserving accuracy. The effectiveness and efficiency of the new method are demonstrated by comparing it with OpenMesh, which is considered the most popular mesh operation software and is capable of achieving accurate mesh simplification models. The new mesh simplification method based on the energy-operator can provide accurate and concise models for interactive 3D rendering, calculating, simulating, and analyzing.

17 citations


Journal ArticleDOI
TL;DR: The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language.
Abstract: In this paper, a novel approach is presented for authorship identification in English and Urdu text using the LDA model with n-grams texts of authors and cosine similarity. The proposed approach uses similarity metrics to identify various learned representations of stylometric features and uses them to identify the writing style of a particular author. The proposed LDA-based approach emphasizes instance-based and profile-based classifications of an author’s text. Here, LDA suitably handles high-dimensional and sparse data by allowing more expressive representation of text. The presented approach is an unsupervised computational methodology that can handle the heterogeneity of the dataset, diversity in writing, and the inherent ambiguity of the Urdu language. A large corpus has been used for performance testing of the presented approach. The results of experiments show superiority of the proposed approach over the state-of-the-art representations and other algorithms used for authorship identification. The contributions of the presented work are the use of cosine similarity with n-gram-based LDA topics to measure similarity in vectors of text documents. Achievement of overall 84.52% accuracy on PAN12 datasets and 93.17% accuracy on Urdu news articles without using any labels for authorship identification task is done.

16 citations


Journal ArticleDOI
TL;DR: This work proposes a novel PSR-based scientific solution that accounts for the information contained at multiple time scales inherent in APM concentrations and demonstrated that prediction error of all the machine learning techniques is smaller for the proposed PSR approach compared to traditional approach.
Abstract: The prediction of atmospheric particulate matter (APM) concentration is essential to reduce adverse effects on human health and to enforce emission restrictions. The dynamics of APM are inherently nonlinear and chaotic. Phase space reconstruction (PSR) is one of the widely used methods for chaotic time series analysis. The APM mass concentrations are an outcome of complex anthropogenic contributors evolving with time, which may operate on multiple time scales. Thus, the traditional single-variable PSR-based prediction algorithm in which data points of last embedding dimension are used as a target set may fail to account for multiple time scales inherent in APM concentrations. To address this issue, we propose a novel PSR-based scientific solution that accounts for the information contained at multiple time scales. Different machine learning algorithms are used to evaluate the performance of the proposed and traditional PSR techniques for predicting mass concentrations of particulate matter up to 2.5 micron (PM2.5), up to 10 micron (PM10.0), and ratio of PM2.5/PM10.0. Hourly time series data of PM2.5 and PM10.0 mass concentrations are collected from January 2014 to September 2015 at the Masfalah air quality monitoring station (couple of kilometers from the Holy Mosque in Makkah, Saudi Arabia). The performances of various learning algorithms are evaluated using RMSE and MAE. The results demonstrated that prediction error of all the machine learning techniques is smaller for the proposed PSR approach compared to traditional approach. For PM2.5, FFNN leads to best results (both RMSE and MAE 0.04 μgm−3), followed by SVR-L (RMSE 0.01 μgm−3 and MAE 0.09 μgm−3) and RF (RMSE 1.27 μgm−3 and MAE 0.86 μgm−3). For PM10.0, SVR-L leads to best results (both RMSE and MAE 0.06 μgm−3), followed by FFNN (RMSE 0.13 μgm−3 and MAE 0.09 μgm−3) and RF (RMSE 1.60 μgm−3 and MAE 1.16 μgm−3). For PM2.5/PM10.0, FFNN is the best and accurate method for prediction (0.001 for both RMSE and MAE), followed by RF (0.02 for both RMSE and MAE) and SVR-L (RMSE 0.05 μgm−3 and MAE 0.04).

13 citations


Journal ArticleDOI
TL;DR: The utility of these experiments is to show developers what are the factors and settings that mostly influence the energy consumption when different web-based asynchronous communication methods are used, helping them to choose the most beneficial solution if possible, to reduce the power consumption of the front-end of web applications for mobile devices.
Abstract: Currently, mobile devices are the most popular pervasive computing devices, and they are becoming the primary way for accessing Internet. Battery is a critical resource in such personal computing gadgets, network communications being one of the primary energy consuming activities in any mobile app. Indeed, as web-based communication is the most used explicitly or implicitly by mobile devices, HTTP-based traffic is the most power demanding one. So, mobile web developers should be aware of how much energy demands the different web-based communication alternatives. The goal of this paper is to measure and compare the energy consumption of three asynchronous HTTP-based methods in mobile devices in different browsers. Our experiments focus on three HTTP-based asynchronous communication models that allow a web server to push data to a client browser through a HTTP/1.1 interaction: Polling, Long Polling, and WebSockets. The resulted measurements are then analysed to get more accurate understanding of the impact of the selected method, and the mobile browser, in the energy consumption of the asynchronous HTTP-based communication. The utility of these experiments is to show developers what are the factors and settings that mostly influence the energy consumption when different web-based asynchronous communication methods are used, helping them to choose the most beneficial solution if possible. With this information, mobile web developers should be able to reduce the power consumption of the front-end of web applications for mobile devices, just selecting and configuring the best asynchronous method or mobile browser, improving the performance of HTTP-based communication in terms of energy demand.

Journal ArticleDOI
Chi Zhang1, Yuxin Wang1, Yuanchen Lv1, Hao Wu1, He Guo1 
TL;DR: A resource management strategy to reduce both energy consumption and Service Level Agreement (SLA) violations in cloud data centers is proposed and it contains three improved methods for subproblems in dynamic virtual machine (VM) consolidation.
Abstract: Reducing energy consumption of data centers is an important way for cloud providers to improve their investment yield, but they must also ensure that the services delivered meet the various requirements of consumers. In this paper, we propose a resource management strategy to reduce both energy consumption and Service Level Agreement (SLA) violations in cloud data centers. It contains three improved methods for subproblems in dynamic virtual machine (VM) consolidation. For making hosts detection more effective and improving the VM selection results, first, the overloaded hosts detecting method sets a dynamic independent saturation threshold for each host, respectively, which takes the CPU utilization trend into consideration; second, the underutilized hosts detecting method uses multiple factors besides CPU utilization and the Naive Bayesian classifier to calculate the combined weights of hosts in prioritization step; and third, the VM selection method considers both current CPU usage and future growth space of CPU demand of VMs. To evaluate the performance of the proposed strategy, it is simulated in CloudSim and compared with five existing energy–saving strategies using real-world workload traces. The experimental results show that our strategy outperforms others with minimum energy consumption and SLA violation.

Journal ArticleDOI
TL;DR: This document is intended to help clarify the role of acronyms in the history of this type of document.
Abstract: Серед патології елементного статусу у населення України недостатність магнію займає провідну позицію. Особливо часто такий дефіцит виявляєть\" ся у дитячому і підлітковому віці, коли внаслідок інтенсивного розвитку дитини підвищені потреби у магнію не відповідають його надходженню з їжею. У статті наведено дані літератури і результати власних досліджень авторів, що свідчать про доцільність визначення рівня магнію при багатьох соматичних захворюваннях, а також необхідність своєчасної корекції магнієвого дефіциту. Корекція дефіциту магнію шляхом використання органічних солей магнію і піридоксину (Магне\"В6®) сприяє поліпшенню клітинного метаболізму та енергозабезпечення, стабілізації мембрани і формуванню фізіологічних умов для сприйняття клітинами організму специфічної терапії основного захворювання. Застосування Магне\"В6® дозволяє впливати на базові ланки патогенезу багатьох захворювань, пов'язаних із дефіцитом магнію, з порушеннями енергетичного та електролітного обмінів, а також сприяти відновленню адаптаційних резервів організму. Ключові слова: дефіцит магнію, діти, Магне\"В6®.

Journal ArticleDOI
TL;DR: This paper reports on the first steps in porting LFRic to the FPGAs of the EuroExa architecture, developing and building a high-performance architecture based upon ARM CPUs with FPGA acceleration targeting exascale-class performance within a realistic power budget.
Abstract: In recent years, there has been renewed interest in the use of field-programmable gate arrays (FPGAs) for high-performance computing (HPC). In this paper, we explore the techniques required by traditional HPC programmers in porting HPC applications to FPGAs, using as an example the LFRic weather and climate model. We report on the first steps in porting LFRic to the FPGAs of the EuroExa architecture. We have used Vivado High-Level Syntheusywwi to implement a matrix-vector kernel from the LFRic code on a Xilinx UltraScale+ development board containing an XCZU9EG multiprocessor system-on-chip. We describe the porting of the code, discuss the optimization decisions, and report performance of 5.34 Gflop/s with double precision and 5.58 Gflop/s with single precision. We discuss sources of inefficiencies, comparisons with peak performance, comparisons with CPU and GPU performance (taking into account power and price), comparisons with published techniques, and comparisons with published performance, and we conclude with some comments on the prospects for future progress with FPGA acceleration of the weather forecast model. The realization of practical exascale-class high-performance computinems requires significant improvements in the energy efficiency of such systems and their components. This has generated interest in computer architectures which utilize accelerators alongside traditional CPUs. FPGAs offer huge potential as an accelerator which can deliver performance for scientific applications at high levels of energy efficiency. The EuroExa project is developing and building a high-performance architecture based upon ARM CPUs with FPGA acceleration targeting exascale-class performance within a realistic power budget.

Journal ArticleDOI
TL;DR: The process contains four phases, each providing a structured deliverable that reports the information required to replicate the measurement, and guides the researcher on a threat to validity analysis to be included in each deliverable.
Abstract: Energy consumption information for devices, as available in the literature, is typically obtained with ad hoc approaches, thus making replication and consumption data comparison difficult. We propose a process for measuring the energy consumption of a software application. The process contains four phases, each providing a structured deliverable that reports the information required to replicate the measurement. The process also guides the researcher on a threat to validity analysis to be included in each deliverable. This analysis ensures better reliability, trust, and confidence to reuse the collected consumption data. Such a process produces a structured consumption data for any kind of electronic device (IoT devices, mobile phones, personal computers, servers, etc.), which can be published and shared with other researchers fostering comparison or further investigations. A real case example demonstrates how to apply the process and how to create the required deliverables.

Journal ArticleDOI
TL;DR: In this article, the authors examine whether the exchange of arguments within a debate may connect critical pedagogy to the teachings of classical rhetorical paideia, which begins with the sophistic movement.
Abstract: In the movie Dead Poets Society, the on-screen teacher, John Keating, was using unconventional teaching methods, in order to exhort his students to think about themselves, the world and their position in it under a new perspective. Gaining a new perspective under which students will shape their individual way of thinking and will become critical and active citizens consists of a diachronic and essential goal of various pedagogical approaches. Within the context of the current research, our interest will be focused on two, distant in time pedagogical approaches, which emphatically underline the need as well as the possibility of students’ empowerment both as individuals and citizens: a) rhetorical paideia and b) critical pedagogy. In particular, we intend to examine whether the exchange of arguments within a debate may connect critical pedagogy to the teachings of classical rhetorical paideia, which begins with the sophistic movement (Egglezou, 2017). We firmly believe that such an attempt could contribute to the pedagogical empowerment of students as critical thinkers and active citizens within the modern educational system. Before the examination of the hypotheses which lead us to the writing of the current paper, it is important to describe the axes on which debate rotates. Debating consists of a formal dialogic process of exchanging arguments – according to certain rules – between two groups of participants. The controversy is referred to a carefully and intentionally chosen wedge issue of contemporary life, which is inextricably related to the historic, political and social context in which it arises (Erickson et al., 2003). During the debate each group of participants struggles to support Debate at the Edge of Critical Pedagogy and Rhetorical Paideia. Cultivating Active Citizens.

Journal ArticleDOI
TL;DR: The corpus developed in this study will help to foster research in an underresourced language of Urdu and will be useful in the development, comparison, and evaluation of cross-lingual plagiarism detection systems for Urdu-English language pair.
Abstract: Cross-lingual plagiarism occurs when the source (or original) text(s) is in one language and the plagiarized text is in another language. In recent years, cross-lingual plagiarism detection has attracted the attention of the research community because a large amount of digital text is easily accessible in many languages through online digital repositories and machine translation systems are readily available, making it easier to perform cross-lingual plagiarism and harder to detect it. To develop and evaluate cross-lingual plagiarism detection systems, standard evaluation resources are needed. The majority of earlier studies have developed cross-lingual plagiarism corpora for English and other European language pairs. However, for Urdu-English language pair, the problem of cross-lingual plagiarism detection has not been thoroughly explored although a large amount of digital text is readily available in Urdu and it is spoken in many countries of the world (particularly in Pakistan, India, and Bangladesh). To fulfill this gap, this paper presents a large benchmark cross-lingual corpus for Urdu-English language pair. The proposed corpus contains 2,395 source-suspicious document pairs (540 are automatic translation, 539 are artificially paraphrased, 508 are manually paraphrased, and 808 are nonplagiarized). Furthermore, our proposed corpus contains three types of cross-lingual examples including artificial (automatic translation and artificially paraphrased), simulated (manually paraphrased), and real (nonplagiarized), which have not been previously reported in the development of cross-lingual corpora. Detailed analysis of our proposed corpus was carried out using - gram overlap and longest common subsequence approaches. Using Word unigrams, mean similarity scores of 1.00, 0.68, 0.52, and 0.22 were obtained for automatic translation, artificially paraphrased, manually paraphrased, and nonplagiarized documents, respectively. These results show that documents in the proposed corpus are created using different obfuscation techniques, which makes the dataset more realistic and challenging. We believe that the corpus developed in this study will help to foster research in an underresourced language of Urdu and will be useful in the development, comparison, and evaluation of cross-lingual plagiarism detection systems for Urdu-English language pair. Our proposed corpus is free and publicly available for research purposes.

Journal ArticleDOI
TL;DR: It is an evidence that a complete training set of images including as many as possible license plates angles and sizes improves the performance of every classifier, and results evidence that including images affected by rain, snow, or fog in the training sets does not improve the accuracy of the classifier detecting license plates over pictures affected by these weather conditions.
Abstract: License Plate Detection (LPD) is one of the most important steps of an Automatic License Plate Recognition (ALPR) system because it is the seed of the entire recognition process. In indoor controlled environments, there are many effective methods for detecting license plates. However, outdoors LPD is still a challenge due to the large number of factors that may affect the process and the results obtained. It is an evidence that a complete training set of images including as many as possible license plates angles and sizes improves the performance of every classifier. On this line of work, numerous training sets contain images taken under different weather conditions. However, no studies tested the differences in the effectiveness of different descriptors for these different conditions. In this paper, various classifiers were trained with features extracted from a set of rainfall images using different kinds of texture-based descriptors. The accuracy of these specific trained classifiers over a test set of rainfall images was compared with the accuracy of the same descriptor-classifier pair trained with features extracted from an ideal conditions images set. In the same way, we repeat the experiment with images affected by challenging illumination. The research concludes, on one hand, that including images affected by rain, snow, or fog in the training sets does not improve the accuracy of the classifier detecting license plates over images affected by these weather conditions. Classifiers trained with ideal conditions images improve the accuracy of license plate detection in images affected by rainfalls up to 19% depending on the kind of extracted features. However, on the other hand, results evidence that including images affected by low illumination regardless of the kind of the selected feature increases the accuracy of the classifier up to 29%.

Journal ArticleDOI
TL;DR: In this paper, a logarithmic classifier was proposed for the classification of high-order M-QAM modulation schemes such as 4-Qam, 256-QA, 512-QAMA, and 1024-QAs.
Abstract: Computing the distinct features from input data, before the classification, is a part of complexity to the methods of automatic modulation classification (AMC) which deals with modulation classification and is a pattern recognition problem. However, the algorithms that focus on multilevel quadrature amplitude modulation (M-QAM) which underneath different channel scenarios is well detailed. A search of the literature revealed that few studies were performed on the classification of high-order M-QAM modulation schemes such as 128-QAM, 256-QAM, 512-QAM, and 1024-QAM. This work focuses on the investigation of the powerful capability of the natural logarithmic properties and the possibility of extracting higher order cumulant’s (HOC) features from input data received raw. The HOC signals were extracted under the additive white Gaussian noise (AWGN) channel with four effective parameters which were defined to distinguish the types of modulation from the set: 4-QAM∼1024-QAM. This approach makes the classifier more intelligent and improves the success rate of classification. The simulation results manifest that a very good classification rate is achieved at a low SNR of 5 dB, which was performed under conditions of statistical noisy channel models. This shows the potential of the logarithmic classifier model for the application of M-QAM signal classification. furthermore, most results were promising and showed that the logarithmic classifier works well under both AWGN and different fading channels, as well as it can achieve a reliable recognition rate even at a lower signal-to-noise ratio (less than zero). It can be considered as an integrated automatic modulation classification (AMC) system in order to identify the higher order of M-QAM signals that has a unique logarithmic classifier to represent higher versatility. Hence, it has a superior performance in all previous works in automatic modulation identification systems.

Journal ArticleDOI
TL;DR: Berdasarkan pedoman IDAI, terdapat lima pilar penanganan DM tipe-1 pada anak: injeksi insulin, pemantauan gula darah, nutrisi, aktivitas fisik, dan edukasi dukungan masyarakat dibutuhkan agar anak denganDM tipe 1 tertangani dengan baik.
Abstract: Insidens Diabetes Mellitus (DM) Tipe-1 pada anak di dunia dan Indonesia terus meningkat. Berdasarkan data Ikatan Dokter Anak Indonesia (IDAI), tercatat 1220 dengan DM tipe-1 pada tahun 2018. Kesadaran masyarakat dan tenaga kesehatan mengenai DM pada anak masih rendah, yang direfleksikan melalui tingginya angka anak yang terdiagnosis dengan DM tipe-1 saat mengalami ketoasidosis diabetikum mencapai 71% pada tahun 2017. Berdasarkan pedoman IDAI, terdapat lima pilar penanganan DM tipe-1 pada anak: injeksi insulin, pemantauan gula darah, nutrisi, aktivitas fisik, dan edukasi. IDAI merekomendasikan insulin minimal dua kali per hari menggunakan insulin basal dan kerja cepat. Pemantauan gula darah mandiri dilakukan minimal 4-6 kali per hari. Keterlibatan pemegang kebijakan, termasuk pemerintah, dan dukungan masyarakat dibutuhkan agar anak dengan DM tipe 1 tertangani dengan baik.

Journal ArticleDOI
Xianzhe Zhang1, Gang Chen1, Jiechen Wang1, Manchun Li1, Liang Cheng1 
TL;DR: There is a significant positive correlation between time and space in the South China Sea shipping network, and this spatial-temporal correlation has the characteristics of time dynamics and spatial heterogeneity, and the forecasting accuracy of the marine traffic volume based on the spatial-time autoregressive moving average model is better than the traditional time-series-based forecasting model.
Abstract: Research on the forecasting of marine traffic flows can provide a basis for port planning, planning the water area layout, and ship navigation management and provides a practical background for sustainable development evaluation of shipping. Most of the traditional marine traffic volume forecasting studies focus on the variation of the traffic volume of a single port or section in time dimension and less research on traffic correlation of associated ports in shipping networks. To reveal the spatial-temporal autocorrelation characteristics of the shipping network and to establish a suitable space-time forecasting model for marine traffic volume, this paper uses the AIS data from 2011 to 2016 for the South China Sea to construct a regional shipping network. The adjacent discrimination rule based on network correlation is proposed, and the traffic demand between ports is estimated based on the gravity model. On this basis, STARMA (space-time autoregressive moving average) model was introduced for deducing the interaction between he traffic volumes of adjacent ports in shipping network. The experimental results show that (1) there is a significant positive correlation between time and space in the South China Sea shipping network, and this spatial-temporal correlation has the characteristics of time dynamics and spatial heterogeneity; (2) the forecasting accuracy of the marine traffic volume based on the spatial-temporal model is better than the traditional time-series-based forecasting model, and the spatial-temporal model can better portray the spatial-temporal autocorrelation of maritime traffic.

Journal ArticleDOI
TL;DR: Comparing the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes shows that local models perform worse than global models in the classification performance, but have significantly better effort-aware prediction performance in the cross-validation and cross-project- validation scenarios.
Abstract: Just-in-time software defect prediction (JIT-SDP) is an active topic in software defect prediction, which aims to identify defect-inducing changes. Recently, some studies have found that the variability of defect data sets can affect the performance of defect predictors. By using local models, it can help improve the performance of prediction models. However, previous studies have focused on module-level defect prediction. Whether local models are still valid in the context of JIT-SDP is an important issue. To this end, we compare the performance of local and global models through a large-scale empirical study based on six open-source projects with 227417 changes. The experiment considers three evaluation scenarios of cross-validation, cross-project-validation, and timewise-cross-validation. To build local models, the experiment uses the k-medoids to divide the training set into several homogeneous regions. In addition, logistic regression and effort-aware linear regression (EALR) are used to build classification models and effort-aware prediction models, respectively. The empirical results show that local models perform worse than global models in the classification performance. However, local models have significantly better effort-aware prediction performance than global models in the cross-validation and cross-project-validation scenarios. Particularly, when the number of clusters k is set to 2, local models can obtain optimal effort-aware prediction performance. Therefore, local models are promising for effort-aware JIT-SDP.

Journal ArticleDOI
TL;DR: The system thus developed is capable of classifying 3D-accelerometer signals in real-time and to issue remote alerts while keeping power consumption low and improving on the present state-of-the-art solutions in the literature.
Abstract: Accidental falls are the main cause of fatal and nonfatal injuries, which typically lead to hospital admissions among elderly people. A wearable system capable of detecting unintentional falls and sending remote notifications will clearly improve the quality of the life of such subjects and also helps to reduce public health costs. In this paper, we describe an edge computing wearable system based on deep learning techniques. In particular, we give special attention to the description of the classification and communication modules, which have been developed by keeping in mind the limits in terms of computational power, memory occupancy, and power consumption of the designed wearable device. The system thus developed is capable of classifying 3D-accelerometer signals in real-time and to issue remote alerts while keeping power consumption low and improving on the present state-of-the-art solutions in the literature.

Journal ArticleDOI
TL;DR: This document is intended to help clarify the role of Twitter in the development of social media in the era of Brexit.
Abstract: Диагностика внебольничных пневмоний в амбулаторной практике продолжает оставаться трудной задачей. До настоящего времени приходится сталкиваться как с гипер , так и с гиподиагностикой, поздней госпитализацией детей. Авторы освещают основные критерии диагностики пневмонии в Украине и других странах мира, предлагают их унификацию для эффективной регистрации заболевания, возможности использования мировых методов диагностики и лечения. Ключевые слова: внебольничная пневмония, диагностика, амбулаторная практика.

Journal ArticleDOI
TL;DR: This work proposes the use of class expression learning (CEL), an ontology-based data mining technique, for the recognition of ADL, a technique based on combining the entities in the ontology, trying to find the expressions that best describe those activities.
Abstract: The miniaturization and price reduction of sensors have encouraged the proliferation of smart environments, in which multitudinous sensors detect and describe the activities carried out by inhabitants. In this context, the recognition of activities of daily living has represented one of the most developed research areas in recent years. Its objective is to determine what daily activity is developed by the inhabitants of a smart environment. In this field, many proposals have been presented in the literature, many of them being based on ad hoc ontologies to formalize logical rules, which hinders their reuse in other contexts. In this work, we propose the use of class expression learning (CEL), an ontology-based data mining technique, for the recognition of ADL. This technique is based on combining the entities in the ontology, trying to find the expressions that best describe those activities. As far as we know, it is the first time that this technique is applied to this problem. To evaluate the performance of CEL for the automatic recognition of activities, we have first developed a framework that is able to convert many of the available datasets to all the ontology models we have found in the literature for dealing with ADL. Two different CEL algorithms have been employed for the recognition of eighteen activities in two different datasets. Although all the available ontologies in the literature are focused on the description of the context of the activities, the results show that the sequence of the events produced by the sensors is more relevant for their automatic recognition, in general terms.

Journal ArticleDOI
TL;DR: The article suggests the development of new ML research lines to facilitate its application in the different domains and shows that the work being done in the ML (for classification and recommendation) research and industrial environment is far from earlier stages such as business requirements and analysis.
Abstract: Today, recommendation algorithms are widely used by companies in multiple sectors with the aim of increasing their profits or offering a more specialized service to their customers. Moreover, there are countless applications in which classification algorithms are used, seeking to find patterns that are difficult for people to detect or whose detection cost is very high. Sometimes, it is necessary to use a mixture of both algorithms to give an optimal solution to a problem. This is the case of the ADAGIO, a R&D project that combines machine learning (ML) strategies from heterogeneous data sources to generate valuable knowledge based on the available open data. In order to support the ADAGIO project requirements, the main objective of this paper is to provide a clear vision of the existing classification and recommendation ML systems to help researchers and practitioners to choose the best option. To achieve this goal, this work presents a systematic review applied in two contexts: scientific and industrial. More than a thousand papers have been analyzed resulting in 80 primary studies. Conclusions show that the combination of these two algorithms (classification and recommendation) is not very used in practice. In fact, the validation presented for both cases is very scarce in the industrial environment. From the point of view of software development life cycle, this review also shows that the work being done in the ML (for classification and recommendation) research and industrial environment is far from earlier stages such as business requirements and analysis. This makes it very difficult to find efficient and effective solutions that support real business needs from an early stage. It is therefore that the article suggests the development of new ML research lines to facilitate its application in the different domains.

Journal ArticleDOI
TL;DR: A pattern-based development approach for the Interaction Flow Modeling Language is presented as a way to finally automate repetitive specification tasks of CRUD operations and presents evidence of a significant productivity improvement obtained.
Abstract: Development and deployment technologies for data-intensive web applications have considerably evolved in the last years. Domain-specific frameworks or model-driven web engineering approaches are examples of these technologies. They have made possible to face implicit problems of these systems such as quick evolving business rules or severe time-to-market requirements. Both approaches propose the automation of redundant development tasks as the key factor for their success. The implementation of CRUD operations is a clear example of repetitive and recurrent task that may be automated. However, although web application frameworks have provided mechanisms to automate the implementation of CRUD operations, model-driven web engineering approaches have generally ignored them, so automation has not been properly faced yet. This paper presents a pattern-based development approach for the Interaction Flow Modeling Language as a way to finally automate repetitive specification tasks. Our approach is illustrated by defining and applying IFML patterns for CRUD operations. Additionally, a supporting tool, which enables automation, is shown. The suitability of our approach and the utility of its tool have been evaluated by its application into several real projects developed by a software company specialized in model-driven web application development. The results obtained present evidence of a significant productivity improvement obtained by the automation of the IFML specification of CRUD operations.

Journal ArticleDOI
TL;DR: For both the benchmark functions and clustering problem, the numerical results show that the hybrid approach for aeDE (HaeDE) outperforms others in both accuracy and computational cost.
Abstract: In this paper, a hybrid approach that combines a population-based method, adaptive elitist differential evolution (aeDE), with a powerful gradient-based method, spherical quadratic steepest descent (SQSD), is proposed and then applied for clustering analysis. This combination not only helps inherit the advantages of both the aeDE and SQSD but also helps reduce computational cost significantly. First, based on the aeDE’s global explorative manner in the initial steps, the proposed approach can quickly reach to a region that contains the global optimal value. Next, based on the SQSD’s locally effective exploitative manner in the later steps, the proposed approach can find the global optimal solution rapidly and accurately and hence helps reduce the computational cost. The proposed method is first tested over 32 benchmark functions to verify its robustness and effectiveness. Then, it is applied for clustering analysis which is one of the problems of interest in statistics, machine learning, and data mining. In this application, the proposed method is utilized to find the positions of the cluster centers, in which the internal validity measure is optimized. For both the benchmark functions and clustering problem, the numerical results show that the hybrid approach for aeDE (HaeDE) outperforms others in both accuracy and computational cost.

Journal ArticleDOI
TL;DR: In this paper, Malik et al. proposed a cut-off profil lipid pada pasien anak dengan sepsis and derajat keparahan penyakit pada bayi and anak di rumah sakit.
Abstract: Latar belakang . Sepsis adalah penyebab kematian terbanyak pada bayi dan anak di rumah sakit. Sepsis dan syok septik menyebabkan perubahan neuroendokrin dan metabolik termasuk konsentrasi dan komposisi lipid plasma serta lipoprotein. Namun hubungan profil lipid dan derajat keparahan penyakit pada anak dengan sepsis masih belum jelas. Tujuan. Mengetahui cut-off profil lipid pada pasien anak dengan sepsis dan hubungan profil lipid terhadap derajat keparahan penyakit pada anak dengan sepsis. Metode. Penelitian potong-lintang bulan Juli sampai Oktober 2017 di PICU, RSUP Haji Adam Malik. Sampel adalah anak usia 1 bulan sampai <18 tahun yang didiagnosis sepsis. Pasien yang menderita diabetes melitus, sindrom nefrotik, gizi lebih, gizi buruk, mendapatkan terapi statin dan insulin diekslusikan. Didapatkan 30 orang anak yang memenuhi kriteria. Derajat keparahan penyakit dinilai berdasarkan skor PELOD-2. Nilai cut-off masing-masing profil lipid berdasarkan kurva ROC. Hubungan antara profil lipid dan derajat keparahan penyakit dianalisis dengan uji chi-square atau Fisher exact. Nilai p < 0,05 dianggap bermakna secara statistik . Hasil. Dari kurva ROC didapatkan nilai cut-off kadar kolesterol total dan trigliserida 93,5 mg/dL dan 199 mg/dL, dan nilai cut-off untuk kadar HDL dan LDL adalah 20,5 mg/dL dan 48,5 mg/dL. Didapatkan hubungan profil lipid dan derajat keparahan penyakit (PELOD-2) dengan nilai p: kolesterol total 0,007, trigliserida 0,005, HDL 0,063 dan LDL 0,279. Kesimpulan. Terdapat hubungan bermakna antara kadar kolesterol total dan trigliserida dengan derajat keparahan penyakit berdasarkan skor PELOD-2. Tidak terdapat hubungan bermakna antara HDL dan LDL dengan derajat keparahan penyakit berdasarkan skor PELOD-2.

Journal ArticleDOI
TL;DR: This paper proposes the use of the MEdit4CEP-CPN model-driven tool as a solution for conducting such quantitative analysis of events of interest for an application domain, without requiring knowledge of any scientific programming language for implementing the pattern conditions.
Abstract: Complex event processing (CEP) is a computational intelligence technology capable of analyzing big data streams for event pattern recognition in real time. In particular, this technology is vastly useful for analyzing multicriteria conditions in a pattern, which will trigger alerts (complex events) upon their fulfillment. However, one of the main challenges to be faced by CEP is how to define the quantitative analysis to be performed in response to the produced complex events. In this paper, we propose the use of the MEdit4CEP-CPN model-driven tool as a solution for conducting such quantitative analysis of events of interest for an application domain, without requiring knowledge of any scientific programming language for implementing the pattern conditions. Precisely, MEdit4CEP-CPN facilitates domain experts to graphically model event patterns, transform them into a Prioritized Colored Petri Net (PCPN) model, modify its initial marking depending on the application scenario, and make the quantitative analysis through the simulation and monitor capabilities provided by CPN tools.