scispace - formally typeset
Search or ask a question

Showing papers in "Global transitions proceedings in 2022"


Journal ArticleDOI
TL;DR: In this paper , the authors provide an overview of data pre-processing in machine learning, focusing on all types of problems while building the machine learning problems and discuss flipping, rotating with slight degrees and others to augment the image data.
Abstract: This review paper provides an overview of data pre-processing in Machine learning, focusing on all types of problems while building the machine learning problems. It deals with two significant issues in the pre-processing process (i). issues with data and (ii). Steps to follow to do data analysis with its best approach. As raw data are vulnerable to noise, corruption, missing, and inconsistent data, it is necessary to perform pre-processing steps, which is done using classification, clustering, and association and many other pre-processing techniques available. Poor data can primarily affect the accuracy and lead to false prediction, so it is necessary to improve the dataset's quality. So, data pre-processing is the best way to deal with such problems. It makes the knowledge extraction from the data set much easier with cleaning, Integration, transformation, and reduction methods. The issue with Data missing and significant differences in the variety of data always exists as the information is collected through multiple sources and from a real-world application. So, the data augmentation approach generates data for machine learning models. To decrease the dependency on training data and to improve the performance of the machine learning model. This paper discusses flipping, rotating with slight degrees and others to augment the image data and shows how to perform data augmentation methods without distorting the original data.

56 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed a method to detect the leaf diseases in the tomato plant using support vector machine (SVM), convolutional neural network (CNN), and K-Nearest Neighbor (K-NN).
Abstract: Agriculture provides food to all the human beings even in case of rapid increase in the population. It is recommended to predict the plant diseases at their early stage in the field of agriculture is essential to cater the food to the overall population. But it unfortunate to predict the diseases at the early stage of the crops. The idea behind the paper is to bring awareness amongst the farmers about the cutting-edge technologies to reduces diseases in plant leaf. Since tomato is merely available vegetable, the approaches of machine learning and image processing with an accurate algorithm is identified to detect the leaf diseases in the tomato plant. In this investigation, the samples of tomato leaves having disorders are considered. With these disorder samples of tomato leaves, the farmers will easily find the diseases based on the early symptoms. Firstly, the samples of tomato leaves are resized to 256 × 256 pixels and then Histogram Equalization is used to improve the quality of tomato samples. The K-means clustering is introduced for partitioning of dataspace into Voronoi cells. The boundary of leaf samples is extracted using contour tracing. The multiple descriptors viz., Discrete Wavelet Transform, Principal Component Analysis and Grey Level Co-occurrence Matrix are used to extract the informative features of the leaf samples. Finally, the extracted features are classified using machine learning approaches such as Support Vector Machine (SVM), Convolutional Neural Network (CNN) and K-Nearest Neighbor (K-NN). The accuracy of the proposed model is tested using SVM (88%), K-NN (97%) and CNN (99.6%) on tomato disordered samples.

54 citations


Journal ArticleDOI
TL;DR: In this paper , a movie recommendation system using Cosine Similarity to recommend similar movies based on the one chosen by the user is described, this system performs sentiment analysis on the reviews of the movie chosen using machine learning.
Abstract: In the modern world, where technology is at the forefront of every industry, there has been an overload of information and data. Thus, a recommendation system comes in handy to deal with this large volume of data and filter out the useful information which is fast and relevant to the user's choice. This paper describes an approach to a movie recommendation system using Cosine Similarity to recommend similar movies based on the one chosen by the user. Although the existing recommendation systems get the job done, it does not justify if the movie is worth spending time on. To enhance the user experience, this system performs sentiment analysis on the reviews of the movie chosen using machine learning. Two of the supervised machine learning algorithms Naïve Bayes (NB) Classifier and Support Vector Machine (SVM) Classifier are used to increase the accuracy and efficiency. This paper also gives a comparison between NB and SVM on the basis of parameters like Accuracy, Precision, Recall and F1 Score. The accuracy score of SVM came out to be 98.63% whereas accuracy score of NB is 97.33%. Thus, SVM outweighs NB and proves to be a better fit for Sentiment Analysis.

26 citations


Journal ArticleDOI
TL;DR: In this paper , Hindi news articles from various news sources are collected and different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news.
Abstract: With the increase in social networks, more number of people are creating and sharing information than ever before, many of them have no relevance to reality. Due to this, fake news for various political and commercial purposes are spreading quickly. Online newspaper has made it challenging to identify trustworthy news sources. In this work, Hindi news articles from various news sources are collected. Preprocessing, feature extraction, classification and prediction processes are discussed in detail. Different machine learning algorithms such as Naïve Bayes, logistic regression and Long Short-Term Memory (LSTM) are used to detect the fake news. The preprocessing step includes data cleaning, stop words removal, tokenizing and stemming. Term frequency inverse document frequency(TF-IDF) is used for feature extraction. Naïve Bayes, logistic regression and LSTM classifiers are used and compared for fake news detection with probability of truth. It is observed that among these three classifiers, LSTM achieved best accuracy of 92.36%.

16 citations


Journal ArticleDOI
TL;DR: In this paper , Logistic Regression (LR) techniques were applied to UCI dataset to classify the cardiac disease and achieved 87.10% accuracy with a splitting ratio of 90:10, 80:20, 70:30, 40:60 and 50:50.
Abstract: One of the most life-threatening disease is cardiovascular disease. Its high mortality rate contributes to nearly 17 million deaths all over the world. Early diagnosis helps to treat the disease in timely manner to prevent mortality. There are several machine and deep learning techniques available to classify the presence and absence of the disease. In this research, Logistic Regression (LR) techniques is applied to UCI dataset to classify the cardiac disease. To improve the performance of the model, pre-processing of data by Cleaning the dataset, finding the missing values are done and features selection were performed by correlation with the target value for all the feature. The highly positive correlated features were selected. Then classification is performed by dividing the dataset into training. testing in the ratio of 90:10, 80:20, 70:30, 40:60 and 50:50. The splitting ratio of 90:10 gives best accuracy as listed below. The LR model obtained 87.10% accuracy.

15 citations


Journal ArticleDOI
TL;DR: In this article , a seagull optimization algorithm (SOA) based 3-Degree-of-freedom (DOF) proportional-integral-derivative (3DOFPID) controller is suggested for load frequency control of multi-area interconnected power system (MAIPS).
Abstract: In this paper, a seagull optimization algorithm (SOA) based 3-Degree-of-freedom (DOF) proportional-integral-derivative (3DOFPID) controller is suggested for load frequency control of multi-area interconnected power system (MAIPS). The considered MAIPS comprises of two areas with Thermal-Hydro-Nuclear generation units in each area. Analysis has been carried out by subjugating area-1 of MAIPS with a step load disturbance (SLD) of 10%. The sovereignty of presented SOA tuned 3DOFPID in regulating the stability of MAIPS is revealed upon comparing with the performances of 2DOFPID and conventional PID controllers. MIPS is analyzed dynamically without and with considering the nonlinear realistic constraint of communication time delays (CTDs) to demonstrate its impact on load frequency control performance. Simulation results disclosed that, MAIPS dynamical behavior is slightly more deviated up on considering CTDs and is justified.

11 citations


Journal ArticleDOI
TL;DR: In this paper , the authors used topic modeling techniques like LDA with BoW to detect trends, paradigms as predictors with an association of each topic which will be discovered through topic modeling.
Abstract: Change is the only constant. In many sectors, a change is being witnessed that is getting increasingly rapid. This carries a plethora of new innovation possibilities with it. This necessitates well-founded data about trends, future developments and their consequences. This study seeks to catch the new directions, paradigms as predictors with an association of each topic which will be discovered through topic modeling techniques like LDA with BoW. For this, empirical analysis on 3269 research articles from the Journal of Applied Intelligence was done, which were gathered during a 30-year span. The inferred topics were then structured into a way suitable for performing predictive analysis. This is significant in the sense that it will help to predict what technology will be encountered in the future, as well as how far human's ability to innovate and discover things may lead this world to. The final model using TF-IDF scores has outperformed the baseline model by a margin of 41%.

10 citations


Journal ArticleDOI
TL;DR: In this paper , an approach for DeepFake detection has been provided, where ResNext, a Convolutional Neural Network (CNN) algorithm and Long Short-Term Memory (LSTM) is used as an approach to detect the Deepfake videos.
Abstract: With the development of technology and ease of creation of fake content, the manipulation of media is carried out on a large scale in recent times. The rise of AI altered videos or Deepfake media has posed a great threat to media integrity and is being produced and spread widely across social media platforms, the detection of which is seen to be a major challenge. In this paper, an approach for Deepfake detection has been provided. ResNext, a Convolutional Neural Network (CNN) algorithm and Long Short-Term Memory (LSTM) is used as an approach to detect the Deepfake videos. The approach and its steps are discussed in this paper. The accuracy obtained for the developed Deep-Learning (DL) model over the Celeb-Df dataset is 91%.

9 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed a secure reliable message communication (SEC-RMC) protocol using Mosquitto MQTT message broker with cryptographic enhancements to offer security services and also provide the mutual authentication in the IoT environment at the transport layer.
Abstract: Recent advancements in the communication protocols and the networking technologies have enabled connectivity of a wide range of objects, resulting in the Internet of Things (IoT) network. The protocols like MQ Telemetry Transport (MQTT), as well as Constrained Application Protocol (CoAP) are moderately capable of providing the management of heterogeneous wireless sensor networks even in an environment with very limited bandwidth. In this paper, we develop a lightweight encryption algorithm to obtain reliable secure data transmission between IoT devices. We propose a Secure Reliable Message Communication (SEC-RMC) protocol using Mosquitto MQTT message broker with cryptographic enhancements to offer security services and also provide the mutual authentication in the IoT environment at the transport layer. The proposed scheme decreases the number of messages transmitted between the devices. Also, the authentication scheme provides resistance to DNS hacking, routing table poisoning and packet mistreatment. On comparison with the existing methods, the transmission time has been reduced by 80% in this work.

8 citations


Journal ArticleDOI
TL;DR: In this article , the use of graph theory and the concepts of graph coloring to schedule resources better and, at the same time, have lesser depended on tasks to ensure lesser conflicts among the resources.
Abstract: Cloud Computing has redefined the industrial side of application development and the delivery of software that has a strong base with the creation of both web apps and mobile apps. There is a significant shift in the mindset of technology organizations about the creation of specified services with the introduction of concepts like pay-for-use and subscription-based modeling. The distributed nature of resources makes it easier on the client-side to focus on the business aspects more than the maintenance of infrastructure and integrations scenarios. In a huge setup with a possibility of conflict among resources, it is essential to have a hierarchy that provides optimized execution time of processes and also ensures minimal waiting time of the resources involved in the processes. This paper proposes the use of Graph theory and the concepts of Graph coloring to schedule resources better and, at the same time, have lesser depended on tasks to ensure lesser conflicts among the resources. This paper also explores a scenario that helps understand the use of graph coloring techniques to handle better the dependency of involved processes in a cloud setup provided by the cloud vendors.

7 citations


Journal ArticleDOI
TL;DR: In this article , the authors developed an Arabic text classification model using various algorithms such as Multinomial Naïve Bayesian (MNB), Bernoulli Naive Bayes (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN).
Abstract: Arabic text classification is one application of Natural Language Processing (NLP). It has been used to analyze and categorize Arabic text. Analyzing text has become an essential part of our lives because of the increasing number of text data which makes text classification a big data problem. Arabic text classification systems become significant to maintain vital information in many domains such as education, and health sector, and public services. In the presented research work, the Arabic text classification model is developed using various algorithms namely Multinomial Naïve Bayesian (MNB), Bernoulli Naïve Bayesian (BNB), Stochastic Gradient Descent (SGD), Logistic Regression (LR), Support vector classifier (SVC), Linear SVC, and convolutional neural networks (CNN). These algorithms have been implemented utilizing the Al-Khaleej dataset. The experiments are carried out with various representation models and it is observed that CNN with character level model outperforms others. The result of CNN exceeds the state-of-the-art machine learning method with an accuracy equal to 98. The presented methods will be useful in different domains, particularly on social media.

Journal ArticleDOI
TL;DR: In this paper , the authors used transfer learning techniques to categorize various food products into their appropriate categories using Efficientnetb0, a transfer learning technique, with an accuracy of 80%.
Abstract: In the subject of object detection using computer vision, image classification is becoming a prominent and promising aspect. However, studies have just scratched the surface. Till now, the superficials of food image classification in order to assess the nutritional abilities of people of different nationalities, The categorization of their traditional cuisine has a significant influence. Existing models categorize different sorts of foods. These models can only categorize a small number of meals at a given time. However, in a single model, the maximum number of foods must be recognized. This work focuses on the creation of a recognition model that uses transfer learning techniques to categorize various food products into their appropriate categories. Using Efficientnetb0, a transfer learning technique, the developed model classified 101 distinct food kinds with an accuracy of 80%. When compared to other state of art models, our model performed with best accuracy.

Journal ArticleDOI
TL;DR: In this paper , the authors used K-Means clustering and Support Vector Machine Algorithm in MATLAB to detect and distinguish different types of leaf and skin diseases in agricultural images.
Abstract: Agricultural production is something on which the economy significantly relies. Leaf diseases in agriculture are the key issue for every nation, as the food demand is expanding at a rapid speed due to a rise in population. Skin disorders are usually seen in animals and humans, it is a particular sort of illness caused by germs or infection. Early and accurate identification and diagnosis of leaf and skin diseases are vital to keeping them from spreading. Image processing techniques can be used for disease detection which involves mathematical equations and mathematical transformations. For humans eyes image is a mixture of RGB colour, because of these colours we can extract some of the features from the image, but modern computer stores image in a mathematical format which means computer sees the image as numbers, hence after evaluating the image as a number arrays or matrix we will perform various transforms on them, these transforms will extract specific details from the picture, before transforming the image must go under various operation like feature adjustment which is also carried out mathematically. The project is implemented using K-Means Clustering and Support Vector Machine Algorithm in MATLAB through which we can detect and distinguish different types of leaf and skin diseases.

Journal ArticleDOI
TL;DR: In this paper , the reliability of the over current protection (OCP) scheme in protecting microgrids with inverter interfaced RES for low voltage distribution networks was evaluated using the PSCAD/EMTDC simulation software.
Abstract: This paper aimed to demonstrate the reliability of the Over Current protection (OCP) scheme in protecting microgrids with inverter interfaced RES for low voltage distribution networks. To prove this reliability, the PSCAD/EMTDC simulation software was used to conduct simulations for the OCP scheme, while comparing throughout grid-connected mode with and without PV generation, as well as in island mode. The computations are carried out using a model of a CIGRE low voltage distribution system. The OCP average relay tripping time for SLG faults through grid mode without PV has been 0.131 s, & 0.121 s for LLL faults. With regards to PV generators, the average relay tripping time increased to 0.199 s & 0.135 s, including both. This is due to the fault current contributed by PV generation inclusion, which restricts the current seen by the predefined OC relays. The findings revealed that some OC relays failed to trip in island mode causing a loss of coordination and a decrease in fault currents. The system was further tested for different generation levels (15%, 57%, and 81%) in island mode and gave a negligible difference in average tripping time for different generation levels.

Journal ArticleDOI
TL;DR: In this paper , regular tweets are analyzed by sentiment analysis technique in Hadoop Eco system, which is a distributed environment which process with big and huge variety of dataset which support processing components that collectively called Hadoops ecosystem.
Abstract: In Recent, Twitter is the well-known public Network acquires a huge number of tweets. Sentiment analysis in twitter data are tremendously valuable in social media observing as it allows getting an overview of extensive global opinion in certain issue. This data are utilized for industrial, government, social and economic approaches by analyzing the tweets as per the requirement of the user. Processing and storing these data are more complicated to analyze. Hadoop is a distributed environment which process with Big and Huge variety of dataset which supports processing components that collectively called Hadoop Ecosystem. In this paper, regular tweets are analyzed by sentiment analysis technique in Hadoop Eco system. Dataset are taken from Kaggle data repository. This research has done by Apache Pig in Demonetization and Covid 19 twitter dataset.

Journal ArticleDOI
TL;DR: In this paper , a waste segregator and decomposer (WSD) model focuses on the segregation of the non-biodegradable wastes automatically using AI techniques and also to frame an effective degradation strategy for commonly used synthetic plastics using novel microorganisms and associated enzymes.
Abstract: The increasing accumulation of mess up plastic waste in natural environments creates a serious threat to our oceans, human health, flora and fauna. There is an urgent need to develop new approaches towards the disposal of non-biodegradable waste materials like plastics. It is now possible to develop novel biological treatment strategies concerning non-biodegradable waste (plastics) management because of the increasing literatures on the microbial degradation of the synthetic polymers like plastics. The valuable enzyme sources of microbes are capable of degrading synthetic polymers. The proposed waste segregator and decomposer (WSD) model focuses on the segregation of the non-biodegradable wastes automatically using AI techniques and also to frame an effective degradation strategy for commonly used synthetic plastics using novel microorganisms and associated enzymes.

Journal ArticleDOI
TL;DR: In this paper , an attempt is made to design logic gates using reversible gates and some of the higher end circuits are also designed such as Binary to Grey, grey-to-Binary, Adder, Subtractor etc.
Abstract: Reversible logic is also called information lossless logic, since the information embedded in the circuits can be recovered, if lost. Research carried out by Landauer and Bennett proved that the energy dissipation would not occur if computation is made reversible. With this aim a number of reversible gates were designed and invented. As examples like- the Fredkin gate, the Toffoli gate, the Peres gate, and the Feynman gate. Reversible logic has extensive applications and is considered as one of the futuristic technologies. But the logic circuit designing is based on logic gates, which are non-reversible. This paper presents design of logic gates using reversible gates. These logic gates help in future implementation of higher end circuits. In this paper an attempt is made to design logic gates using reversible gates and some of the higher end circuits are also designed such as Binary-to-Grey, grey-to-Binary, Adder, Subtractor etc.

Journal ArticleDOI
TL;DR: In this article , various black-box models are compared using different performance metrics, and explanations of these models are provided using a model-agnostic explainer, and the best modelexplainer combo is proposed with potential areas of future exploration.
Abstract: Machine learning is fast becoming one of the central solutions to various real-world problems. Thanks to powerful hardware and large datasets, training a machine learning model has become easier and more rewarding. However, an inherent problem in various machine learning models is a lack of understanding of what goes on ’under the hood’. A lack of explainability and interpretability leads to lower levels of trust in the model's predictions, which means it can't be used in sensitive applications like diagnosing medical ailments and detecting terrorism. This has led to various advances in making machine learning explainable. In this paper various black-box models are used to classify credit card defaulters. These models are compared using different performance metrics, and explanations of these models are provided using a model-agnostic explainer. Finally, the best model-explainer combo is proposed with potential areas of future exploration.

Journal ArticleDOI
TL;DR: In this paper , the authors discuss the many difficulties that optical fiber installation and processing face, and illustrate the many innovative methods for speeding up and simplifying their work have been identified.
Abstract: The broad spectrum of optical wireless communication meets the needs of high-speed wireless communication, which is optical wireless communication's primary advantage over traditional wireless communication technologies. Optical fiber communications, as significant use of laser technology, are vital facilitators for the contemporary information era. With the rise of new technologies such as the Internet of Things, big data, cloud computing, virtual reality, and artificial intelligence, there is an increasing need in society for high-capacity data transmission, raising the bar for optical fiber communication technology. Many new technologies are coming our way, which has made our lives a lot simpler. But now that this new technology has arrived, we've run out of patience. To do whatever in the shortest possible period. Furthermore, in today's fast-paced society, sluggish walkers are quickly left behind while the rest of the world keeps moving forward. Many innovative methods for speeding up and simplifying our work have been identified. With optical fiber technology, our scientists have achieved a breakthrough, allowing us to go from one place to another in a matter of seconds. Wireless optical fiber communication networks are discussed in this research. This study also illustrates the many difficulties that optical fiber installation and processing face.

Journal ArticleDOI
TL;DR: Arduino UNO-based solar powered Grasscutter designed to cut healthy grass in places like parks, hotels, public places, etc., The Grasscutters is designed through IoT (Internet of Things) technology, which is controlled remotely through Blynk application supported with Bluetooth module as discussed by the authors .
Abstract: Arduino UNO-based Solar powered Grasscutter designed to cut healthy grass in places like parks, hotels, public places, etc., The Grasscutter is designed through IoT (Internet of Things) technology, which is controlled remotely through Blynk application supported with Bluetooth module. The proposed model consists of hardware components like Arduino UNO, Solar panel, DC motor, motor driver, rechargeable batteries and Bluetooth module. The designed model is programmed through Arduino IDE to control the operation of the Grasscutter. The control mechanism and movements such as Forward movement, Backward movement, Right movement, Left movement, On mechanism, Off mechanism and Stop function for the Grasscutter prototype. An ultrasonic sensor connected to the head of the model avoids the system from colliding with obstacles while in movement.

Journal ArticleDOI
TL;DR: In this paper , a computer automation system was proposed to detect the diseases of the citrus leaves using machine learning and deep learning techniques, which achieved an accuracy of 89.9% and 99.89% with an error of 0.0219.
Abstract: The citrus family provides healthy fruits to humans. The quality and quantity of the citrus fruits depend on the quality of the citrus leaves. Due to the diseases of citrus leaves, the quality and productivity of citrus fruit are degraded. This paper provides a computer automation system to detect the diseases of the citrus leaves using machine learning and deep learning techniques. In this paper, images of citrus leaves are captured using an Android Smartphone in natural environmental light. Three different classes of citrus leaves are collected using the Smartphone and destructive method. These are citrus healthy, citrus greening, and citrus CTV (Citrus Tristeza virus). Citrus images are processed for image resizing, image noise removal, image enhancement, and feature extraction. Features of the images are calculated using Gray Level Co-occurrence Matrix in the color and gray domain of the images. Finally, the detection and classification are done using K- Nearest Neighbor (KNN) and Deep Neural Network (DNN) classifier. The accuracy of the KNN classifier is 89.9% for the K value 3 whereas the accuracy of DNN is 99.89% with an error of 0.0219. The proposed system helps the farmer to detect diseases at an early stage.

Journal ArticleDOI
TL;DR: In this article , the use of word embedding technique such as Word2Vec embedding is used for code analysis is proposed to analyze and process macro code written in visual basic language to understand and detect the attack vector before opening the documents.
Abstract: Macro-based malware attacks are on the rise in recent cyber-attacks using malicious code written in visual basic code which can be used to target computers to achieve various exploitations. Macro malware can be obfuscated using various tools and easily evade antivirus software. To detect this macro malware, several methods of machine learning techniques have been proposed with an inadequate dataset for both benign and malicious macro codes which are not reproducible and evaluated on unbalanced datasets. In this paper, use of word embedding technique such as Word2Vec embedding is used for code analysis is proposed to analyze and process macro code written in visual basic language to understand and detect the attack vector before opening the documents. The proposed word embedding technique, called Obfuscated-Word2vec is proposed to detect obfuscated keywords, Obfuscated function names from the macro code and classify them as obfuscated or benign function calls which are later used as feature vectors to train models to extract the most relevant features from macro code and even to help the classifiers to detect more accurately as a downloader, dropper malware, shellcode, PowerShell exploits, etc. Experimental results show that proposed method is reproducible and could detect completely new macro malware by analyzing the macro code by the help of Random forest classifier with 82.65 percent accuracy.

Journal ArticleDOI
TL;DR: This paper proposed a model named Term Frequency-Inverse Document Frequency (TF-IDF) Summarization Tool which implements a text analytics approach to generate a meaningful summary, which is used to identify the topic or context of the text statistically.
Abstract: Text summarization is an important Natural Language Processing problem. Manual text summarization is a laborious and time-consuming task. Owing to the advancements in the field of Natural Language Processing, this task can be effectively moved from manual to automated text summarization. This paper proposes a model named Term Frequency-Inverse Document Frequency (TF-IDF) Summarization Tool which implements a text analytics approach called TF-IDF to generate a meaningful summary. TF-IDF is used to identify the topic or context of the text statistically. As data today is mostly unstructured in nature, this paper aims to explore a combination of NLP techniques such as Speech Recognition and Optical Character Recognition to summarize multimedia data as well. The TF-IDF Summarization Tool is seen to produce summaries with Jaccard's Similarity value of 67% and Rogue-1 of 64.9%, Rogue-2 of 48.2%, and Rogue-L of 56.4% based on a self-developed dataset.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a method using simulated annealing and partial least squares regression for gene selection from six open-source microarray cancer gene-expression datasets, which was used to fit support vector machines, random-forest, voting-classifiers, and multilayer-perceptron classifiers.
Abstract: Accurate characterization of the molecular nature of a tumour is important for its effective treatment. Therefore, the classification of tumours is an important research problem. The application of data science and machine learning techniques to the gene-expression data has enabled computational researchers to separate the gene-expression samples into different classes based on the difference in gene-expression patterns. This has also facilitated the discovery of new classes and new disease biomarkers. However, gene-expression data is very high-dimensional and noisy. The number of features is high in comparison to the number of samples. The classes in the data are often imbalanced. Out of thousands of genes, only a few are relevant to the disease. The machine learning approaches for the classification of gene-expression samples need to address all these issues to obtain reliable performance. This paper proposed a method using simulated annealing and partial least squares regression for gene selection from six open-source microarray cancer gene-expression datasets. Selected subset of genes was used to fit support-vector machines, random-forest, voting-classifiers, and multilayer-perceptron classifiers. A comparison with existing methods shows the superior performance of the proposed method.


Journal ArticleDOI
TL;DR: In this paper , the authors analyzed the response of MMC-HVDC under different DC and AC faults conditions for five-level MMC HVDC systems, to better understand systems under fault.
Abstract: The development of HVDC transmission technology using an MMC has been promoted, by overcoming the drawbacks of traditional VSC technology. The extension of flexible transmission to overhead lines, especially the use of HVDC transmission based on an MMC, raises the issue of DC fault. So identification of fault, clearing the dc fault, and design of fast-acting protection system operating against fault becomes significant. This article provides insights on the monopolar structured MMC and operational characteristics, fault analysis, and a fault protection scheme. DC line faults on HVDC lines using MMC-VSC are major issues; isolation of complete system is not a viable option. It is observed that Pole to ground fault is the most common fault, which leads to generous overcurrent in the AC grid and results in converter valves getting damaged. This article analyzes the response of MMC-HVDC under different DC and AC faults conditions for five-level MMC HVDC systems, to better understand systems under fault. Faults also have an impact on the converter stations' performance. The voltages fluctuate in faulty situations. In comparison to the inverter station, the rectifier station has the most impact. Simulation is performed out in PSCAD software. The correctness and effectiveness of DC and AC fault analysis helps to check the capability of locating fault occurring on HVDC transmission lines quickly and accurately

Journal ArticleDOI
TL;DR: In this paper , the top applicants were rated using content-based suggestion, which uses cosine similarity to find the curriculum vitae that are the most comparable to the job description supplied and KNN algorithm is used to pick and rank Curriculum Vitaes (CV) based on job descriptions in huge quantities.
Abstract: Finding acceptable applicants for a vacant job might be a difficult process, especially when there are many prospects. The manual process of screening resumes could stymie the team's efforts to locate the right individual at the right moment. The laborious screening may be greatly aided by an automated technique for screening and ranking applicants. In our work, the top applicants might be rated using content-based suggestion, which uses cosine similarity to find the curriculum vitae that are the most comparable to the job description supplied and KNN algorithm is used to pick and rank Curriculum Vitaes (CV) based on job descriptions in huge quantities. Experimental results indicate the performance of the proposed system as an average text parsing accuracy of 85% and a ranking accuracy of 92%.

Journal ArticleDOI
TL;DR: In this paper , an extensive literature review was conducted to identify risk factors that may affect cloud computing adoption and various risk factors were identified, after feature selection and identification of risk factors, utilized to select most effective features using linear regression algorithms.
Abstract: Major backbone of today's competitive and upcoming market is definitely becoming Cloud computing & hence corporate utilize capabilities of cloud computing services. To improve security initiatives by cloud computing service or CRPs, novel types of tools and protocols finds themselves always in demand. In order to build comprehensive risk assessment methodology, extensive literature review was conducted to identify risk factors that may affect cloud computing adoption. In this context various risk factors were identified. After feature selection and identification of risk factors, utilized to select most effective features using linear regression algorithms. Then AI-ML techniques like Decision Tree (DTC), Randomizable Filter Classifier, k-star with RMSE method is used to analyse threats within CC environment. Experimental outcomes depicted that division of dataset to (95%-5%) provided best result out of every remaining partitioning and moreover put forth that DTC algorithm provided best outcomes out of entire data set used in experimental setups.

Journal ArticleDOI
TL;DR: In this paper , the Trusted Anonymous Lightweight Attacker Detection (TALAD) scheme is presented to identify and isolate hostile sensor nodes in a cloud-assisted WSN-IoT system.
Abstract: A network system called Wireless Sensor Network is made up of wireless sensor node devices that are spread at random (WSN). WSNs are a critical paradigm for the Internet of Things' evolution (IoT). Strong security measures must be done to protect the network from security threats and malicious assaults in order to make it more efficient. To identify and isolate hostile sensor nodes in a cloud-assisted WSN-IoT system, the Trusted Anonymous Lightweight Attacker Detection (TALAD) scheme is presented. The TALAD strategy creates a routing path to the cloud with highly trusted nodes, subject to a desired path length limit. Using the binomial algebraic theorem, the node identities are formed with bogus identities, and the original identity is hidden from the other nodes in the network. If only the forward key and the reverse key string are matched, the nodes' original identities are exposed. The forward and reverse keys are mapped using a context-free grammar rule. Even when a major chunk of the network drops to forward packets, TALAD successfully avoids incursions, according to the simulation results.

Journal ArticleDOI
TL;DR: In this article , a comparative study of the anomaly of class imbalance and ways to implement its solutions are analyzed to prove certain results, the effectiveness of the algorithms varies on the set of data and the instance in which it is used.
Abstract: In today's world, a lot of processes are carried over the Internet to make our lives easier. But, on the other hand, many unauthorized and illegitimate activities that take place over it are causing major trouble for the growth of the economy. One of them being the fraud cases that misguide people and lead to financial losses. Major frauds reported recently occur through the malicious techniques that are made to work on Credit cards that are used for financial transactions over online platforms. Hence, it is the need of the hour to investigate this problem. Several companies have started their study in this regard and have formulated data driven models that use various Machine Learning algorithms and models on datasets to analyse false activity. Several techniques used are Support Vector Machine, Gradient Boost, Random Forest and their mixtures. In this comparative study, the anomaly of class imbalance and ways to implement its solutions are analysed to prove certain results. The effectiveness of the algorithms varies on the set of data and the instance in which it is used. They prove that all algorithms despite of all the calculations show certain imbalance at some point in the study The limitations have also been evaluated and highlighted to help in future. In this study, it is found that although logistic regression had more accuracy but when the learning curves were plotted it signified that the majority of the algorithm under fit while KNN has the ability only to learn. Hence KNN is better classifier for the credit card fraud detection.