scispace - formally typeset
Search or ask a question
Author

Noha Shehab

Bio: Noha Shehab is an academic researcher from Mansoura University. The author has contributed to research in topics: Big data & k-nearest neighbors algorithm. The author has an hindex of 1, co-authored 3 publications receiving 3 citations. Previous affiliations of Noha Shehab include Information Technology Institute.

Papers
More filters
Book ChapterDOI
01 Jan 2021
TL;DR: In this article, the authors discuss the importance of preprocessing big data data in terms of analysis time, utilized resources percentage, storage, efficiency of analyzed data and the output gained information.
Abstract: Big data is a trending word in the industry and academia that represents the huge flood of collected data, this data is very complex in its nature. Big data as a term used to describe many concepts related to the data from technological and cultural meaning. In the big data community, big data analytics is used to discover the hidden patterns and values that give an accurate representation of the data. Big data preprocessing is considered an important step in the analysis process. It a key to the success of the analysis process in terms of analysis time, utilized resources percentage, storage, the efficiency of the analyzed data and the output gained information. Preprocessing data involves dealing with concepts like concept drift, data streams that are considered as significant challenges.

7 citations

Journal ArticleDOI
TL;DR: This study introduces a novel hybrid feature selection cloud-based model for imbalanced data based on the k nearest neighbor algorithm, which shows good performance and insights in both time usage and feature weights compared with the weighted nearest neighbor.
Abstract: Recently, big data are widely noticed in many fields like machine learning, pattern recognition, medical, financial, and transportation fields. Data analysis is crucial to converting data into more specific information fed to the decision-making systems. With the diverse and complex types of datasets, knowledge discovery becomes more difficult. One solution is to use feature subset selection preprocessing that reduces this complexity, so the computation and analysis become convenient. Preprocessing produces a reliable and suitable source for any data-mining algorithm. The effective features’ selection can improve a model’s performance and help us understand the characteristics and underlying structure of complex data. This study introduces a novel hybrid feature selection cloud-based model for imbalanced data based on the k nearest neighbor algorithm. The proposed model showed good performance compared with the simple weighted nearest neighbor. The proposed model combines the firefly distance metric and the Euclidean distance used in the k nearest neighbor. The experimental results showed good insights in both time usage and feature weights compared with the weighted nearest neighbor. It also showed improvement in the classification accuracy by 12% compared with the weighted nearest neighbor algorithm. And using the cloud-distributed model reduced the processing time up to 30%, which is deliberated to be substantial compared with the recent state-of-the-art methods.

4 citations

Book ChapterDOI
26 Oct 2019
TL;DR: The main objective of this paper is to survey the most recent research challenges for big data analysis and preprocessing processes.
Abstract: The rapid observed increase in using the Internet led to the presence of huge amounts of data. Traditional data technologies, techniques, and even applications cannot cope with the new data’s volume, structure, and types of styles. Big data concepts come to assimilate this non-stop flooding. Big data analysis process used to jewel the useful data and exclude the other one which provides better results with minimum resource utilization, time, and cost. Feature selection principle is a traditional data dimension reduction technique, and big data analytics provided modern technologies and frameworks that feature selection can be integrated with them to provide better performance for the principle itself and help in preprocessing of big data on the other hand. The main objective of this paper is to survey the most recent research challenges for big data analysis and preprocessing processes. The analysis is carried out via acquiring data from resources, storing them, then filtered to pick up the useful ones and dismissing the unwanted ones then extracting information. Before analyzing data, it needs preparation to remove noise, fix incomplete data and put it in a suitable pattern. This is done in the preprocessing step by various models like data reduction, cleaning, normalization, preparation, integration, and transformation.

Cited by
More filters
Journal ArticleDOI
18 Oct 2021
TL;DR: This study focuses on the design of metaheuristic optimization based on big data classification in a MapReduce (MOBDC-MR) environment and demonstrates promising performance over the other existing techniques under different dimensions.
Abstract: Big Data are highly effective for systematically extracting and analyzing massive data. It can be useful to manage data proficiently over the conventional data handling approaches. Recently, several schemes have been developed for handling big datasets with several features. At the same time, feature selection (FS) methodologies intend to eliminate repetitive, noisy, and unwanted features that degrade the classifier results. Since conventional methods have failed to attain scalability under massive data, the design of new Big Data classification models is essential. In this aspect, this study focuses on the design of metaheuristic optimization based on big data classification in a MapReduce (MOBDC-MR) environment. The MOBDC-MR technique aims to choose optimal features and effectively classify big data. In addition, the MOBDC-MR technique involves the design of a binary pigeon optimization algorithm (BPOA)-based FS technique to reduce the complexity and increase the accuracy. Beetle antenna search (BAS) with long short-term memory (LSTM) model is employed for big data classification. The presented MOBDC-MR technique has been realized on Hadoop with the MapReduce programming model. The effective performance of the MOBDC-MR technique was validated using a benchmark dataset and the results were investigated under several measures. The MOBDC-MR technique demonstrated promising performance over the other existing techniques under different dimensions.

18 citations

Journal ArticleDOI
26 Feb 2022-Sensors
TL;DR: A new comprehensive framework to precisely differentiate between malignant and benign prostate cancer is introduced and the developed diagnostic system provided consistent diagnostic performance using 10-fold and 5-fold cross-validation approaches, which confirms the reliability, generalization ability, and robustness of the developed system.
Abstract: Prostate cancer, which is also known as prostatic adenocarcinoma, is an unconstrained growth of epithelial cells in the prostate and has become one of the leading causes of cancer-related death worldwide. The survival of patients with prostate cancer relies on detection at an early, treatable stage. In this paper, we introduce a new comprehensive framework to precisely differentiate between malignant and benign prostate cancer. This framework proposes a noninvasive computer-aided diagnosis system that integrates two imaging modalities of MR (diffusion-weighted (DW) and T2-weighted (T2W)). For the first time, it utilizes the combination of functional features represented by apparent diffusion coefficient (ADC) maps estimated from DW-MRI for the whole prostate in combination with texture features with its first- and second-order representations, extracted from T2W-MRIs of the whole prostate, and shape features represented by spherical harmonics constructed for the lesion inside the prostate and integrated with PSA screening results. The dataset presented in the paper includes 80 biopsy confirmed patients, with a mean age of 65.7 years (43 benign prostatic hyperplasia, 37 prostatic carcinomas). Experiments were conducted using different well-known machine learning approaches including support vector machines (SVM), random forests (RF), decision trees (DT), and linear discriminant analysis (LDA) classification models to study the impact of different feature sets that lead to better identification of prostatic adenocarcinoma. Using a leave-one-out cross-validation approach, the diagnostic results obtained using the SVM classification model along with the combined feature set after applying feature selection (88.75% accuracy, 81.08% sensitivity, 95.35% specificity, and 0.8821 AUC) indicated that the system’s performance, after integrating and reducing different types of feature sets, obtained an enhanced diagnostic performance compared with each individual feature set and other machine learning classifiers. In addition, the developed diagnostic system provided consistent diagnostic performance using 10-fold and 5-fold cross-validation approaches, which confirms the reliability, generalization ability, and robustness of the developed system.

10 citations

Journal ArticleDOI
TL;DR: In this article , Liu et al. proposed a new concept suitable for data-driven robust optimization, and design two new methods for constructing datadriven uncertainty sets, i.e., partial least squares (PLS) or kernel principal component analysis (KPCA) to capture the underlying uncertainties and correlation of uncertain data, and the projection of uncertain projections on each principal component is obtained.

7 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed a novel integrated battery data cleaning framework to systematically solve data quality problems in cloud-based vehicle battery monitoring and management, which can further boost the practical application of the vehicle big data platform and Internet of vehicle.

2 citations

Journal ArticleDOI
TL;DR: In this article, the authors incorporated local spatial information in the Fuzzy k-plane clustering method to handle the noise present in the image and showed that the proposed FkPC_S method is superior in comparison with 10 related methods in the presence of noise.
Abstract: Human brain MRI images are complex, and matters present in the brain exhibit non-spherical shape. There exits uncertainty in the overlapping structure of brain tissue, i.e. a lack of distinctness in the class definition. Soft clustering methods can efficiently handle the uncertainty, and plane-based clustering methods are found to be more efficient for non-spherical shape data. Fuzzy k-plane clustering (FkPC) method is a soft plane-based clustering algorithms that can handle the uncertainty in medical images, but its performance degraded in the presence of noise. In this research work, we incorporated local spatial information in the FkPC clustering method to handle the noise present in the image. This spatial regularization term included in the proposed FkPC_S method refines the membership value of noisy pixel with the help of immediate neighbour pixels information. To show the effectiveness of the proposed FkPC_S method, extensive experiments are performed on one synthetic image and two publicly available human brain MRI datasets. The performance of the proposed method is compared with 10 related methods in terms of average segmentation accuracy and dice score. The experiments result shows that the proposed FkPC_S method is superior in comparison with 10 related methods in the presence of noise. Statistically significance difference and superior performance of the proposed method in comparison with other methods are also found using Friedman test.

1 citations