scispace - formally typeset
Search or ask a question
Author

Nur'Aini Abdul Rashid

Bio: Nur'Aini Abdul Rashid is an academic researcher from Universiti Sains Malaysia. The author has contributed to research in topics: Parallel algorithm & Search algorithm. The author has an hindex of 11, co-authored 59 publications receiving 932 citations. Previous affiliations of Nur'Aini Abdul Rashid include Princess Nora bint Abdul Rahman University.


Papers
More filters
Journal ArticleDOI
TL;DR: An overview on the data mining techniques that have been used to predict students performance and how the prediction algorithm can be used to identify the most important attributes in a students data is provided.

558 citations

Journal ArticleDOI
TL;DR: This review paper has consolidated the papers reviewed inline to the disciplines, model, tasks and methods involved in data mining in terms of method, algorithms and results.

209 citations

Journal ArticleDOI
TL;DR: This research embeds a particle swarm optimization as feature selection into three renowned classifiers, namely, naive Bayes, K-nearest neighbor, and fast decision tree learner, with the objective of increasing the accuracy level of the prediction model.
Abstract: Women who have recovered from breast cancer (BC) always fear its recurrence. The fact that they have endured the painstaking treatment makes recurrence their greatest fear. However, with current advancements in technology, early recurrence prediction can help patients receive treatment earlier. The availability of extensive data and advanced methods make accurate and fast prediction possible. This research aims to compare the accuracy of a few existing data mining algorithms in predicting BC recurrence. It embeds a particle swarm optimization as feature selection into three renowned classifiers, namely, naive Bayes, K-nearest neighbor, and fast decision tree learner, with the objective of increasing the accuracy level of the prediction model.

130 citations

Journal ArticleDOI
TL;DR: This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions, and had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pk notsRE methods.
Abstract: The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedio...

25 citations

Proceedings ArticleDOI
12 Jun 2012
TL;DR: A framework of a hybrid approach using Naïve Bayes for prediction and Genetic Algorithm for parameter optimization to solve the childhood obesity prediction problem that has a small ratio of negative samples compared to the positive samples.
Abstract: Naive Bayes is a data mining technique that has been used by many researchers for predictions in various domains. This paper presents a framework of a hybrid approach using Naive Bayes for prediction and Genetic Algorithm for parameter optimization. This framework is a solution applied to the childhood obesity prediction problem that has a small ratio of negative samples compared to the positive samples. The Naive Bayes has shown a weakness in prediction involving a zero value parameter. Therefore, in this paper we propose a solution for this weakness which is using Genetic Algorithm optimization. The study begins with a literature review of the childhood obesity problem and suitable data mining techniques for childhood obesity prediction. As a result of the review, 19 parameters were selected and the Naive Bayes technique was implemented for childhood obesity prediction. The initial experiment to identify the usability of the proposed approach has indicated a 75% improvement in accuracy.

22 citations


Cited by
More filters
Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations

Journal ArticleDOI
TL;DR: An App (called Healthcare Data Gateway (HGD) architecture based on blockchain is proposed to enable patient to own, control and share their own data easily and securely without violating privacy, which provides a new potential way to improve the intelligence of healthcare systems while keeping patient data private.
Abstract: Healthcare data are a valuable source of healthcare intelligence. Sharing of healthcare data is one essential step to make healthcare system smarter and improve the quality of healthcare service. Healthcare data, one personal asset of patient, should be owned and controlled by patient, instead of being scattered in different healthcare systems, which prevents data sharing and puts patient privacy at risks. Blockchain is demonstrated in the financial field that trusted, auditable computing is possible using a decentralized network of peers accompanied by a public ledger. In this paper, we proposed an App (called Healthcare Data Gateway (HGD)) architecture based on blockchain to enable patient to own, control and share their own data easily and securely without violating privacy, which provides a new potential way to improve the intelligence of healthcare systems while keeping patient data private. Our proposed purpose-centric access model ensures patient own and control their healthcare data; simple unified Indicator-Centric Schema (ICS) makes it possible to organize all kinds of personal healthcare data practically and easily. We also point out that MPC (Secure Multi-Party Computing) is one promising solution to enable untrusted third-party to conduct computation over patient data without violating privacy.

884 citations

Book
01 Nov 2005
TL;DR: In this article, the authors present an efficient reduction from constrained to unconstrained maximum agreement subtree for the maximum quartet consistency problem, which can be solved by using semi-definite programming.
Abstract: Expression.- Spectral Clustering Gene Ontology Terms to Group Genes by Function.- Dynamic De-Novo Prediction of microRNAs Associated with Cell Conditions: A Search Pruned by Expression.- Clustering Gene Expression Series with Prior Knowledge.- A Linear Time Biclustering Algorithm for Time Series Gene Expression Data.- Time-Window Analysis of Developmental Gene Expression Data with Multiple Genetic Backgrounds.- Phylogeny.- A Lookahead Branch-and-Bound Algorithm for the Maximum Quartet Consistency Problem.- Computing the Quartet Distance Between Trees of Arbitrary Degree.- Using Semi-definite Programming to Enhance Supertree Resolvability.- An Efficient Reduction from Constrained to Unconstrained Maximum Agreement Subtree.- Pattern Identification in Biogeography.- On the Complexity of Several Haplotyping Problems.- A Hidden Markov Technique for Haplotype Reconstruction.- Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombination Event.- Networks.- A Faster Algorithm for Detecting Network Motifs.- Reaction Motifs in Metabolic Networks.- Reconstructing Metabolic Networks Using Interval Analysis.- Genome Rearrangements.- A 1.375-Approximation Algorithm for Sorting by Transpositions.- A New Tight Upper Bound on the Transposition Distance.- Perfect Sorting by Reversals Is Not Always Difficult.- Minimum Recombination Histories by Branch and Bound.- Sequences.- A Unifying Framework for Seed Sensitivity and Its Application to Subset Seeds.- Generalized Planted (l,d)-Motif Problem with Negative Set.- Alignment of Tandem Repeats with Excision, Duplication, Substitution and Indels (EDSI).- The Peres-Shields Order Estimator for Fixed and Variable Length Markov Models with Applications to DNA Sequence Similarity.- Multiple Structural RNA Alignment with Lagrangian Relaxation.- Faster Algorithms for Optimal Multiple Sequence Alignment Based on Pairwise Comparisons.- Ortholog Clustering on a Multipartite Graph.- Linear Time Algorithm for Parsing RNA Secondary Structure.- A Compressed Format for Collections of Phylogenetic Trees and Improved Consensus Performance.- Structure.- Optimal Protein Threading by Cost-Splitting.- Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment.- Rotamer-Pair Energy Calculations Using a Trie Data Structure.- Improved Maintenance of Molecular Surfaces Using Dynamic Graph Connectivity.- The Main Structural Regularities of the Sandwich Proteins.- Discovery of Protein Substructures in EM Maps.

492 citations

Journal ArticleDOI
TL;DR: This work provides a guide to the currently available alignment-free sequence analysis tools and addresses questions about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research.
Abstract: Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.

367 citations

Proceedings ArticleDOI
02 Jul 2018
TL;DR: An ITiCSE working group conducted a systematic review of the introductory programming literature to explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research.
Abstract: As computing becomes a mainstream discipline embedded in the school curriculum and acts as an enabler for an increasing range of academic disciplines in higher education, the literature on introductory programming is growing. Although there have been several reviews that focus on specific aspects of introductory programming, there has been no broad overview of the literature exploring recent trends across the breadth of introductory programming. This paper is the report of an ITiCSE working group that conducted a systematic review in order to gain an overview of the introductory programming literature. Partitioning the literature into papers addressing the student, teaching, the curriculum, and assessment, we explore trends, highlight advances in knowledge over the past 15 years, and indicate possible directions for future research.

282 citations