scispace - formally typeset
Search or ask a question

Showing papers presented at "Computational Science and Engineering in 2011"


Journal ArticleDOI
01 Mar 2011
TL;DR: Cython is a Python language extension that allows explicit type declarations and is compiled directly to C, addressing Python's large overhead for numerical loops and the difficulty of efficiently using existing C and Fortran code, which Cython can interact with natively.
Abstract: Cython is a Python language extension that allows explicit type declarations and is compiled directly to C. As such, it addresses Python's large overhead for numerical loops and the difficulty of efficiently using existing C and Fortran code, which Cython can interact with natively.

1,037 citations


Journal ArticleDOI
01 Mar 2011
TL;DR: Mayavi as mentioned in this paper is a general purpose, open source 3D scientific visualization package that is tightly integrated with the rich ecosystem of Python scientific packages, providing a continuum of tools for developing scientific applications, ranging from interactive and script-based data visualization in Python to full-blown custom end-user applications.
Abstract: Mayavi is a general purpose, open source 3D scientific visualization package that is tightly integrated with the rich ecosystem of Python scientific packages. Mayavi provides a continuum of tools for developing scientific applications, ranging from interactive and script-based data visualization in Python to full-blown custom end-user applications.

520 citations


Journal ArticleDOI
01 Jul 2011
TL;DR: Experimental results demonstrate that the proposed Galaxy-based search algorithm for the principal components analysis or GbSA-PCA is a promising tool for the PCA estimation.
Abstract: In this paper, the principal components analysis (PCA) is formulated as a continuous optimisation problem. Then, a novel metaheuristic inspired from nature is employed to explore the search space for the optimum solution to the PCA problem. The new metaheuristic is called 'galaxy-based search algorithm' or 'GbSA'. The GbSA imitates the spiral arm of spiral galaxies to search its surrounding. This spiral movement is enhanced by chaos to escape from local optimums. A local search algorithm is also utilised to adjust the solution obtained by the spiral movement of the GbSA. Experimental results demonstrate that the proposed GbSA for the PCA or GbSA-PCA is a promising tool for the PCA estimation.

243 citations


Journal ArticleDOI
01 Jul 2011
TL;DR: The NTSP (Numerical Tours of Signal Processing) collection of Matlab/Scilab experiments guides users through the emerging jungle of advanced signal-, image-, and mesh-processing algorithms.
Abstract: The NTSP (Numerical Tours of Signal Processing) collection of Matlab/Scilab experiments guides users through the emerging jungle of advanced signal-, image-, and mesh-processing algorithms.

92 citations


Proceedings ArticleDOI
24 Aug 2011
TL;DR: Experimental results showed that the proposed Video Steganography algorithm can hide a same-size video in the host video without obvious distortion in theHost video.
Abstract: This paper proposes a novel Video Steganography which can hide an uncompressed secret video stream in a host video stream with almost the same size. Each frame of the secret video will be Non-uniform rectangular partitioned and the partitioned codes obtained can be an encrypted version of the original frame. These codes will be hidden in the Least 4 Significant Bits of each frames of the host video. Experimental results showed that this algorithm can hide a same-size video in the host video without obvious distortion in the host video.

76 citations


Book ChapterDOI
23 Sep 2011
TL;DR: In this paper, the authors investigated the hybrid chaos synchronization of identical Liu systems, identical Lu systems, and non-identical Liu and Lu systems by active nonlinear control and derived sufficient conditions for hybrid synchronization of the three-dimensional chaotic systems.
Abstract: This paper investigates the hybrid chaos synchronization of identical Liu systems, identical Lu systems, and non-identical Liu and Lu systems by active nonlinear control. Liu system (Liu et al. 2004) and Lu system (Lu and Chen, 2002) are important models of three-dimensional chaotic systems. Hybrid synchronization of the three-dimensional chaotic systems considered in this paper are achieved through the synchronization of the first and last pairs of states and anti-synchronization of the middle pairs of the two systems. Sufficient conditions for hybrid synchronization of identical Liu, identical Lu, and non-identical Liu and Lu systems are derived using active nonlinear control and Lyapunov stability theory. Since the Lyapunov exponents are not needed for these calculations, the active nonlinear control is an effective and convenient method for the hybrid synchronization of the chaotic systems addressed in this paper. Numerical simulations are shown to illustrate the effectiveness of the proposed synchronization schemes.

64 citations


Journal ArticleDOI
01 May 2011
TL;DR: In this article, a feature selection algorithm based on a multiobjective genetic algorithm to analyze and discard irrelevant coefficients offers a solution that considerably reduces the number of coefficients, while also improving recognition rates.
Abstract: Although it shows enormous potential as a feature extractor, 2D principal component analysis produces numerous coefficients. Using a feature-selection algorithm based on a multiobjective genetic algorithm to analyze and discard irrelevant coefficients offers a solution that considerably reduces the number of coefficients, while also improving recognition rates.

61 citations


Book ChapterDOI
23 Sep 2011
TL;DR: Active nonlinear feedback control is the method used to achieve the synchronization of the identical and different hyperchaotic Pang and Wang systems addressed in this paper and the results are established using Lyapunov stability theory.
Abstract: This paper investigates the global chaos synchronization of identical hyperchaotic Pang systems (Pang and Liu, 2011) and synchronization of non-identical hyperchaotic Pang system and Wang system (Wang and Liu, 2006). Active nonlinear feedback control is the method used to achieve the synchronization of the identical and different hyperchaotic Pang and Wang systems addressed in this paper and our results are established using Lyapunov stability theory. Since the Lyapunov exponents are not required for these calculations, the active control method is effective and convenient to synchronize identical and different hyperchaotic Pang and Wang systems. Numerical simulations are given to illustrate the effectiveness of the proposed synchronization schemes for the global chaos synchronization of hyperchaotic systems addressed in this paper.

54 citations


Proceedings ArticleDOI
28 May 2011
TL;DR: A literature review of the existence and impact of agility in three large scientific software projects indicates that scientific software development projects adopting agile practices perceive their testing to be better than average.
Abstract: The nature of scientific research and the development of scientific software have similarities with processes that follow the agile manifesto: responsiveness to change and collaboration are of the utmost importance. But how well do current scientific software development processes match the practices found in agile development methods, and what are the effects of using agile practices in such processes? In order to investigate this, we conduct a literature review, focusing on evaluating the agility present in a selection of scientific software projects. Both projects with intentionally agile practices and projects with a certain degree of agile elements are taken into consideration. In the agility assessment, we define and utilize an agile mapping chart. The elements of the mapping chart are based on Scrum and XP, thus covering two of the most prominent agile reference models. We compared the findings of the literature review to results of a previously conducted survey. The comparison indicates that scientific software development projects adopting agile practices perceive their testing to be better than average. No difference to average projects was perceived regarding requirements-related activities. Future work includes an in-depth case study to further investigate the existence and impact of agility in three large scientific software projects, ultimately aiming at a better understanding of the particularities involved in developing scientific software.

46 citations


Proceedings ArticleDOI
24 Aug 2011
TL;DR: The power domination number of a graph is the minimum cardinality of a power dominating set of the graph for hypercubes Qn with n = 2 k , where k is any positive integer.
Abstract: The performance of electrical networks is monitored by expensive Phasor Measurement Units (PMUs). It is economically beneficial to determine the optimal placement and the minimum number of PMUs required to effectively monitor an entire network. This problem has a graph theory model involving power dominating sets in a graph. A set S of vertices in a graph is called a power dominating set if every vertex and every edge in the graph is “observed” by S according to a set of observation rules. The power domination number of a graph is the minimum cardinality of a power dominating set of the graph. In this paper, the power domination number is determined for hypercubes Qn with n = 2 k , where k is any positive integer.

33 citations


Journal ArticleDOI
01 May 2011
TL;DR: The paper discusses Org-mode, a simple, plain-text markup language for hierarchical documents that allows the intermingling of data, code, and prose.
Abstract: The paper discusses Org-mode. It is a simple, plain-text markup language for hierarchical documents that allows the intermingling of data, code, and prose.

Journal ArticleDOI
24 Aug 2011
TL;DR: In this paper, a performance modeling framework based on memory bandwidth contention time and a parameterized communication model is presented to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore clusters: IBM POWER4, POWER5+ and Blue Gene/P.
Abstract: In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore clusters: IBM POWER4, POWER5+ and Blue Gene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore clusters because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyro kinetic Toroidal Code in magnetic fusion to validate our performance model of the hybrid application on these multicore clusters. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore clusters.

Book ChapterDOI
01 Jan 2011
TL;DR: The rationale for the RBEM is presented together with numerical results showing exponential convergence for the simulation of a metallic pipe with both ends open and a linear combination of the corresponding precomputed solutions on each subdomain.
Abstract: We present a reduced basis element method (RBEM) for the time-harmonic Maxwell’s equation. The RBEM is a Reduced Basis Method (RBM) with parameters describing the geometry of the computational domain, coupled with a domain decomposition method. The basic idea is the following. First, we decompose the computational domain into a series of subdomains, each of which is deformed from some reference domain. Then, we associate with each reference domain precomputed solutions to the same governing partial differential equation, but with different choices of deformations. Finally, one seeks the approximation on a new domain as a linear combination of the corresponding precomputed solutions on each subdomain. Unlike the work on RBEM for thermal fin and fluid flow problems, we do not need a mortar type method to “glue” the various local functions. This “gluing” is done “automatically” thanks to the use of a discontinuous Galerkin method. We present the rationale for the method together with numerical results showing exponential convergence for the simulation of a metallic pipe with both ends open.

Journal ArticleDOI
01 Jul 2011
TL;DR: The random selection process in the classical harmony search method is replaced by wavelet theory based mutation process to improve the performance of the algorithm and reveal the robustness and ability of the proposed methodology over other existing techniques.
Abstract: This paper presents a new evolutionary optimisation algorithm to solve economic load dispatch (ELD) problem with operational constraints using the improved harmony search algorithm. Harmony search algorithm is a recently developed derivative-free, meta-heuristic optimisation algorithm, which draws inspiration from the musical process of searching for a perfect state of harmony. In this paper we replace the random selection process in the classical harmony search method by wavelet theory based mutation process to improve the performance of the algorithm. The proposed methodology easily takes care of solving non-convex ELD problems along with different constraints like power balance, ramp rate limits of the generators and prohibited operating zones. Simulations were performed over various standard test systems with different number of generating units and a comparative study is carried out with other existing relevant approaches. The results obtained reveal the robustness and ability of the proposed methodology over other existing techniques.

Journal ArticleDOI
01 Sep 2011
TL;DR: At two recent workshops, participants discussed the juxtaposition of software engineering with the development of scientific computational software.
Abstract: At two recent workshops, participants discussed the juxtaposition of software engineering with the development of scientific computational software.

Proceedings ArticleDOI
Yongming Qin1, Qiuling Tang1, Ye Liang1, Xiuyu Yue1, Xian Li1 
24 Aug 2011
TL;DR: The result shows that EE-LEACH-MIMO scheme can well balance the network load, save energy and prolong the network lifetime.
Abstract: Energy efficiency is the most important design goal for wireless sensor networks (WSNs). In this paper, we propose an energy-efficient cooperative MIMO scheme, which combines energy-efficient LEACH (EE-LEACH) protocol and cooperative MIMO. We name it as EE-LEACH-MIMO scheme. EE-LEACH is an improved LEACH algorithm, in which the network is partitioned to sectors with equal angle for avoiding the distribution non-uniformity of cluster heads. In EE-LEACH-MIMO scheme, the location and the residual energy of each node are considered when the cluster heads for clustering and cooperative nodes for MIMO system are chosen. For comparisons, LEACH, EE-LEACH, the simple cooperative scheme with LEACH and MIMO (LEACH-MIMO), and EE-LEACH-MIMO scheme are simulated. The result shows that EE-LEACH-MIMO scheme can well balance the network load, save energy and prolong the network lifetime.

Book ChapterDOI
01 Jan 2011
TL;DR: In this paper, the mathematical formulation for the hydroelastic analysis of very large floating structures (VLFS) is presented, and mitigation methods in reducing the structural response are discussed using some example problems.
Abstract: Pontoon-type very large floating structures (VLFS) are giant plates resting on the sea surface. As these structures have a large surface area and a relatively small depth, they behave elastically under wave action. This type of fluid-structure interaction has being termed hydroelasticity. Hydroelastic analysis is thus necessary to be carried out for VLFS designs in order to assess the dynamic motion and stresses due to wave action. This paper presents the mathematical formulation for the hydroelastic analysis of VLFS. Hydroelastic responses and mitigation methods in reducing the structural response are discussed using some example problems.

Proceedings ArticleDOI
24 Aug 2011
TL;DR: The results show that assuming three times slowdown of the statistical multiplexing layer, for an application using 1024 processors with 35\% checkpoint overhead, the two-tier framework will produce sustained time and energy savings for MTBF less than 6 hours.
Abstract: General purpose GPU (GPGPU) computing has produced the fastest running supercomputers in the world. For continued sustainable progress, GPU computing at scale also need to address two open issues: a) how increase applications mean time between failures (MTBF) as we increase supercomputer's component counts, and b) how to minimize unnecessary energy consumption. Since energy consumption is defined by the number of components used, we consider a sustainable high performance computing (HPC) application can allow better performance and reliability at the same time when adding computing or communication components. This paper reports a two-tier semantic statistical multiplexing framework for sustainable HPC at scale. The idea is to leverage the powers of statistic multiplexing to tame the nagging HPC scalability challenges. We include the theoretical model, sustainability analysis and computational experiments with automatic system level multiple CPU/GPU failure containment. Our results show that assuming three times slowdown of the statistical multiplexing layer, for an application using 1024 processors with 35\% checkpoint overhead, the two-tier framework will produce sustained time and energy savings for MTBF less than 6 hours. With 5% checkpoint overhead, 1.5 hour MTBF would be the break even point. These results suggest the practical feasibility for the proposed two-tier framework.

Book ChapterDOI
23 Sep 2011
TL;DR: A Friend RS for WBSNs is presented and a real-valued Genetic algorithm is used to learn user preferences based on comparison of individual features to increase recommendation effectiveness and alleviate the sparsity problem of collaborative filtering.
Abstract: Web-based social networks (WBSNs) are a promising new paradigm for large scale distributed data management and collective intelligences. But the exponential growth of social networks poses a new challenge and presents opportunities for recommender systems, such as complicated nature of human to human interaction which comes into play while recommending people. Web based recommender systems (RSs) are the most notable application of the web personalization to deal with problem of information overload. In this paper, we present a Friend RS for WBSNs. Our contribution is three fold. First, we have identified appropriate attributes in a user profile and suggest suitable similarity computation formulae. Second, a real-valued Genetic algorithm is used to learn user preferences based on comparison of individual features to increase recommendation effectiveness. Finally, inorder to alleviate the sparsity problem of collaborative filtering, we have employed trust propagation techniques. Experimental results clearly demonstrate the effectiveness of our proposed schemes.

Proceedings ArticleDOI
24 Aug 2011
TL;DR: This work discusses GAs from an architectural perspective, offering a general analysis of GAs on multi-core CPUs and on GPUs, with solution quality considered, and describes widely-used parallel GA schemes based on Master-Slave, Island and Cellular models.
Abstract: A Genetic Algorithm (GA) is a heuristic to find exact or approximate solutions to optimization and search problems within an acceptable time. We discuss GAs from an architectural perspective, offering a general analysis of GAs on multi-core CPUs and on GPUs, with solution quality considered. We describe widely-used parallel GA schemes based on Master-Slave, Island and Cellular models. Then, based on the multi-core and many-core architectures, especially the thread organization, memory hierarchy, and core utilization, we analyze the execution speed and solution quality of different GA schemes theoretically. Finally, we can point to the best approach to use on multi-core and many-core systems to execute GAs, so that we can obtain the highest quality solution at a cost of the shortest execution time. Furthermore, there are three extra contributions. Firstly, during our analysis and evaluation, we not only focus on the execution speed of different schemes, but also take the solution quality into account, so that our findings will be more useful in practice. Secondly, during our optimization of an Island scheme on GPUs, we find that the GPU architecture actually alters the scheme, making it become the Cellular scheme, which leads to big changes in solution quality and optimization results. Finally, we calculate the GPU speedup based on a comparison between the best scheme on a GPU and the best one on a CPU, rather than between an optimized one on the GPU and the worst one on a CPU, so that the speedup we calculate is more reasonable and a better guide to practical decisions. Keywords-Genetic Algorithm; multi-core; GPU; accuracy; architecture; speedup;

Book ChapterDOI
23 Sep 2011
TL;DR: Scaled Conjugate Gradient Algorithm, a second order training algorithm for training of neural network provides faster training with excellent test efficiency and a lexicon matching algorithm solves the minor misclassification problems.
Abstract: Handwritten text and character recognition is a challenging task compared to recognition of handwritten numeral and computer printed text due to its large variety in nature. Neural Network based approach provides most reliable performance in handwritten character and text recognition but recognition performance depends upon some important factors like no of training samples, reliable features and no of features per character, training time, variety of handwriting etc. Important features from different types of handwriting are collected and are fed to the neural network for training. More no of features increases testing efficiency but it take longer time to converge the error curve. To reduce this training time effectively proper algorithm should be chosen so that the system provides best train and test efficiency in least possible time that is to provide the system fastest intelligence. In this paper we have used Scaled Conjugate Gradient Algorithm, a second order training algorithm for training of neural network. It provides faster training with excellent test efficiency. A scanned handwritten text is taken as input and character level segmentation is done. Some important and reliable features from each character are extracted and used as input to a neural network for training. When the error level reaches into a satisfactory level (10− 12) weights are accepted for testing a test script. Finally a lexicon matching algorithm solves the minor misclassification problems.

Proceedings ArticleDOI
24 Aug 2011
TL;DR: This paper proposes a new approach to parallelize AES-CTR algorithm by extending the size of the block which is encrypted at one time across the unit block boundaries, which leads to significant performance improvements using a general-purpose multi-core processor and a Graphic Processing Unit (GPU) which become popular these days.
Abstract: Data encryption and decryption are common operations in a network based application programs with security. In order to keep pace with the input data rate in such applications, real-time processing of data encryption/decryption is essential. For example, in an environment where a multimedia data is streamed, high speed data encryption/decryption is crucial. In this paper, we propose a new approach to parallelize AES-CTR algorithm by extending the size of the block which is encrypted at one time across the unit block boundaries. The proposed approach leads to significant performance improvements using a general-purpose multi-core processor and a Graphic Processing Unit (GPU) which become popular these days. In particular, the performance improvement on GPU is dramatic, close to 9-times faster compared with the original coarse-grain parallelization approach, mainly thanks to the "multi-core" nature of the GPU architecture.

Book ChapterDOI
23 Sep 2011
TL;DR: The selecting of testing technique is formulated as a multi criteria decision making problem and an efficient solution is proposed.
Abstract: The appropriate usage of efficient testing technique at any stage of the Software Development Life Cycle (SDLC) is still in infant stage. There are a number of testing techniques at various phases of testing. Selection of the right testing technique at any stage is one of the critical problems. The selection method should not only take the subjective knowledge but also incorporate objective knowledge for the efficient selection of testing technique as per the requirements. Selection of testing technique at every stage of SDLC depends on many factors such as resources, schedule, cost of the project, etc. Thus we formulate the selecting of testing technique as a multi criteria decision making problem and propose an efficient solution.

Proceedings ArticleDOI
24 Aug 2011
TL;DR: This paper proposes a novel proactive spectrum handoff approach based on time estimation (TPSH) to reduce the communication disruptions to primary users and increase the channel utilization efficiency.
Abstract: Cognitive Radio technology improves spectrum utilization by enabling secondary users to access primary user's unutilized spectrum in an opportunistic manner. However, it causes disruptions to both primary and secondary communication and leads to high switching overhead. In this paper, we propose a novel proactive spectrum handoff approach based on time estimation(TPSH) to reduce the communication disruptions to primary users and increase the channel utilization efficiency. Secondary users utilize past channel histories to maintain an estimation vector of the channel remaining idle period and make predictions on future spectrum availability, and then schedule the channel usage in advance. We propose a smart channel selection and switching algorithm to implement above approach. In addition, a threshold is introduced when handoff happens to maintain a tradeoff between the disruption effects on primary users and channel efficiency. Simulation results show that our approach can significantly reduce the communication disruption to primary user by up to 32%, and improve the overall channel efficiency by about 7%-18%.

Journal ArticleDOI
01 Jul 2011
TL;DR: A new study explains why the days of obtaining performance increases due to higher processor speed are mostly over, and where the authors go from here.
Abstract: A new study explains why the days of obtaining performance increases due to higher processor speed are mostly over, and where we go from here.

Journal ArticleDOI
01 Jul 2011
TL;DR: The E-Simulator project aims to reproduce a world-class shake-table test, while the Integrated Earthquake Simulation project seamlessly simulates three key earthquake processes to provide vital information about potential earthquake hazards and disasters.
Abstract: Two research projects are working toward petascale computation for earthquake engineering. The E-Simulator project aims to reproduce a world-class shake-table test, while the Integrated Earthquake Simulation (IES) project seamlessly simulates three key earthquake processes to provide vital information about potential earthquake hazards and disasters.

Book ChapterDOI
23 Sep 2011
TL;DR: From the comparison of the average efficiency of the wavelet families, statistical features extracted from Gabor wavlets provides better efficiency than the other two methods.
Abstract: In this paper, we present an evaluation of diagnosis of dementia using texture analysis of brain MRI with wavelets and further classification by BackPropagation Network. The tests were conducted on 3D brain MRI data extracted from OASIS database. The classification is based on the following steps: First, the region of interest is extracted from the MRI images by wavelets, Gray level occurance matrix (GLCM) and Haralick features. Gabor features were characterized by the distribution of histogram of wavelet coefficients. These features were segregated into three datasets. The first dataset containing the GLCM features, the second data set has the Haralick features and the third dataset has Gabor wavelet based Haralick features. Classification was done by backpropagation network based on 3 feature vectors. From the analysis it has been found that the average efficiency of Gabor combined with Haralick features is around 97% for all types of datasets, and the average efficiency value for GLCM is 86 % and that of Haralick features was 90%. From the comparison of the average efficiency of the wavelet families, statistical features extracted from Gabor wavlets provides better efficiency than the other two methods.

Proceedings ArticleDOI
24 Aug 2011
TL;DR: A novel modified kernel fuzzy cmeans (NMKFCM) algorithm based on conventional KFCM which incorporates the neighbor term into its objective function is proposed which has better performance in noisy images.
Abstract: Image segmentation plays an important role in imaging analysis. Based on the Mercer kernel, the fuzzy kernel c-means clustering algorithm (FKCM) is derived from the fuzzy c-means clustering algorithm (FCM).The FKCM algorithm that provides image clustering can improve accuracy significantly compared with classical fuzzy c-Means algorithms. In this paper, considering the advantages of KFCM, we propose a novel modified kernel fuzzy cmeans(NMKFCM) algorithm based on conventional KFCM which incorporates the neighbor term into its objective function. The results of experiments performed on synthetic and real medical images show that the new algorithm is effective and efficient, and has better performance in noisy images. Keywords-component; medical image segmentation; neighbor term; fuzzy clustering; FCM; KFCM; NMKFCM;.

Proceedings ArticleDOI
24 Aug 2011
TL;DR: The H-Star guarantees K-anonymity under the strict reciprocity condition and increases anonymization success rate by reducing computation overhead and shows the effectiveness of the algorithm in the field of spatial cloaking.
Abstract: The proliferations of position identifying devices become increasing privacy threat in location-based services (LBSs). It is very difficult to avoid the privacy threat of a user in processing his/her request because the user has to submit his/her exact location with a query to the LBS. To protect privacy in road networks, the existing method employs a XStar framework to hide the query issuer and provide protection from attack resilience. However, it incurs low anonymization success rate and computation cost is quite high. To address these issues, we propose Hilbert-order based star network expansion cloaking algorithm (H-Star). Our H-Star guarantees K-anonymity under the strict reciprocity condition and increases anonymization success rate by reducing computation overhead. Through comprehensive experimental evaluations, we show the effectiveness of our algorithm in the field of spatial cloaking.

Proceedings ArticleDOI
24 Aug 2011
TL;DR: This paper proposes a novel machine learning based approach where the features are extracted from packet payload instead of flow statistics, and shows that the approach can achieve high accuracy with low overhead.
Abstract: Due to the increasing unreliability of traditional port-based methods, Internet traffic classification has attracted a lot of research efforts in recent years. Quite a lot of previous papers have focused on using statistical characteristics as discriminators and applying machine learning techniques to classify the traffic flows. In this paper, we propose a novel machine learning based approach where the features are extracted from packet payload instead of flow statistics. Specifically, every flow is represented by a feature vector, in which each item indicates the occurrence of a particular token, i.e., a common substring, in the payload. We have applied various machine learning algorithms to evaluate the idea and used different feature selection schemes to identify the critical tokens. Experimental result based on a real-world traffic data set shows that the approach can achieve high accuracy with low overhead.