Showing papers in "Scientific Programming in 2015"

PDF

Open Access

Journal Article•DOI•

Link prediction methods and their accuracy for different social networks and network metrics

[...]

Fei Gao¹, Katarzyna Musial¹, Colin Cooper¹, Sophia Tsoka¹•Institutions (1)

01 Jan 2015-Scientific Programming

TL;DR: Correlation analysis between network metrics and prediction accuracy of prediction methods may form the basis of a metalearning system where based on network characteristics it will be able to recommend the right prediction method for a given network.

...read moreread less

Abstract: Currently, we are experiencing a rapid growth of the number of social-based online systems. The availability of the vast amounts of data gathered in those systems brings new challenges that we face when trying to analyse it. One of the intensively researched topics is the prediction of social connections between users. Although a lot of effort has been made to develop new prediction approaches, the existing methods are not comprehensively analysed. In this paper we investigate the correlation between network metrics and accuracy of different prediction methods. We selected six time-stamped real-world social networks and ten most widely used link prediction methods. The results of the experiments show that the performance of some methods has a strong correlation with certain network metrics. We managed to distinguish "prediction friendly" networks, for which most of the prediction methods give good performance, as well as "prediction unfriendly" networks, for which most of the methods result in high prediction error. Correlation analysis between network metrics and prediction accuracy of prediction methods may form the basis of a metalearning system where based on network characteristics it will be able to recommend the right prediction method for a given network.

...read moreread less

92 citations

Journal Article•DOI•

Scheduling multilevel deadline-constrained scientific workflows on clouds based on cost optimization

[...]

Maciej Malawski¹, Kamil Figiela¹, Marian Bubak¹, Ewa Deelman², Jarek Nabrzyski³ - Show less +1 more•Institutions (3)

AGH University of Science and Technology¹, Information Sciences Institute², University of Notre Dame³

01 Jan 2015-Scientific Programming

TL;DR: This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace and indicates how this model can be used for scenarios that require resource planning for scientific workflow and their ensembles.

...read moreread less

Abstract: This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize the cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.

...read moreread less

60 citations

Journal Article•DOI•

Research of improved FP-Growth algorithm in association rules mining

[...]

Yi Zeng¹, Shiqun Yin¹, Jiangyue Liu¹, Miao Zhang¹•Institutions (1)

Southwest University¹

01 Jan 2015-Scientific Programming

TL;DR: Improved algorithms of FP-G growth algorithm--Painting-Growth algorithm and N (not) Painting-G Growth algorithm (removes the painting steps, and uses another way to achieve) are worked out.

...read moreread less

Abstract: Association rules mining is an important technology in data mining. FP-Growth (frequent-pattern growth) algorithm is a classical algorithm in association rules mining. But the FP-Growth algorithm in mining needs two times to scan database, which reduces the efficiency of algorithm. Through the study of association rules mining and FP-Growth algorithm, we worked out improved algorithms of FP-Growth algorithm--Painting-Growth algorithm and N (not) Painting-Growth algorithm (removes the painting steps, and uses another way to achieve). We compared two kinds of improved algorithms with FP-Growth algorithm. Experimental results show that Painting-Growth algorithm is more than 1050 and N Painting-Growth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FP-Growth algorithm.

...read moreread less

44 citations

Journal Article•DOI•

Towards Reproducibility in Scientific Workflows: An Infrastructure-Based Approach

[...]

Idafen Santana-Perez¹, María S. Pérez-Hernández•Institutions (1)

Technical University of Madrid¹

24 Feb 2015-Scientific Programming

TL;DR: A new approach is proposed that addresses the third cornerstone of experimental reproducibility: the equipment of a computational experiment, that is, the set of software and hardware components that are involved in the execution of a scientific workflow.

...read moreread less

Abstract: It is commonly agreed that in silico scientific experiments should be executable and repeatable processes. Most of the current approaches for computational experiment conservation and reproducibility have focused so far on two of the main components of the experiment, namely, data and method. In this paper, we propose a new approach that addresses the third cornerstone of experimental reproducibility: the equipment. This work focuses on the equipment of a computational experiment, that is, the set of software and hardware components that are involved in the execution of a scientific workflow. In order to demonstrate the feasibility of our proposal, we describe a use case scenario on the Text Analytics domain and the application of our approach to it. From the original workflow, we document its execution environment, by means of a set of semantic models and a catalogue of resources, and generate an equivalent infrastructure for reexecuting it.

...read moreread less

40 citations

Journal Article•DOI•

HPC programming on Intel many-integrated-core hardware with MAGMA port to Xeon Phi

[...]

Jack Dongarra¹, Mark Gates¹, Azzam Haidar¹, Yulu Jia¹, Khairul Kabir¹, Piotr Luszczek¹, Stanimire Tomov¹ - Show less +3 more•Institutions (1)

University of Tennessee¹

01 Jan 2015-Scientific Programming

TL;DR: The design and implementation of several fundamental dense linear algebra algorithms for multicore with Intel Xeon Phi coprocessors are presented and their methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.

...read moreread less

Abstract: This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors.The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer fromthe specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.

...read moreread less

38 citations

Journal Article•DOI•

Adaptation of MPDATA heterogeneous stencil computation to Intel Xeon Phi coprocessor

[...]

Lukasz Szustak¹, Krzysztof Rojek¹, Tomasz Olas¹, Lukasz Kuczynski¹, Kamil Halbiniak¹, Pawel Gepner² - Show less +2 more•Institutions (2)

Częstochowa University of Technology¹, Intel²

01 Jan 2015-Scientific Programming

TL;DR: This work outlines an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture, and proposes the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms.

...read moreread less

Abstract: The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.

...read moreread less

37 citations

Journal Article•DOI•

Feature reduction based on genetic algorithm and hybrid model for opinion mining

[...]

P. Kalaivani¹, K. L. Shunmuganathan²•Institutions (2)

St. Joseph's College of Engineering¹, RMK Engineering College²

01 Jan 2015-Scientific Programming

TL;DR: The proposed improved KNN algorithm is an optimized feature selection, genetic algorithm that incorporates the information gain for feature reduction and combined with bagging technique that improves the accuracy of sentiment classification.

...read moreread less

Abstract: With the rapid growth of websites and web form the number of product reviews is available on the sites An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining The effectiveness of single classifiers Naive Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews Classification algorithms are evaluated using McNemar's test to compare the level of significance of the classifiers

...read moreread less

30 citations

Journal Article•DOI•

OpenCL performance evaluation on modern multicore CPUs

[...]

Joo Hwan Lee¹, Nimit Nigania¹, Hyesoon Kim¹, Kaushik Patel¹, Hyojong Kim¹ - Show less +1 more•Institutions (1)

Georgia Institute of Technology¹

01 Jan 2015-Scientific Programming

TL;DR: This paper evaluates the performance of OpenCL programs on out-of-order multicore CPUs from the architectural perspective, comparing OpenCL to conventional parallel programming models for CPUs.

...read moreread less

Abstract: Utilizing heterogeneous platforms for computation has become a general trend, making the portability issue important. OpenCL (Open Computing Language) serves this purpose by enabling portable execution on heterogeneous architectures. However, unpredictable performance variation on different platforms has become a burden for programmers who write OpenCL applications. This is especially true for conventional multicore CPUs, since the performance of general OpenCL applications on CPUs lags behind the performance of their counterparts written in the conventional parallel programming model for CPUs. In this paper, we evaluate the performance of OpenCL applications on out-of-order multicore CPUs from the architectural perspective. We evaluate OpenCL applications on various aspects, including API overhead, scheduling overhead, instruction-level parallelism, address space, data location, data locality, and vectorization, comparing OpenCL to conventional parallel programming models for CPUs. Our evaluation indicates unique performance characteristics of OpenCL applications and also provides insight into the optimization metrics for better performance on CPUs.

...read moreread less

26 citations

Journal Article•DOI•

Locality-aware task scheduling and data distribution for OpenMP programs on NUMA systems and manycore processors

[...]

Ananya Muddukrishna¹, Peter A. Jonsson, Mats Brorsson¹•Institutions (1)

Royal Institute of Technology¹

01 Jan 2015-Scientific Programming

TL;DR: The technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times.

...read moreread less

Abstract: Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.

...read moreread less

25 citations

Journal Article•DOI•

Effective SIMD vectorization for intel Xeon Phi coprocessors

[...]

Xinmin Tian¹, Hideki Saito¹, Serguei V. Preis¹, Eric N. Garcia¹, Sergey S. Kozhukhov¹, Matt Masten¹, Aleksei G. Cherkasov¹, Nikolay Panchenko¹ - Show less +4 more•Institutions (1)

Intel¹

01 Jan 2015-Scientific Programming

TL;DR: Several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2Dvectorization implemented in the IntelC/C++ and Fortran production compilers for Intel Xeon Phi coprocessors are presented.

...read moreread less

Abstract: Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the IntelC/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.

...read moreread less

24 citations

Journal Article•DOI•

Extracting UML class diagrams from object-oriented fortran: ForUML

[...]

Aziz Nanthaamornphong¹, Jeffrey C. Carver², Karla Morris³, Salvatore Filippone⁴•Institutions (4)

Prince of Songkla University¹, University of Alabama², Sandia National Laboratories³, University of Rome Tor Vergata⁴

01 Jan 2015-Scientific Programming

TL;DR: A software tool to extract unified modeling language (UML) class diagrams from Fortran code facilitates the developers' ability to examine the entities and their relationships in the software system.

...read moreread less

Abstract: Many scientists who implement computational science and engineering software have adopted the object-oriented (OO) Fortran paradigm. One of the challenges faced by OO Fortran developers is the inability to obtain high level software design descriptions of existing applications. Knowledge of the overall software design is not only valuable in the absence of documentation, it can also serve to assist developers with accomplishing different tasks during the software development process, especially maintenance and refactoring. The software engineering community commonly uses reverse engineering techniques to deal with this challenge. A number of reverse engineering-based tools have been proposed, but few of them can be applied to OO Fortran applications. In this paper, we propose a software tool to extract unified modeling language (UML) class diagrams from Fortran code. The UML class diagram facilitates the developers' ability to examine the entities and their relationships in the software system. The extracted diagrams enhance software maintenance and evolution. The experiments carried out to evaluate the proposed tool show its accuracy and a few of the limitations.

...read moreread less

Journal Article•DOI•

A community-based approach for link prediction in signed social networks

[...]

Saeed Reza Shahriary¹, Mohsen Shahriari², Rafidah Md Noor¹•Institutions (2)

Information Technology University¹, RWTH Aachen University²

01 Jan 2015-Scientific Programming

TL;DR: Community detection which is another active area of research in social networks is investigated in this paper and it is indicated that, in some cases, these ranking algorithms outperform previous works because their prediction accuracies are better.

...read moreread less

Abstract: In signed social networks, relationships among nodes are of the types positive (friendship) and negative (hostility). One absorbing issue in signed social networks is predicting sign of edges among people who are members of these networks. Other than edge sign prediction, one can define importance of people or nodes in networks via ranking algorithms. There exist few ranking algorithms for signed graphs; also few studies have shown role of ranking in link prediction problem. Hence, we were motivated to investigate ranking algorithms availed for signed graphs and their effect on sign prediction problem. This paper makes the contribution of using community detection approach for ranking algorithms in signed graphs. Therefore, community detection which is another active area of research in social networks is also investigated in this paper. Community detection algorithms try to find groups of nodes in which they share common properties like similarity. We were able to devise three community-based ranking algorithms which are suitable for signed graphs, and also we evaluated these ranking algorithms via sign prediction problem. These ranking algorithms were tested on three large-scale datasets: Epinions, Slashdot, and Wikipedia. We indicated that, in some cases, these ranking algorithms outperform previous works because their prediction accuracies are better.

...read moreread less

Journal Article•DOI•

Quasi-Optimal elimination trees for 2D grids with singularities

[...]

Anna Paszyńska¹, Maciej Paszyński², Konrad Jopek², Maciej Woźniak², Damian Goik², Piotr Gurgul², Hassan AbouEisha³, Mikhail Moshkov³, Victor M. Calo³, Andrew Lenharth⁴, Donald Nguyen⁴, Keshav Pingali⁴ - Show less +8 more•Institutions (4)

Jagiellonian University¹, AGH University of Science and Technology², King Abdullah University of Science and Technology³, University of Texas at Austin⁴

01 Jan 2015-Scientific Programming

TL;DR: This work proposes a heuristic construction of the elimination trees that has cost O(Ne log(Ne), where Ne is the number of elements in the mesh, and shows that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in the authors' numerical experiments.

...read moreread less

Abstract: We construct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O(Ne log(Ne)), where Ne is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.

...read moreread less

Journal Article•DOI•

Parallel framework for dimensionality reduction of large-scale datasets

[...]

Sai Kiranmayee Samudrala¹, Jaroslaw Zola², Srinivas Aluru¹, Baskar Ganapathysubramanian³•Institutions (3)

Georgia Institute of Technology¹, University at Buffalo², Iowa State University³

01 Jan 2015-Scientific Programming

TL;DR: This paper identifies key components underlying the spectral dimensionality reduction techniques, and proposes their efficient parallel implementation, and shows that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods.

...read moreread less

Abstract: Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

...read moreread less

Journal Article•DOI•

Fast parallel all-subgraph enumeration using multicore machines

[...]

Saeed Shahrivari¹, Saeed Jalili¹•Institutions (1)

Tarbiat Modares University¹

01 Jan 2015-Scientific Programming

TL;DR: Compared to the available solutions, Subenum can enumerate subgraphs several orders of magnitude faster and the experimental results show that the performance of Subenum scales almost linearly by using additional processor cores.

...read moreread less

Abstract: Enumerating all subgraphs of an input graph is an important task for analyzing complex networks. Valuable information can be extracted about the characteristics of the input graph using all-subgraph enumeration. Not withstanding, the number of subgraphs grows exponentially with growth of the input graph or by increasing the size of the subgraphs to be enumerated. Hence, all-subgraph enumeration is very time consuming when the size of the subgraphs or the input graph is big. We propose a parallel solution named Subenum which in contrast to available solutions can perform much faster. Subenum enumerates subgraphs using edges instead of vertices, and this approach leads to a parallel and load-balanced enumeration algorithm that can have efficient execution on current multicore and multiprocessor machines. Also, Subenum uses a fast heuristic which can effectively accelerate nonisomorphism subgraph enumeration. Subenum can efficiently use external memory, and unlike other subgraph enumeration methods, it is not associated with the main memory limits of the used machine. Hence, Subenum can handle large input graphs and subgraph sizes that other solutions cannot handle. Several experiments are done using real-world input graphs. Compared to the available solutions, Subenum can enumerate subgraphs several orders of magnitude faster and the experimental results show that the performance of Subenum scales almost linearly by using additional processor cores.

...read moreread less

Journal Article•DOI•

Multi-GPU support on single node using directive-based programming model

[...]

Rengan Xu¹, Xiaonan Tian¹, Sunita Chandrasekaran¹, Barbara Chapman¹•Institutions (1)

University of Houston¹

01 Jan 2015-Scientific Programming

TL;DR: This work critically analyzes the applicability of the hybrid model approach and evaluates the proposed strategy using several case studies and demonstrating their effectiveness.

...read moreread less

Abstract: Existing studies show that using single GPU can lead to obtaining significant performance gains. We should be able to achieve further performance speedup if we use more than one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and are often considered as a leading candidate for porting complex scientific applications. Unfortunately programming heterogeneous systems requires more effort than what is required for traditional multicore systems. Directive-based programming approaches are being widely adopted since they make it easy to use/port/maintain application code. OpenMP and OpenACC are two popular models used to port applications to accelerators. However, neither of the models provides support for multiple GPUs. A plausible solution is to use combination of OpenMP and OpenACC that forms a hybrid model; however, building this model has its own limitations due to lack of necessary compilers' support. Moreover, the model also lacks support for direct device-to-device communication. To overcome these limitations, an alternate strategy is to extend OpenACC by proposing and developing extensions that follow a task-based implementation for supporting multiple GPUs. We critically analyze the applicability of the hybrid model approach and evaluate the proposed strategy using several case studies and demonstrate their effectiveness.

...read moreread less

Journal Article•DOI•

An evaluating method with combined assigning-weight based on maximizing variance

[...]

Liu Hong-jiu¹, Hu Yanrong¹•Institutions (1)

Changshu Institute of Technology¹

01 Jan 2015-Scientific Programming

TL;DR: Empirical study shows that the new method can lead to more reasonable weighting results and decision, and may integrate the merits of both subjective and objective weighting methods.

...read moreread less

Abstract: This paper proposes a combined assigning-weight approach to determine attribute weights in the multiattribute decision problems. The approach combines subjective weights and objective weights of attributes based on maximizing variance. Objective weights are determined by rough set method and subjective weights by Analytic Hierarchy Process. This new combination method may integrate the merits of both subjective and objective weighting methods. Empirical study shows that the new method can lead to more reasonable weighting results and decision.

...read moreread less

Journal Article•DOI•

High-Performance design patterns for modern fortran

[...]

Magne Haveraaen¹, Karla Morris², Damian Rouson³, Hari Radhakrishnan, Clayton Carson³ - Show less +1 more•Institutions (3)

University of Bergen¹, Sandia National Laboratories², Stanford University³

01 Jan 2015-Scientific Programming

TL;DR: Ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain are presented and some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data are implemented.

...read moreread less

Abstract: This paper presents ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain. We also show how Fortran, over the last few decades, has changed to become a language well-suited for state-of-the-art software development. Fortran's new coarray distributed data structure, the language's class mechanism, and its side-effect-free, pure procedure capability provide the scaffolding on which we implement HPC software. These features empower compilers to organize parallel computations with efficient communication. We present some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data. We implemented these patterns using coarrays and the message passing interface (MPI). We compared the codes' complexity and performance. The MPI code is much more complex and depends on external libraries. The MPI code on Cray hardware using the Cray compiler is 1.5-2 times faster than the coarray code on the same hardware. The Intel compiler implements coarrays atop Intel's MPI library with the result apparently being 2-2.5 times slower than manually coded MPI despite exhibiting nearly linear scaling efficiency. As compilers mature and further improvements to coarrays comes in Fortran 2015, we expect this performance gap to narrow.

...read moreread less

Journal Article•DOI•

An efficient algorithm for on-the-fly data race detection using an epoch-based technique

[...]

Ok-Kyoon Ha¹, Yong-Kee Jun¹•Institutions (1)

Gyeongsang National University¹

01 Jan 2015-Scientific Programming

TL;DR: An efficient algorithm, called iFT, that uses only the epochs of the access histories to detect data races during the execution of multithreaded programs, and reduces the average runtime and memory overhead to 84% and 37%, respectively, of those of FastTrack.

...read moreread less

Abstract: Data races represent the most notorious class of concurrency bugs in multithreaded programs. To detect data races precisely and efficiently during the execution of multithreaded programs, the epoch-based FastTrack technique has been employed. However, FastTrack has time and space complexities that depend on the maximum parallelism of the program to partially maintain expensive data structures, such as vector clocks. This paper presents an efficient algorithm, called iFT, that uses only the epochs of the access histories. Unlike FastTrack, our algorithm requires O(1) operations to maintain an access history and locate data races, without any switching between epochs and vector clocks. We implement this algorithm on top of the Pin binary instrumentation framework and compare it with other on-the-fly detection algorithms, including FastTrack, which uses a state-of-the-art happens-before analysis algorithm. Empirical results using the PARSEC benchmark show that iFT reduces the average runtime and memory overhead to 84% and 37%, respectively, of those of FastTrack.

...read moreread less

Journal Article•DOI•

Automated design space exploration with Aspen

[...]

Kyle Spafford¹, Jeffrey S. Vetter¹•Institutions (1)

Oak Ridge National Laboratory¹

01 Jan 2015-Scientific Programming

TL;DR: The Aspen (Abstract Scalable Performance Engineering Notation) language is extended with three new language constructs: user-defined resources, parameter ranges, and a collection of costs in the abstract machine model to enable automated design space exploration via a nonlinear optimization solver.

...read moreread less

Abstract: Architects and applications scientists often use performance models to explore a multidimensional design space of architectural characteristics, algorithm designs, and application parameters. With traditional performance modeling tools, these explorations forced users to first develop a performance model and then repeatedly evaluate and analyze the model manually. These manual investigations proved laborious and error prone. More importantly, the complexity of this traditional process often forced users to simplify their investigations. To address this challenge of design space exploration, we extend our Aspen (Abstract Scalable Performance Engineering Notation) language with three new language constructs: user-defined resources, parameter ranges, and a collection of costs in the abstract machine model. Then, we use these constructs to enable automated design space exploration via a nonlinear optimization solver. We show how four interesting classes of design space exploration scenarios can be derived from Aspen models and formulated as pure nonlinear programs. The analysis tools are demonstrated using examples based on Aspenmodels for a three-dimensional Fast Fourier Transform, the CoMD molecular dynamics proxy application, and the DARPA Streaming Sensor Challenge Problem. Our results show that this approach can compose and solve arbitrary performance modeling questions quickly and rigorously when compared to the traditional manual approach.

...read moreread less

Journal Article•DOI•

Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC, and GPU using OpenMP, OpenMP+LEO, and OpenACC directives

[...]

Carlos Couder-Castañeda¹, Hector Barrios-Piña², Isidoro Gitler¹, M. Arroyo¹•Institutions (2)

Instituto Politécnico Nacional¹, Monterrey Institute of Technology and Higher Education²

01 Jan 2015-Scientific Programming

TL;DR: A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code.

...read moreread less

Abstract: A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. This research follows well-proven strategies in order to obtain the best performance in both OpenMP and OpenACC. OpenMP has become the programming standard for scientific multicore software and OpenACC is one true alternative for graphics accelerators without the need of programming low level kernels. The strategies using OpenMP are oriented towards reducing the creation of parallel regions, tasks creation to handle boundary conditions, and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-core Xeon CPU, Xeon Phi 5110p, and Tesla C2070, obtaining the best performance from the latter. The Tesla C2070 presented an acceleration factor of 9.86X, 1.6X, and 4.5X compared against the serial version on CPU, 12-core Xeon CPU, and Xeon Phi, respectively.

...read moreread less

Journal Article•DOI•

Parallelizing SLPA for scalable overlapping community detection

[...]

Konstantin Kuzmin¹, Mingming Chen¹, Boleslaw K. Szymanski²•Institutions (2)

Rensselaer Polytechnic Institute¹, Wrocław University of Technology²

01 Jan 2015-Scientific Programming

TL;DR: This paper uses the Speaker-Listener Label Propagation Algorithm (SLPA) as the basis for their parallel overlapping community detection implementation and shows that it yields a significant performance gain over sequential execution while preserving the high quality of community detection.

...read moreread less

Abstract: Communities in networks are groups of nodes whose connections to the nodes in a community are stronger than with the nodes in the rest of the network. Quite often nodes participate in multiple communities; that is, communities can overlap. In this paper, we first analyze what other researchers have done to utilize high performance computing to perform efficient community detection in social, biological, and other networks. We note that detection of overlapping communities is more computationally intensive than disjoint community detection, and the former presents new challenges that algorithm designers have to face. Moreover, the efficiency of many existing algorithms grows superlinearly with the network size making them unsuitable to process large datasets. We use the Speaker-Listener Label Propagation Algorithm (SLPA) as the basis for our parallel overlapping community detection implementation. SLPA provides near linear time overlapping community detection and is well suited for parallelization. We explore the benefits of a multithreaded programming paradigm and show that it yields a significant performance gain over sequential execution while preserving the high quality of community detection. The algorithm was tested on four real-world datasets with up to 5.5 million nodes and 170 million edges. In order to assess the quality of community detection, at least 4 different metrics were used for each of the datasets.

...read moreread less

Journal Article•DOI•

From the Information Society to the Knowledge Society

[...]

K. E. Bagirova

01 Jan 2015-Scientific Programming

Journal Article•DOI•

Skillrank: towards a hybrid method to assess quality and confidence of professional skills in social networks

[...]

Jose María Alvarez-Rodríguez¹, Ricardo Colomo-Palacios², Vladimir Stantchev³•Institutions (3)

Charles III University of Madrid¹, Østfold University College², SRH University Berlin³

01 Jan 2015-Scientific Programming

TL;DR: A hybrid technique to measure the expertise of users by analyzing their profiles and activities in social networks, which takes advantage of existing data and information available on the web to perform both a ranked list of experts in a field and a confidence value for every professional skill.

...read moreread less

Abstract: The present paper introduces a hybrid technique to measure the expertise of users by analyzing their profiles and activities in social networks. Currently, both job seekers and talent hunters are looking for new and innovative techniques to filter jobs and candidates where candidates are trying to improve and make their profiles more attractive. In this sense, the Skill rank approach is based on the conjunction of existing and well-known information and expertise retrieval techniques that perfectly fit the existing web and social media environment to deliver an intelligent component to integrate the user context in the analysis of skills confidence. A major outcome of this approach is that it actually takes advantage of existing data and information available on the web to perform both a ranked list of experts in a field and a confidence value for every professional skill. Thus, expertise and experts can be detected, verified, and ranked using a suited trust metric. An experiment to validate the Skillrank technique based on precision and recall metrics is also presented using two different datasets: (1) ad hoc created using real data from a professional social network and (2) real data extracted from the LinkedIn API.

...read moreread less

Journal Article•DOI•

Prefiltering strategy to improve performance of semantic Web service discovery

[...]

Samira Ghayekhloo¹, Zeki Bayram¹•Institutions (1)

Eastern Mediterranean University¹

01 Jan 2015-Scientific Programming

TL;DR: A new logical discovery framework based on semantic description of the capability of Web services and user goals using F-logic is presented, which tackles the scalability problem and improves discovery performance by adding two prefiltering stages to the discovery engine.

...read moreread less

Abstract: Discovery of semantic Web services is a heavyweight task when the number of Web services or the complexity of ontologies increases. In this paper, we present a new logical discovery framework based on semantic description of the capability of Web services and user goals using F-logic. Our framework tackles the scalability problem and improves discovery performance by adding two prefiltering stages to the discovery engine. The first stage is based on ontology comparison of user request and Web service categories. In the second stage, yet more Web services are eliminated based upon a decomposition and analysis of concept and instance attributes used in Web service capabilities and the requested capabilities of the client, resulting in a much smaller pool of Web services that need to be matched against the client request. Our prefiltering approach is evaluated using a new Web service repository, called WSMO-FL test collection.The recall rate of the filtering process is 100% by design, since no relevant Web services are ever eliminated by the two prefiltering stages, and experimental results show that the precision rate is more than 53%.

...read moreread less

Journal Article•DOI•

Performance evaluation of multithreaded Geant4 simulations using an Intel Xeon Phi cluster

[...]

Pierre Schweitzer¹, Sébastien Cipière², A. Dufaure², H. Payno², Yann Perrot², D. R. C. Hill¹, Lydia Maigne² - Show less +3 more•Institutions (2)

Centre national de la recherche scientifique¹, University of Auvergne²

01 Jan 2015-Scientific Programming

TL;DR: The objective of this study is to evaluate the performances of Intel Xeon Phi hardware accelerators for Geant4 simulations, especially for multithreaded applications, and presents the complete methodology to guide users for the compilation of their Geant 4 applications on Phi processors.

...read moreread less

Abstract: The objective of this study is to evaluate the performances of Intel Xeon Phi hardware accelerators for Geant4 simulations, especially for multithreaded applications. We present the complete methodology to guide users for the compilation of their Geant4 applications on Phi processors. Then, we propose series of benchmarks to compare the performance of Xeon CPUs and Phi processors for aGeant4 example dedicated to the simulation of electron dose point kernels, the TestEm12 example. First, we compare a distributed execution of a sequential version of the Geant4 example on both architectures before evaluating the multithreaded version of the Geant4 example. If Phi processors demonstrated their ability to accelerate computing time (till a factor 3.83) when distributing sequential Geant4 simulations, we do not reach the same level of speedup when considering the multithreaded version of the Geant4 example.

...read moreread less

Journal Article•DOI•

Relações de gênero e violência contra mulheres indígenas em Amambai – MS (2007-2013)

[...]

Ana Evanir Alves Viana¹, Tânia Regina Zimmermann¹•Institutions (1)

European Union of Medical Specialists¹

22 Jun 2015-Scientific Programming

TL;DR: In this paper, the authors conducted interviews with indigenous women in the municipality of Amambai, MS from 2007 to 2013 to understand some forms of violence perpetrated against women in indigenous villages.

...read moreread less

Abstract: This research aims to understand some forms of violence perpetrated against women in indigenous villages in the municipality of Amambai, MS from 2007 to 2013. Regarding the gender issue, indigenous women are among the most severely impacted by multiple forms of violence and remain in the perspective of victimization. Interviews were conducted with indigenous women victims of domestic violence as well as with people involved in these conflict situations (the captain, the manager of the Amabai’s public office for women and researchers of the subject). Thus, it becomes important academic research that highlights the silent these women live by and produces visibility to the forms of domestic violence perpetrated against these indigenous women, an academic research from the gender perspective that notices these indigenous women's own perspectives to resolve this violation.

...read moreread less

Journal Article•DOI•

Generating multibillion element unstructured meshes on distributed memory parallel machines

[...]

Seren Soner¹, Can Özturan¹•Institutions (1)

Boğaziçi University¹

01 Jan 2015-Scientific Programming

TL;DR: Test results obtained on an SGI Altix ICE X system with 8192 cores confirm that the approach does indeed enable the mesh generator to generate multibillion element meshes in a scalable way.

...read moreread less

Abstract: We present a parallel mesh generator called PMSH that is developed as a wrapper code around the open source sequential Netgen mesh generator. Parallelization of the mesh generator is carried out in five stages: (i) generation of a coarse volume mesh; (ii) partitioning of the coarse mesh; (iii) refinement of coarse surface mesh to produce fine surface submeshes; (iv) remeshing of each fine surface submesh to get a final fine mesh; (v) matching of partition boundary vertices followed by global vertex numbering. A new integer based barycentric coordinate method is developed for matching distributed partition boundary vertices. This method does not have precision related problems of floating point coordinate based vertex matching. Test results obtained on an SGI Altix ICE X system with 8192 cores confirm that our approach does indeed enable us to generate multibillion element meshes in a scalable way.

...read moreread less

Journal Article•DOI•

Empirical analysis of high efficient remote cloud data center backup using HBase and Cassandra

[...]

Bao Rong Chang¹, Hsiu-Fen Tsai, Chia-Yen Chen¹, Cin-Long Guo¹•Institutions (1)

National University of Kaohsiung¹

01 Jan 2015-Scientific Programming

TL;DR: This paper aims to realize high efficient remote cloud data center backup using HBase and Cassandra, and in order to verify the high efficiency backup they have applied, a cost-performance ratio has been evaluated for several benchmark databases and the proposed HBase approach outperforms the other databases.

...read moreread less

Abstract: HBase, a master-slave framework, and Cassandra, a peer-to-peer (P2P) framework, are the two most commonly used large-scale distributed NoSQL databases, especially applicable to the cloud computing with high flexibility and scalability and the ease of big data processing. Regarding storage structure, different structure adopts distinct backup strategy to reduce the risks of data loss. This paper aims to realize high efficient remote cloud data center backup using HBase and Cassandra, and in order to verify the high efficiency backup they have applied. Thrift Java for cloud data center to take a stress test by performing strictly data read/write and remote database backup in the large amounts of data. Finally, in terms of the effectiveness-cost evaluation to assess the remote datacenter backup, a cost-performance ratio has been evaluated for several benchmark databases and the proposed ones. As a result, the proposed HBase approach outperforms the other databases.

...read moreread less

Journal Article•DOI•

Implementation of secondary index on cloud computing NoSQL database in big data environment

[...]

Bao Rong Chang¹, Hsiu-Fen Tsai, Chia-Yen Chen¹, Chien-Feng Huang¹, Hung-Ta Hsu¹ - Show less +1 more•Institutions (1)

National University of Kaohsiung¹

01 Jan 2015-Scientific Programming

TL;DR: The proposed combination of HBase and Solr database is capable of performing an excellent query/response in a big data environment and outperforms the other databases and fulfills secondary index function with fast query in NoSQL database.

...read moreread less

Abstract: This paper introduces the combination of NoSQL database HBase and enterprise search platform Solr so as to tackle the problem of the secondary index function with fast query. In order to verify the effectiveness and efficiency of the proposed approach, the assessment using Cost-Performance ratio has been done for several competitive benchmark databases and the proposed one. As a result, our proposed approach outperforms the other databases and fulfills secondary index function with fast query in NoSQL database. Moreover, according to the cross-sectional analysis, the proposed combination of HBase and Solr database is capable of performing an excellent query/response in a big data environment.

...read moreread less