scispace - formally typeset
Search or ask a question

Showing papers in "Scientific Programming in 2015"


Journal ArticleDOI
TL;DR: Correlation analysis between network metrics and prediction accuracy of prediction methods may form the basis of a metalearning system where based on network characteristics it will be able to recommend the right prediction method for a given network.
Abstract: Currently, we are experiencing a rapid growth of the number of social-based online systems. The availability of the vast amounts of data gathered in those systems brings new challenges that we face when trying to analyse it. One of the intensively researched topics is the prediction of social connections between users. Although a lot of effort has been made to develop new prediction approaches, the existing methods are not comprehensively analysed. In this paper we investigate the correlation between network metrics and accuracy of different prediction methods. We selected six time-stamped real-world social networks and ten most widely used link prediction methods. The results of the experiments show that the performance of some methods has a strong correlation with certain network metrics. We managed to distinguish "prediction friendly" networks, for which most of the prediction methods give good performance, as well as "prediction unfriendly" networks, for which most of the methods result in high prediction error. Correlation analysis between network metrics and prediction accuracy of prediction methods may form the basis of a metalearning system where based on network characteristics it will be able to recommend the right prediction method for a given network.

92 citations


Journal ArticleDOI
TL;DR: This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace and indicates how this model can be used for scenarios that require resource planning for scientific workflow and their ensembles.
Abstract: This paper presents a cost optimization model for scheduling scientific workflows on IaaS clouds such as Amazon EC2 or RackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize the cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.

60 citations


Journal ArticleDOI
TL;DR: Improved algorithms of FP-G growth algorithm--Painting-Growth algorithm and N (not) Painting-G Growth algorithm (removes the painting steps, and uses another way to achieve) are worked out.
Abstract: Association rules mining is an important technology in data mining. FP-Growth (frequent-pattern growth) algorithm is a classical algorithm in association rules mining. But the FP-Growth algorithm in mining needs two times to scan database, which reduces the efficiency of algorithm. Through the study of association rules mining and FP-Growth algorithm, we worked out improved algorithms of FP-Growth algorithm--Painting-Growth algorithm and N (not) Painting-Growth algorithm (removes the painting steps, and uses another way to achieve). We compared two kinds of improved algorithms with FP-Growth algorithm. Experimental results show that Painting-Growth algorithm is more than 1050 and N Painting-Growth algorithm is less than 10000 in data volume; the performance of the two kinds of improved algorithms is better than that of FP-Growth algorithm.

44 citations


Journal ArticleDOI
TL;DR: A new approach is proposed that addresses the third cornerstone of experimental reproducibility: the equipment of a computational experiment, that is, the set of software and hardware components that are involved in the execution of a scientific workflow.
Abstract: It is commonly agreed that in silico scientific experiments should be executable and repeatable processes. Most of the current approaches for computational experiment conservation and reproducibility have focused so far on two of the main components of the experiment, namely, data and method. In this paper, we propose a new approach that addresses the third cornerstone of experimental reproducibility: the equipment. This work focuses on the equipment of a computational experiment, that is, the set of software and hardware components that are involved in the execution of a scientific workflow. In order to demonstrate the feasibility of our proposal, we describe a use case scenario on the Text Analytics domain and the application of our approach to it. From the original workflow, we document its execution environment, by means of a set of semantic models and a catalogue of resources, and generate an equivalent infrastructure for reexecuting it.

40 citations


Journal ArticleDOI
TL;DR: The design and implementation of several fundamental dense linear algebra algorithms for multicore with Intel Xeon Phi coprocessors are presented and their methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.
Abstract: This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors.The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer fromthe specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.

38 citations


Journal ArticleDOI
TL;DR: This work outlines an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture, and proposes the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms.
Abstract: The multidimensional positive definite advection transport algorithm (MPDATA) belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1)D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1)D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.

37 citations


Journal ArticleDOI
TL;DR: The proposed improved KNN algorithm is an optimized feature selection, genetic algorithm that incorporates the information gain for feature reduction and combined with bagging technique that improves the accuracy of sentiment classification.
Abstract: With the rapid growth of websites and web form the number of product reviews is available on the sites An opinion mining system is needed to help the people to evaluate emotions, opinions, attitude, and behavior of others, which is used to make decisions based on the user preference In this paper, we proposed an optimized feature reduction that incorporates an ensemble method of machine learning approaches that uses information gain and genetic algorithm as feature reduction techniques We conducted comparative study experiments on multidomain review dataset and movie review dataset in opinion mining The effectiveness of single classifiers Naive Bayes, logistic regression, support vector machine, and ensemble technique for opinion mining are compared on five datasets The proposed hybrid method is evaluated and experimental results using information gain and genetic algorithm with ensemble technique perform better in terms of various measures for multidomain review and movie reviews Classification algorithms are evaluated using McNemar's test to compare the level of significance of the classifiers

30 citations


Journal ArticleDOI
TL;DR: This paper evaluates the performance of OpenCL programs on out-of-order multicore CPUs from the architectural perspective, comparing OpenCL to conventional parallel programming models for CPUs.
Abstract: Utilizing heterogeneous platforms for computation has become a general trend, making the portability issue important. OpenCL (Open Computing Language) serves this purpose by enabling portable execution on heterogeneous architectures. However, unpredictable performance variation on different platforms has become a burden for programmers who write OpenCL applications. This is especially true for conventional multicore CPUs, since the performance of general OpenCL applications on CPUs lags behind the performance of their counterparts written in the conventional parallel programming model for CPUs. In this paper, we evaluate the performance of OpenCL applications on out-of-order multicore CPUs from the architectural perspective. We evaluate OpenCL applications on various aspects, including API overhead, scheduling overhead, instruction-level parallelism, address space, data location, data locality, and vectorization, comparing OpenCL to conventional parallel programming models for CPUs. Our evaluation indicates unique performance characteristics of OpenCL applications and also provides insight into the optimization metrics for better performance on CPUs.

26 citations


Journal ArticleDOI
TL;DR: The technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times.
Abstract: Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.

25 citations


Journal ArticleDOI
TL;DR: Several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2Dvectorization implemented in the IntelC/C++ and Fortran production compilers for Intel Xeon Phi coprocessors are presented.
Abstract: Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the IntelC/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.

24 citations


Journal ArticleDOI
TL;DR: A software tool to extract unified modeling language (UML) class diagrams from Fortran code facilitates the developers' ability to examine the entities and their relationships in the software system.
Abstract: Many scientists who implement computational science and engineering software have adopted the object-oriented (OO) Fortran paradigm. One of the challenges faced by OO Fortran developers is the inability to obtain high level software design descriptions of existing applications. Knowledge of the overall software design is not only valuable in the absence of documentation, it can also serve to assist developers with accomplishing different tasks during the software development process, especially maintenance and refactoring. The software engineering community commonly uses reverse engineering techniques to deal with this challenge. A number of reverse engineering-based tools have been proposed, but few of them can be applied to OO Fortran applications. In this paper, we propose a software tool to extract unified modeling language (UML) class diagrams from Fortran code. The UML class diagram facilitates the developers' ability to examine the entities and their relationships in the software system. The extracted diagrams enhance software maintenance and evolution. The experiments carried out to evaluate the proposed tool show its accuracy and a few of the limitations.

Journal ArticleDOI
TL;DR: Community detection which is another active area of research in social networks is investigated in this paper and it is indicated that, in some cases, these ranking algorithms outperform previous works because their prediction accuracies are better.
Abstract: In signed social networks, relationships among nodes are of the types positive (friendship) and negative (hostility). One absorbing issue in signed social networks is predicting sign of edges among people who are members of these networks. Other than edge sign prediction, one can define importance of people or nodes in networks via ranking algorithms. There exist few ranking algorithms for signed graphs; also few studies have shown role of ranking in link prediction problem. Hence, we were motivated to investigate ranking algorithms availed for signed graphs and their effect on sign prediction problem. This paper makes the contribution of using community detection approach for ranking algorithms in signed graphs. Therefore, community detection which is another active area of research in social networks is also investigated in this paper. Community detection algorithms try to find groups of nodes in which they share common properties like similarity. We were able to devise three community-based ranking algorithms which are suitable for signed graphs, and also we evaluated these ranking algorithms via sign prediction problem. These ranking algorithms were tested on three large-scale datasets: Epinions, Slashdot, and Wikipedia. We indicated that, in some cases, these ranking algorithms outperform previous works because their prediction accuracies are better.

Journal ArticleDOI
TL;DR: This work proposes a heuristic construction of the elimination trees that has cost O(Ne log(Ne), where Ne is the number of elements in the mesh, and shows that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in the authors' numerical experiments.
Abstract: We construct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O(Ne log(Ne)), where Ne is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.

Journal ArticleDOI
TL;DR: This paper identifies key components underlying the spectral dimensionality reduction techniques, and proposes their efficient parallel implementation, and shows that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods.
Abstract: Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

Journal ArticleDOI
TL;DR: Compared to the available solutions, Subenum can enumerate subgraphs several orders of magnitude faster and the experimental results show that the performance of Subenum scales almost linearly by using additional processor cores.
Abstract: Enumerating all subgraphs of an input graph is an important task for analyzing complex networks. Valuable information can be extracted about the characteristics of the input graph using all-subgraph enumeration. Not withstanding, the number of subgraphs grows exponentially with growth of the input graph or by increasing the size of the subgraphs to be enumerated. Hence, all-subgraph enumeration is very time consuming when the size of the subgraphs or the input graph is big. We propose a parallel solution named Subenum which in contrast to available solutions can perform much faster. Subenum enumerates subgraphs using edges instead of vertices, and this approach leads to a parallel and load-balanced enumeration algorithm that can have efficient execution on current multicore and multiprocessor machines. Also, Subenum uses a fast heuristic which can effectively accelerate nonisomorphism subgraph enumeration. Subenum can efficiently use external memory, and unlike other subgraph enumeration methods, it is not associated with the main memory limits of the used machine. Hence, Subenum can handle large input graphs and subgraph sizes that other solutions cannot handle. Several experiments are done using real-world input graphs. Compared to the available solutions, Subenum can enumerate subgraphs several orders of magnitude faster and the experimental results show that the performance of Subenum scales almost linearly by using additional processor cores.

Journal ArticleDOI
TL;DR: This work critically analyzes the applicability of the hybrid model approach and evaluates the proposed strategy using several case studies and demonstrating their effectiveness.
Abstract: Existing studies show that using single GPU can lead to obtaining significant performance gains. We should be able to achieve further performance speedup if we use more than one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and are often considered as a leading candidate for porting complex scientific applications. Unfortunately programming heterogeneous systems requires more effort than what is required for traditional multicore systems. Directive-based programming approaches are being widely adopted since they make it easy to use/port/maintain application code. OpenMP and OpenACC are two popular models used to port applications to accelerators. However, neither of the models provides support for multiple GPUs. A plausible solution is to use combination of OpenMP and OpenACC that forms a hybrid model; however, building this model has its own limitations due to lack of necessary compilers' support. Moreover, the model also lacks support for direct device-to-device communication. To overcome these limitations, an alternate strategy is to extend OpenACC by proposing and developing extensions that follow a task-based implementation for supporting multiple GPUs. We critically analyze the applicability of the hybrid model approach and evaluate the proposed strategy using several case studies and demonstrate their effectiveness.

Journal ArticleDOI
TL;DR: Empirical study shows that the new method can lead to more reasonable weighting results and decision, and may integrate the merits of both subjective and objective weighting methods.
Abstract: This paper proposes a combined assigning-weight approach to determine attribute weights in the multiattribute decision problems. The approach combines subjective weights and objective weights of attributes based on maximizing variance. Objective weights are determined by rough set method and subjective weights by Analytic Hierarchy Process. This new combination method may integrate the merits of both subjective and objective weighting methods. Empirical study shows that the new method can lead to more reasonable weighting results and decision.

Journal ArticleDOI
TL;DR: Ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain are presented and some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data are implemented.
Abstract: This paper presents ideas for using coordinate-free numerics in modern Fortran to achieve code flexibility in the partial differential equation (PDE) domain. We also show how Fortran, over the last few decades, has changed to become a language well-suited for state-of-the-art software development. Fortran's new coarray distributed data structure, the language's class mechanism, and its side-effect-free, pure procedure capability provide the scaffolding on which we implement HPC software. These features empower compilers to organize parallel computations with efficient communication. We present some programming patterns that support asynchronous evaluation of expressions comprised of parallel operations on distributed data. We implemented these patterns using coarrays and the message passing interface (MPI). We compared the codes' complexity and performance. The MPI code is much more complex and depends on external libraries. The MPI code on Cray hardware using the Cray compiler is 1.5-2 times faster than the coarray code on the same hardware. The Intel compiler implements coarrays atop Intel's MPI library with the result apparently being 2-2.5 times slower than manually coded MPI despite exhibiting nearly linear scaling efficiency. As compilers mature and further improvements to coarrays comes in Fortran 2015, we expect this performance gap to narrow.

Journal ArticleDOI
TL;DR: An efficient algorithm, called iFT, that uses only the epochs of the access histories to detect data races during the execution of multithreaded programs, and reduces the average runtime and memory overhead to 84% and 37%, respectively, of those of FastTrack.
Abstract: Data races represent the most notorious class of concurrency bugs in multithreaded programs. To detect data races precisely and efficiently during the execution of multithreaded programs, the epoch-based FastTrack technique has been employed. However, FastTrack has time and space complexities that depend on the maximum parallelism of the program to partially maintain expensive data structures, such as vector clocks. This paper presents an efficient algorithm, called iFT, that uses only the epochs of the access histories. Unlike FastTrack, our algorithm requires O(1) operations to maintain an access history and locate data races, without any switching between epochs and vector clocks. We implement this algorithm on top of the Pin binary instrumentation framework and compare it with other on-the-fly detection algorithms, including FastTrack, which uses a state-of-the-art happens-before analysis algorithm. Empirical results using the PARSEC benchmark show that iFT reduces the average runtime and memory overhead to 84% and 37%, respectively, of those of FastTrack.

Journal ArticleDOI
TL;DR: The Aspen (Abstract Scalable Performance Engineering Notation) language is extended with three new language constructs: user-defined resources, parameter ranges, and a collection of costs in the abstract machine model to enable automated design space exploration via a nonlinear optimization solver.
Abstract: Architects and applications scientists often use performance models to explore a multidimensional design space of architectural characteristics, algorithm designs, and application parameters. With traditional performance modeling tools, these explorations forced users to first develop a performance model and then repeatedly evaluate and analyze the model manually. These manual investigations proved laborious and error prone. More importantly, the complexity of this traditional process often forced users to simplify their investigations. To address this challenge of design space exploration, we extend our Aspen (Abstract Scalable Performance Engineering Notation) language with three new language constructs: user-defined resources, parameter ranges, and a collection of costs in the abstract machine model. Then, we use these constructs to enable automated design space exploration via a nonlinear optimization solver. We show how four interesting classes of design space exploration scenarios can be derived from Aspen models and formulated as pure nonlinear programs. The analysis tools are demonstrated using examples based on Aspenmodels for a three-dimensional Fast Fourier Transform, the CoMD molecular dynamics proxy application, and the DARPA Streaming Sensor Challenge Problem. Our results show that this approach can compose and solve arbitrary performance modeling questions quickly and rigorously when compared to the traditional manual approach.

Journal ArticleDOI
TL;DR: A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code.
Abstract: A serial source code for simulating a supersonic ejector flow is accelerated using parallelization based on OpenMP and OpenACC directives. The purpose is to reduce the development costs and to simplify the maintenance of the application due to the complexity of the FORTRAN source code. This research follows well-proven strategies in order to obtain the best performance in both OpenMP and OpenACC. OpenMP has become the programming standard for scientific multicore software and OpenACC is one true alternative for graphics accelerators without the need of programming low level kernels. The strategies using OpenMP are oriented towards reducing the creation of parallel regions, tasks creation to handle boundary conditions, and a nested control of the loop time for the programming in offload mode specifically for the Xeon Phi. In OpenACC, the strategy focuses on maintaining the data regions among the executions of the kernels. Experiments for performance and validation are conducted here on a 12-core Xeon CPU, Xeon Phi 5110p, and Tesla C2070, obtaining the best performance from the latter. The Tesla C2070 presented an acceleration factor of 9.86X, 1.6X, and 4.5X compared against the serial version on CPU, 12-core Xeon CPU, and Xeon Phi, respectively.

Journal ArticleDOI
TL;DR: This paper uses the Speaker-Listener Label Propagation Algorithm (SLPA) as the basis for their parallel overlapping community detection implementation and shows that it yields a significant performance gain over sequential execution while preserving the high quality of community detection.
Abstract: Communities in networks are groups of nodes whose connections to the nodes in a community are stronger than with the nodes in the rest of the network. Quite often nodes participate in multiple communities; that is, communities can overlap. In this paper, we first analyze what other researchers have done to utilize high performance computing to perform efficient community detection in social, biological, and other networks. We note that detection of overlapping communities is more computationally intensive than disjoint community detection, and the former presents new challenges that algorithm designers have to face. Moreover, the efficiency of many existing algorithms grows superlinearly with the network size making them unsuitable to process large datasets. We use the Speaker-Listener Label Propagation Algorithm (SLPA) as the basis for our parallel overlapping community detection implementation. SLPA provides near linear time overlapping community detection and is well suited for parallelization. We explore the benefits of a multithreaded programming paradigm and show that it yields a significant performance gain over sequential execution while preserving the high quality of community detection. The algorithm was tested on four real-world datasets with up to 5.5 million nodes and 170 million edges. In order to assess the quality of community detection, at least 4 different metrics were used for each of the datasets.


Journal ArticleDOI
TL;DR: A hybrid technique to measure the expertise of users by analyzing their profiles and activities in social networks, which takes advantage of existing data and information available on the web to perform both a ranked list of experts in a field and a confidence value for every professional skill.
Abstract: The present paper introduces a hybrid technique to measure the expertise of users by analyzing their profiles and activities in social networks. Currently, both job seekers and talent hunters are looking for new and innovative techniques to filter jobs and candidates where candidates are trying to improve and make their profiles more attractive. In this sense, the Skill rank approach is based on the conjunction of existing and well-known information and expertise retrieval techniques that perfectly fit the existing web and social media environment to deliver an intelligent component to integrate the user context in the analysis of skills confidence. A major outcome of this approach is that it actually takes advantage of existing data and information available on the web to perform both a ranked list of experts in a field and a confidence value for every professional skill. Thus, expertise and experts can be detected, verified, and ranked using a suited trust metric. An experiment to validate the Skillrank technique based on precision and recall metrics is also presented using two different datasets: (1) ad hoc created using real data from a professional social network and (2) real data extracted from the LinkedIn API.

Journal ArticleDOI
TL;DR: A new logical discovery framework based on semantic description of the capability of Web services and user goals using F-logic is presented, which tackles the scalability problem and improves discovery performance by adding two prefiltering stages to the discovery engine.
Abstract: Discovery of semantic Web services is a heavyweight task when the number of Web services or the complexity of ontologies increases. In this paper, we present a new logical discovery framework based on semantic description of the capability of Web services and user goals using F-logic. Our framework tackles the scalability problem and improves discovery performance by adding two prefiltering stages to the discovery engine. The first stage is based on ontology comparison of user request and Web service categories. In the second stage, yet more Web services are eliminated based upon a decomposition and analysis of concept and instance attributes used in Web service capabilities and the requested capabilities of the client, resulting in a much smaller pool of Web services that need to be matched against the client request. Our prefiltering approach is evaluated using a new Web service repository, called WSMO-FL test collection.The recall rate of the filtering process is 100% by design, since no relevant Web services are ever eliminated by the two prefiltering stages, and experimental results show that the precision rate is more than 53%.

Journal ArticleDOI
TL;DR: The objective of this study is to evaluate the performances of Intel Xeon Phi hardware accelerators for Geant4 simulations, especially for multithreaded applications, and presents the complete methodology to guide users for the compilation of their Geant 4 applications on Phi processors.
Abstract: The objective of this study is to evaluate the performances of Intel Xeon Phi hardware accelerators for Geant4 simulations, especially for multithreaded applications. We present the complete methodology to guide users for the compilation of their Geant4 applications on Phi processors. Then, we propose series of benchmarks to compare the performance of Xeon CPUs and Phi processors for aGeant4 example dedicated to the simulation of electron dose point kernels, the TestEm12 example. First, we compare a distributed execution of a sequential version of the Geant4 example on both architectures before evaluating the multithreaded version of the Geant4 example. If Phi processors demonstrated their ability to accelerate computing time (till a factor 3.83) when distributing sequential Geant4 simulations, we do not reach the same level of speedup when considering the multithreaded version of the Geant4 example.

Journal ArticleDOI
TL;DR: In this paper, the authors conducted interviews with indigenous women in the municipality of Amambai, MS from 2007 to 2013 to understand some forms of violence perpetrated against women in indigenous villages.
Abstract: This research aims to understand some forms of violence perpetrated against women in indigenous villages in the municipality of Amambai, MS from 2007 to 2013. Regarding the gender issue, indigenous women are among the most severely impacted by multiple forms of violence and remain in the perspective of victimization. Interviews were conducted with indigenous women victims of domestic violence as well as with people involved in these conflict situations (the captain, the manager of the Amabai’s public office for women and researchers of the subject). Thus, it becomes important academic research that highlights the silent these women live by and produces visibility to the forms of domestic violence perpetrated against these indigenous women, an academic research from the gender perspective that notices these indigenous women's own perspectives to resolve this violation.

Journal ArticleDOI
TL;DR: Test results obtained on an SGI Altix ICE X system with 8192 cores confirm that the approach does indeed enable the mesh generator to generate multibillion element meshes in a scalable way.
Abstract: We present a parallel mesh generator called PMSH that is developed as a wrapper code around the open source sequential Netgen mesh generator. Parallelization of the mesh generator is carried out in five stages: (i) generation of a coarse volume mesh; (ii) partitioning of the coarse mesh; (iii) refinement of coarse surface mesh to produce fine surface submeshes; (iv) remeshing of each fine surface submesh to get a final fine mesh; (v) matching of partition boundary vertices followed by global vertex numbering. A new integer based barycentric coordinate method is developed for matching distributed partition boundary vertices. This method does not have precision related problems of floating point coordinate based vertex matching. Test results obtained on an SGI Altix ICE X system with 8192 cores confirm that our approach does indeed enable us to generate multibillion element meshes in a scalable way.

Journal ArticleDOI
TL;DR: This paper aims to realize high efficient remote cloud data center backup using HBase and Cassandra, and in order to verify the high efficiency backup they have applied, a cost-performance ratio has been evaluated for several benchmark databases and the proposed HBase approach outperforms the other databases.
Abstract: HBase, a master-slave framework, and Cassandra, a peer-to-peer (P2P) framework, are the two most commonly used large-scale distributed NoSQL databases, especially applicable to the cloud computing with high flexibility and scalability and the ease of big data processing. Regarding storage structure, different structure adopts distinct backup strategy to reduce the risks of data loss. This paper aims to realize high efficient remote cloud data center backup using HBase and Cassandra, and in order to verify the high efficiency backup they have applied. Thrift Java for cloud data center to take a stress test by performing strictly data read/write and remote database backup in the large amounts of data. Finally, in terms of the effectiveness-cost evaluation to assess the remote datacenter backup, a cost-performance ratio has been evaluated for several benchmark databases and the proposed ones. As a result, the proposed HBase approach outperforms the other databases.

Journal ArticleDOI
TL;DR: The proposed combination of HBase and Solr database is capable of performing an excellent query/response in a big data environment and outperforms the other databases and fulfills secondary index function with fast query in NoSQL database.
Abstract: This paper introduces the combination of NoSQL database HBase and enterprise search platform Solr so as to tackle the problem of the secondary index function with fast query. In order to verify the effectiveness and efficiency of the proposed approach, the assessment using Cost-Performance ratio has been done for several competitive benchmark databases and the proposed one. As a result, our proposed approach outperforms the other databases and fulfills secondary index function with fast query in NoSQL database. Moreover, according to the cross-sectional analysis, the proposed combination of HBase and Solr database is capable of performing an excellent query/response in a big data environment.