scispace - formally typeset
Search or ask a question
Journal ArticleDOI

High-performance modelling and simulation for big data applications

TL;DR: This open access book is the final compendium of case studies emanated from the 4-year COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications” (cHiPSet), set to become a required reference for the fast-changing fields of HPC, Big Data, and Modelling & Simulation.
About: This article is published in Simulation Modelling Practice and Theory.The article was published on 2017-08-01 and is currently open access. It has received 24 citations till now. The article focuses on the topics: Chipset & Big data.

Summary (6 min read)

1 Introduction

  • This chapter presents a position survey on the overall objective and specific challenges encompassing the state of the art in forecasting cryptocurrency value by Sentiment Analysis.
  • Further possibilities are then explored, based on this new metric perspective, such as technical analysis, forecasting, and beyond.
  • While High-Performance Computing (HPC) and Cloud Computing are not sine qua non for cryptocurrencies, their use has become pervasive in their transaction verification (“mining”).
  • Then, the Conclusion section summarizes the surveyed perspectives.

2 C. Grelck et al.

  • Semi-structured data first into valuable information and then into meaningful knowledge.
  • The COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications facilitates cross-pollination between the HPC community (both developers and users) and M&S disciplines for which the use of HPC facilities, technologies and methodologies still is a novel, if any, phenomenon.
  • They often require a significant amount of computational resources with data sets scattered across multiple sources and different geographical locations.
  • Modelling has traditionally addressed complexity by raising the level of abstraction and aiming at an essential representation of the domain at hand.
  • Domain-specific considerations may put some more or even almost all emphasis on other factors, such as usability, productivity, economic cost and time to solution.

4 C. Grelck et al.

  • Following this introductory part, the authors have a closer look at the subjects relevant to the four working groups that make up the COST Action IC1406.
  • The authors focus on Enabling Infrastructures and Middleware for Big-Data Modelling and Simulation in Sect. 3, Parallel Programming Models for Big-Data Modelling and Simulation in Sect. 4, HPC-enabled Modelling and Simulation for Life Sciences in Sect. 5, HPC-enabled Modelling and Simulation for Socio-Economical and Physical Sciences in Sect. 6, respectively.
  • Last, but not least, the authors draw some conclusions in Sect.

2 Background and State of the Art

  • High-Performance Computing is currently undergoing a major change with exascale systems expected for the early 2020s.
  • Data-intensive (big data) HPC is arguably fundamental to address grand-challenge M&S problems.
  • The development of new complex HPC-enabled M&S applications requires collaborative efforts from researchers with different domain knowledge and expertise.
  • In bio-medical studies, wet-lab validation typically involves additional resource-intensive work that has to be geared towards a statistically significant distilled fragment of the computational results, suitable to confirm the bio-medical hypotheses and compatible with the available resources.
  • Big data is an emerging paradigm whose size and features are beyond the ability of the current M&S tools [6].

6 C. Grelck et al.

  • Suitable skills for the parallel implementation of data-intensive applications.
  • Therefore, another natural objective of their work is to intelligently transfer the heterogeneous workflows in M&S to HPC, which will boost those scientific fields that are essential for both M&S and HPC societies [7].
  • M&S experts are be supported in their investigations by properly-enabled HPC frameworks, currently sought but missing.
  • HPC architects in turn obtain access to a wealth of application domains by means of which they will better understand the specific requirements of HPC in the big data era.
  • Among others, the authors aim at the design of improved data-center oriented programming models and frameworks for HPC-enabled M&S.

3 Enabling Infrastructures and Middleware for Big-Data Modelling and Simulation

  • From the inception of the Internet, one has witnessed an explosive growth in the volume, speed and variety of electronic data created on a daily basis.
  • The so-called big data problem requires the continuous improvement of servers, storage, and the whole network infrastructure in order to enable the efficient analysis and interpretation of data through on-hand data management applications, e.g. agent-based solutions in Agent Component in Oracle Data Integrator (ODI).
  • A survey of software tools for supporting cluster, grid and cloud computing is provided in [15,17,18].
  • Job scheduling, load balancing and management play a crucial role in HPC and big data simulation [27,28].
  • Some of the best-known include: Spark, Pig, Hive, JAQL, Sqoop, Oozie, Mahout, etc. Apache Spark [33], a unified engine for big data processing, provides an alternative to MapReduce that enables workloads to execute in memory, instead of on disk.

8 C. Grelck et al.

  • Apache Storm [34] is a scalable, rapid, fault-tolerant platform for distributed computing that has the advantage of handling real time data processing downloaded from synchronous and asynchronous systems.
  • Numerous tools for big data analysis, visualisation and machine learning have been made available.
  • New software applications have been developed for browsing, visualizing, interpreting and analyzing large-scale sequencing data.
  • Synchronous and asynchronous distributed simulation have been one of the options that could improve the scalability of a simulator both in term of application size and execution speed, enabling large scale systems to be simulated in real time [43,44].
  • JADE [52] is the heterogeneous multiprocessor design simulation environment that allows to simulate networkon-chips, inter-chip networks and intra-rack networks using optical and electrical interconnects.

4 Parallel Programming Models for Big-Data Modelling and Simulation

  • A core challenge in modelling and simulation is the need to combine software expertise and domain expertise.
  • Even starting from well-defined mathematical models, manual coding is inevitable.
  • This may impair time-to-solution, performance, and performance portability across different platforms.
  • In the domain-specific languages (DSL) approach abstractions aim to provide domain experts with programming primitives that match specific concepts in their domain.

4.1 Languages and Frameworks for Big Data Analysis

  • Boosted by big data popularity new languages and frameworks for data analytics are appearing at an increasing pace.
  • Each of them introduces its own concepts and terminology and advocates a (real or alleged) superiority in terms of performance or expressiveness against its predecessors.
  • For a user approaching big data analytics (even an educated computer scientist) it is increasingly difficult to retain a clear picture of the programming model underneath these tools and the expressiveness they provide to solve some user-defined problem.

To provide some order in the world of big data processing, a toolkit of models

  • To identify their common features is introduced, starting from data layout.
  • Data-processing applications are divided into batch vs. stream processing.
  • For a complete description of the Dataflow model the authors refer back to [6,70], where the main features of mainstream languages are presented.
  • Based on the map and reduce functions, commonly used in parallel and functional programming [73], MapReduce provides a native keyvalue model and built-in sorting facilities.
  • Each flat-map executor Why HPC Modelling and Simulation for Big Data Applications Matters 11 emits R (i.e. the number of intermediate partitions) chunks, each containing the intermediate key-value pairs mapped to a given partition.

3. performs the reduction on a per-key basis.

  • Finally, a downstream collector gathers R tokens from the reduce executors and merges them into the final result.
  • This poses severe challenges from the implementation perspective.
  • As a key feature HDFS exposes the locality for stored data, thus enabling the principle of moving the computation towards the data and to minimise communication.
  • Disk-based communication leads to performance problems when dealing with iterative computations, such as machine learning algorithms [74].
  • Instead of a fixed processing schema, Spark allows datasets to be processed by means of arbitrarily composed primitives, constructing a directed acyclic graph (DAG).

12 C. Grelck et al.

  • Similar to the MapReduce implementation, Spark’s execution model relies on the master-Workers model: a cluster manager (e.g. YARN) manages resources and supervises the execution of the program.
  • Each of these actors represents independent data-parallel tasks, on which pipeline parallelism is exploited.
  • Currently, they include, among others, Apache Flink, Apache Spark and Google Cloud Dataflow.
  • Bounded PCollections can be processed using batch jobs, that might read the entire data set once and perform processing as a finite job.
  • That graph is then executed using the appropriate distributed processing back-end, becoming an asynchronous job/process on that back-end.

4.2 The Systematic Mapping Study on Parallel Programming Models for Big-Data Modelling and Simulation

  • In order to minimize the bias, given that many Action participants actively design programming models and tools, the working group refined and adopted a systematic methodology to study the state of the art, called systematic mapping study (SMS).
  • The mapping study focused on the main paradigms and properties of programming languages used in highperformance computing for gig data processing.

14 C. Grelck et al.

  • Specifically, the SMS focused on domain-specific languages and explicitly excluded general-purpose languages, such as C, C++, OpenMP, Fortan, Java, Python, Scala, etc., combined with parallel exploitation libraries, such as MPI.
  • Quantitatively, in the SMS, the initial literature search resulted in 420 articles; 152 articles were retained for final review after the evaluation of initial search results by domain experts.
  • Results of their mapping study indicate, for instance, that the majority of the used HPC languages in the context of big data are text-based general-purpose programming languages and target the enduser community.
  • To evaluate the outcome of the mapping study, the authors developed a questionnaire and collected the opinions of domain experts.

5 HPC-Enabled Modelling and Simulation for Life Sciences

  • Life Sciences typically deal with and generate large amounts of data, e.g., the flux of terabytes about genes and their expression produced by state of the art sequencing and microarray equipment, or data relating to the dynamics of cell biochemistry or organ functionality.
  • The authors will consider approaches for modelling healthcare and diseases as well as problems in systems and synthetic biology.
  • Taking into account only the DNA sequencing data, its rate of accumulation is much larger than other major generators of big data, such as astronomy, YouTube and Twitter.
  • Areas such as systems medicine, clinical informatics, systems biology and bioinformatics have large overlaps with classical fields of medicine, and extensively use biological information and computational methods to infer new knowledge towards understanding disease mechanism and diagnosis.
  • A patient’s condition is characterised by multiple, complex and interrelated conditions, disorders or diseases [87,88].

16 C. Grelck et al.

  • And markers re-modulation; the establishment of clinical decision support systems.
  • This could be of great importance for epigenetic data, which shows alteration with ageing, inflammatory diseases, obesity, cardiovascular and neurodegenerative diseases.
  • Dependant on the magnitude of mechanical stress osteoprogenitors differentiate or transdifferentiate into osteoblastlike cells that express characteristic proteins and can form bone matrix.
  • The transition between a continuous representation and a discrete representation makes the coupling of the models across the cell-tissue scale particularly difficult.
  • Conventional homogenisation approaches, frequently used as relation models to link to component models defined at different scales, are computationally resource demanding [89–92].

18 C. Grelck et al.

  • In recent years, thanks to faster and cheaper sequencing machines, a huge amount of whole genomic sequences within the same population has become available (e.g. [99]).
  • An elastic-degenerate text (ED-text) is a sequence compactly representing a multiple alignment of several closely-related sequences: substrings that match exactly are collapsed, while those in positions where the sequences differ (by means of substitutions, insertions, and deletions of substrings) are called degenerate, and therein all possible variants observed at that location are listed [105].
  • This problem has been efficiently solved in [112] with a linear time algorithm for the case of non-elastic D-texts (a degenerate segment can only contain strings of the same size).
  • Solutions have typically an exponential computational complexity.
  • WHATSHAP [113] is a framework returning exact solutions to the problem of haplotyping which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology’s current trends that are producing longer fragments.

20 C. Grelck et al.

  • Many functional modules are linked together in a Metabolic Network for reproducing metabolic pathways and describing the entire cellular metabolism of an organism.
  • An integrated approach based on statistical, topological, and functional analysis allows for obtaining a deep knowledge on overall metabolic network robustness.
  • So, ultra-peripheral non-hub nodes can assume a fundamental role for network survival if they belong to network extreme pathways, while hub nodes can have a limited impact on networks if they can be replaced by alternative nodes and paths [115,116].
  • The same approach have been applied as a bio-inspired optimisation method to different application domains.
  • The computational analysis of complex biological systems can be hindered by three main factors:.

1. modelling the system so that it can be easily understood and analysed by non-expert users is not always possible;

  • When the system is composed of hundreds or thousands of reactions and chemical species, the classic CPU-based simulators could not be appropriate to efficiently derive the behaviour of the system.
  • These methods often need an amount of experimental data that not always is available.
  • The system behaviour is described in detail by a system of ordinary differential equations (ODE) while model indetermination is resolved selecting time-varying coefficients that maximize/minimize the objective function at each ODE integration step.
  • Some interesting applications in this context are based on the study of integrated biological data and how they are organised in complex systems.
  • The persistent challenges in the healthcare sector call for urgent review of strategies.

22 C. Grelck et al.

  • There has also been diverse application of operations management techniques in several domains including the health sector.
  • A major classification identified resource and facility management, demand forecasting, inventory and supply chain management, and cost measurement as application groupings to prioritise [126].
  • Challenges do also arise around patient workflow: admission, scheduling, and resource allocation.
  • This obviously comes with the need for adequate computing and storage capabilities.
  • The choice of model and/or simulation technique can ultimately be influenced by available computing power and storage space.

6 HPC-Enabled Modelling and Simulation for Socio-Economical and Physical Sciences

  • Many types of decisions in society are supported by modelling and simulation.
  • The authors can roughly divide the applications within the large and diverse area, that they here call socio-economical and physical sciences, into two groups.
  • In classical HPC applications, the need for HPC arises from the fact that the authors have a large-scale model or a computationally heavy software implementation, that needs to make use of large-scale computational resources, and potentially also large-scale storage resources in order to deliver timely results.
  • The opportunities for using data in new ways are endless, but as is suggested in [138], data and algorithms together can provide the whats, while the innovation and imagination of human interpreters is still needed to answer the whys.
  • Wing design is one of the essential procedures of aircraft manufactures and it is a compromise between many competing factors and constraints.

24 C. Grelck et al.

  • Necessary derivatives can easily be calculated by applying finite-difference methods.
  • As a thriving application platform, HPC excels in supporting execution and it’s speedup through parallellisation when running Computational Intelligence (CI) algorithms.
  • The likes of CI algorithms supported by this action includes development of some of most efficient optimization algorithms for continuous optimization as defined with benchmark functions competition framework from Congress on Evolutionary Computation (CEC) 2017 [143,144].
  • IoT assumes that multiple sensors can be used to monitor the real-world and this information can be stored and processed, jointly with information from soft-sensor (RSS, web, etc.) [155], to for example assist elderly people in the street [156], develop intelligent interfaces [157] or detect anomalies in industrial environments [158].
  • Concentration of these data at a decision-making location may also allow travel time estimation, exploitation of network locality information, as well as comparison with the estimates provided by a traffic management system, which can be evaluated for effectiveness on the medium term and possibly tuned accordingly.

26 C. Grelck et al.

  • Risk management, in insurance, and in prediction of catastrophic climate events.
  • In a later chapter, methods for extreme value estimation are surveyed.

7 Summary and Conclusion

  • HPC and M&S form two previously largely disjoint and disconnected research communities.
  • The COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications brings these two communities together to tackle the challenges of big data applications from diverse application domains.
  • Having set the scene in this paper, the other papers of this volume exemplify the achievements of the COST Action.

Did you find this useful? Give us your feedback

Figures (21)
Citations
More filters
01 Jan 2013
TL;DR: From the experience of several industrial trials on smart grid with communication infrastructures, it is expected that the traditional carbon fuel based power plants can cooperate with emerging distributed renewable energy such as wind, solar, etc, to reduce the carbon fuel consumption and consequent green house gas such as carbon dioxide emission.
Abstract: A communication infrastructure is an essential part to the success of the emerging smart grid. A scalable and pervasive communication infrastructure is crucial in both construction and operation of a smart grid. In this paper, we present the background and motivation of communication infrastructures in smart grid systems. We also summarize major requirements that smart grid communications must meet. From the experience of several industrial trials on smart grid with communication infrastructures, we expect that the traditional carbon fuel based power plants can cooperate with emerging distributed renewable energy such as wind, solar, etc, to reduce the carbon fuel consumption and consequent green house gas such as carbon dioxide emission. The consumers can minimize their expense on energy by adjusting their intelligent home appliance operations to avoid the peak hours and utilize the renewable energy instead. We further explore the challenges for a communication infrastructure as the part of a complex smart grid system. Since a smart grid system might have over millions of consumers and devices, the demand of its reliability and security is extremely critical. Through a communication infrastructure, a smart grid can improve power reliability and quality to eliminate electricity blackout. Security is a challenging issue since the on-going smart grid systems facing increasing vulnerabilities as more and more automation, remote monitoring/controlling and supervision entities are interconnected.

1,036 citations

Journal ArticleDOI
TL;DR: The central premise of the book is that the combination of the Pareto or Zipf distribution that is characteristic of Web traffic and the direct access to consumers via Web technology has opened up new business opportunities in the ''long tail''.
Abstract: The Long Tail: How Technology is turning mass markets into millions of niches. (p. 15). This passage from The Long Tail, pretty much sums it all up. The Long Tail by Chris Anderson is a good and worthwhile read for information scientists, computer scientists, ecommerce researchers, and others interested in all areas of Web research. The central premise of the book is that the combination of (1) the Pareto or Zipf distribution (i.e., power law probability distribution) that is characteristic of Web traffic and (2) the direct access to consumers via Web technology has opened up new business opportunities in the ''long tail''. Producers and advertisers no longer have to target ''the big hits'' at the head of the distribution. Instead, they can target the small, niche communities or even individuals in the tail of the distribution. The long tail is has been studied by Web researchers and has been noted in term usage on search engines, access times to servers, and popularity of Web sites. Andersen points out that the long tail also applies to products sold on the Web. He recounts that a sizeable percentage of Amazon sales come from books that only sell a few copies, a large number of songs from Rhapsody get downloaded only once in a month, and a significant number of movies from Netflix only get ordered occasionally. However, since the storage is in digital form for the songs and music (and Amazon out sources the storage of books) there is little additional inventory cost of these items. This phenomenon across all Web companies has led to a broadening of participation by both producers and consumers that would not have happened without the Web. The idea of the long tail is well known, of course. What Anderson has done is present it in an interesting manner and in a Web ecommerce setting. He applies it to Web businesses and then relates the multitude of other factors ongoing that permit the actual implementation of the long tail effect. Anderson also expands on prior work on the long tail by introducing an element of time, given the distribution a three dimensional effect. All in all, it is a nifty idea. The book is comprised of 14 chapters, plus an Introduction. Chapter 1 presents an overview of what the long tail is. Chapter 2 discusses the ''head'', which is the top of the tail where the …

827 citations

01 Jan 2016
TL;DR: Thank you very much for downloading using mpi portable parallel programming with the message passing interface for reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.
Abstract: Thank you very much for downloading using mpi portable parallel programming with the message passing interface. As you may know, people have search hundreds times for their chosen novels like this using mpi portable parallel programming with the message passing interface, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some malicious bugs inside their laptop.

593 citations

Journal Article
TL;DR: The reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort are described and the application requirements for consistency, availability, partition tolerance, data model and scalability are discussed.
Abstract: Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day. This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort and discusses the applicationBs requirements for consistency, availability, partition tolerance, data model and scalability. I explore the enhancements made to Hadoop to make it a more effective realtime system, the tradeoffs we made while configuring the system, and how this solution has significant advantages over the sharded MySQL database scheme used in other applications at Facebook and many other web-scalecompanies. I discuss the motivations behind my design choices, the challenges that we face in day-to-day operations, and future capabilities and improvements still under development.I offer these observations on the deployment as a model for other companies who are contemplating a Hadoop-based solution over traditional sharded RDBMS deployments.

279 citations

Journal ArticleDOI
TL;DR: In an out-of-sample analysis accounting for transaction cost, it is found that combining cryptocurrencies enriches the set of ‘low’-risk cryptocurrency investment opportunities and the 1/N-portfolio outperforms single cryptocurrencies and more than 75% of mean-variance optimal portfolios.
Abstract: By the end of 2017, 27 cryptocurrencies topped a market capitalization of one billion USD. Bitcoin is still shaping market and media coverage, however, recently we faced a vibrant rise of other currencies. As a result, 2017 has also witnessed the advent of a large number of cryptocurrency-funds. In this paper, we use Markowitz' mean-variance framework in order to assess risk-return-benefits of cryptocurrency-portfolios. We relate risk and return of different portfolio strategies to single cryptocurrency investments. In an out-of-sample analysis accounting for transaction cost we find that combining cryptocurrencies in a portfolio enriches the set of 'low'-risk cryptocurrency investment opportunities.

84 citations

References
More filters
Journal ArticleDOI
TL;DR: The goals of the PDB are described, the systems in place for data deposition and access, how to obtain further information and plans for the future development of the resource are described.
Abstract: The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

34,239 citations

Journal ArticleDOI
Rainer Storn1, Kenneth Price
TL;DR: In this article, a new heuristic approach for minimizing possibly nonlinear and non-differentiable continuous space functions is presented, which requires few control variables, is robust, easy to use, and lends itself very well to parallel computation.
Abstract: A new heuristic approach for minimizing possibly nonlinear and non-differentiable continuous space functions is presented. By means of an extensive testbed it is demonstrated that the new method converges faster and with more certainty than many other acclaimed global optimization methods. The new method requires few control variables, is robust, easy to use, and lends itself very well to parallel computation.

24,053 citations

Journal ArticleDOI
TL;DR: Upon returning to the U.S., author Singhal’s Google search revealed the following: in January 2001, the impeachment trial against President Estrada was halted by senators who supported him and the government fell without a shot being fired.

23,419 citations


"High-performance modelling and simu..." refers background or methods in this paper

  • ...energy applications [148], constrained trajectory planning [149], artificial life of full ecosystems [150] including HPC-enabled evolutionary computer vision in 2D [151,152] and 3D [151], many other well recognized real-world optimization challenges [153], or even insight to deep inner dynamics of DE over full benchmarks, requiring large HPC capacities [154]....

    [...]

  • ...In Rogers’ classic work [150], the author defines information diffusion as the process in which an innovation is communicated through certain channels over time among the members of a social system....

    [...]

  • ...Rogers’ theory [150] is quantified by the Bass model [33]....

    [...]

  • ...Evaluation: [8-512] nodes cluster simulation...

    [...]

Journal ArticleDOI
Jeffrey Dean1, Sanjay Ghemawat1
06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Abstract: MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.

20,309 citations

Journal ArticleDOI
TL;DR: This publication contains reprint articles for which IEEE does not hold copyright and which are likely to be copyrighted.
Abstract: Social network sites SNSs are increasingly attracting the attention of academic and industry researchers intrigued by their affordances and reach This special theme section of the Journal of Computer-Mediated Communication brings together scholarship on these emergent phenomena In this introductory article, we describe features of SNSs and propose a comprehensive definition We then present one perspective on the history of such sites, discussing key changes and developments After briefly summarizing existing scholarship concerning SNSs, we discuss the articles in this special section and conclude with considerations for future research

14,912 citations


"High-performance modelling and simu..." refers background or methods in this paper

  • ...The rule-of-thumb bandwidth estimator of Silverman [44],...

    [...]

  • ...Evaluation: [8-512] nodes cluster simulation...

    [...]

  • ...It was shown in [44], see also [41], that minimizing the square integrated error (ISE) for a specific sample is equivalent to minimizing the cross-validation function...

    [...]

  • ...We then have [44] the asymptotic approximation...

    [...]

  • ...Synchronous and asynchronous distributed simulation have been one of the options that could improve the scalability of a simulator both in term of application size and execution speed, enabling large scale systems to be simulated in real time [43,44]....

    [...]

Frequently Asked Questions (20)
Q1. What are the contributions mentioned in the paper "High-performance modelling and simulation for big data applications" ?

In this introductory article the authors argue why joining forces between M & S and HPC communities is both timely in the big data era and crucial for success in many application domains. Moreover, the authors provide an overview on the state of the art in the various research areas concerned. 

In the following work, some more specific implementations and experimental results could be presented, based on the guidelines, outlines, and integration possibilities presented in this chapter. Author RS also acknowledges that work was supported by the Ministry of Education, Forecasting Cryptocurrency Value by Sentiment Analysis 341 Youth and Sports of the Czech Republic within the National Sustainability Programme Project No. LO1303 ( MSMT-7778/2014 ), further supported by the European Regional Development Fund under the Project CEBIA-Tech no. 

Apache Storm [34] is a scalable, rapid, fault-tolerant platform for distributed computing that has the advantage of handling real time data processing downloaded from synchronous and asynchronous systems. 

By using the method of modular analysis and unified derivatives (MAUD), the authors can unify all methods for computing total derivatives using a single equation with associated distributed-memory, sparse data-passing schemes. 

The medical approach to comorbidities represents an impressive computational challenge, mainly because of data synergies leading to the integration of heterogeneous sources of information, the definition of deep phenotypingand markers re-modulation; the establishment of clinical decision support systems. 

Due to biotechnologies limitations, sequencing (that is, giving as input the in vitro DNA and getting out an in silico text file) can only be done on a genome fragment of limited size. 

in the case of EU project RIVR (Upgrading National Research Structures in Slovenia) supported by European Regional Development Fund (ERDF), an important sideeffect of cHiPSet COST action was leveraging it’s experts’ inclusiveness to gain capacity recognition at a national ministry for co-financing HPC equipment1. 

In particular, complex disease management is mostly based on electronic health records collection and analysis, which are expensive processes. 

Since most of these applications belong to domains within the life, social and physical sciences, their mainstream approaches are rooted in non-computational abstractions and they are typically not HPC-enabled. 

Classical HPC applications, where the authors build a large-scale complex model and simulate this in order to produce data as a basis for decisions, and Big data applications, where the starting point is a data set, that is processed and analyzed to learn the behaviour of a system, to find relevant features, and to make predictions or decisions. 

For instance by using the Next Generation Sequencing technology approaches cancer clones, subtypes and metastasis could be appropriately traced. 

For instance, in bio-medical studies, wet-lab validation typically involves additional resource-intensive work that has to be geared towards a statistically significant distilled fragment of the computational results, suitable to confirm the bio-medical hypotheses and compatible with the available resources. 

Their chances to make the data the drivers of paths to cures for many complex diseases depends in a good percentage on extracting evidences from large-scale electronic records comparison and on models of disease trajectories. 

CloudSim [54] is one of the most popular open source framework for modeling and simulation of cloud computing infrastructures and services. 

The growth is driven by three main factors:1. Biomedicine is heavily interdisciplinary and e-Healthcare requires physicians, bioinformaticians, computer scientists and engineers to team up. 

The optimum framework for modelling and simulating a particular use-case depends on the availability, structure and size of data [126]. 

in the SMS, the initial literature search resulted in 420 articles; 152 articles were retained for final review after the evaluation of initial search results by domain experts. 

Other tools, such as BamView [40] have been developed specifically to visualise mapped read alignment data in the context of the reference sequence. 

Some approaches have been successful, leading to potential industrial impact and supporting experiments that generate petabytes of data, like those performed at CERN for instance. 

The computational analysis of complex biological systems can be hindered by three main factors:2. When the system is composed of hundreds or thousands of reactions and chemical species, the classic CPU-based simulators could not be appropriate to efficiently derive the behaviour of the system.