scispace - formally typeset
Search or ask a question

Showing papers by "Lars Lundberg published in 2017"


Proceedings ArticleDOI
01 Oct 2017
TL;DR: This study shows that Docker had lower overhead compared to the VMware when running Cassandra, and the Cassandra's performance on the Dockerized infrastructure was as good as on the Non-Virtualized.
Abstract: Today, scalable and high-available NoSQL distributed databases are largely used as Big Data platforms Such distributed databases typically run on a virtualized infrastructure that could be implemented using Hypervisor-based virtualization or Container-based virtualization Hypervisor-based virtualization is a mature technology but imposes overhead on CPU, memory, networking, and disk Recently, by sharing the operating system resources and simplifying the deployment of applications, container-based virtualization is getting more popular Container-based virtualization is lightweight in resource consumption while also providing isolation However, disadvantages are security issues and I/O performance As a result, today these two technologies are competing to provide virtual instances for running big data platforms Hence, a key issue becomes the assessment of the performance of those virtualization technologies while running distributed databases This paper presents an extensive performance comparison between VMware and Docker container, while running Apache Cassandra as workload Apache Cassandra is a leading NoSQL distributed database when it comes to Big Data platforms As baseline for comparisons we used the Cassandra's performance when running on a physical infrastructure Our study shows that Docker had lower overhead compared to the VMware when running Cassandra In fact, the Cassandra's performance on the Dockerized infrastructure was as good as on the Non-Virtualized

25 citations


Journal ArticleDOI
TL;DR: Cassandra, CouchDB, MongoDB, PostgreSQL, and RethinkDB are evaluated as database management systems for NoSQL and SQL users.
Abstract: In this study, we evaluate the performance of SQL and NoSQL database management systems namely; Cassandra, CouchDB, MongoDB, PostgreSQL, and RethinkDB. We use a cluster of four nod ...

16 citations


Proceedings ArticleDOI
01 Jun 2017
TL;DR: A user segmentation model based on cluster-ing of user trajectories from the Call Detail Records covering one week of activity of one region in Sweden is completed and has been compared against MOSAIC in the recommendation module of a customer relationship management system and has revealed better business options with regard to network exploitation and potential revenues.
Abstract: In business analytics some industries rely heavily on commercial geo-demographic segmentation systems (MOSAIC, ACORN, etc), which are a universally strong predictor of user's behavior: from diabetes propensity and purchasing habits to political preferences A segment is de-fined with a postcode of the client's home address Recent research suggests that a mature competitor to geo-demographic segmentation is about to emerge: segmentation based on user mobility is reported to be a reliable proxy of social well-being of the neighborhood In this submission, we have completed a user segmentation model based on cluster-ing of user trajectories from the Call Detail Records covering one week of activity of one region in Sweden The new seg-mentation has been compared against MOSAIC in the recommendation module of a customer relationship management system and has revealed better business options with regard to network exploitation and potential revenues The implementation is available from the corresponding author (JS or LL) on request

11 citations


Journal ArticleDOI
TL;DR: The main findings are: VDC allocation aiming at reducing the energy consumption or resource usage in general can heavily reduce the reliability of Cassandra in term of the consistency level offered.
Abstract: Apache Cassandra is an highly scalable and available NoSql datastore, largely used by enterprises of each size and for application areas that range from entertainment to big data analytics Managed Cassandra service providers are emerging to hide the complexity of the installation, fine tuning and operation of Cassandra virtual data centers (VDCs) This paper address the problem of energy efficient auto-scaling of Cassandra VDC in managed Cassandra data centers We propose three energy-aware autoscaling algorithms: Opt, LocalOpt and LocalOpt-H The first provides the optimal scaling decision orchestrating horizontal and vertical scaling and optimal placement The other two are heuristics and provide sub-optimal solutions Both orchestrate horizontal scaling and optimal placement LocalOpt consider also vertical scaling In this paper: we provide an analysis of the computational complexity of the optimal and of the heuristic auto-scaling algorithms; we discuss the issues in auto-scaling Cassandra VDC and we provide best practice for using auto-scaling algorithms; we evaluate the performance of the proposed algorithms under programmed SLA variation, surge of throughput (unexpected) and failures of physical nodes We also compare the performance of energy-aware auto-scaling algorithms with the performance of two energy-blind auto-scaling algorithms, namely BestFit and BestFit-H The main findings are: VDC allocation aiming at reducing the energy consumption or resource usage in general can heavily reduce the reliability of Cassandra in term of the consistency level offered Horizontal scaling of Cassandra is very slow and make hard to manage surge of throughput Vertical scaling is a valid alternative, but it is not supported by all the cloud infrastructures

10 citations


Book ChapterDOI
24 Sep 2017
TL;DR: In this paper, the authors explore the second alternative: new antennas can be installed at hot spots of user demand, which will require an investment, and/or the clientele expansion can be carried out in a planned manner to promote the exploitation of the infrastructure in less loaded geographical zones.
Abstract: The population in Sweden is growing rapidly due to immigration. In this light, the issue of infrastructure upgrades to provide telecommunication services is of importance. New antennas can be installed at hot spots of user demand, which will require an investment, and/or the clientele expansion can be carried out in a planned manner to promote the exploitation of the infrastructure in the less loaded geographical zones. In this paper, we explore the second alternative. Informally speaking, the term Infrastructure-Stressing describes a user who stays in the zones of high demand, which are prone to produce service failures, if further loaded. We have studied the Infrastructure-Stressing population in the light of their correlation with geo-demographic segments. This is motivated by the fact that specific geo-demographic segments can be targeted via marketing campaigns. Fuzzy logic is applied to create an interface between big data, numeric methods for its processing, and a manager who wants a comprehensible summary.

5 citations


Proceedings ArticleDOI
01 Aug 2017
TL;DR: In this article, the authors present a data-driven analytic strategy based on combinatorial optimization and analysis of historical data, and apply the proposed method in a case study to identify the optimal combination of geodemographic segments in the customer base.
Abstract: A major investment made by a telecom operator goes into the infrastructure and its maintenance, while business revenues depend on how efficiently it is exploited. We present a data-driven analytic strategy based on combinatorial optimization and analysis of historical data. The data cover historical mobility in one region of Sweden during a week. Applying the proposed method in a case study, we have identified the optimal combination of geodemographic segments in the customer base, developed a functionality to assess the potential of a planned marketing campaign, and investigated how many and which segments to target for customer base growth. A comprehensible summary of the conclusions is created via execution of the queries with a fuzzy logic component.

4 citations


Posted Content
TL;DR: The population in Sweden is growing rapidly due to immigration, and the issue of infrastructure upgrades to provide telecommunication services is of importance; new antennas can be installed at hot spots of user demand, and/or the clientele expansion can be carried out in a planned manner to promote the exploitation of the infrastructure in the less loaded geographical zones.
Abstract: The population in Sweden is growing rapidly due to immigration. In this light, the issue of infrastructure upgrades to provide telecommunication services is of importance. New antennas can be installed at hot spots of user demand, which will require an investment, and/or the clientele expansion can be carried out in a planned manner to promote the exploitation of the infrastructure in the less loaded geographical zones. In this paper, we explore the second alternative. Informally speaking, the term Infrastructure-Stressing describes a user who stays in the zones of high demand, which are prone to produce service failures, if further loaded. We have studied the Infrastructure-Stressing population in the light of their correlation with geo-demographic segments. This is motivated by the fact that specific geo-demographic segments can be targeted via marketing campaigns. Fuzzy logic is applied to create an interface between big data, numeric methods for processing big data and a manager.

4 citations


Proceedings ArticleDOI
01 Nov 2017
TL;DR: Initial evaluations show that cluster validation techniques can be useful tools for assessing the organizational structure using objective analysis of internal email communications, and for simulating and studying different reorganization scenarios.
Abstract: In this work, we report an ongoing study that aims to apply cluster validation measures for analyzing email communications at an organizational level of a company. This analysis can be used to evaluate the company structure and to produce further recommendations for structural improvements. Our initial evaluations, based on data in the forms of emails logs and organizational structure for a large European telecommunication company, show that cluster validation techniques can be useful tools for assessing the organizational structure using objective analysis of internal email communications, and for simulating and studying different reorganization scenarios.

3 citations


Posted Content
TL;DR: A data-driven analytic strategy based on combinatorial optimization and analysis of historical data that identifies the optimal combination of geodemographic segments in the customer base, developed a functionality to assess the potential of a planned marketing campaign, and investigated how many and which segments to target for customer base growth.
Abstract: A major investment made by a telecom operator goes into the infrastructure and its maintenance, while business revenues are proportional to how big and good the customer base is. We present a data-driven analytic strategy based on combinatorial optimization and analysis of historical data. The data cover historical mobility of the users in one region of Sweden during a week. Applying the proposed method to the case study, we have identified the optimal proportion of geo-demographic segments in the customer base, developed a functionality to assess the potential of a planned marketing campaign, and explored the problem of an optimal number and types of the geo-demographic segments to target through marketing campaigns. With the help of fuzzy logic, the conclusions of data analysis are automatically translated into comprehensible recommendations in a natural language.

2 citations


Journal Article
TL;DR: The use of virtualized systems is growing, and one would like to benefit from this kind of systems also for real-time applications with hard deadlines, and there are two leading providers of these systems.
Abstract: The use of virtualized systems is growing, and one would like to benefit from this kind of systems also for real-time applications with hard deadlines. There are two le ...

2 citations