scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Big Data Analytics for Security

TL;DR: Big data is changing the landscape of security tools for network monitoring, security information and event management, and forensics; however, in the eternal arms race of attack and defense, security researchers must keep exploring novel ways to mitigate and contain sophisticated attackers.
Abstract: Big data is changing the landscape of security tools for network monitoring, security information and event management, and forensics; however, in the eternal arms race of attack and defense, security researchers must keep exploring novel ways to mitigate and contain sophisticated attackers.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors present a state-of-the-art review that presents a holistic view of the BD challenges and BDA methods theorized/proposed/employed by organizations to help others understand this landscape with the objective of making robust investment decisions.

1,267 citations


Cites background from "Big Data Analytics for Security"

  • ...Aggregating these data evidently goes beyond the abilities of current data integration systems (Carlson et al., 2010). According to Karacapilidis, Tzagarakis, and Christodoulou (2013), the availability of data in large volumes and diverse types of representation, smart integration of these data sources to create new knowledge – towards serving collaboration and improved decision-making – remains a key challenge. Halevy, Rajaraman, and Ordille (2006) assert that the indecision and provenance of data are also a major challenge for data aggregation and integration. Another challenge relates to aggregated data in warehouses – in line with this argument, Lebdaoui, Orhanou, and Elhajji (2014) report that to enable decision systems to efficiently respond to the real world's demands, such systems must be updated with clean operational data. • Step 4 – Data Analysis and Modelling: Once the data has been captured, stored, mined, cleaned and integrated, comes the data analysis and modelling for BD. Outdated data analysis and modelling centers around solving the intricacy of relationships between schema-enabled data. As BD is often noisy, unreliable, heterogeneous, dynamic in nature; in this context, these considerations do not apply to non-relational, schema-less databases (Shah et al., 2015). From the perspective of differing between BD and traditional data warehousing systems; Kune, Konugurthi, Agarwal, Chillarige, and Buyya (2016) report that although these two have similar goals; to deliver business value through the analysis of data, they differ in the analytics methods and the organization of the data....

    [...]

  • ...Aggregating these data evidently goes beyond the abilities of current data integration systems (Carlson et al., 2010). According to Karacapilidis, Tzagarakis, and Christodoulou (2013), the availability of data in large volumes and diverse types of representation, smart integration of these data sources to create new knowledge – towards serving collaboration and improved decision-making – remains a key challenge. Halevy, Rajaraman, and Ordille (2006) assert that the indecision and provenance of data are also a major challenge for data aggregation and integration....

    [...]

  • ...Aggregating these data evidently goes beyond the abilities of current data integration systems (Carlson et al., 2010). According to Karacapilidis, Tzagarakis, and Christodoulou (2013), the availability of data in large volumes and diverse types of representation, smart integration of these data sources to create new knowledge – towards serving collaboration and improved decision-making – remains a key challenge....

    [...]

  • ...Aggregating these data evidently goes beyond the abilities of current data integration systems (Carlson et al., 2010). According to Karacapilidis, Tzagarakis, and Christodoulou (2013), the availability of data in large volumes and diverse types of representation, smart integration of these data sources to create new knowledge – towards serving collaboration and improved decision-making – remains a key challenge. Halevy, Rajaraman, and Ordille (2006) assert that the indecision and provenance of data are also a major challenge for data aggregation and integration. Another challenge relates to aggregated data in warehouses – in line with this argument, Lebdaoui, Orhanou, and Elhajji (2014) report that to enable decision systems to efficiently respond to the real world's demands, such systems must be updated with clean operational data....

    [...]

Journal ArticleDOI
TL;DR: This survey takes into account the early stage threats which may lead to a malicious insider rising up and reviews the countermeasures from a data analytics perspective.
Abstract: Information communications technology systems are facing an increasing number of cyber security threats, the majority of which are originated by insiders. As insiders reside behind the enterprise-level security defence mechanisms and often have privileged access to the network, detecting and preventing insider threats is a complex and challenging problem. In fact, many schemes and systems have been proposed to address insider threats from different perspectives, such as intent, type of threat, or available audit data source. This survey attempts to line up these works together with only three most common types of insider namely traitor, masquerader, and unintentional perpetrator, while reviewing the countermeasures from a data analytics perspective. Uniquely, this survey takes into account the early stage threats which may lead to a malicious insider rising up. When direct and indirect threats are put on the same page, all the relevant works can be categorised as host, network, or contextual data-based according to audit data source and each work is reviewed for its capability against insider threats, how the information is extracted from the engaged data sources, and what the decision-making algorithm is. The works are also compared and contrasted. Finally, some issues are raised based on the observations from the reviewed works and new research gaps and challenges identified.

259 citations


Cites background from "Big Data Analytics for Security"

  • ...big data, namely big volume, high velocity and variety [141]....

    [...]

Journal ArticleDOI
TL;DR: A state-of-art survey on the integration of blockchain with 5G networks and beyond, including discussions on the potential of blockchain for enabling key 5G technologies, including cloud/edge computing, Software Defined Networks, Network Function Virtualization, Network Slicing, and D2D communications.

244 citations


Cites background from "Big Data Analytics for Security"

  • ...ltimedia data generated from ubiquitous 5G IoT devices can be exploited to enable data-related applications, for example, data analytics, data extraction empowered by artificial intelligence solutions [315]. Cloud computing services can offer high storage capabilities to cope with the expansion of quantity and diversity of digital IoT data. However, big data technologies can face various challenges, ran...

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors identified a set of challenges (framework) for implementing Industry 4.0 in manufacturing industries and evaluated them using a novel multi-criteria decision-making method named Best-Worst method (BWM).

242 citations

Journal ArticleDOI
TL;DR: A comprehensive survey on state-of-the-art deep learning, IoT security, and big data technologies is conducted and a thematic taxonomy is derived from the comparative analysis of technical studies of the three aforementioned domains.

193 citations


Cites background from "Big Data Analytics for Security"

  • ...Authors of [7], discuss that enterprises collect security related data for regulatory compliance and post hoc forensic analysis....

    [...]

References
More filters
Proceedings ArticleDOI
09 Dec 2013
TL;DR: A novel system, Beehive, that attacks the problem of automatically mining and extracting knowledge from the dirty log data produced by a wide variety of security products in a large enterprise, and is able to identify malicious events and policy violations which would otherwise go undetected.
Abstract: As more and more Internet-based attacks arise, organizations are responding by deploying an assortment of security products that generate situational intelligence in the form of logs. These logs often contain high volumes of interesting and useful information about activities in the network, and are among the first data sources that information security specialists consult when they suspect that an attack has taken place. However, security products often come from a patchwork of vendors, and are inconsistently installed and administered. They generate logs whose formats differ widely and that are often incomplete, mutually contradictory, and very large in volume. Hence, although this collected information is useful, it is often dirty. We present a novel system, Beehive, that attacks the problem of automatically mining and extracting knowledge from the dirty log data produced by a wide variety of security products in a large enterprise. We improve on signature-based approaches to detecting security incidents and instead identify suspicious host behaviors that Beehive reports as potential security incidents. These incidents can then be further analyzed by incident response teams to determine whether a policy violation or attack has occurred. We have evaluated Beehive on the log data collected in a large enterprise, EMC, over a period of two weeks. We compare the incidents identified by Beehive against enterprise Security Operations Center reports, antivirus software alerts, and feedback from enterprise security specialists. We show that Beehive is able to identify malicious events and policy violations which would otherwise go undetected.

262 citations

Proceedings ArticleDOI
Tudor Dumitras1, Darren Shou1
10 Apr 2011
TL;DR: The unique characteristics of the WINE data are reviewed, why rigorous benchmarking will provide fresh insights on the security arms race is discussed, and a research agenda for this area is proposed.
Abstract: Unlike benchmarks that focus on performance or reliability evaluations, a benchmark for computer security must necessarily include sensitive code and data. Because these artifacts could damage systems or reveal personally identifiable information about the users affected by cyber attacks, publicly disseminating such a benchmark raises several scientific, ethical and legal challenges. We propose the Worldwide Intelligence Network Environment (WINE), a security-benchmarking approach based on rigorous experimental methods. WINE includes representative field data, collected worldwide from 240,000 sensors, for new empirical studies, and it will enable the validation of research on all the phases in the lifecycle of security threats. We tackle the key challenges for security benchmarking by designing a platform for repeatable experimentation on the WINE data sets and by collecting the metadata required for understanding the results. In this paper, we review the unique characteristics of the WINE data, we discuss why rigorous benchmarking will provide fresh insights on the security arms race and we propose a research agenda for this area.

102 citations

Proceedings ArticleDOI
29 Nov 2011
TL;DR: This paper proposes a distributed computing framework that leverages a host dependency model and an adapted PageRank algorithm and reports experimental results from an open-source based Hadoop cluster and highlights the performance benefits when using real network traces from an Internet operator.
Abstract: Botnets are a major threat of the current Internet Understanding the novel generation of botnets relying on peer-to-peer networks is crucial for mitigating this threat Nowadays, botnet traffic is mixed with a huge volume of benign traffic due to almost ubiquitous high speed networks Such networks can be monitored using IP flow records but their forensic analysis form the major computational bottleneck We propose in this paper a distributed computing framework that leverages a host dependency model and an adapted PageRank [1] algorithm We report experimental results from an open-source based Hadoop cluster [2] and highlight the performance benefits when using real network traces from an Internet operator

83 citations

Journal Article
Paul Giura1, Wei Wang1
01 Jan 2012-Science
TL;DR: This paper proposes a model of the APT detection problem as well as a methodology to implement it on a generic organization network and shows that this approach is feasible to process very large data sets and is flexible enough to accommodate any context processing algorithm, even to detect sophisticated attacks such as APT.
Abstract: Besides a large set of malware categories such as worms and Trojan horses, Advanced Persistent Threat (APT) is another more sophisticated and highly targeted attack emerging in the cyber threats environment. In this paper we propose a model of the APT detection problem as well as a methodology to implement it on a generic organization network. The method suggests to closely monitor the possible targets and to use a large scale distributed computing framework, such as MapReduce to consider all possible events and to process all the possible contexts where the attack could take place. Our results show that this approach is feasible to process very large data sets and is flexible enough to accommodate any context processing algorithm, even to detect sophisticated attacks such as APT.

54 citations