scispace - formally typeset
Search or ask a question

Showing papers presented at "Computational Science and Engineering in 2017"


Journal ArticleDOI
01 Mar 2017
TL;DR: The "end of Moore's law" as discussed by the authors has been widely recognized as a major barrier to further miniaturization of semiconductor technology. But the field effect transistor is approaching some physical limits, and the associated rising costs and reduced return on investment appear to be slowing the pace of development.
Abstract: The insights contained in Gordon Moore's now famous 1965 and 1975 papers have broadly guided the development of semiconductor electronics for over 50 years. However, the field-effect transistor is approaching some physical limits to further miniaturization, and the associated rising costs and reduced return on investment appear to be slowing the pace of development. Far from signaling an end to progress, this gradual "end of Moore's law" will open a new era in information technology as the focus of research and development shifts from miniaturization of long-established technologies to the coordinated introduction of new devices, new integration technologies, and new architectures for computing.

461 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: In this article, the authors proposed a mechanism that combines data deduplication with dynamic data operations in the privacy preserving public auditing for secure cloud storage, which is highly efficient and provably secure.
Abstract: Cloud storage service has been increasing in popularity as cloud computing plays an important role in the IT domain. Users can be relieved of the burden of storage and computation, by outsourcing the large data files to the cloud servers. However, from the cloud service providers' point of view, it is wise to utilize the data deduplication techniques to reduce the costs of running large storage system and energy consumption on cloud servers. Based on the dynamic nature of data in the cloud storage system, we not only need to assure the data integrity with an auditing protocol supporting dynamic data operations for users, but also consider resorting to data deduplication techniques in the dynamic data operations for cloud service providers to achieve the goal of reducing costs. Thus, in this paper, we propose a mechanism that combines data deduplication with dynamic data operations in the privacy preserving public auditing for secure cloud storage. The analysis of security and performance shows that the proposed mechanism is highly efficient and provably secure.

418 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper provides a comprehensive survey on what is Big Data, comparing methods, its research problems, and trends, and application of Deep Learning in Big data, its challenges, open research problems and future trends are presented.
Abstract: Big Data means extremely huge large data sets that can be analyzed to find patterns, trends. One technique that can be used for data analysis so that able to help us find abstract patterns in Big Data is Deep Learning. If we apply Deep Learning to Big Data, we can find unknown and useful patterns that were impossible so far. With the help of Deep Learning, AI is getting smart. There is a hypothesis in this regard, the more data, the more abstract knowledge. So a handy survey of Big Data, Deep Learning and its application in Big Data is necessary. In this paper, we provide a comprehensive survey on what is Big Data, comparing methods, its research problems, and trends. Then a survey of Deep Learning, its methods, comparison of frameworks, and algorithms is presented. And at last, application of Deep Learning in Big Data, its challenges, open research problems and future trends are presented.

266 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: An intrusion detection method using deep belief network (DBN) and probabilistic neural network (PNN) is proposed, which shows that the method performs better than the traditional PNN, PCA-PNN and unoptimized DBN- PNN.
Abstract: This paper focuses on the problems existing in intrusion detection using neural network, including redundant information, large amount of data, long-time training, easy to fall into the local optimal. An intrusion detection method using deep belief network (DBN) and probabilistic neural network (PNN) is proposed. First, the raw data are converted to low-dimensional data while retaining the essential attributes of the raw data by using the nonlinear learning ability of DBN. Second, to obtain the best learning performance, particle swarm optimization algorithm is used to optimize the number of hidden-layer nodes per layer. Next, PNN is used to classify the low-dimensional data. Finally, the KDD CUP 1999 dataset is employed to test the performance of the method mentioned above. The experiment result shows that the method performs better than the traditional PNN, PCA-PNN and unoptimized DBN-PNN

92 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: It is found that the host-based statistical features of network flow play an important role in predicting network intrusion and can be one of the competitive classifier for network intrusion detection.
Abstract: The network intrusion detection techniques are important to prevent our system and network from malicious behaviors. In order to improve accuracy of network intrusion detection, machine learning, feature selection and optimization methods have been used, and the result tell us that the combination of machine learning and feature selection can improve accuracy. In this study, we developed a new machine learning approach for predicting network intrusion based on random forest and support vector machine. Since there were many potential features for network intrusion classification, random forest were used for feature selection based on variable importance score. We found that the host-based statistical features of network flow play an important role in predicting network intrusion. The performance of the support vector machine which used the 14 selected features on KDD 99 dataset has been evaluated by comparing it with the total(41) features and popular classifiers. The result showed that the selected features can achieve higher attack detection rate and it can be one of the competitive classifier for network intrusion detection.

82 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: Experimental results reveal that the automatic OSA detection model provides better classification accuracy.
Abstract: This paper introduces an OSA detection method based on Recurrent Neural network. At the first step, RR interval (time interval from one R wave to the next R wave) is employed to extract the signals from Apnea- Electrocardiogram (ECG) where all extracted features are then used as an input for the designed deep model. Then an architecture having four recurrent layers and batch normalization layers are designed and trained with the extracted features for OSA detection. Apnea-ECG datasets from physionet.org are used for training and testing our model. Experimental results reveal that our automatic OSA detection model provides better classification accuracy.

67 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper exploits a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and proposes an ensemble random forest algorithm which used the parallel computing capability and memory-cache mechanism optimized by Spark.
Abstract: Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms like Logistic Regression and SVM etc. This paper exploits a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and proposes an ensemble random forest algorithm which used the parallel computing capability and memory-cache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data.

66 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: A smart system of greenhouse management based on the Internet of Things is proposed using sensor networks and web-based technologies to remotely manage the temperature, humidity and irrigation in the greenhouses.
Abstract: China is a large agricultural country with the largest population in the world. This creates a high demand for food, which is prompting the study of high quality and high-yielding crops. China's current agricultural production is sufficient to feed the nation; however, compared with developed countries agricultural farming is still lagging behind, mainly due to the fact that the system of growing agricultural crops is not based on maximizing output, the latter would include scientific sowing, irrigation and fertilization. In the past few years many seasonal fruits have been offered for sale in markets, but these crops are grown in traditional backward agricultural greenhouses and large scale changes are needed to modernize production. The reform of small-scale greenhouse agricultural production is relatively easy and could be implemented. The concept of the Agricultural Internet of Things utilizes networking technology in agricultural production, the hardware part of this agricultural IoT include temperature, humidity and light sensors and processors with a large data processing capability; these hardware devices are connected by short-distance wireless communication technology, such as Bluetooth, WIFI or Zigbee. In fact, Zigbee technology, because of its convenient networking and low power consumption, is widely used in the agricultural internet. The sensor network is combined with well-established web technology, in the form of a wireless sensor network, to remotely control and monitor data from the sensors.In this paper a smart system of greenhouse management based on the Internet of Things is proposed using sensor networks and web-based technologies. The system consists of sensor networks and asoftware control system. The sensor network consists of the master control center and various sensors using Zigbee protocols. The hardware control center communicates with a middleware system via serial network interface converters. The middleware communicates with a hardware network using an underlying interface and it also communicates with a web system using an upper interface. The top web system provides users with an interface to view and manage the hardware facilities ; administrators can thus view the status of agricultural greenhouses and issue commands to the sensors through this system in order to remotely manage the temperature, humidity and irrigation in the greenhouses. The main topics covered in this paper are:1. To research the current development of new technologies applicable to agriculture and summarizes the strong points concerning the application of the Agricultural Internet of Things both at home and abroad. Also proposed are some new methods of agricultural greenhouse management.2. An analysis of system requirements, the users’ expectations of the system and the response to needs analysis, and the overall design of the system to determine it’s architecture.3. Using software engineering to ensure that functional modules of the system, as far as possible, meet the requirements of high cohesion and low coupling between modules, also detailed design and implementation of each module is considered.

56 citations


Book ChapterDOI
08 Apr 2017
TL;DR: In this paper, a multiscale Petrov-Galerkin finite element method for time-harmonic acoustic scattering problems with heterogeneous coefficients in the high-frequency regime is presented.
Abstract: This paper presents a multiscale Petrov-Galerkin finite element method for time-harmonic acoustic scattering problems with heterogeneous coefficients in the high-frequency regime. We show that the method is pollution-free also in the case of heterogeneous media provided that the stability bound of the continuous problem grows at most polynomially with the wave number k. By generalizing classical estimates of Melenk (Ph.D. Thesis, 1995) and Hetmaniuk (Commun. Math. Sci. 5, 2007) for homogeneous medium, we show that this assumption of polynomially wave number growth holds true for a particular class of smooth heterogeneous material coefficients. Further, we present numerical examples to verify our stability estimates and implement an example in the wider class of discontinuous coefficients to show computational applicability beyond our limited class of coefficients.

42 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: The multiobjective optimization problem is defined and briefly summarized, and representative MOEAs from three categories are introduced in detail, and some of the problems and challenges in improvingMOEAs are discussed.
Abstract: Multiobjective optimization aims to simultaneously optimize two or more objectives for a problem, with multiobjective evolutionary algorithms (MOEAs) having become a popular research topic in evolutionary multiobjective optimization. We first define the multiobjective optimization problem and briefly summarize multiobjective optimization methods based on the evolutionary algorithm. Representative MOEAs from three categories are then introduced in detail, and we discuss some of the problems and challenges in improving MOEAs. Finally, future research directions for MOEAs are proposed.

41 citations


Journal ArticleDOI
17 Aug 2017
TL;DR: Completing a full replication study of the authors’ previously published findings on bluff-body aerodynamics was harder than they thought, despite them having good reproducible-research practices, such as sharing their code and data openly.
Abstract: Completing a full replication study of the authors’ previously published findings on bluff-body aerodynamics was harder than they thought, despite them having good reproducible-research practices, such as sharing their code and data openly. Here’s what they learned from three years, four computational fluid dynamics codes, and hundreds of runs.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: The integration of water and fertilizer irrigation intelligent big data system is established based on the technologies of Internet of things, big data and so on, which can predict and forecast the water requirement of crops in different growth periods and make the decision of automatic irrigation and fertilization.
Abstract: With the advent of emerging technologies such as the Internet of things and big data, the pace of transformation from traditional agriculture to modern agriculture will continue to be accelerated. Given that traditional agriculture exists many problems currently, such as low utilization of irrigation water and backward in management level, the integration of water and fertilizer irrigation intelligent big data system is established based on the technologies of Internet of things, big data and so on. The system uses the Internet of things and some other technologies to real-timely monitor and automatically collect the data related to the growth of crops in the fields and then upload them to Shandong Agricultural University big data central target database. The center of big data intelligently stores, screens, calibrates, mines and extracts the monitoring data to establish the crop growth model based on big data, which can predict and forecast the water requirement of crops in different growth periods and make the decision of automatic irrigation and fertilization, finally realize timely and proper irrigation of crops.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A new malware detection and classification method based on n-grams attribute similarity that outperforms a variety of machine learning methods, including Naïve Bayes, Bayesian Networks, Support Vector Machine and C4.5 Decision Tree.
Abstract: Unknown malware has increased dramatically, but the existing security software cannot identify them effectively. In this paper, we propose a new malware detection and classification method based on n-grams attribute similarity. We extract all n-grams of byte codes from training samples and select the most relevant as attributes. After calculating the average value of attributes in malware and benign separately, we determine a test sample is malware or benign by attribute similarity between attributes of the test sample and the two average attributes of malware and benign. We compare our method with a variety of machine learning methods, including Naive Bayes, Bayesian Networks, Support Vector Machine and C4.5 Decision Tree. Experimental results on public (Open Malware Benchmark) and private (self-collected) datasets both reveal that our method outperforms the other four methods.

Journal ArticleDOI
25 May 2017
TL;DR: In this article, the authors examine Linux container technology for the distribution of a nontrivial scientific computing software stack and its execution on a spectrum of platforms from laptop computers through high-performance computing systems.
Abstract: Containers are an emerging technology that holds promise for improving productivity and code portability in scientific computing. The authors examine Linux container technology for the distribution of a nontrivial scientific computing software stack and its execution on a spectrum of platforms from laptop computers through high-performance computing systems. For Python code run on large parallel computers, the runtime is reduced inside a container due to faster library imports. The software distribution approach and data that the authors present will help developers and users decide on whether container technology is appropriate for them. The article also provides guidance for vendors of HPC systems that rely on proprietary libraries for performance on what they can do to make containers work seamlessly and without performance penalty.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: The AC-Apriori algorithm reduces the times scanning the transaction database while preserving the full mining effect, which reduces the runtime and improves the mining efficiency compared with the Apriori algorithms.
Abstract: Apriori algorithm is a classic mining algorithm which can mining association rules and sequential patterns. However, when the Apriori algorithm is applied to contiguous sequential pattern mining, it is inefficient. In web log mining, the contiguous sequential pattern can better represent the semantic information of the user's access to the site due to the continuity of the user's visit to the site page. Contiguous sequential pattern can be used not only to predict the user's next access request, but also to improve the site topology structure and set the advertising page. The Apriori algorithm is used to generate a large number of candidates when mining contiguous sequence patterns, and to scan the transaction database frequently. In this paper, we present an improved algorithm that we called AC-Apriori algorithm based on the Apriori algorithm. The AC-Apriori algorithm reduces the times scanning the transaction database while preserving the full mining effect, which reduces the runtime and improves the mining efficiency compared with the Apriori algorithm.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper uses Java language to implement a movie recommendation system in Ubuntu system benefiting from the MapReduce framework and the recommendation algorithm based on items, which can achieve high efficiency and reliability in large data sets.
Abstract: Collaborative filtering algorithm is widely used in the recommendation system of e-commerce website, which is based on the analysis of a large number of user’s historical behavior data, so as to explore the user’s interest and recommend the appropriate products to users. In this paper, we focus on how to design a reliable and highly accurate algorithm for movie recommendation. It is worth noting that the algorithm is not limited to film recommendation, but can be applied in many other areas of e-commerce. In this paper, we use Java language to implement a movie recommendation system in Ubuntu system. Benefiting from the MapReduce framework and the recommendation algorithm based on items, the system can handle large data sets. The experimental results show that the system can achieve high efficiency and reliability in large data sets.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper proposes a user authentication scheme based on improved challenge-response mechanism to resist replay attack in which an efficient mutual user authentication and a secure session key agreement are achieved.
Abstract: The Internet of Things (IoT) is the current technological revolution which can upgrade the current Internet environment into a more pervasive and ubiquitous world. Due to the distributed nature and the limited hardware capabilities of IoT, a lightweight authentication scheme is necessary. A certain number of lightweight user authentication schemes suit to the IoT environment had been proposed in recent years. However, these schemes mostly use the timestamp mechanism to resist the replay attack which is infeasible in IoT environment. In this paper, we propose a user authentication scheme based on improved challenge-response mechanism to resist replay attack in which an efficient mutual user authentication and a secure session key agreement are achieved. The performance and security analysis shows that our proposed scheme has a higher security level and is still highly efficient.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper shows how to mitigate the problems through sentiment structure and the sentiment calculation rules through the dependency parsing with the relationship migration and modified distance, which makes a good contribution to understanding the sentiment of short text.
Abstract: Traditional approaches to analysis the sentiment of short text don't consider the relationship between emotion words and modifiers and simply accumulate the sentiment of the sentence to obtain the sentiment of short text. In this paper, we show how to mitigate the problems through sentiment structure and the sentiment calculation rules. The sentiment structure is obtained from the dependency parsing with the relationship migration and modified distance, which makes a good contribution to understanding the sentiment of short text. The sentiment of short text is accumulated according to the different influence of relationships between the modifier and the emotion word and the contribution of each sentence to the sentiment calculation of short text. Experiment results validate the effective of the approach.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: Based on the technology of radio frequency identification (RFID), combined with Internet and information processing technology, intelligent refrigerator for food management is developed, food records the within the refrigerator is achieved.
Abstract: Based on the technology of radio frequency identification (RFID), combined with Internet and information processing technology, intelligent refrigerator for food management is developed, food records the within the refrigerator is achieved. These data will be automatically uploaded to the production of intelligent refrigerator cloud data service platform, easy to be viewed by the user. And according to the food inside the refrigerator it can provide recipes. Wireless communication module is used to achieve the reception of Internet information. A week before shelf life of the food, a warning will be sent to the user. When the food over the shelf life, there will be usage warning to the user. When the food is reduced or not available through the information on the Internet, different businesses will be compared to choose the best food. At the same time the wireless communication module can also be used to connect intelligent terminal and the refrigerator, the situation inside the refrigerator is checked.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: An improved semi-supervised algorithm is proposed to train the URL multi-classification model and it is verified by using the data set with different labeling rate, which shows the improved algorithm has good classification performance compared to the previous algorithm.
Abstract: Web attacks are increasing and the scale of malicious URL continues to expand with the rapid development of the Internet, so that the network security situation is increasingly grim. In this case, this paper studies the URL multi-classification problem, which is a continuation of the reference [1] and follows the data sets and most of feature selection methods in it. Firstly, different types of URL structure is analyzed and features which have obvious directivity to the attack type are increased based on the original features. Secondly, an improved semi-supervised algorithm is proposed to train the URL multi-classification model. Finally, the effect of the multi-classification model is verified by using the data set with different labeling rate, which shows the improved algorithm has good classification performance compared to the previous algorithm.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A service discovery algorithm via multi-stage semantic service matching algorithm adopts the method of layer filters, considering the various constraint parameters of IoT services, such as service category, input/output (IO), precondition/effect and quality of experience (QoE).
Abstract: In recent years, the number of services in Internet of Things (IoT) has increased rapidly, and service discovery in IoT has become more difficult in large-scale registration. In the traditional matching method, in order to get a better match results, all the matching parameters for services had to be calculated together, thus it would waste a lot of computing resource and time. This paper presents a service discovery algorithm via multi-stage semantic service matching algorithm. It adopts the method of layer filters, considering the various constraint parameters of IoT services, such as service category, input/output (IO), precondition/effect (PE) and quality of experience (QoE). It can obtain the proper matching results in a more efficient way. Firstly, we use IoT service description language OWL-Siot to describe IoT services and request uniformly. Then, we propose a four-layer structure model for service discovery, namely interactive interface layer, parsing annotation layer, service matching layer and data semantic layer. We also propose a hybrid service matching degree measurement by synthetically calculating the concept logic and semantic similarity for each layer separately. Experimental results show that the method can effectively improve the performance of service discovery.

Book ChapterDOI
06 Feb 2017
TL;DR: This paper designs robust and efficient block preconditioners for the two-field formulation of Biot's consolidation model, where stabilized finite-element discretizations are used.
Abstract: In this paper, we design robust and efficient block preconditioners for the two-field formulation of Biot’s consolidation model, where stabilized finite-element discretizations are used. The proposed block preconditioners are based on the well-posedness of the discrete linear systems. Block diagonal (norm-equivalent) and block triangular preconditioners are developed, and we prove that these methods are robust with respect to both physical and discretization parameters. Numerical results are presented to support the theoretical results.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A mutual authentication protocol based on ECC is designed for RFID systems which can resist camouflage attacks, tracking attacks, denial of service attacks, system internal attack and so on.
Abstract: In this paper, a mutual authentication protocol based on ECC is designed for RFID systems. This protocol is described in detail and the performance of this protocol is analyzed. The results show that the protocol has many advantages, such as mutual authentication, confidentiality, anonymity, availability, forward security, scalability and so on, which can resist camouflage attacks, tracking attacks, denial of service attacks, system internal attack.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper introduces a password guessing method based on Long Short-Term Memory recurrent neural networks that shows higher coverage rate and Compared with PCFG and Markov methods, this method shows higher Coverage rate.
Abstract: Passwords are frequently used in data encryption and user authentication. Since people incline to choose meaningful words or numbers as their passwords, lots of passwords are easy to guess. This paper introduces a password guessing method based on Long Short-Term Memory recurrent neural networks. After training our LSTM neural network with 30 million passwords from leaked Rockyou dataset, the generated 3.35 billion passwords could cover 81.52% of the remaining Rockyou dataset. Compared with PCFG and Markov methods, this method shows higher coverage rate.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: The experimental results show that this anomaly detection system of user behavior using Discrete-time Markov Chains can detect normal and abnormal user behavior precisely and effectively.
Abstract: Aiming at the problem of internal attackers of database system, anomaly detection method of user behaviour is used to detect the internal attackers of database system. With using Discrete-time Markov Chains (DTMC), an anomaly detection system of user behavior is proposed, which can detect the internal threats of database system. First, we make an analysis on SQL queries, which are user behavior features. Then, we use DTMC model extract behavior features of a normal user and the detected user and make a comparison between them. If the deviation of features is beyond threshold, the detected user behavior is judged as an anomaly behavior. The experiments are used to test the feasibility of the detction system. The experimental results show that this detction system can detect normal and abnormal user behavior precisely and effectively.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: With the rapid development of wireless sensor network application technology, people put forward higher requirements for the quality of water environment, Wireless sensor network can be used in water environment for real-time monitoring of water quality.
Abstract: The quality of water resources has a direct impact on the daily life of mankind and the sustainable development of society. However, with the rapid development of national industrialization, the current industrial wastewater discharge and improper handling have become more and more serious, especially for the growing domestic water pollution today, it is an urgent need for an efficient water quality monitoring system. With the rapid development of wireless sensor network application technology, people put forward higher requirements for the quality of water environment, wireless sensor network can be used in water environment for real-time monitoring of water quality.

Journal ArticleDOI
03 May 2017
TL;DR: The authors present their education research-practice partnership, initial findings, and highlights of a collaborative process that has furthered their work to support more equitable learning in CS.
Abstract: The computer science (CS) education field is engaging in unprecedented efforts to expand learning opportunities in K-12 CS education, but one group of students is often overlooked: those with specific learning disabilities and related attention deficit disorders. As CS education initiatives grow, K-12 teachers need research-informed guidance to make computing more accessible for students who learn differently. This article reports on the first phase of a National Science Foundation-supported exploratory research study to address this problem. The authors present their education research-practice partnership, initial findings, and highlights of a collaborative process that has furthered their work to support more equitable learning in CS.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper defined a unified architecture of IoT system, on which IoT node model, virtual things, the basic service of things and overall hierarchical model of services had been described.
Abstract: This paper defined a unified architecture of IoT system, on which IoT node model, virtual things, the basic service of things and overall hierarchical model of services had been described. IoT nodes must be connected directly to Internet and provide basic services of things in this architecture, and started from the base services which were compatible with Internet applications system of SOA, built the IoT application system with the middle layer of the Internet-based services and the base services of things, in order to implement a complete IoT system. this study also provides a practical application for monitoring water resources in the IoT way.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: An optimized algorithm is introduced, it considers the size of small files when merging files into combine file, and generates map record for each small file and reduces the NameNode's memory and access time consumption, thus it can achieve better performance.
Abstract: Nowadays, the most popular way of data storage is distributed storage and the most widespread cloud storage platform is HDFS. It successfully used by many notable companies since its excellent capability. Unfortunately, the original design of HDFS was to handle large files, when dealing with enormous quantity of small files, the situation is not very optimistic. To solve this problem, an optimized algorithm is introduced, it considers the size of small files when merging files into combine file, and generates map record for each small file. Meanwhile, we apply prefetching and caching mechanism to enhance the access efficiency. The experimental results show that the optimized strategy reduces the NameNode's memory and access time consumption, thus it can achieve better performance.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This measure combines Structural Hole and Degree Centrality to measure the node influence and uses Structure Hole to reflect the impact of topological connections among neighbor nodes, which improves the ability to distinguish the influence of nodes in the low time complexity.
Abstract: The analysis of node influence plays important role in product marketing, public opinion analysis, disease transmission and other fields. Researchers have proposed a variety of methods to measure node influence, with the rapid expansion of the scale of social networks, Degree Centrality algorithm attracts much attention for its lowest time complexity, however, its result is not sufficiently accurate because it considers only the local node information and not reflects the impact of topological connections among neighbor nodes. To solve this problem, we proposed a novel measure based on Structural Holes and Degree Centrality(SHDC). Our measure combines Structural Hole and Degree Centrality to measure the node influence. It uses Degree Centrality to make a fast and coarse distinction between the influence of nodes and uses Structure Hole to reflect the impact of topological connections among neighbor nodes, which improves the ability to distinguish the influence of nodes in the low time complexity. Experimental results show that the SHDC algorithm can more accurately measure the influence of nodes than Degree Centrality and Structural Hole and it has stronger applicability.