scispace - formally typeset
Search or ask a question

Showing papers by "Hai Yu published in 2020"


Proceedings ArticleDOI
27 Jun 2020
TL;DR: Watchman is designed and implemented, a technique to continuously monitor dependency conflicts for the PyPI ecosystem, and found several key factors that can lead to DC issues and their regressions.
Abstract: The PyPI ecosystem has indexed millions of Python libraries to allow developers to automatically download and install dependencies of their projects based on the specified version constraints. Despite the convenience brought by automation, version constraints in Python projects can easily conflict, resulting in build failures. We refer to such conflicts as Dependency Conflict (DC) issues. Although DC issues are common in Python projects, developers lack tool support to gain a comprehensive knowledge for diagnosing the root causes of these issues. In this paper, we conducted an empirical study on 235 real-world DC issues. We studied the manifestation patterns and fixing strategies of these issues and found several key factors that can lead to DC issues and their regressions. Based on our findings, we designed and implemented Watchman, a technique to continuously monitor dependency conflicts for the PyPI ecosystem. In our evaluation, Watchman analyzed PyPI snapshots between 11 Jul 2019 and 16 Aug 2019, and found 117 potential DC issues. We reported these issues to the developers of the corresponding projects. So far, 63 issues have been confirmed, 38 of which have been quickly fixed by applying our suggested patches.

32 citations


Journal ArticleDOI
TL;DR: This paper proposes a new strategy with two distinct criteria for global and local refinement of the boundary pixels, based on the watershed transformation, which reduces the compromise between the boundary adherence and compactness.
Abstract: Superpixels are widely used in computer vision applications, as they conserve the running costs of subsequent processing while preserving the original performance. In most of the existing algorithms, the boundary adherence and the compactness of superpixels are necessarily inter-inhibitive because the color/gradient information is balanced against the position constraints, and the set criteria define all pixels indiscriminately. In this paper, we present a two-phase superpixel segmentation method based on the watershed transformation. After designing a new approach for calculating the flooding priority, we propose a new strategy with two distinct criteria for global and local refinement of the boundary pixels. These criteria reduce the compromise between the boundary adherence and compactness. Unlike the indiscriminate standards, our method applies different treatments to pixels in different environments, preserving the color homogeneity in content-rich areas while improving the regularity of the superpixels in content-plain regions. The superior accuracy and computing time of our proposed method are verified in comparison experiments with several state-of-the-art methods.

13 citations


Journal ArticleDOI
Zhiliang Zhu1, Yanjie Song1, Wei Zhang1, Hai Yu1, Yuli Zhao1 
TL;DR: A novel CS-based compression-encryption framework (CS-CEF) using the intrinsic property of CS to provide a strong plaintext sensitivity for the compression- Encryption scheme, which takes a low additional computation cost.
Abstract: In this paper, we find that compressive sensing (CS) with the chaotic measurement matrix has a strong sensitivity to plaintext. Because of the quantification executed after CS, however, the plaintext sensitivity produced by CS may be weakened greatly. Thus, we propose a novel CS-based compression-encryption framework (CS-CEF) using the intrinsic property of CS to provide a strong plaintext sensitivity for the compression-encryption scheme, which takes a low additional computation cost. Meanwhile, a simple and efficient Substitution box (S-box) construction algorithm (SbCA) based on chaos is designed. Compared with the existing S-box construction methods, the simulation results prove that the proposed S-box has stronger cryptographic characteristics. Based on the above works, we develop an efficient and secure image compression-encryption scheme using S-box (CSb-CES) under the proposed CS-CEF. The simulations and security analysis illustrate that the proposed CSb-CES has the higher efficiency and security compared with the several state-of-the-art CS-based compression-encryption schemes.

9 citations


Journal ArticleDOI
Wei Zhang1, Shuwen Wang1, Weijie Han1, Hai Yu1, Zhiliang Zhu1 
06 Jan 2020-Entropy
TL;DR: A method to generate random Hamiltonian path within digital images, which is equivalent to permutation in image encryption, is designed and an adjusted Bernoulli map is proposed to ensure the randomness of the generated Hamiltonian paths.
Abstract: In graph theory, Hamiltonian path refers to the path that visits each vertex exactly once. In this paper, we designed a method to generate random Hamiltonian path within digital images, which is equivalent to permutation in image encryption. By these means, building a Hamiltonian path across bit planes can shuffle the distribution of the pixel’s bits. Furthermore, a similar thought can be applied for the substitution of pixel’s grey levels. To ensure the randomness of the generated Hamiltonian path, an adjusted Bernoulli map is proposed. By adopting these novel techniques, a bit-level image encryption scheme was devised. Evaluation of simulation results proves that the proposed scheme reached fair performance. In addition, a common flaw in calculating correlation coefficients of adjacent pixels was pinpointed by us. After enhancement, correlation coefficient becomes a stricter criterion for image encryption algorithms.

9 citations


Journal ArticleDOI
TL;DR: This work uses the theory of the naming game to conduct observations on the problem of multiple source localization in the context of information propagation in social networks and proposes a method that can locate sources without knowing the number of information sources.

5 citations


Journal ArticleDOI
TL;DR: The design of a dynamic deployment method that reduces considerably the number of observations and the time needed to locate the source of information and calculates the probability of each node that acts as a source based on the information provided by observations is introduced.
Abstract: We study herein the problem of the location of the information propagation source in social networks based on the network topology and a set of observations. We propose a concise and novel method to accurately locate the source of information using naming game theory. This study introduces the design of a dynamic deployment method that reduces considerably the number of observations and the time needed to locate the source. Moreover, it calculates the probability of each node that acts as a source based on the information provided by observations. This method can be potentially applied to various information propagation models. The simulation results reveal that the method is able to estimate the information source within a small number of hops from the true source.

3 citations


Proceedings ArticleDOI
Kai Shi1, Chenni Wu1, Yuechen Wang1, Hai Yu1, Zhiliang Zhu1 
23 Oct 2020
TL;DR: This paper proposes a wind turbine condition monitoring method based on variable importance of random forest by utilizing the SCADA data, and applies the proposed method to four real cases from wind farms in China.
Abstract: SCADA data lacks sensory data such as vibration and strain measurement for traditional wind turbine condition monitoring; it is updates in low frequency, one piece of data per 10 minutes in the main, which is also low for failure prediction. Thus it is a tough work to monitoring wind turbines' working condition based on SCADA data. To this end, this paper proposes a wind turbine condition monitoring method based on variable importance of random forest by utilizing the SCADA data. First, to minimize the misjudgment caused by individual outliers, we divide the SCADA time series into segments in unit of time period T. Second, we use decrease accuracy method to calculate the variable importance of random forest, as the feature vector of each segment, which characterizes a turbine's condition. Third, we compare a specific turbine's variable importance with the standard feature of healthy turbines to obtain the proximity of them. Fourth, the monitoring baseline is determined according to 3σ, and the deterioration function is applied to construct the failure probability model. To show the effectiveness, we apply the proposed method to four real cases from wind farms in China.

2 citations


Posted Content
TL;DR: In this article, the authors propose an automated testing technique Sensor, which synthesizes test cases using ingredients from the project under test to trigger inconsistent behaviors of the APIs with the same signatures in conflicting library versions.
Abstract: Java projects are often built on top of various third-party libraries. If multiple versions of a library exist on the classpath, JVM will only load one version and shadow the others, which we refer to as dependency conflicts. This would give rise to semantic conflict (SC) issues, if the library APIs referenced by a project have identical method signatures but inconsistent semantics across the loaded and shadowed versions of libraries. SC issues are difficult for developers to diagnose in practice, since understanding them typically requires domain knowledge. Although adapting the existing test generation technique for dependency conflict issues, Riddle, to detect SC issues is feasible, its effectiveness is greatly compromised. This is mainly because Riddle randomly generates test inputs, while the SC issues typically require specific arguments in the tests to be exposed. To address that, we conducted an empirical study of 75 real SC issues to understand the characteristics of such specific arguments in the test cases that can capture the SC issues. Inspired by our empirical findings, we propose an automated testing technique Sensor, which synthesizes test cases using ingredients from the project under test to trigger inconsistent behaviors of the APIs with the same signatures in conflicting library versions. Our evaluation results show that \textsc{Sensor} is effective and useful: it achieved a $Precision$ of 0.803 and a $Recall$ of 0.760 on open-source projects and a $Precision$ of 0.821 on industrial projects; it detected 150 semantic conflict issues in 29 projects, 81.8\% of which had been confirmed as real bugs.