scispace - formally typeset
Search or ask a question
Author

Ying Zou

Bio: Ying Zou is an academic researcher from Queen's University. The author has contributed to research in topics: Web service & Business process modeling. The author has an hindex of 38, co-authored 170 publications receiving 3973 citations. Previous affiliations of Ying Zou include University of Waterloo & IBM.


Papers
More filters
Proceedings ArticleDOI
14 May 2016
TL;DR: In the cross-project setting, the proposed connectivity-based unsupervised classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers and five un supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree), while only random forest ranks in the first tier.
Abstract: Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research). An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community. We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.

219 citations

Proceedings ArticleDOI
31 May 2014
TL;DR: The time complexity of the proposed approach for spotting working code examples is as low as the complexity of existing code search engines on the Internet and considerably lower than the pattern-based approaches supporting free-form queries.
Abstract: Working code examples are useful resources for pragmatic reuse in software development. A working code example provides a solution to a specific programming problem. Earlier studies have shown that existing code search engines are not successful in finding working code examples. They fail in ranking high quality code examples at the top of the result set. To address this shortcoming, a variety of pattern-based solutions are proposed in the literature. However, these solutions cannot be integrated seamlessly in Internet-scale source code engines due to their high time complexity or query language restrictions. In this paper, we propose an approach for spotting working code examples which can be adopted by Internet-scale source code search engines. The time complexity of our approach is as low as the complexity of existing code search engines on the Internet and considerably lower than the pattern-based approaches supporting free-form queries. We study the performance of our approach using a representative corpus of 25,000 open source Java projects. Our findings support the feasibility of our approach for Internet-scale code search. We also found that our approach outperforms Ohloh Code search engine, previously known as Koders, in spotting working code examples.

155 citations

Patent
Jennifer L. Hawkins1, Ross McKegney1, Tack Tong1, Qi Zhang1, Ying Zou1 
02 Oct 2006
TL;DR: In this article, the authors propose a role-based personalized business user workplace based on business processes. And they assign a user role for access to the business items, based on the context of the workflow and use by particular business item instances of the business item.
Abstract: Methods for generating auxiliary data operations for a role-based personalized business user workplace based on business processes includes analyzing a work-low of a business process to specify business items as an input or output of a task in the business process; identifying data operations for each one of the business items by examining associated attributes and usage of the business item; categorizing the data operations by associating common data operations to the business items, and attaching specific data operations based on the context of the workflow and use by particular business item instances of the business item; and assigning a user role for access to the business items.

153 citations

Proceedings ArticleDOI
31 May 2014
TL;DR: The results suggest that a universal defect prediction model may be an achievable goal, and propose context-aware rank transformations for predictors that improve the predictive power.
Abstract: To predict files with defects, a suitable prediction model must be built for a software project from either itself (within-project) or other projects (cross-project). A universal defect prediction model that is built from the entire set of diverse projects would relieve the need for building models for an individual project. A universal model could also be interpreted as a basic relationship between software metrics and defects. However, the variations in the distribution of predictors pose a formidable obstacle to build a universal model. Such variations exist among projects with different context factors (e.g., size and programming language). To overcome this challenge, we propose context-aware rank transformations for predictors. We cluster projects based on the similarity of the distribution of 26 predictors, and derive the rank transformations using quantiles of predictors for a cluster. We then fit the universal model on the transformed data of 1,398 open source projects hosted on SourceForge and GoogleCode. Adding context factors to the universal model improves the predictive power. The universal model obtains prediction performance comparable to the within-project models and yields similar results when applied on five external projects (one Apache and four Eclipse projects). These results suggest that a universal defect prediction model may be an achievable goal.

140 citations

Proceedings ArticleDOI
02 Jun 2012
TL;DR: It is found that with shorter release cycles, users do not experience significantly more post-release bugs and bugs are fixed faster, yet users experience these bugs earlier during software execution (the program crashes earlier).
Abstract: Nowadays, many software companies are shifting from the traditional 18-month release cycle to shorter release cycles. For example, Google Chrome and Mozilla Firefox release new versions every 6 weeks. These shorter release cycles reduce the users' waiting time for a new release and offer better marketing opportunities to companies, but it is unclear if the quality of the software product improves as well, since shorter release cycles result in shorter testing periods. In this paper, we empirically study the development process of Mozilla Firefox in 2010 and 2011, a period during which the project transitioned to a shorter release cycle. We compare crash rates, median uptime, and the proportion of post-release bugs of the versions that had a shorter release cycle with those having a traditional release cycle, to assess the relation between release cycle length and the software quality observed by the end user. We found that (1) with shorter release cycles, users do not experience significantly more post-release bugs and (2) bugs are fixed faster, yet (3) users experience these bugs earlier during software execution (the program crashes earlier).

133 citations


Cited by
More filters
01 Jan 2002

9,314 citations

01 Jan 2012

3,692 citations

Journal ArticleDOI
TL;DR: In this article, applied linear regression models are used for linear regression in the context of quality control in quality control systems, and the results show that linear regression is effective in many applications.
Abstract: (1991). Applied Linear Regression Models. Journal of Quality Technology: Vol. 23, No. 1, pp. 76-77.

1,811 citations

Patent
14 Jun 2016
TL;DR: Newness and distinctiveness is claimed in the features of ornamentation as shown inside the broken line circle in the accompanying representation as discussed by the authors, which is the basis for the representation presented in this paper.
Abstract: Newness and distinctiveness is claimed in the features of ornamentation as shown inside the broken line circle in the accompanying representation.

1,500 citations