scispace - formally typeset
Search or ask a question
Author

Ishtiaque Hussain

Bio: Ishtiaque Hussain is an academic researcher from Pennsylvania State University. The author has contributed to research in topics: Computer science & Benchmark (computing). The author has an hindex of 4, co-authored 9 publications receiving 98 citations. Previous affiliations of Ishtiaque Hussain include Penn State Abington & University of Texas at Arlington.

Papers
More filters
Proceedings ArticleDOI
11 Nov 2012
TL;DR: The results indicate with strong statistical significance that when execution time is measured in terms of the number of runs of the application on different input test data, CarFast outperforms the evaluated competitive approaches on most subject applications.
Abstract: Test coverage is an important metric of software quality, since it indicates thoroughness of testing. In industry, test coverage is often measured as statement coverage. A fundamental problem of software testing is how to achieve higher statement coverage faster, and it is a difficult problem since it requires testers to cleverly find input data that can steer execution sooner toward sections of application code that contain more statements.We created a novel fully automatic approach for aChieving higher stAtement coveRage FASTer (CarFast), which we implemented and evaluated on twelve generated Java applications whose sizes range from 300 LOC to one million LOC. We compared CarFast with several popular test case generation techniques, including pure random, adaptive random, and Directed Automated Random Testing (DART). Our results indicate with strong statistical significance that when execution time is measured in terms of the number of runs of the application on different input test data, CarFast outperforms the evaluated competitive approaches on most subject applications.

65 citations

Journal ArticleDOI
TL;DR: This work proposes a novel approach for generating random benchmarks for evaluating program analysis and testing tools and compilers that uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs.
Abstract: Benchmarks are heavily used in different areas of computer science to evaluate algorithms and tools. In program analysis and testing, open-source and commercial programs are routinely used as benchmarks to evaluate different aspects of algorithms and tools. Unfortunately, many of these programs are written by programmers who introduce different biases, not to mention that it is very difficult to find programs that can serve as benchmarks with high reproducibility of results. We propose a novel approach for generating random benchmarks for evaluating program analysis and testing tools and compilers. Our approach uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated programs. We implemented our tool for Java and applied it to generate a set of large benchmark programs of up to 5Mlines of code each with which we evaluated different program analysis and testing tools and compilers. The generated benchmarks let us independently rediscover several issues in the evaluated tools. Copyright © 2014 John Wiley & Sons, Ltd.

13 citations

Proceedings ArticleDOI
01 May 2010
TL;DR: It is motivated how dynamic symbolic techniques enable generic repair to support a wider range of correctness conditions and DSDSR, a novel repair algorithm based on dynamic symbolic execution is presented.
Abstract: Generic repair of complex data structures is a new and exciting area of research. Existing approaches can integrate with good software engineering practices such as program assertions. But in practice there is a wide variety of assertions and not all of them satisfy the style rules imposed by existing repair techniques. I.e., a "badly" written assertion may render generic repair inefficient or ineffective. In this paper we build on the state of the art in generic repair and discuss how generic repair can work effectively with a wider range of correctness conditions. We motivate how dynamic symbolic techniques enable generic repair to support a wider range of correctness conditions and present DSDSR, a novel repair algorithm based on dynamic symbolic execution. We implement the algorithm for Java and report initial empirical results to demonstrate the promise of our approach for generic repair.

13 citations

Proceedings ArticleDOI
15 Jul 2012
TL;DR: This work proposes a novel approach for generating random benchmarks for evaluating program analysis and testing tools that uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated benchmarks.
Abstract: Benchmarks are heavily used in different areas of computer science to evaluate algorithms and tools. In program analysis and testing, open-source and commercial programs are routinely used as bench- marks to evaluate different aspects of algorithms and tools. Unfor- tunately, many of these programs are written by programmers who introduce different biases, not to mention that it is very difficult to find programs that can serve as benchmarks with high reproducibil- ity of results. We propose a novel approach for generating random benchmarks for evaluating program analysis and testing tools. Our approach uses stochastic parse trees, where language grammar production rules are assigned probabilities that specify the frequencies with which instantiations of these rules will appear in the generated pro- grams. We implemented our tool for Java and applied it to generate benchmarks with which we evaluated different program analysis and testing tools. Our tool was also implemented by a major soft- ware company for C++ and used by a team of developers to gener- ate benchmarks that enabled them to reproduce a bug in less than four hours.

9 citations

Proceedings ArticleDOI
27 Jun 2020
TL;DR: The paper captured and analyzed more than 9,000 students' comments for over 300 CS instructors for the top 20 universities in the U.S. and Canada and mined and analyzed the RMP data for some research questions, e.g., What are the common characteristics of the popular CS instructors?
Abstract: The employment opportunity for Computer Science (CS), Information Technology and Software Engineering and Development (SE) related occupations is projected to grow much faster than the average of all other occupations. Therefore, increase in student enrollment, retention and graduation rate is becoming very important, so is the need for effective teaching in these subjects. Many universities commonly use formal, institutional Student Evaluation of Teaching (SET) systems to measure the teaching effectiveness. After each semester, through SET, students provide feedback and comments for their courses and instructors. However, evaluations are private and only a handful people have access to these. Therefore, these evaluations cannot be utilized to create a common understanding of the students' expectations, perspective, desired characteristics of the courses and instructors. On the other hand, third party online platforms like RateMyProfessor.com (RMP) are public, solicit anonymous student feedback and host tremendous amount of data about the instructors and their courses. These platforms are also popular among students. We mined and analyzed the RMP data for some research questions, e.g.: What are the common characteristics of the popular CS instructors? How different are they for the SE instructors? Are there any examples of special characteristics, tools and techniques popular CS instructors use? We captured and analyzed more than 9,000 students' comments for over 300 CS instructors for the top 20 universities in the U.S. and Canada. The paper contributes by presenting the findings for the research questions and making the data and the scripts available for public use for future research.

6 citations


Cited by
More filters
Proceedings ArticleDOI
21 Aug 2017
TL;DR: Themis as discussed by the authors is a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior, and generates efficient test suites to measure discrimination.
Abstract: This paper defines software fairness and discrimination and develops a testing-based method for measuring if and how much software discriminates, focusing on causality in discriminatory behavior. Evidence of software discrimination has been found in modern software systems that recommend criminal sentences, grant access to financial products, and determine who is allowed to participate in promotions. Our approach, Themis, generates efficient test suites to measure discrimination. Given a schema describing valid system inputs, Themis generates discrimination tests automatically and does not require an oracle. We evaluate Themis on 20 software systems, 12 of which come from prior work with explicit focus on avoiding discrimination. We find that (1) Themis is effective at discovering software discrimination, (2) state-of-the-art techniques for removing discrimination from algorithms fail in many situations, at times discriminating against as much as 98% of an input subdomain, (3) Themis optimizations are effective at producing efficient test suites for measuring discrimination, and (4) Themis is more efficient on systems that exhibit more discrimination. We thus demonstrate that fairness testing is a critical aspect of the software development cycle in domains with possible discrimination and provide initial tools for measuring software discrimination.

321 citations

Proceedings ArticleDOI
09 Nov 2015
TL;DR: Three state-of-the-art unit test generation tools for Java (Randoop, EvoSuite, and Agitar) are applied to the 357 real faults in the Defects4J dataset and investigated how well the generated test suites perform at detecting these faults.
Abstract: Rather than tediously writing unit tests manually, tools can be used to generate them automatically --- sometimes even resulting in higher code coverage than manual testing. But how good are these tests at actually finding faults? To answer this question, we applied three state-of-the-art unit test generation tools for Java (Randoop, EvoSuite, and Agitar) to the 357 real faults in the Defects4J dataset and investigated how well the generated test suites perform at detecting these faults. Although the automatically generated test suites detected 55.7% of the faults overall, only 19.9% of all the individual test suites detected a fault. By studying the effectiveness and problems of the individual tools and the tests they generate, we derive insights to support the development of automated unit test generators that achieve a higher fault detection rate. These insights include 1) improving the obtained code coverage so that faulty statements are executed in the first instance, 2) improving the propagation of faulty program states to an observable output, coupled with the generation of more sensitive assertions, and 3) improving the simulation of the execution environment to detect faults that are dependent on external factors such as date and time.

193 citations

Journal ArticleDOI
TL;DR: The study confirms that VOSUITE can achieve good levels of branch coverage in practice, and exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects.
Abstract: Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated scientific advance in automated unit test generation with the EVOSUITE tool by evaluating it on manually selected open-source projects or examples that represent a particular problem addressed by the underlying technique. However, demonstrating scientific advance is not necessarily the same as demonstrating practical value; even if VOSUITE worked well on the software projects we selected for evaluation, it might not scale up to the complexity of real systems. Ideally, one would use large “real-world” software systems to minimize the threats to external validity when evaluating research tools. However, neither choosing such software systems nor applying research prototypes to them are trivial tasks. In this article we present the results of a large experiment in unit test generation using the VOSUITE tool on 100 randomly chosen open-source projects, the 10 most popular open-source projects according to the SourceForge Web site, seven industrial projects, and 11 automatically generated software projects. The study confirms that VOSUITE can achieve good levels of branch coverage (on average, 71p per class) in practice. However, the study also exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects. Furthermore, our experiments demonstrate how practical limitations interfere with scientific advances, branch coverage on an unbiased sample is affected by predominant environmental dependencies. The surprisingly large effect of such practical engineering problems in unit testing will hopefully lead to a larger appreciation of work in this area, thus supporting transfer of knowledge from software testing research to practice.

176 citations

Proceedings ArticleDOI
18 Mar 2013
TL;DR: R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically and could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.
Abstract: Many bugs, even those that are known and documented in bug reports, remain in mature software for a long time due to the lack of the development resources to fix them. We propose a general approach, R2Fix, to automatically generate bug-fixing patches from free-form bug reports. R2Fix combines past fix patterns, machine learning techniques, and semantic patch generation techniques to fix bugs automatically. We evaluate R2Fix on three projects, i.e., the Linux kernel, Mozilla, and Apache, for three important types of bugs: buffer overflows, null pointer bugs, and memory leaks. R2Fix generates 57 patches correctly, 5 of which are new patches for bugs that have not been fixed by developers yet. We reported all 5 new patches to the developers; 4 have already been accepted and committed to the code repositories. The 57 correct patches generated by R2Fix could have shortened and saved up to an average of 63 days of bug diagnosis and patch generation time.

104 citations

Proceedings ArticleDOI
18 May 2013
TL;DR: This technique is intended to maintain a faulty application functional in the field while the developers work on permanent and radical fixes, and works without interrupting the execution flow of the application and without restarting its components.
Abstract: We present a technique to make applications resilient to failures. This technique is intended to maintain a faulty application functional in the field while the developers work on permanent and radical fixes. We target field failures in applications built on reusable components. In particular, the technique exploits the intrinsic redundancy of those components by identifying workarounds consisting of alternative uses of the faulty components that avoid the failure. The technique is currently implemented for Java applications but makes little or no assumptions about the nature of the application, and works without interrupting the execution flow of the application and without restarting its components. We demonstrate and evaluate this technique on four mid-size applications and two popular libraries of reusable components affected by real and seeded faults. In these cases the technique is effective, maintaining the application fully functional with between 19% and 48% of the failure-causing faults, depending on the application. The experiments also show that the technique incurs an acceptable runtime overhead in all cases.

103 citations