scispace - formally typeset
Proceedings ArticleDOI

Using latent semantic analysis to identify similarities in source code to support program understanding

TLDR
The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation to assist in the understanding of a nontrivial software system, namely a version of Mosaic.
Abstract
The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent semantic analysis is a corpus based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). Here LSA is used as the basis to cluster software components. This clustering is used to assist in the understanding of a nontrivial software system, namely a version of Mosaic. Applying latent semantic analysis to the domain of source code and internal documentation for the support of program understanding is a new application of this method and a departure from the normal application domain of natural language.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Recovering documentation-to-source-code traceability links using latent semantic indexing

TL;DR: The method presented proves to give good results by comparison and additionally it is a low cost, highly flexible method to apply with regards to preprocessing and/or parsing of the source code and documentation.
Book

Operating Systems: Design and Implementation

TL;DR: The author discusses the history and present situation of operating systems, as well as some of the techniques used to design and implement these systems.
Journal ArticleDOI

Semantic clustering: Identifying topics in source code

TL;DR: Semantic Clustering is introduced, a technique based on Latent Semantic Indexing and clustering to group source artifacts that use similar vocabulary that interpret them as linguistic topics that reveal the intention of the code.
Proceedings ArticleDOI

Identification of high-level concept clones in source code

TL;DR: The intention of the approach is to enhance and augment existing clone detection methods that are based on structural analysis and improve the quality of clone detection.
Proceedings ArticleDOI

Supporting program comprehension using semantic and structural information

TL;DR: Focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems.
References
More filters
Book

Principal Component Analysis

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Book

Pattern classification and scene analysis

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Book

Numerical Recipes in C: The Art of Scientific Computing

TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Related Papers (5)