scispace - formally typeset
Search or ask a question

Showing papers by "Santonu Sarkar published in 2008"


Proceedings ArticleDOI
19 Feb 2008
TL;DR: Preliminary results indicate that LDA is able to identify some of the domain topics and is a satisfactory starting point for further manual refinement of topics, and a human assisted approach based on LDA for extracting domain topics from source code is proposed.
Abstract: One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionality of the system. Latent Dirichlet Allocation (LDA), a statistical model, has emerged as a popular technique for discovering topics in large text document corpus. But its applicability in extracting business domain topics from source code has not been explored so far. This paper investigates LDA in the context of comprehending large software systems and proposes a human assisted approachbased on LDA for extracting domain topics from source code. This method has been applied on a number of open source and proprietary systems. Preliminary results indicate that LDA is able to identify some of the domain topics and isa satisfactory starting point for further manual refinement of topics

188 citations


Journal ArticleDOI
TL;DR: In this paper, a second-order mathematical model in terms of machining parameters was developed for surface roughness, dimensional shift and cutting speed using response surface methodology (RSM) for wire electrical discharge machining of γ-titanium aluminide.

149 citations


Journal ArticleDOI
TL;DR: A set of metrics that characterize the quality of modularization with respect to the APIs of the modules and such object-oriented inter-module dependencies as caused by inheritance, associational relationships, state access violations, fragile base-class design, etc are provided.
Abstract: The metrics formulated to date for characterizing the modularization quality of object-oriented software have considered module and class to be synonymous concepts. But a typical class in object oriented programming exists at too low a level of granularity in large object-oriented software consisting of millions of lines of code. A typical module (sometimes referred to as a superpackage) in a large object-oriented software system will typically consist of a large number of classes. Even when the access discipline encoded in each class makes for "clean" class-level partitioning of the code, the intermodule dependencies created by associational, inheritance-based, and method invocations may still make it difficult to maintain and extend the software. The goal of this paper is to provide a set of metrics that characterize large object-oriented software systems with regard to such dependencies. Our metrics characterize the quality of modularization with respect to the APIs of the modules, on the one hand, and, on the other, with respect to such object-oriented inter-module dependencies as caused by inheritance, associational relationships, state access violations, fragile base-class design, etc. Using a two-pronged approach, we validate the metrics by applying them to popular open-source software systems.

97 citations


Patent
17 Sep 2008
TL;DR: In this article, topics in source code can be identified using Latent Dirichlet Allocation (LDA) by identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and source code using LDA, and outputting a list of topics.
Abstract: Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.

29 citations


Patent
15 Sep 2008
TL;DR: In this article, a system and method for improving modularity of a software source code is presented, which comprises of a user interface for receiving source code; a source code model extractor for parsing and forming a model of the source code, a database of refactoring operators, and a record of changes.
Abstract: A system and method for improving modularity of a software source code is provided. The system comprises of a user interface for receiving source code; a source code model extractor for parsing and forming a model of the source code; a source code model database for storing the source code model, refactoring operators, and a record of refactoring changes; a modularity improvement analyzer for reading the source code model and modularity problem diagnosis data and generating a set of prescriptions; an optimal improvement suggestion selector for evaluating and selecting prescriptions; and a refactoring engine for receiving selected prescriptions and applying them on the source code.

21 citations


Proceedings ArticleDOI
18 Jan 2008
TL;DR: In the era of global outsourcing, maintenance and enhancement activities are performed in distributed locations and a critical success factor in such a scenario is to have a collaborative platform for managing and sharing the domain specific knowledge across distributed locations.
Abstract: In the era of global outsourcing, maintenance and enhancement activities are performed in distributed locations. In most cases, the domain expertise is not available which increases the complexity to manifold. A critical success factor in such a scenario is to have a collaborative platform for managing and sharing the domain specific knowledge across distributed locations. In our ongoing research we have developed a human assisted collaborative knowledge sharing tool called CollabDev. The aim of this tool is to analyze applications in multiple languages and render various structural, architectural, and functional insights to the people involved in maintenance. The novelty of this platform lies in integrating different elements of application knowledge by linking them to source code and allowing multiple developers to collaborate on-line by using annotations for the knowledge elements. The platform also provides diagnostic information on architecture of source code.

19 citations


Proceedings ArticleDOI
Santonu Sarkar1, A. Panayappan1
01 Nov 2008
TL;DR: A type system to model the architecture of a complex enterprise IT system using Acme architecture description language is proposed and a modeling approach to capture various architectural design decisions architects perform as a part of the architecture review is reported.
Abstract: Maintenance of complex business applications is challenging for software services industry. The maintenance team inherits the software with little design and implementation knowledge. The client-facing team gathers an ad-hoc architectural description of some sort and communicates the same to the geographically distributed maintenance team through informal box and line diagrams. This information is poorly understood, and the underlying architectural constraints are never enforced. This paper proposes a type system to model the architecture of a complex enterprise IT system using Acme architecture description language and reports a modeling approach to capture various architectural design decisions architects perform as a part of the architecture review. An initial field-study to evaluate the usefulness of such modeling has been encouraging.

1 citations