Showing papers by "Santonu Sarkar published in 2008"

PDF

Open Access

Proceedings Article•DOI•

Mining business topics in source code using latent dirichlet allocation

[...]

Girish Maskeri¹, Santonu Sarkar¹, Kenneth Heafield²•Institutions (2)

Infosys¹, California Institute of Technology²

19 Feb 2008

TL;DR: Preliminary results indicate that LDA is able to identify some of the domain topics and is a satisfactory starting point for further manual refinement of topics, and a human assisted approach based on LDA for extracting domain topics from source code is proposed.

...read moreread less

Abstract: One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionality of the system. Latent Dirichlet Allocation (LDA), a statistical model, has emerged as a popular technique for discovering topics in large text document corpus. But its applicability in extracting business domain topics from source code has not been explored so far. This paper investigates LDA in the context of comprehending large software systems and proposes a human assisted approachbased on LDA for extracting domain topics from source code. This method has been applied on a number of open source and proprietary systems. Preliminary results indicate that LDA is able to identify some of the domain topics and isa satisfactory starting point for further manual refinement of topics

...read moreread less

188 citations

Journal Article•DOI•

Modeling and optimization of wire electrical discharge machining of γ-TiAl in trim cutting operation

[...]

Santonu Sarkar¹, Mukandar Sekh¹, Souren Mitra¹, Bijoy Bhattacharyya¹•Institutions (1)

Jadavpur University¹

26 Aug 2008-Journal of Materials Processing Technology

TL;DR: In this paper, a second-order mathematical model in terms of machining parameters was developed for surface roughness, dimensional shift and cutting speed using response surface methodology (RSM) for wire electrical discharge machining of γ-titanium aluminide.

...read moreread less

149 citations

Journal Article•DOI•

Metrics for Measuring the Quality of Modularization of Large-Scale Object-Oriented Software

[...]

Santonu Sarkar¹, Avinash C. Kak², Girish Maskeri Rama³•Institutions (3)

Accenture¹, Purdue University², Infosys³

01 Sep 2008-IEEE Transactions on Software Engineering

TL;DR: A set of metrics that characterize the quality of modularization with respect to the APIs of the modules and such object-oriented inter-module dependencies as caused by inheritance, associational relationships, state access violations, fragile base-class design, etc are provided.

...read moreread less

Abstract: The metrics formulated to date for characterizing the modularization quality of object-oriented software have considered module and class to be synonymous concepts. But a typical class in object oriented programming exists at too low a level of granularity in large object-oriented software consisting of millions of lines of code. A typical module (sometimes referred to as a superpackage) in a large object-oriented software system will typically consist of a large number of classes. Even when the access discipline encoded in each class makes for "clean" class-level partitioning of the code, the intermodule dependencies created by associational, inheritance-based, and method invocations may still make it difficult to maintain and extend the software. The goal of this paper is to provide a set of metrics that characterize large object-oriented software systems with regard to such dependencies. Our metrics characterize the quality of modularization with respect to the APIs of the modules, on the one hand, and, on the other, with respect to such object-oriented inter-module dependencies as caused by inheritance, associational relationships, state access violations, fragile base-class design, etc. Using a two-pronged approach, we validate the metrics by applying them to popular open-source software systems.

...read moreread less

97 citations

Patent•

Identification of topics in source code

[...]

Girish Maskeri Rama¹, Kenneth Heafield, Santonu Sarkar¹•Institutions (1)

Infosys¹

17 Sep 2008

TL;DR: In this article, topics in source code can be identified using Latent Dirichlet Allocation (LDA) by identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and source code using LDA, and outputting a list of topics.

...read moreread less

Abstract: Topics in source code can be identified using Latent Dirichlet Allocation (LDA) by receiving source code, identifying domain specific keywords from the source code, generating a keyword matrix, processing the keyword matrix and the source code using LDA, and outputting a list of topics. The list of topics is output as collections of domain specific keywords. Probabilities of domain specific keywords belonging to their respective topics can also be output. The keyword matrix comprises weighted sums of occurrences of domain specific keywords in the source code.

...read moreread less

29 citations

Patent•

System and method for improving modularity of large legacy software systems

[...]

Girish Maskeri Rama¹, Santonu Sarkar¹•Institutions (1)

Infosys¹

15 Sep 2008

TL;DR: In this article, a system and method for improving modularity of a software source code is presented, which comprises of a user interface for receiving source code; a source code model extractor for parsing and forming a model of the source code, a database of refactoring operators, and a record of changes.

...read moreread less

Abstract: A system and method for improving modularity of a software source code is provided. The system comprises of a user interface for receiving source code; a source code model extractor for parsing and forming a model of the source code; a source code model database for storing the source code model, refactoring operators, and a record of refactoring changes; a modularity improvement analyzer for reading the source code model and modularity problem diagnosis data and generating a set of prescriptions; an optimal improvement suggestion selector for evaluating and selecting prescriptions; and a refactoring engine for receiving selected prescriptions and applying them on the source code.

...read moreread less

21 citations

Proceedings Article•DOI•

A collaborative platform for application knowledge management in software maintenance projects

[...]

Santonu Sarkar¹, Renuka Sindhgatta², Krishnakumar Pooloth•Institutions (2)

Accenture¹, IBM²

18 Jan 2008

TL;DR: In the era of global outsourcing, maintenance and enhancement activities are performed in distributed locations and a critical success factor in such a scenario is to have a collaborative platform for managing and sharing the domain specific knowledge across distributed locations.

...read moreread less

Abstract: In the era of global outsourcing, maintenance and enhancement activities are performed in distributed locations. In most cases, the domain expertise is not available which increases the complexity to manifold. A critical success factor in such a scenario is to have a collaborative platform for managing and sharing the domain specific knowledge across distributed locations. In our ongoing research we have developed a human assisted collaborative knowledge sharing tool called CollabDev. The aim of this tool is to analyze applications in multiple languages and render various structural, architectural, and functional insights to the people involved in maintenance. The novelty of this platform lies in integrating different elements of application knowledge by linking them to source code and allowing multiple developers to collaborate on-line by using annotations for the knowledge elements. The platform also provides diagnostic information on architecture of source code.

...read moreread less

19 citations

Proceedings Article•DOI•

Formal architecture modeling of business application- software maintenance case study

[...]

Santonu Sarkar¹, A. Panayappan¹•Institutions (1)

Accenture¹

01 Nov 2008

TL;DR: A type system to model the architecture of a complex enterprise IT system using Acme architecture description language is proposed and a modeling approach to capture various architectural design decisions architects perform as a part of the architecture review is reported.

...read moreread less

Abstract: Maintenance of complex business applications is challenging for software services industry. The maintenance team inherits the software with little design and implementation knowledge. The client-facing team gathers an ad-hoc architectural description of some sort and communicates the same to the geographically distributed maintenance team through informal box and line diagrams. This information is poorly understood, and the underlying architectural constraints are never enforced. This paper proposes a type system to model the architecture of a complex enterprise IT system using Acme architecture description language and reports a modeling approach to capture various architectural design decisions architects perform as a part of the architecture review. An initial field-study to evaluate the usefulness of such modeling has been encouraging.

...read moreread less

1 citations