Awards and Honors
2008 Michael E.Conrad Outstanding Graduate Research Publication Award, CS Dept, WSU
2007 Best Paper Award at ICPC’07
2007 Graduate Research Assistant Recognition Award, WSU, Graduate School
2007 ACM SIGSOFT Award to attend ICSE2007
2006 Best Paper Award at ICPC’06
2006 ACM SIGSOFT Award to attend ICSE2006
2005 Microsoft Student Ambassador Award
2005 Graduate Recognition Award for Exceptional Teaching Service, College of Science, WSU
Research
My research interests are in the area of software engineering . The focus of my research is on supporting the evolution and maintenance of large scale software systems. My work to date concentrated on several topics in the area of software evolution and maintenance, such as feature location in source code, change impact analysis, software measurement and fault prediction, traceability link recovery, software repository mining, and software visualization.
Research Program
A common theme and long-term goal of my research is to develop methods and tools to support software engineers during the development and maintenance of large-scale software systems. The main premise of my research is that an explicit representation of the unstructured information found in software systems and its combination with structured analysis data can help develop efficient tools and techniques. This combined approach directly supports various software development and maintenance activities, such as feature location, change impact analysis, design decision recovery, traceability link management, and software quality measurement, etc. I currently focus on applications of Information Retrieval (IR) approaches and their combinations with source code analysis methods and software repository mining (MSR) techniques to automatically extract and analyze the textual (natural language) information that is embedded in a multitude of software artifacts.
Feature location. Feature (or concept) location is the process of identifying the parts of the source code that implement a specific functionality of a software system. In order to better support feature location in source code, we investigate applications of IR techniques for extracting and representing the textual information in large software systems so that this unstructured information can be automatically combined with structural information. The current research focuses on combining IR-based analysis data with program dependencies, execution traces, and Formal Concept Analysis (FCA).
Software measurement and fault prediction. Classes in object-oriented systems, which may be written in different programming languages, contain identifiers and comments, which reflect concepts from the domain of the software system. Software cohesion and coupling ultimately affects comprehensibility of the source code of those classes. For the source code to be easy to understand, it has to have a clear implementation logic (i.e., design) and it has to be easy to read (i.e., good language use). The unstructured (textual) information, present in identifiers and comments, can be effectively used to measure the readability of source code and ultimately its quality. Inspired by the mechanisms used to measure textual coherence in cognitive psychology and computational linguistics, we investigate how this textual information can be used for measuring the cohesion and coupling of classes in object-oriented software systems. Highly cohesive classes need to have a design that ensures a strong coupling among its methods (captured by structural metrics) as well as a coherent internal description (captured by cohesion metrics based on analysis of unstructured information). Finally, we investigate the combinations of structural and conceptual cohesion metrics to define better models for prediction of faults in classes.
Change impact analysis. During modifications of the source code, the starting point of the change is identified via feature location and the impact set ( all components or classes of the existing code that will be changed) is determined via impact analysis techniques. Determining the impact set is a significant part of the software change process. A simple software change may require modification of just a single class, while a complex change may require modifications in many locations, spread over the source code. We are investigating the impact analysis activity with emphasis on combining static program analysis methods with IR techniques. Our research efforts are aimed at combining the unstructured information with program dependencies to provide efficient mechanisms for ranking components or classes during impact analysis as well as in identifying hidden dependencies (i.e. dependencies with indirect data or control flow).
Traceability link recovery. IR methods have been widely used in software engineering to recover traceability links among software artifacts. In our research, we investigate applications of traceability links for assessing and maintaining the quality of software documentation. Our preposition is that quality software documentation should accurately reflect the structure of the source code; hence elements of documentation that link to strongly coupled elements of the source code should also be strongly related. We investigate combinations of IR methods with static analysis of source code to improve existing methods for recovering traceability links. Specifically, the research is aimed at improving existing documentation (e.g., recover missing traceability links among sections in documentation based on coupling among related elements in source code) or writing new documentation during the evolution of the software.
Mining software repositories and software visualization. MSR is an increasingly important activity during software evolution, as the extracted data can be used to support a variety of software maintenance tasks. To leverage the unstructured information mined from software systems and tailor it to the needs of the developer, the information needs to be filtered, aggregated, and presented to the users. Within our research program, we investigate new visualization techniques and tools for mining unstructured data in software repositories and thus developing tools to support programmers during software maintenance tasks.
Presentations
"Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code", at the 15th IEEE International Conference on Program Comprehension (ICPC2007), Banff, Alberta, Canada, June 26-29, 2007 [slides]
"Integrating COTS Search Engines into Eclipse: Google Desktop Case Study" at the
2nd International ICSE'07 Workshop on Incorporating COTS Software into Software Systems: Tools and Techniques (IWICSS'07), Minneapolis, MN, May 22, 2007 [slides]
"Using Traceability Links to Assess and Maintain the Quality of Software Documentation" at the
ACM International Symposium on Grand Challenges in Traceability, Lexington, KY, March 23, 2007 [slides]
"The Conceptual Coupling Metrics for Object-Oriented Systems", at the 22nd IEEE International Conference on Software Maintenance (ICSM2006), Philadelphia, Pennsylvania, September 27, 2006 [slides]
"Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification", at the 14th IEEE International Conference on Program Comprehension (ICPC2006), Athens, Greece, June 15, 2006 [slides]
"JIRiSS - an Eclipse plug-in for Source Code Exploration", at the 14th IEEE International Conference on Program Comprehension (ICPC2006), Athens, Greece, June 16, 2006 [slides]
"3D Visualization for Concept Location in Source Code" at the 28th IEEE/ACM International Conference on Software Engineering (ICSE2006), Research Demonstrations, Shanghai, China, May 26, 2006 [slides]
"JRipples: An Eclipse Plug-in for Software Evolution" at the Eclipse Technology Exchange at OOPSLA'05, Poster presentation, San Diego, CA, October 2005
|