September 25, Monday

4:00-5:30 pm

Schedule and position papers

Working Session:
Information Retrieval Based Approaches in Software Evolution
Organizers

Description

 

Andrian Marcus
Wayne State University
Detroit, MI, USA


Andrea De Lucia
Università di Salerno
Fisciano (SA), Italy

 

Jane Huffman Hayes
University of Kentucky
Lexington, KY, USA

 

Denys Poshyvanyk
Wayne State University
Detroit, MI, USA


Contact organizers
amarcus at wayne dot edu

Schedule and position papers

THEME

This working session will focus on the state on the art in the application of IR based techniques to support software maintenance activities. The session aims to identify the main research and practical issues in the field, to determine future work directions, and to foster collaborations among the participants.

Software is comprised of a multitude of artifacts; some of them are intended to be read by the compiler, while many others are intended to be read by developers. This is especially true during software evolution, when developers have to deal with large software, often written by others.

The user centric information is often expressed in natural language and it is embedded in documentation and source code. This information is very important for the developers to understand a great deal of the why and what of the software system, as much as the source code is useful to understand the how of the software. Natural language external documentation (e.g., requirements, design documents, user manual, etc.), comments, and identifiers in the source code encode to a large degree the domain of the software and capture design decisions, change requests, developer information, etc. This unstructured information is referred to as semantic, as opposed to structural, which is expressed mainly by the source code and other data intensive artifacts, such as analysis information. It is referred as such, as it encodes part of the domain semantics of the software.

The single developer/maintainer development model did not need capturing much of this information, as the working and long term memory of the developer often sufficed to store such information. Today, the increasing size and complexity of software needs large development groups, often distributed geographically. Storing and sharing the semantic information is much needed today. More than that, given the large amount of it, tools are necessary for its storage, retrieval, and analysis, before it is delivered to the users.

STATE OF THE ART

In the past decade, researchers proposed information retrieval (IR) models to address these problems related to the semantic information in existing software. Early models were used to construct software libraries and more recent work focused on specific software maintenance or development tasks such as:

  • Traceability link recovery
  • Concept location
  • Software and web site modularization
  • Reverse engineering
  • Software reuse
  • Impact analysis
  • Quality assessment and software measurement

These IR based approaches to software engineering problems differ not only in their scope, but also in their underlying indexing mechanism, corpus construction, or data analysis method.

GOALS

The working session has several complementing goals. First, it aims at clearly defining the state of the art in the filed. As the field grows, researchers and practitioners need to agree on a common terminology, as the current work by different groups is somewhat incoherent. We need to assess how far this field came to date and how far it can go in the future.

In addition, we want to identify which issues are already answered by research and ready for practical applications and which are still open or unaddressed. Several questions will be directly addressed during the working session and many more will be raised on the spot:

  • Can we define a general model for the application of IR methods in software evolution?
  • Do certain IR methods suit specific software maintenance problems, or we can use any of them for any task?
  • Is the field mature enough to talk about benchmarking?
  • What new applications exist for the IR-based approaches?
  • What are the major practical problems with the current state of the art: efficiency, scalability, recall and precision, etc.?
  • Are there specific problems associated with different IR methods?
  • Who among the current researchers can collaborate on future projects?
  • Is there available software produced by any research group? Can we initiate and maintain an open source effort in the area? What about shared data for replication of case studies?
  • How can we best integrate IR methods with other techniques for the analysis of natural language processing? What is the trade-off?
  • How can we bridge the work of the software maintenance community and other groups from areas like requirements engineering, programming languages, etc?
  • Is there a need for future organized meetings like this working session?

We invite you to submit a position paper/presentation on one or more of these topics and to participate to the working session to discuss to propose ideas and solutions.

WORKING SESSION ORGANIZATION

The working session will have 90 minutes and will consist of three parts. It will start with short interactive presentations given by some of the participants, which will be solicited in advance and selected by the organizers. These presentations will focus on existing approaches and techniques. Following these presentations, all the participants will participate in an open brainstorming session, which will focus on identifying open issues in the field, new challenges, etc. Questions will be asked and answers provided by the participants. The final part will be devoted to recapitulate and reiterate the unanswered items from the previous two parts and to build a roadmap for future events, research, and collaborations among the participants.

SUBMISSION OF POSITION STATEMENTS

We expect to have a highly interactive session with as many presenters as possible. In order to better organize the session, we invite all those who whish to present something specific at the workshop to submit a position paper, not exceeding 3 pages in length.

The submission deadline is September 4, 2006.

Authors are requested to submit a position paper (IEEE proceeding format preferred in PDF) by e-mail to the working session organizers.
Please send your submissions by email to the organizers at:
amarcus at wayne dot edu

SCHEDULE AND POSITION STATEMENTS

Presentation slides and discussion notes will be posted after conference.
Click on the title for the position paper. Presenter's name in bold.