THEME
This working session will focus on the state on the art in the application of IR based techniques to support software maintenance activities. The session aims to identify the main research and practical issues in the field, to determine future work directions, and to foster collaborations among the participants.
Software is comprised of a multitude of artifacts; some of them are intended to be read by the compiler, while many others are intended to be read by developers. This is especially true during software evolution, when developers have to deal with large software, often written by others.
The user centric information is often expressed in natural language and it is embedded in documentation and source code. This information is very important for the developers to understand a great deal of the why and what of the software system, as much as the source code is useful to understand the how of the software. Natural language external documentation (e.g., requirements, design documents, user manual, etc.), comments, and identifiers in the source code encode to a large degree the domain of the software and capture design decisions, change requests, developer information, etc. This unstructured information is referred to as semantic, as opposed to structural, which is expressed mainly by the source code and other data intensive artifacts, such as analysis information.
The single developer/maintainer development model did not need capturing much of this information, as the working and long term memory of the developer often sufficed to store such information. Today, the increasing size and complexity of software needs large development groups, often distributed geographically. Storing and sharing the semantic information is much needed today. More than that, given the large amount of it, tools are necessary for its storage, retrieval, and analysis, before it is delivered to the users.
STATE OF THE ART
In the past decade, researchers proposed information retrieval (IR) models to
address these problems related to the semantic information in existing
software. Early models were used to construct software libraries and more
recent work focused on specific software maintenance or development tasks such
as:
• Traceability link recovery
• Concept location
• Software and web site modularization
• Reverse engineering
• Software reuse
• Impact analysis
• Quality assessment and software measurement
These IR based approaches to software engineering problems differ not only in their scope, but also in their underlying indexing mechanism, corpus construction, or data analysis method.
GOALS
The working session has several complementing goals. First, it aims at clearly defining the state of the art in the filed, briefly described above. As the field grows, researchers and practitioners need to agree on a common terminology, as the current work by different groups is somewhat incoherent. We need to assess how far this field came to date and how far it can go in the future.
In addition, we want to identify which issues are already answered by research and ready for practical applications and which are still open or unaddressed. Several questions will be directly addressed during the working session and many more will be raised on the spot:
• How can we refine and improve the general model, presented above? Does the
model suit all current and future applications?
• Do certain IR methods suit specific software maintenance problems, or we can use any of them for any task?
• Is the field mature enough to talk about benchmarking?
• What new applications exist for the IR-based approaches?
• What are the major practical problems with the current state of the art:
efficiency, scalability, recall and precision, etc.?
Are there specific problems associated with different IR methods?
• Who among the current researchers can collaborate on future projects?
• Is there available software produced by any research group? Can we initiate and maintain an open source effort in the area?
• How can we best integrate IR methods with other techniques for the analysis of unstructured information (e.g., natural language processing)? What is the trade-off?
• How can we bridge the work of the software maintenance community and other groups from areas like requirements engineering, programming languages, etc?
• Is there a need for future organized meetings like this working session?
We invite you to submit a position
paper about one or more of the mentioned topics and to participate to the
working session to discuss and propose ideas and solutions.
WORKING SESSION ORGANIZATION
The working session will have 90 minutes and will consist of three parts. It will start with short interactive presentations given by some of the participants, which will be solicited in advance and selected by the organizers. These presentations will focus on existing approaches and techniques. Following these presentations, all the participants will participate in an open brainstorming session, which will focus on identifying open issues in the field, new challenges, etc. Questions will be asked and answers provided by the participants. The final part will be devoted to recapitulate and reiterate the unanswered items from the previous two parts and to build a roadmap for future events, research, and collaborations among the participants.
SUBMISSION OF POSITION STATEMENTS
Interested participants are expected to submit informal abstracts or position
papers, not exceeding 3 pages in length, and then to give a short presentation
(10 min) at the working session.
The submission deadline is September 1, 2006. The official language is
English. Authors are requested to submit a PDF version of their position papers
(IEEE proceeding format) by e-mail to the working session organizers.
Please send you submissions by email to organizers at:
ir_icsm06@yahoogroups.com