Schedule
and position papers
THEME
This working session will focus on the state on the art in the application
of IR based techniques to support software maintenance activities.
The session aims to identify the main research and practical issues
in the field, to determine future work directions, and to foster collaborations
among the participants.
Software is comprised of a multitude of artifacts; some of them
are intended to be read by the compiler, while many others are intended
to be read by developers. This is especially true during software
evolution, when developers have to deal with large software, often
written by others.
The user centric information is often expressed in natural language
and it is embedded in documentation and source code. This information
is very important for the developers to understand a great deal of
the why and what of the software system, as much
as the source code is useful to understand the how of the
software. Natural language external documentation (e.g., requirements,
design documents, user manual, etc.), comments, and identifiers in
the source code encode to a large degree the domain of the software
and capture design decisions, change requests, developer information,
etc. This unstructured information is referred to as semantic,
as opposed to structural, which is expressed mainly by the
source code and other data intensive artifacts, such as analysis information.
It is referred as such, as it encodes part of the domain semantics
of the software.
The single developer/maintainer development model did not need capturing
much of this information, as the working and long term memory of the
developer often sufficed to store such information. Today, the increasing
size and complexity of software needs large development groups, often
distributed geographically. Storing and sharing the semantic information
is much needed today. More than that, given the large amount of it,
tools are necessary for its storage, retrieval, and analysis, before
it is delivered to the users.
STATE OF THE ART
In the past decade, researchers proposed information retrieval (IR)
models to address these problems related to the semantic information
in existing software. Early models were used to construct software
libraries and more recent work focused on specific software maintenance
or development tasks such as:
- Traceability link recovery
- Concept location
- Software and web site modularization
- Reverse engineering
- Software reuse
- Impact analysis
- Quality assessment and software measurement
These IR based approaches to software engineering problems differ
not only in their scope, but also in their underlying indexing mechanism,
corpus construction, or data analysis method.
GOALS
The working session has several complementing goals. First, it aims
at clearly defining the state of the art in the filed. As the field
grows, researchers and practitioners need to agree on a common terminology,
as the current work by different groups is somewhat incoherent. We
need to assess how far this field came to date and how far it can
go in the future.
In addition, we want to identify which issues are already answered
by research and ready for practical applications and which are still
open or unaddressed. Several questions will be directly addressed
during the working session and many more will be raised on the spot:
- Can we define a general model for the application of IR methods
in software evolution?
- Do certain IR methods suit specific software maintenance problems,
or we can use any of them for any task?
- Is the field mature enough to talk about benchmarking?
- What new applications exist for the IR-based approaches?
- What are the major practical problems with the current state of
the art: efficiency, scalability, recall and precision, etc.?
- Are there specific problems associated with different IR methods?
- Who among the current researchers can collaborate on future projects?
- Is there available software produced by any research group? Can
we initiate and maintain an open source effort in the area? What
about shared data for replication of case studies?
- How can we best integrate IR methods with other techniques for
the analysis of natural language processing? What is the trade-off?
- How can we bridge the work of the software maintenance community
and other groups from areas like requirements engineering, programming
languages, etc?
- Is there a need for future organized meetings like this working
session?
We invite you to submit
a position paper/presentation on one or more of these topics and to
participate to the working session to discuss to propose ideas and
solutions.
WORKING SESSION ORGANIZATION
The working session will have 90 minutes and will consist of three
parts. It will start with short interactive presentations given by
some of the participants, which will be solicited in advance and selected
by the organizers. These presentations will focus on existing approaches
and techniques. Following these presentations, all the participants
will participate in an open brainstorming session, which will focus
on identifying open issues in the field, new challenges, etc. Questions
will be asked and answers provided by the participants. The final
part will be devoted to recapitulate and reiterate the unanswered
items from the previous two parts and to build a roadmap for future
events, research, and collaborations among the participants.
SUBMISSION OF POSITION STATEMENTS
We expect to have a highly interactive session with as many presenters
as possible. In order to better organize the session, we invite all
those who whish to present something specific at the workshop to submit
a position paper, not exceeding 3 pages in length.
The submission deadline is September 4, 2006.
Authors are requested to submit a position paper (IEEE proceeding
format preferred in PDF) by e-mail to the working session organizers.
Please send your submissions by email
to the organizers at:
amarcus at wayne dot edu
SCHEDULE AND POSITION
STATEMENTS
Presentation slides and discussion notes will be posted after conference.
Click on the title for the position paper. Presenter's name in bold.