Project  2

 

Goal

The goal of this project is to develop a hierarchical scientific workflow using the Taverna  system to explore the notion of  “provenance”.  

 

Description

Provenance management is essential for scientific workflows to support scientific discovery, reproducibility, result interpretation, and problem diagnosis, while such a facility is usually not necessary for business workflows. Provenance metadata captures the derivation history of a data product, including the original data sources, intermediate data products, and the steps that were applied to produce the data product. The provenance management problem concerns about the efficiency and effectiveness of the collection, representation, storage, querying, and visualization of provenance metadata.

 

In this project, you will extend the scientific workflow you have implemented for project 1 with the following additional requirements using  the Taverna system. You can also design and implement a totally new scientific workflow. In either case, the workflow needs to satisfy the following features:

1)      The workflow should contain at least 10 tasks and has to be hierarchical (i.e., containing composite tasks).

2)      At least two of the workflow tasks must be transactions and at least two of the workflow tasks must be Web services.

One should be able to run your workflow multiple times and then export a provenance file called provenance.txt with all workflow run provenances collected in the file. Please acquire additional papers from the TA if you need to understand more about provenance.

The format of provenance.txt is not fixed, but you need to clearly define your format so that other people can understand it (preferably RDF or XML format but not mandatory).

 

 

Submission

 

Send a zip  file to TA via Digital Dropbox in Blackboard with all source codes and necessary files, including file provenance.txt. The zip file should include “read.txt” to explain how to compile and run your system, including the information that TA must know in order to grade, such as your teammates. The zip file should be named after your name, for example, for project x if your name is “David Smith”, then your file should be named as “david_smith_projectx.zip”. The name(subject) of your submission in digital dropbox should be “Projectx_WSU access ID”, for example : Projectx_aq1111.