CSC6710 Final project

 

Description

Scientific workflows have recently emerged as a new paradigm for scientists to formalize and structure complex scientific processes to enable and accelerate many significant scientific discoveries. On one hand, scientific workflows automate many otherwise time-consuming manual scientific activities (scientific data access, integration, transformation, and movement); on the other hand, scientific workflows support the reproducibility of scientific results via tracking and archiving provenance – data that captures the derivation history of a data product, including the original data sources, intermediate data products, and the steps that were applied to produce the data product. Efficient storage and querying of provenance data is essential for the success of scientific workflow technology [1].

 

In this project, you are expected to use a relational database (Oracle) to store and query provenance data. Provenance will be in OPM format [2]. You can use existing provenance datasets from the Provenance Challenge Series [3] or your own simulated datasets to populate your database. You are required to implement a user-friendly provenance database system such that a user can insert multiple provenance datasets into the database and perform provenance queries. In this project, you are expected to implement 15 non-trivial provenance queries (see the Provenance Challenge Series [3] for provenance query examples). Your system should be developed in Java.

 

Part 1:

Draw an ER-diagram to model OPM-compliant provenance information, and then translate the ER-diagram into a relational schema. Implement a simple database system that supports the insert of one provenance dataset and one provenance query. In your project report for part 1, please describe your queries by Q1, Q2, … etc, and elaborate what each query is supposed to return, and how it is implemented in SQL.

 

Part 2:

Complete the project as described in the above description (support user-friendly storage and querying of provenance, at least 15 non-trivial provenance queries.) In your final project report, please describe your queries by Q1, Q2, … etc, and elaborate what each query is supposed to return, and how it is implemented in SQL.

 

Submission

Send a zip  file to TA via Digital Dropbox in Blackboard with all source codes and necessary files. A project report file called “report.doc must be compiled, which describes the developed system  in details, including pseudocode for all algorithms and the detailed description and implementation of each query. The zip file should include “read.txt” to explain how to compile and run your system, including the information that TA must know in order to grade, such as your teammates. The zip file should be named after your name, for example, for part 1 if your name is “David Smith”, then your file should be named as “david_smith_part1.zip”. The name(subject) of your submission in digital dropbox should be “Project_part1_WSU access ID”, for example : Project_part1_aq1111.