CSC6710 Final project
Scientific workflows have recently emerged as a new paradigm for
scientists to formalize and structure complex scientific processes to enable
and accelerate many significant scientific discoveries. On one hand, scientific
workflows automate many otherwise time-consuming manual scientific activities
(scientific data access, integration, transformation, and movement); on the other
hand, scientific workflows support the reproducibility of scientific results
via tracking and archiving provenance – data that captures the derivation
history of a data product, including the original data sources, intermediate
data products, and the steps that were applied to produce the data product. Efficient
storage and querying of provenance data is essential for the success of
scientific workflow technology [1].
In this project, you are expected to use a relational database (Oracle)
to store and query provenance data. Provenance will be in OPM format [2]. You can use existing provenance
datasets from the Provenance Challenge Series [3] or your own
simulated datasets to populate your database. You are required to implement a user-friendly
provenance database system such that a user can insert multiple provenance
datasets into the database and perform provenance queries. In this project, you
are expected to implement 15 non-trivial provenance queries (see the Provenance
Challenge Series [3]
for provenance query examples). Your system should be developed in Java.
Part 1:
Draw an ER-diagram to model OPM-compliant provenance information, and
then translate the ER-diagram into a relational schema. Implement a simple
database system that supports the insert of one provenance dataset and one
provenance query. In your project report for part 1, please describe your
queries by Q1, Q2, … etc, and elaborate what each
query is supposed to return, and how it is implemented in SQL.
Part 2:
Complete the project as described in the above description (support
user-friendly storage and querying of provenance, at least 15 non-trivial
provenance queries.) In your final project report, please describe your queries
by Q1, Q2, … etc, and elaborate what each query is
supposed to return, and how it is implemented in SQL.
Submission
Send a zip file to TA via Digital Dropbox in Blackboard with all source codes and
necessary files. A project report file called “report.doc” must be compiled, which
describes the developed system in
details, including pseudocode for all algorithms and
the detailed description and implementation of each query. The zip file should
include “read.txt” to explain how to
compile and run your system, including the information that TA must know in
order to grade, such as your teammates. The zip file should be named after your
name, for example, for part 1 if your name is “David Smith”, then your file
should be named as “david_smith_part1.zip”. The name(subject)
of your submission in digital dropbox should be
“Project_part1_WSU access ID”, for example : Project_part1_aq1111.