CSC 7991 Introduction to Data Mining
Course #: CSC 7991
Prerequisite: graduate status in a biological science or computer science, or approval of the instructor.
Day: TTh
Room: 306 State Hall
Hours: 3:00 p.m.  4:20 p.m.
Instructor: Sorin Draghici
Office: 408 State Hall
Office hours: Tue: 6.00pm – 7.00pm or by appointment.
Telephone: 5775484
Email: sod@cs.wayne.edu
Web page: http://www.cs.wayne.edu/~sod
On this web page you can find the syllabus, and announcements regarding the course if any.
The course is focused on data analysis of microarray data. The goal of this course is to present the main data mining techniques available in a way that is useful to the biological scientist. The intended audience includes as a central figure the researcher or practitioner with a background in the biological sciences that needs to use computational tools in order to analyze data. At the same time, the course is intended for the computer scientists who would like to use their background in order to solve problems at the border with biology and medicine. The course explains the nature of the specific challenges that such problems pose as well as various adaptations that classical algorithms need to undergo in order to provide good results in this particular field.
Final exam: 25 April
Revision for final exam: 18 April
Midterm exam: 3/7.
Elementary
calculus and basic algebra.
Introduction 

Elements of statistics 
Measures of central tendency, measures of variability, the normal distribution, some statistical tests (ttest, MannWhitney, etc). 
Data preparation 
Preprocessing, flip dye experiments, background correction. etc. 
Normalization 
Divide/subtract mean, replicates, thresholding, ratios, log transform. 
Data analysis and Data mining 
Why, what, how. 
Basic tools 
Histograms, scatterplots, time series 
Selection of differentially regulated genes 
Fold change, unusual ratio, maximum likelihood, confidence analysis, SAM. 
Exploratory analysis 
PCA, similarity measures 
Midterm 

Clustering 
kmeans, hierarchical, top down, bottom up, SOFM, how and when to do what. 
Advanced unsupervised tools 
Cluster confidence and significance analysis 
Supervised learning 
Issues in supervised learning: training validation, curse of dimensionality. 
Supervised techniques 
Neural networks, gene shaving 
Other techniques 
Bayesian techniques, etc. 
Other techniques & Revision 

Attendance: Attending all lectures is essential; the assignments, exams, quizzes, etc. will be based primarily (though not exclusively) on the materials presented in these lectures. Also, assignments due dates, explanation and clarification of assignments will be presented during lecture and lab sessions. If you miss a lecture or lab session, it is your responsibility to obtain the information covered in the session.
Health Safety: Please report to the instructor any health condition which may create a classroom emergency (e.g. seizure disorders, diabetes, heart conditions, etc.).
Computer lab: To enhance your learning and for your homework, the computer lab, equipped with PC’s is available to you during the time posted on the lab’s door.
If you have a PC and appropriate software at home, you are encouraged to work at home. However, it is your responsibility to make sure that your homework is fully compatible with the equipment in the undergraduate lab and to transfer your homework on the equipment in the lab so that it is available for assessment on the due date.
Assignments, quizzes, examinations and
final project: There will be a number of assignments, due at the beginning
of the class session of the due date.
Late submissions (but not later than one week) will carry a 10% deduction
of the marks for each day it is late.
If you must, late homework can be turned in to the secretary in the
Department of Computer Science main office (431 State Hall, open weekdays from
9am to 5pm). No assignments will be accepted after 9 calendar days past its due
date. Since each assignment is an integral part of the course, the instructor
reserves the right to give a failing grade to anyone who is turning in 50% or
less of the homework.
There will be a number of unannounced
quizzes during the regular lecture
hours. The examinations will be closed books, closed notes and closed
neighbors.
Since the two exams cover different parts
of the course material, in order to pass the course, you must pass both
exams. If you suspect that you will be unable to attend an exam because of
a valid and verifiable excuse, you must give me prior notice, at least one full
day before the exam. There will be NO makeup examinations.
Be aware of the fact that this course, like
any other course, require a certain amount of work to be done. Specifically to
this course, some of the work has to be done on a computer. Simply attending
the lectures is not sufficient to obtain a passing grade.
Final grade: Each homework/exam/quiz/lab/term project is worth 100 points.
The final grade will be calculated as follows:
Average of homework: 10%
Quizzes: 10%
Project 30%
Midterm exam: 25%
Final: 25%
The homework might involve collecting and reading research papers related to the topic. Writing short essays, providing feedback on the lecture notes, etc.
The project will involve analyzing a real world dataset. You are encouraged to use your own data. If you do not work currently with microarrays, you will be able to choose a data sets made available by the instructor. The report for the project will be written in the form of a research paper. The submission of the report for publication is strongly encouraged but not compulsory.At the end of the semester, you will give a 1520min presentation of the project work.
The final letter grade will be determined approximately as follows:
The final letter grade will be determined approximately as follows: A: 95100 % A: 9094.99 B+: 8589.99 B: 8084.99 B: 7579.99 C+: 7074.99 C: 6669.99 C: 6265.99 D+: 5861.99 D: 5457.99 D: 5053.99 E: less than 50% A grade of Incomplete (I) will not be given unless in very exceptional circumstances. 
Student Responsibilities and Academic Honesty: As a college student who is committed to seek a higher education, we expect you to be a very responsible person. At the least, please:
· Do your best to understand the material covered in the class and ask questions when you do not understand.
· Be aware of the homework assignments, deadlines and late assignment policy.
· Turn in your assignments in neat, readable and easily accessible form.
· Obtain notes and handouts from your classmates if you miss a class for unavoidable circumstances.
Also, we expect all of you to have the highest level of academic honesty. We expect each of you to do your work (assignments, lab exercises, quizzes, exams) yourself and strongly encourage you to discuss with the instructor regarding any problems which you might have in the course work. Remember, you are here to gather more knowledge and become a more educated person, not to collect grades.
In fairness to all, if we find two or more assignments which appear to be copied from each other, we will split the points evenly among all those involved (no matter who copied from whom). Repeated incidents will be dealt with severe disciplinary actions including expulsion from the CS program.
Please behave decently in the classroom. If you have any questions or problems regarding the topic being discussed, feel free to ask your instructor at any time. Don’t be shy: no question is too simple and many others might share your puzzlement. Please refrain from discussing other issues among yourselves during the class. You might be disturbing your colleagues who have the right to attend the lecture in a noise‑free environment.
CSC Questionnaire
Instructions: Please complete the following and return to the instructor.
Name: __________________________________________________
(Last) (First) (Middle)
Student ID number: _____________________
Telephone: ___________________ (Home) Can I leave a message?________
(Office) Can I leave a message?________
Email:
What level are you? __________________
Major:_____________________________

YES 
NO 
Have you ever used microarrays: 


Do you plan to use microarrays in the next year: 


Have you ever taken any course on statistics: 


Why are you interested in this course:
__________________________________________________________________________________
___________________________________________________________________________________
___________________________________________________________________________________
___________________________________________________________________________________
___________________________________________________________________________________
________________________________________________________________________
Please circle the number that best represents you response to each item using the following scale:
1 Strongly disagree
2 Disagree
3 Disagree somewhat
4 Neutral
5 Agree somewhat
6 Agree
7 Strongly agree
1. At the beginning of the course the overall class plan was clearly presented.
1 2 3 4 5 6 7
2. At the beginning of the course, my responsibilities as a student were made clear.
1 2 3 4 5 6 7
3. The grading procedures were clearly explained at the start of the course.
1 2 3 4 5 6 7