Tutorial 4 (afternoon) Applied Data Mining in Clinical Research John Holmes
Overview
This tutorial will introduce attendees to the practical application of data mining.
Using a well-known data mining life cycle as a conceptual framework, attendees will experience first-hand,
thorough demonstration and direct participation, the techniques of mining clinical data.
These techniques include data preparation, description and visualization, mining association, classification,
and prediction rules, clustering.
The capstone of the tutorial will be the application of mined data to informing traditional statistical analysis.
Detailed Content
This tutorial proposes to illustrate, via demonstration and hands-on experience,
the application of application of data mining methodologies to a clinical database.
A knowledge discovery life cycle model [1] will be employed as the conceptual framework for the tutorial.
The goal of this tutorial is to provide attendees with practical experience in mining a database for use in clinical research,
and ultimately for assisting with statistical analysis.
The selected database for this tutorial will be the Pima Indians Diabetes Database [2].
This database was selected because it is well known in the machine learning community,
thereby providing a rich literature of application of various data mining paradigms to it.
In addition, this database offers a variety of attribute types and substantial complexity,
even though it contains only nine variables and 768 records. Finally, it is freely available and in the public domain.
It is an excellent choice for demonstration and laboratory purposes.
The Weka data mining software package [3] will be used for demonstration in the tutorial.
Weka is freely available in the public domain, and runs on even modestly equipped computers within a
Java runtime environment (JRE). Weka and the JRE will be distributed to attendees on CD-ROM free of charge.
Attendees will be encouraged to bring laptops to the tutorial, and they will be given the opportunity to install and use Weka there.
Those who do not bring laptops will benefit from the detailed demonstrations in the tutorial.
The tutorial will cover: Introduction to Weka and the demonstration database;
Data preparation and reduction; Data description and visualization; Association rule mining; Clustering;
Classification and prediction rule mining; Interpreting and applying the results to analysis; and Summary and conclusion.
[1] Han J, Kamber M: Data Mining- Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers, 2001.
[2] Hettich, S, Bay SD: The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information
and Computer Science.
[3] Witten IH, Frank E: Data Mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann
Publishers, 2001.
Intended Audience
Clinical and basic science researchers will benefit most from this tutorial.
The content level is 50% intermediate, 50% advanced.
Pre-requisite Knowledge
Prior exposure to the basic methodologies of data mining.
Important Dates
June 30, 2005
Deadline for registration for tutorials
July 23, 2005
Tutorial
July 25-27, 2005
AIME 05 Scientific Sessions
Presenter
John H. Holmes, PhD, is an internationally recognized expert in applying
evolutionary computation methods to knowledge discovery
in biomedical databases. He is a regular contributor to GECCO (the Genetic and Evolutionary Computation Conference),
a reviewer for Artificial Intelligence in Medicine, Evolutionary Computation, IEEE Transactions on Evolutionary Computation,
among several clinical journals, specializing in KDD issues and applications. Dr. Holmes has also co-authored a chapter on using
learning classifier systems in knowledge discovery (Bull L (ed): Applications of Learning Classifier Systems Berlin:Springer 2004, 15-67).
He has given numerous tutorials and lectures on biomedical KDD at such venues as the Fall Symposium of the American Medical Informatics
Association, Medinfo, The Drug Information Association, and the Centers for Disease Control and Prevention, as well as universities in
the United States and Canada. His tutorials have been very well attended; most recently, his KDD tutorial at Medinfo 2004 drew over
50 people, one of the largest tutorial audiences at that meeting.
|