Seminar Kosice - Vienna _____

Overview

Before the Seminar
The Seminar
After the Seminar
- Some Photos from a great event!

Kurzbeschreibung

Allgemeines: Seminar in englischer Sprache in Zusammenarbeit mit der Technischen Universität Kosice, in Kosice, Slovakei
Anrechenbarkeit: Seminar aus Informatik bzw. Seminar aus Artificial Intelligence, beide in englischer Sprache
Inhalt: Methoden der Datenanalyse, Neuronale Netze, Maschinelles Lernen
Datum: Das Seminar wird an 2 aufeinanderfolgenden Tagen an der TU Kosice durchgefuehrt. Genauer Termin wird noch vereinbart.
Sprache: Englisch
Teilnehmerzahl: max. 6 StudentInnen von der TU Wien
Vorbesprechung: Mi., 15. 3. 2000, 13:00, am Institut, Favoritenstr. 9-11, 2, Stock
Anmeldung: per e-mail an rauber@ifs.tuwien.ac.at

Overview

General: English-language Seminar in cooperation with the Technical University of Kosice in Kosice, Slovakia
Credits: "Seminar aus Informatik" or "Seminar aus Artificial Intelligence"
Content: Intelligent Data Analysis, Data Mining, Neural Networks, Machine Learning
Date:The Seminar will be held on 2 consecutive days in the summer term 2000. The exact date will be agreed upon.
Language: English
Number Participants: max. 6 Students from the Vienna University of Technology
First Meeting: We, March 15. 2000, 13:00, at the Department, Favoritenstr. 9-11, 2. floor
Registration: send e-mail to rauber@ifs.tuwien.ac.at

Some Details

General Information

The seminar will be organized as a kind of Student Workshop with participants from the Vienna University of Technology, Austria, and the Technical University of Kosice, Slovakia as a cooperation between the Department of Software Technology (IfS) at VUT Vienna and the Department of Artificial Intelligence at TUKE, Kosice. The main goal of this seminar is to bring together students who are interested in the field of data mining, to discuss and exchange ideas and experiences. We will analyze and compare a set of data mining techniques based on some reference data set. The individual results of the various approaches will be presented at this seminar, followed by a comparison of these results. Thus, every pasrticipant will gain a good knowledge and overview of the strengths, weaknesses and applicabilities of the various approaches. Apart from that, we will defintely also have time for some 'social program' apart from the seminar itself, as one of the central ideas of this seminar is to get people together and have fun while doing some reasonable and interesting work :)

Preliminary Schedule

Nov. 1999 - January 2000: preliminary registration for the seminar by sending an e-mail to rauber@ifs.tuwien.ac.at. Please note, that the maximum number of participants is limited to a group of 6 students fro Vienna University of Technology.
Beginning of March 2000: First meeting, discussion of various issues concerning the seminar, presentation of the reference data sets, presentation of a set of data mining techniques that will be analyzed in the course of the seminar. Each participant may then select one or two of the proposed data mining technologies she or he wants to analyze.
March/April 2000: The selected methods will be studied in some detail, and the reference data sets will be analyzed using these specific techniques. Programs for this analysis willbe provided.
End of April 2000: By the end of April, a first report describing the methods used and the results obtained, will be handed in. We will also discuss these findings in an internal meeting.
May/June 2000: The final report will be written and disseminated to all seminar participants. We will then meet at the Technical University in Kosice for 2 days to present the results, discuss the findings, and most probably also have lots of fun :-)

Location

The Technical University of Kosice is located in Kosice, in the far western part of Slovakia in the area of the famous High Tatras Mountains. This pleasant surrounding will provide an ideal and inspiring setting for the seminar as well as for some leisure time activities...

Funding

We are currently trying to obtain some funding for this seminar, in order to be able to cover (at least part of) the travel and accomodation costs. Details will be discussed during our first meeting at the beginning of March.

Questions, Registration, Miscellaneous

If you are interested to participate in this seminar, or if you have any further questions, just drop me an e-mail at rauber@ifs.tuwien.ac.at. We are currently starting to prepare the details of this seminar, so lots of things still have to be defined or will be defined according to the interests of all participants. I will kepp you informed of all news by e-mail.

List of Praticipants

Kosice:
Vienna:

Experiments Data

We will use 3 different data sets for our experiments, each of which has different characteristics. Thus, we should be able to analyze the strengths and weaknesses of the various approaches with respect to different types of data to be analyzed. The 3 datasets are as follows:

Animals Data Set
a toy example, maily used to test the various algorithms: small, easy to handle, and intuitively interpretable.
16 animals, described by 13 attributes
SW-MIS Data
data describing characteristics of software modules (size, complexity measures, ...), medium-sized data set with low dimensionality
about 420 vectors, described by 13 attributes
TIME Magazine Article Data
Newspaper articles of the TIME Magazine from the 1960's, medium-sized data set with very high dimensionality
420 vectors (articles) described by 4382 attributes

Each of these data sets should be used in three different forms:

"as is"
in the raw form as it is provided
normalized by attribute:
the values of each attribute have different ranges, which might pose a problem for some data mining methods.Thus, the values for each attribute should be normalized to values in the range [0,1] to have each attrubte in comparable value ranges.
vector length normalized to 1 (norm of the vector)
Since various attributes will be present to differing extents, the length of each data vector (the norm) will be in different ranges. It might thus be advisable for some data mining methods to normalize each vectr to length 1, to have them even better comparable.

Goal of the Seminar

The goal is to analyse and compare various techniques for unsupervised data analysis. Some of these techniques might be more suitable for certain kinds of data than others, or produce results that ar emore favourably interpreted in some applications than in others. The goal is to test different Data Mining techniques using the 3 different data sets presented above, and to analyze

How easy was the system to use ?
(data preprocessing, number of parameters to be configured, simplicity of the system, do we understand what the system does, are the results easily interpretable,...)
How stable was the system ?
(parameter sensitivity - die the results vary a lot when you slightly changed some of the parameters, time required to perform analysis, ...)
Which clusters were found in the system?
Are they comparable with the clusters found by the other systems?
Did the system perform equally well with the three different data sets, or did it work better with one set or the other?
Would you use this system again? If so, for which tasks?
further comments....

Methods Used

A set of different methods will be used for analysis, namely

Kosice:
- AGLO
- AutoClass
- k-NN
- EM - Expectation Maximization
Vienna:
- SOM - Self-Organizing Map
- GH-SOM - Growing Hierarchical Self-Organizing Map
- GTM - Generative Topographic Mapping
- ART2 - Adaptive Resonance Theory

The Paper

Each participant shall write a paper to be presented at our Workshop meeting in summer. Basically, the paper shall comprise the following:

Title
without comment ;-)
Abstract
a short abstract of about 250 words describing the jist of your paper: what is the paper about: what is the problem, how are you trying to solve it, and what are the results.
Introduction
A bit more deteiled description of the problem: many data mining techniques, decide which to use -C comparison of various techniques,...
Related Work
a short review of some work others have done in this field: data mining tools and applicatiosn, various methods, ...
The Method
a description of the mothod you are using: what does it do, how does it do it - the technical stuff
Experiments
description of the experiments set-up: data sets used, transformation/normalization
description of the results obtained
brief comparison of results with thos of other methods
Conclusion
a short summary and outlook
References
a short list of references: papers about the method used, manuals, etc.

The length of the paper shall be between 6 to max. 12 pages. Style files for LaTeX and MS Word will be provided.

Date and Program of the Workshop

Tentative program - the exact details will be decided upon soon...

May 26 - 28, 2000 with the following program:
May 26 - arrival in Kosice, accommodation, trip downtown, common dinner
May 27:
- 9.00 - 9.15 opening of the workshop by J. Paralic and A. Rauber
- 9.15 - 10.15 first block of presentations (approx. 10 min for presentation + 5 min for questions)
- 10.15 - 10.30 coffee break
- 10.30 - 11.30 second block of presentations
- 11.30 - 12.00 two group presentations (comparison of results achieved at particular sites)
- 12.00 - 14.00 common lunch
- 14.00 - 16.00 discussion and comparison of all results
- evening: common social program and/or free program, decision about tomorrow's program
May 28: social program: trip to the hills surrounding Kosice
departure in the evening, retunr by night-train
May 29: arrival in Vienna at around 08:00 in the morning.

Some Photos

Participants

Proceedings

Kosice

Seminar

BACK