Tutorials
Three tutorials, two half-day (3 hours) and one
full day (6 hours), will be held at the 8th
International Conference on Discovery Science
(DS-05), October 8-11, 2005, at the Marina
Mandarin Hotel in Singapore.
The tutorials will be held on the first day of
the conference, Saturday, October 8th,
2005.
| 1st coffee break: | 10.30-11.00 |
| lunch break: | 12.30-14.00 |
| 2nd coffee break: | 15.30-16.00 |
T1: Bioinformatics in Practice
(morning, 3h, slides)
Bioinformatics has attracted considerable attention in the life
sciences and related industry in the past decade due to the
accumulation of huge amount of biomedical data and the imminent need
to turn such data into useful knowledge. The knowledge gained can lead
to improved understanding, improved drug target, improved diagnostics,
and improved treatment plan. This tutorial will introduce several
types of bioinformatics challenges, as well as solutions to instances
of these challenges, including DNA feature recognition, protein
function inference, whole genome alignment, phylogenetic network,
peptide sequencing, disease treatment optimization, and mining errors
in bio databases.
Presenters:
Wing-Kin
Sung is an assistant professor in the Department of
Computer Science, National University of Singapore (NUS) and a senior
group leader in the Genome Institute of Singapore (GIS). His research
interest is on algorithm and its applications on bioinformatics. He
has over 10 years experience in bioinformatics research. Prior joining
NUS, Wing-Kin worked as a Post-Doctoral Fellow in Yale University and
worked as a Senior Technology Officer in the E-business technology
institute in the University of Hong Kong. He received both the B.Sc.
and the Ph.D. degree in the Department of Computer Science from the
University of Hong Kong.
Limsoon
Wong has recently joined the National University of
Singapore as a professor at the School of Computing and Faculty of
Medicine. Prior to his present appointment, he served the A*STAR
Institute for Infocomm Research for 17 years, rising to the position
of Deputy Executive Director. He is currently working mostly on
knowledge discovery technologies and is especially interested in their
application to biomedicine. Prior to that, he has done significant
research in database query language theory and finite model theory, as
well as significant development work in broad-scale data integration
systems. Limsoon has written about 100 research papers, a few of which
are among the best cited of their respective fields. He serves on the
editorial boards of several journals, and is a scientific advisor to a
number of companies. He received his BSc(Eng) from Imperial College
London and his PhD from University of Pennsylvania.
T2: Computational Scientific Discovery
(afternoon, 3h,
slides,
hand-outs)
Computational scientific discovery focuses on applying computational
methods to automate scientific activities, such as formulating laws
or models from observational data. It is based on a view of scientific
activities as problem solving tasks, which can be approached by
heuristic search through a space of possible problem solutions.
While early research on computational discovery focused on
reconstructing discoveries from history of science (e.g.,
rediscovering Kepler's Laws), recent efforts focus on individual
scientific activities and produce a number of novel scientific models
and laws. Many of these lead to publications in the relevant
scientific literature and they include qualitative laws of metallic
behavior, conjectures in graph theory, qualitative models of
chemical reactions, and temporal quantitative models of ecological
behavior. Much of the work in computational scientific discovery has
put emphasis on standard formalisms used to communicate among
scientists, including numeric equations, structural models, and
reaction pathways. In this sense, computational scientific discovery
is complementary to mining scientific data. The latter focuses
on building predictivhttp://www-ai.ijs.si/~ljupco/e models and employs formalisms such as
decision trees, rule sets, and probabilistic dependencies, rather
than producing knowledge in any standard scientific notation.
The tutorial will provide an introduction to computational methods
for discovery of scientific models, laws, and knowledge and give an
overview of recent advances in this area. The primary focus will be
on discovery in scientific and engineering disciplines, where
communication of knowledge is often a central concern.
Presenter:
Ljupco
Todorovski is a researcher at Jozef Stefan Institute and
assistant professor at University of Ljubljana, Slovenia. He has held
visiting researcher positions at the University of Porto, Portugal,
Osaka University, Japan, and Stanford University, USA. His research
interests are in the field of machine learning and especially
computational discovery of scientific laws and models from
observational and measurement data. Most of his research is focused on
integration of background knowledge in the process of induction by
transforming the knowledge into inductive constraints.
T3: Network Data Mining and Visualisation
(full-day, 6h)
Unexpectedly, it turned out on Oct. 6th, that
both presenters will not be able to attend the conference, although
they tried until the very last minute. They offered to hold the
tutorial via a Web conference, but we decided that it would be more
appropriate to cancel it. We apologize for this inconvenience.
(Johannes Fürnkranz, tutorial chair)
Network data mining is the art and science of discovering network
models within myriads of individual data items. It also utilises
special algorithms that aid visualisation of "emergent" patterns and
trends in the linkage. These techniques complement conventional
predictive data mining methods, which typically examine a
collection of input attributes trying to build a model that estimates
the value of and one or more outputs. Network data mining
developments are a result of the shift away from the typical social
network analysis of small graphs and the properties of individual
vertices or edges within such graphs, to the discovery of very largescale
networks and investigation of their statistical properties. This
new wave in network research has been driven mainly by the
increased ability to collect large amounts of data that explicitly
represents network structures. Recent studies approached networks
with millions or even billions of vertices. The human eye is an
analytic tool of remarkable power and the visual analysis of
networks of tens or hundreds of vertices is an excellent way to gain
an understanding of their structure. With a network of a million or a
billion vertices the role of the eye in network data mining is
complemented by statistical methods for quantifying large networks
(citing Mark Newman from the Santa Fe Institute, statistical
methods in modern network analysis answer the question, "How
can I tell what this network looks like, when I can't actually look at
it?").
This tutorial will present the current state-of-art in network data
mining and the major systems currently available for network data
mining.. It will look at the principles that govern the topology and
evolution of networks that emerge from real world data, the
statistical properties of different types of networks, spanning from
path lengths and degree distributions to network clustering
coefficients. It will discuss the current ways to measure the
properties that characterise the structure and behaviour of networks,
the models of networks that can help in understanding the meaning
of these properties - how they came to be as they are, and how they
interact with one another.
The tutorial will present also a human-centered network data
mining methodology and one of the tools that supports this
methodology - NetMap Analytics. This part of the tutorial will be
organised along several case studies from discovering fraud,
organisational inter-relationships, analysis of the world internet
traffic and others. The tutorial will present also visualisation
techniques for discovering of network models in unstructured data.
Presenters:
Simeon J.
Simoff is currently an associate professor in information
technology and computing science and Head of e-Markets Research at the
University of Technology Sydney. He is also director of the Institute
of Analytic Professionals of Australia. Professor Simoff is known for
the unique blend of interdisciplinary scholarship, which integrates
the areas of data mining, design computing, virtual worlds and digital
media. This work has resulted in 9 co-authored or co-edited books,
more than 150 research papers and numerous cross-disciplinary courses
in information technology and computing. He has initiated and
co-chaired several conferences and workshop series in the area of data
mining, including The Australasian Data Mining Conference AusDM, the
Visual Data Mining workshops at ECML/PKDD and ICDM, and the Multimedia
Data Mining workshops at ACM SIGKDD conferences. He is an associate
editor of the ASCE International Journal of Computing in Civil
Engineering.
John
Galloway is Chief Scientist at NetMap Analytics, a Sydney
based technology company. He is also Adjunct Professor of Business
Intelligence, at the Faculty of Business, UTS (University of
Technology, Sydney) and Director of the Complex Systems Research
Centre at UTS. Professor Galloway's research has been informed,
initially by General Systems Theory and cybernetics, and then Complex
Systems Science. He founded NetMap Analytics in 1991, pioneering and
developing the NetMap technology, recognised worldwide as the premier
tool for leading edge visualisation and analysis in the areas of
fraud, crime, consumer behaviour and network analysis. By applying
algorithms he developed for "difficult to solve" problems (for which a
specific question is difficult to ask, and the problem tends to be
non-linear and complex in type and often on-going and costly for an
organisation), he developed the notion that a different method of
analysis could be used to complement the methods of regular
statistics, standard query language (SQL) data base querying and, more
recently, neural networks. This would provide a completely different
level of analysis and understanding/knowledge about patterns and
trends in data - what he terms a bottom-up or "emergent" approach.
Since late 2002, he has relinquished daily management tasks in order
to revisit and further develop the scientific basis of the NetMap
technology.