Home
Organization
Paper Submission
Schedule
Tutorials
Collocated Conference
Past Conferences
Local Information
Registration
Proceedings
Sponsors

Tutorials

Three tutorials, two half-day (3 hours) and one full day (6 hours), will be held at the 8th International Conference on Discovery Science (DS-05), October 8-11, 2005, at the Marina Mandarin Hotel in Singapore.

The tutorials will be held on the first day of the conference, Saturday, October 8th, 2005.

LAST MINUTE NOTE: Tutorial T3 is cancelled!

No. Time Tutorial Title Presenters Slides
T1 09.00-12.30h Bioinformatics in Practice Wing-Kin Sung, Limsoon Wong ppt
T2 14.00-17.30h Computational Scientific Discovery Ljupco Todorovski pdf 4up/pdf
T3 09.00-17.30h Network Data Mining and Visualisation Simeon J. Simoff, John Galloway cancelled!

1st coffee break: 10.30-11.00
lunch break: 12.30-14.00
2nd coffee break: 15.30-16.00

T1: Bioinformatics in Practice

(morning, 3h, slides)

Bioinformatics has attracted considerable attention in the life sciences and related industry in the past decade due to the accumulation of huge amount of biomedical data and the imminent need to turn such data into useful knowledge. The knowledge gained can lead to improved understanding, improved drug target, improved diagnostics, and improved treatment plan. This tutorial will introduce several types of bioinformatics challenges, as well as solutions to instances of these challenges, including DNA feature recognition, protein function inference, whole genome alignment, phylogenetic network, peptide sequencing, disease treatment optimization, and mining errors in bio databases.

Presenters:

Wing-Kin Sung is an assistant professor in the Department of Computer Science, National University of Singapore (NUS) and a senior group leader in the Genome Institute of Singapore (GIS). His research interest is on algorithm and its applications on bioinformatics. He has over 10 years experience in bioinformatics research. Prior joining NUS, Wing-Kin worked as a Post-Doctoral Fellow in Yale University and worked as a Senior Technology Officer in the E-business technology institute in the University of Hong Kong. He received both the B.Sc. and the Ph.D. degree in the Department of Computer Science from the University of Hong Kong.

Limsoon Wong has recently joined the National University of Singapore as a professor at the School of Computing and Faculty of Medicine. Prior to his present appointment, he served the A*STAR Institute for Infocomm Research for 17 years, rising to the position of Deputy Executive Director. He is currently working mostly on knowledge discovery technologies and is especially interested in their application to biomedicine. Prior to that, he has done significant research in database query language theory and finite model theory, as well as significant development work in broad-scale data integration systems. Limsoon has written about 100 research papers, a few of which are among the best cited of their respective fields. He serves on the editorial boards of several journals, and is a scientific advisor to a number of companies. He received his BSc(Eng) from Imperial College London and his PhD from University of Pennsylvania.

T2: Computational Scientific Discovery

(afternoon, 3h, slides, hand-outs)

Computational scientific discovery focuses on applying computational methods to automate scientific activities, such as formulating laws or models from observational data. It is based on a view of scientific activities as problem solving tasks, which can be approached by heuristic search through a space of possible problem solutions. While early research on computational discovery focused on reconstructing discoveries from history of science (e.g., rediscovering Kepler's Laws), recent efforts focus on individual scientific activities and produce a number of novel scientific models and laws. Many of these lead to publications in the relevant scientific literature and they include qualitative laws of metallic behavior, conjectures in graph theory, qualitative models of chemical reactions, and temporal quantitative models of ecological behavior. Much of the work in computational scientific discovery has put emphasis on standard formalisms used to communicate among scientists, including numeric equations, structural models, and reaction pathways. In this sense, computational scientific discovery is complementary to mining scientific data. The latter focuses on building predictivhttp://www-ai.ijs.si/~ljupco/e models and employs formalisms such as decision trees, rule sets, and probabilistic dependencies, rather than producing knowledge in any standard scientific notation. The tutorial will provide an introduction to computational methods for discovery of scientific models, laws, and knowledge and give an overview of recent advances in this area. The primary focus will be on discovery in scientific and engineering disciplines, where communication of knowledge is often a central concern.

Presenter:

Ljupco Todorovski is a researcher at Jozef Stefan Institute and assistant professor at University of Ljubljana, Slovenia. He has held visiting researcher positions at the University of Porto, Portugal, Osaka University, Japan, and Stanford University, USA. His research interests are in the field of machine learning and especially computational discovery of scientific laws and models from observational and measurement data. Most of his research is focused on integration of background knowledge in the process of induction by transforming the knowledge into inductive constraints.

T3: Network Data Mining and Visualisation

(full-day, 6h)

Unexpectedly, it turned out on Oct. 6th, that both presenters will not be able to attend the conference, although they tried until the very last minute. They offered to hold the tutorial via a Web conference, but we decided that it would be more appropriate to cancel it. We apologize for this inconvenience. (Johannes Fürnkranz, tutorial chair)

Network data mining is the art and science of discovering network models within myriads of individual data items. It also utilises special algorithms that aid visualisation of "emergent" patterns and trends in the linkage. These techniques complement conventional predictive data mining methods, which typically examine a collection of input attributes trying to build a model that estimates the value of and one or more outputs. Network data mining developments are a result of the shift away from the typical social network analysis of small graphs and the properties of individual vertices or edges within such graphs, to the discovery of very largescale networks and investigation of their statistical properties. This new wave in network research has been driven mainly by the increased ability to collect large amounts of data that explicitly represents network structures. Recent studies approached networks with millions or even billions of vertices. The human eye is an analytic tool of remarkable power and the visual analysis of networks of tens or hundreds of vertices is an excellent way to gain an understanding of their structure. With a network of a million or a billion vertices the role of the eye in network data mining is complemented by statistical methods for quantifying large networks (citing Mark Newman from the Santa Fe Institute, statistical methods in modern network analysis answer the question, "How can I tell what this network looks like, when I can't actually look at it?").

This tutorial will present the current state-of-art in network data mining and the major systems currently available for network data mining.. It will look at the principles that govern the topology and evolution of networks that emerge from real world data, the statistical properties of different types of networks, spanning from path lengths and degree distributions to network clustering coefficients. It will discuss the current ways to measure the properties that characterise the structure and behaviour of networks, the models of networks that can help in understanding the meaning of these properties - how they came to be as they are, and how they interact with one another.

The tutorial will present also a human-centered network data mining methodology and one of the tools that supports this methodology - NetMap Analytics. This part of the tutorial will be organised along several case studies from discovering fraud, organisational inter-relationships, analysis of the world internet traffic and others. The tutorial will present also visualisation techniques for discovering of network models in unstructured data.

Presenters:

Simeon J. Simoff is currently an associate professor in information technology and computing science and Head of e-Markets Research at the University of Technology Sydney. He is also director of the Institute of Analytic Professionals of Australia. Professor Simoff is known for the unique blend of interdisciplinary scholarship, which integrates the areas of data mining, design computing, virtual worlds and digital media. This work has resulted in 9 co-authored or co-edited books, more than 150 research papers and numerous cross-disciplinary courses in information technology and computing. He has initiated and co-chaired several conferences and workshop series in the area of data mining, including The Australasian Data Mining Conference AusDM, the Visual Data Mining workshops at ECML/PKDD and ICDM, and the Multimedia Data Mining workshops at ACM SIGKDD conferences. He is an associate editor of the ASCE International Journal of Computing in Civil Engineering.

John Galloway is Chief Scientist at NetMap Analytics, a Sydney based technology company. He is also Adjunct Professor of Business Intelligence, at the Faculty of Business, UTS (University of Technology, Sydney) and Director of the Complex Systems Research Centre at UTS. Professor Galloway's research has been informed, initially by General Systems Theory and cybernetics, and then Complex Systems Science. He founded NetMap Analytics in 1991, pioneering and developing the NetMap technology, recognised worldwide as the premier tool for leading edge visualisation and analysis in the areas of fraud, crime, consumer behaviour and network analysis. By applying algorithms he developed for "difficult to solve" problems (for which a specific question is difficult to ask, and the problem tends to be non-linear and complex in type and often on-going and costly for an organisation), he developed the notion that a different method of analysis could be used to complement the methods of regular statistics, standard query language (SQL) data base querying and, more recently, neural networks. This would provide a completely different level of analysis and understanding/knowledge about patterns and trends in data - what he terms a bottom-up or "emergent" approach. Since late 2002, he has relinquished daily management tasks in order to revisit and further develop the scientific basis of the NetMap technology.