# From Local Patterns to Global Models (LeGo-09)

## ECML/PKDD-09 Workshop

7 September 2009, Bled, Slovenia

## Background

Over the last decade, the field of local pattern discovery has grown rapidly, and a range of techniques is available for producing extensive collections of patterns. Because of the exhaustive nature of most such techniques, the pattern collections provide a fairly complete picture of the information content of the database. However, in many cases this is where the process stops. The so-called local patterns represent fragmented knowledge, and often it is not clear how the pieces of the puzzle can be combined into a global model. Because a useful global model, such as a classifier or regression model, is often the expected result of a Data Mining process, the question of how to turn large collections of patterns into global models deserves attention.

In this workshop, we will deal with the question of how to convert local patterns into an actionable global model, for example a classifier or regression model. Global modeling in this setting entails combining patterns effectively and dealing with possible redundancy or conflicts between the reported patterns. In our view, a common ground of all the local pattern mining techniques is that they can be considered to be feature construction techniques that follow different objectives (or constraints).

## Workshop Goals

This workshop is a follow-up to the LeGo-08 workshop that was held at last year's ECML/PKDD conference. Its is to bring people that work on various aspects of this subject into a fruitful dicussion about the state-of-the-art and the remaining open problems, about commonalities and differences in their respective works. Some research questions, which we consider to be particularly relevant are:

**Metrics for Pattern Set Selection:**- Evaluation metrics for local and global models have been investigated in quite some depth. However, for the Pattern Set Discovery task, which does not evaluate isolated patterns but pattern sets, it is still quite unclear what types of metrics and constraints can be defined and what effects they will have.
**Efficient Pattern Set Discovery:**- As an exponential number of subsets exist, exhaustive methods will only work for small pattern collections. How can this process be sped up, and how can good approximations be achieved?
**Propagation of Constraints:**- How can global constraints be propagated back to local constraints? What type of local patterns must be found in order to guarantee a high performance on the global modeling task? Which local constraints optimize which global constraints?
**General-Purpose Constraints:**- A key advantage of the modular approach could be that local patterns may be mined independently and can be re-used for several Global Modeling tasks. Are there general local constraints, such as on frequency or entropy, that give a reasonable performance on a wide variety of Global Modeling tasks?

## Workshop Format

The workshop will be held as a one-day session with paper presentations. Submitted papers will be reviewed by the programme committee. We will also admit position statements or interesting incomplete or immature work into the program, which will hopefully stimulate discussions, in order to further the main goal of the workshop, namely to increase the awareness of different approaches to this general problem.

Presentation times for the individual papers will be alloted based on the reviews, the length, significance and the expected interest of the contribution (we foresee 10, 20, and 30 minutes presentations). Papers will be arranged into a hopefully coherent program, putting papers that address the same or similar type of problems into the same sessions. Each session is planned to end with a discussion on all presented papers (in addition to the usual question periods after each presentation). Discussions will be stimulated by designated session chairs whose expertise is in the session area.

## Topics of Interest

The workshop calls for papers related to global modeling using local patterns. The following is an (incomplete) list of suitable topics. We would like to stress that these topics should be considered in the context of the theme of the workshop. Particularly, papers concerning either global modeling or local pattern discovery in isolation cannot be accepted for inclusion into the workshop program.

- Associative Classification
- Combination Strategies
- Compression-based Pattern Selection
- Constraint-based Pattern Set Mining
- Ensembles of Patterns
- Feature Construction and Selection
- Global Modeling with Patterns
- Iterative Local Pattern Discovery
- KDD Process-models for Building Global Models from Local Patterns}
- Parallel Universes
- Pattern Ordering
- Patterns and Information Theory
- Pattern Set Selection
- Pattern Teams
- Propositionalisation
- Quality Measures for Pattern Sets
- Resolution of Conflicting Predictions
- Subgroup Discovery

## Program

The program below contains download-links to individual papers. You can also download theentire workshop proceedings in one PDF-File.

09:00 - 09:15 | Welcome and overview |

09:15 - 10:00 | Invited Presentation:Pattern-Based Classification: A Unifying Perspective (B. Bringmann, S. Nijssen, and A. Zimmermann) [Slides] |

10:00 - 10:30 | Local Constraint-Based Mining and Set Constraint Programming for Pattern Discovery (M. Khiari, P. Boizumault, B. Cremilleux) [Slides] |

10:30 - 11:00 | Coffee break |

11:00 - 11:30 | Invited Presentation:Scoring Pattern Sets Using Statistical Models. (N. Tatti, H. Heikinheimo) |

11:30 - 12:00 | Incorporating Exceptions: Efficient Mining of e-Relevant Subgroup Patterns (F. Lemmerich, M. Atzmüller) |

12:00 - 12.20 | Towards Understanding Spammers - Discovering Local Patterns for Concept Description (M. Atzmüller, F. Lemmerich, Beate Krause, Andreas Hotho) |

12:20 - 12:40 | An Enhanced Incremental Prototype Classifier using Subspace Representation Scheme (Y. Xu, F. Shen, J. Zhao, O. Hasegawa) |

12:40 - 13:40 | Lunch |

13:40 - 14:10 | Invited Presentation:Subgroup Discovery (N. Lavrac, P. Kralj Novak) |

14:10 - 14:30 | Player Modeling for Intelligent Difficulty Adjustment (O. Missura, T. Gärtner) |

14:30 - 15:00 | Feature Set-based Consistency Sampling in Bagging Ensembles (J. Blaszczynski, R. Slowinski, J. Stefanowski) |

15:00 - 15:20 | Coffee break |

15:20 - 15:40 | Building Classifiers from Pattern Teams (A. Knobbe, J. Valkonet) [Slides] |

15:40 - 16:00 | A Study of Probability Estimation Techniques for Rule Learning (J. Sulzmann, J. Fürnkranz) [Slides] |

16:00 - 16:20 | Levelwise Cluster Mining under a Maximum SSE Constraint (J. De Knijf, B. Goethals, A. Prado) [Slides] |

16:50 - 17:00 | LeGo Discussion |

## Organizers

- Arno Knobbe (LIACS, Leiden University, and Kiminkii)
- Johannes Fürnkranz (TU Darmstadt)

## Programme Committee

(may still be expanded)

- Bruno Crémilleux (Université de Caen)
- Ad Feelders (Universiteit Utrecht)
- Henrik Grosskreutz (Fraunhofer IAIS, Bonn)
- Szymon Jaroszewicz (National Institute of Telecommunications, Warsaw)
- Alipio Jorge (University of Porto)
- Arne Koopman (Universiteit Utrecht)
- Petra Kralj Novak (Joszef Stefan Institute)
- Siegfried Nijssen (K.U. Leuven)
- Sang-Hyeun Park (TU Darmstadt)
- Martin Scholz (HP Research)
- Jan-Nikolas Sulzmann (TU Darmstadt)
- Celine Vens (Katholieke Universiteit Leuven)
- Jilles Vreeken (Universiteit Utrecht)
- Bernd Wiswedel (Universität Konstanz)
- Stefan Wrobel (Fraunhofer IAIS)