This site contains datasets, applications, tools and other resources publicly provided by the KE Group.

The following list gives a short description of the available resources:


  • EUR-Lex text collection
    The EUR-Lex text collection provides a large multlabel classification benchmark with up to 4000 different classes.
  • Datasets for Graded Multilabel Classification
    The known BeLaE Dataset and two new datasets from medical text classification and movie ratings.
  • Incident-Related Twitter Datasets
    These datasets comprise labeled tweets from 10 major cities in the English-speaking world. The tweets were selected and labeled for the domain of incident detection.
  • Medical Concept Embeddings
    Concept vector representations learned from a large labeled background corpus. These were used for computing the semantic similarity between terms from the medical domain.


  • UI² Ontology
    The UI² Ontology is a formal ontology for describing user interfaces, their components, and the possible interactions with them.


  • Computer Poker Bots and the TUD poker framework
    A small repository of (old) Computer Poker Bots and our framework for developing, comparing bots and playing against them with a GUI
  • Attachment Checker
    A Thunderbird plugin that learns to warn you when you forget to attach a file to your message.
  • Classification GUI
    A graphical user interface that allows to intuitively assign concepts from an ontology to a set of documents in order to quickly and easily develop a (multilabel) classification dataset.
  • Peewit
    A light-weight meta-framework for machine learning experiments.
  • FeGeLOD
    A tool for generating machine-learning features from Linked Open Data.
  • Explain-a-LOD
    A tool for generating possible explanations for statistics based on Linked Open Data.
  • SeCo
    A framework for Separate-and-Conquer Rule Learning.
  • Perceptrovement
    A highly modular framework for the efficient Perceptron algorithm containing a great collection of effective extensions
  • MoB4LOD
    A framework for creating customized browser applications for Linked Open Data
  • JFreeWebSearch
    A free (i.e., no registration and API key required) Java library to perform searches on the web
  • Ontology Matching Tools
    The KE group has developed a variety of ontology matching tools.
  • Graded Multilabel Classification, Code and Data
    The code and data used for our paper about pairwise graded multilabel classification. In this setting, a label is not only present or absent, but can have several grades, e.g. stars.
  • P³oodle
    A browser extension/add-on for personalized privacy-protected web search.
  • AiTextML
    Learn continuous vector representations jointly for words, documents, and labels. Use corpora with labelled documents and use also descriptions of labels. This enables also to do zero-shot learning, i.e., to predict labels for which no documents were observed during training.



small ke-icon

Knowledge Engineering Group

Fachbereich Informatik
TU Darmstadt

S2|02 D203
Hochschulstrasse 10

D-64289 Darmstadt

Telefon-Symbol+49 6151 16-21811
Fax-Symbol +49 6151 16-21812

A A A | Drucken | Impressum | Sitemap | Suche | Mobile Version
zum Seitenanfangzum Seitenanfang