Resources
This site contains datasets, applications, tools and other resources publicly provided by the KE Group.
The following list gives a short description of the available resources:
Datasets
-
- EUR-Lex text collection
- The EUR-Lex text collection provides a large multlabel classification benchmark with up to 4000 different classes.
-
- Datasets for Graded Multilabel Classification
- The known BeLaE Dataset and two new datasets from medical text classification and movie ratings.
- Datasets for Graded Multilabel Classification
-
- Incident-Related Twitter Datasets
- These datasets comprise labeled tweets from 10 major cities in the English-speaking world. The tweets were selected and labeled for the domain of incident detection.
-
- Medical Concept Embeddings
- Concept vector representations learned from a large labeled background corpus. These were used for computing the semantic similarity between terms from the medical domain.
- Medical Concept Embeddings
Ontologies
-
- UI² Ontology
- The UI² Ontology is a formal ontology for describing user interfaces, their components, and the possible interactions with them.
Software
-
- Computer Poker Bots and the TUD poker framework
- A small repository of (old) Computer Poker Bots and our framework for developing, comparing bots and playing against them with a GUI
- Computer Poker Bots and the TUD poker framework
-
- Attachment Checker
- A Thunderbird plugin that learns to warn you when you forget to attach a file to your message.
-
- Classification GUI
- A graphical user interface that allows to intuitively assign concepts from an ontology to a set of documents in order to quickly and easily develop a (multilabel) classification dataset.
-
- Peewit
- A light-weight meta-framework for machine learning experiments.
-
- FeGeLOD
- A tool for generating machine-learning features from Linked Open Data.
-
- Explain-a-LOD
- A tool for generating possible explanations for statistics based on Linked Open Data.
-
- SeCo
- A framework for Separate-and-Conquer Rule Learning.
-
- Perceptrovement
- A highly modular framework for the efficient Perceptron algorithm containing a great collection of effective extensions
-
- MoB4LOD
- A framework for creating customized browser applications for Linked Open Data
-
- JFreeWebSearch
- A free (i.e., no registration and API key required) Java library to perform searches on the web
-
- Ontology Matching Tools
- The KE group has developed a variety of ontology matching tools.
- Ontology Matching Tools
- Graded Multilabel Classification, Code and Data
The code and data used for our paper about pairwise graded multilabel classification. In this setting, a label is not only present or absent, but can have several grades, e.g. stars. - P³oodle
A browser extension/add-on for personalized privacy-protected web search. - AiTextML
Learn continuous vector representations jointly for words, documents, and labels. Use corpora with labelled documents and use also descriptions of labels. This enables also to do zero-shot learning, i.e., to predict labels for which no documents were observed during training.
Computing
-
- Students Pool
- Students who are active in our group have the possibility to use our infrastructure and our pool with six Linux-based computers in room D205.
-
- Get to know our research computing cluster.