Code and Data for Graded Multilabel Classification

General Information and Citation

On this page you will find data and the code for the various approaches from 

C. Brinker, E. Loza Mencía and J. Fürnkranz, "Graded Multilabel Classification by Pairwise Comparisons", in Proceedings of the 14th International Conference on Data Mining (ICDM-2014), 2014 [bibtex]

and the experiments described there. A longer version with more detailed and complete descriptions can be found in

C. Brinker, E. Loza Mencía and J. Fürnkranz, "Graded multilabel classification by pairwise comparisons", TU Darmstadt, Tech. Rep. TUD-KE-2014-01, 2014. [pdf] [bibtex]


The code uses an outdated version of the LPCforSOS framework. This uses Java 1.6 and Weka 3.7. The code was developed using the Eclipse IDE. Because of incompabillities between the different approaches in the usage of the framework, each of the approaches is developed in its own branch of the framework as separate eclipse project. Each approach is represented as a separate folder in the archive file

  • BinaryRelevance
  • FrankHall
  • FullCLR
  • HorizontalCLR
  • JoinedCLR

NOTE: We obtained the code of the IBLR-ML algorithm directly from the authors, so you may obtain it from there too or use the version contained in MULAN


The following datasets were used in the evaluation:

  • BeLaE: This dataset consists of 1930 instances each representing a graduate student. Each instance has 50 attributes. Two attributes, age and sex, characterize the student, the remaining 48 attributes represent the actual questions to the students, which were on the importance of certain properties of their future jobs. Each of these answers has a grade from `1' (completely unimportant) to `5' (very important). 
  • In view of the lack of a more comprehensive and informative characterization of the students, Cheng et al. decided to use a subset of the question answers as additional attributes for characterizing the students. Following the same setup, we generated 50 datasets by choosing randomly a subset of n questions as target labels. The remaining 50-n attributes were used as features of the instances. We generated two kinds of datasets, for n=5 and n=10, respectively.
  • movies: We collected a dataset from the German TV program guide which rates movies by assigning grades to the categories `fun', `action', `sex', `suspense' and `sophistication' rather than giving an overall rating. Each category has grades from `0'  to `3'. In total, we had data for 1967 movies.  For characterizing them, we extracted the associated summary texts from the IMDB. Furthermore, we added the English title, the year, director's name, actors' names, characters' names, writers' names, runtime, country of origin, and language as text to the summary.The text was tokenized, stemmed with the Porter algorithm and common English stopwords were filtered. We computed then the TF-IDF values of the tokens on the respective training data of the 10-fold cross validation.
  • medical: The medical dataset consists of 1953 free text radiology reports. They were collected for the CMC's 2007 Medical Natural Language Center (homepage not online anymore) and three expert companies were asked to annotate them with a set of ICD-9-CM disease/diagnosis classification codes. In the original dataset for the multilabel classification competition, a document was assigned to a code if there was a consensus among at least two of the annotators on a specific code. In contrast, we generated a GMLC dataset by considering the level of agreement as grade of assignment. The texts were processed as for the movies dataset but we used the absolute term frequency in contrast to TF-IDF.

The following four different kinds of experiments were performed:

  • 10-fold cross validation on each of the 50 generated datasets from BeLaE with 5 labels (
  • 10-fold cross validation on each of the 50 generated datasets from BeLaE with 10 labels (
  • 10-fold cross validation on the medical dataset (
  • 10-fold cross validation on the movies dataset (, TF-IDF was computed separately on the training data of each split)
The data is available in a variation of the ARFF format extended in order to support the presence of grade information. The following format is used for the label attribute specification:
@ATTRIBUTE class [ML_Graded|(labelA),(labelB)(,...) × 0<1<2<(...maxGrade)|2]
where (label*) denote the label names and (maxGrade) is the highest possible grade and everything not in () is fixed in all datasets. In the @data section, we used the following format:
value1, value2 ... lastvalue, {labelA0,labelB0<labelC1<labelD2...}
supposing labelA and labelB have grade 0, labelC has grade 1, labelD2 has grade 2 etc. For the sparse format, the label information is added in the end after the feature value information, i.e. e.g.
{index1 value1, index 2 value2, ...}, {labelA0,labelB0<labelC1<labelD2...}


For each of the experiments there exists a separate main-class in each of the projects:


They can be found in each project under "<project-folder>/src/main/java/LPCforSOS/evaluation/".

Each of the main-methods in the classes consumes three arguments:

  1. Prediction method: 'v' - voting, 'w' - weighted voting, 't' - voting with weighted voting as tie breaking strategy
  2. path of the directory containing the dataset(s)
  3. path and name of the output file for the results (results will be appended)

NOTE: Weighted voting and tie breaking methods are actually not implemented for all of the approaches. In case of using them although not implemented the programm will terminate immediately.

NOTE: Some of the experiments will need their time. The code is not optimized for short runtimes and the experiments can take several hours. If you want to speed up please remark that the slowest part is the learning of the several base classifiers. This can be easily parallelized using threads and/or clusters.

Terms of Use

The software provided by the authors on this site is freely available. For external software or data that may be included in the distributables like libraries or datasets, please contact the original authors for their terms of use. Nevertheless, we would be glad if you would cite this site or our paper if you use the provided software or data.

small ke-icon

Knowledge Engineering Group

Fachbereich Informatik
TU Darmstadt

S2|02 D203
Hochschulstrasse 10

D-64289 Darmstadt

Telefon-Symbol +49 6151 16-21811
Fax-Symbol +49 6151 16-21812

A A A | Drucken | Impressum | Sitemap | Suche | Mobile Version
zum Seitenanfangzum Seitenanfang