Medical Concept Embeddings via Labeled Background Corpora

Resources of the publication: embeddings, software and external sources

This page contains the resources used in and resulting from

Eneldo Loza Mencía, Gerard de Melo and Jinseok Nam, Medical Concept Embeddings via Labeled Background Corpora, in: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), 2016 [bibtex]

AiTextML

The vector representations

Example Code

Will come soon!

Software

The embeddings were learned with the software AiTextML written by Jinseok Nam, see also the corresponding publication. The source code and installation instructions are available at the project site at GitHub.

Other Resources

Assessed Pairs of Medical Concepts

Around 500-600 pairs of medical concepts were assessed by human experts regarding their similarity (UMNSRS_similarity.csv) and relatedness (UMNSRS_relatedness.csv) and made available through Medical Residents Similarity and Relatedness Set datasets. In addition, the Medical Coders Set (MayoSRS.terms) provides 101 pairs. All dataset were made available by the University of Minnesota.

Embeddings trained from PubMed

Pretrained word embeddings trained on abstracts and full documents from PubMed  and the Wikipedia were used from the Natural Language Processing Laboratory.

Web-Interface for computing path-based proximity

A web-interface to the UMLS::Similarity software package for obtaining similarity and relatedness measures between biomedical terms is available.

UMLS Ontology

The Unified Medical Language System ontology is available through a web interface or you can download it from the web site. However, you will need a (usually free) account. The UMLS ontology also includes a mapping to the MeSH ontology.

MeSH 2015 ontology

The used concepts are from the Medical Subjects Headings ontology. You can download the descriptors from http://www.nlm.nih.gov/mesh/filelist.html. Please note that we used 2015 MeSH in our experiments.

BioASQ background corpus

The BioASQ dataset is a subset from the PubMed database for biomedical publications and can be downloaded by the competition site (Task 3a) after registration.

Terms of Use

The data provided by the authors on this site is freely available. For external software (including AiTextML) or data that may be included in the distributables like libraries or datasets, please contact the original authors for their terms of use. Nevertheless, we would be glad if you would cite this site or our paper if you use the provided software or data.

 

A A A | Drucken | Impressum | Sitemap | Suche | Mobile Version
zum Seitenanfangzum Seitenanfang