Summary
Objectives:
We propose an interlingua-based indexing approach to account for the particular challenges
that arise in the design and implementation of cross-language document retrieval systems
for the medical domain.
Methods:
Documents, as well as queries, are mapped to a language-independent conceptual layer
on which retrieval operations are performed. We contrast this approach with the direct
translation of German queries to English ones which, subsequently, are matched against
English documents.
Results:
We evaluate both approaches, interlingua-based and direct translation, on a large
medical document collection, the OHSUMED corpus. A substantial benefit for interlingua-based
document retrieval using German queries on English texts is found, which amounts to
93% of the (monolingual) English baseline.
Conclusions:
Most state-of-the-art cross-language information retrieval systems translate user
queries to the language(s) of the target documents. In contradistinction to this approach,
translating both documents and user queries into a language-independent, concept-like
representation format is more beneficial to enhance cross-language retrieval performance.
Keywords
Information storage and retrieval - cross-language information retrieval - search
engine - OHSUMED