Methods Inf Med 2013; 52(05): 411-421
DOI: 10.3414/ME12-01-0101
Original Articles
Schattauer GmbH

Automated Selection of Relevant Information for Notification of Incident Cancer Cases within a Multisource Cancer Registry

V. Jouhet
1   Registre général des cancers de Poitou-Charentes, Faculté de médecine, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France
2   CHU de Bordeaux, Pole de santé publique, Service d’information medicale, Bordeaux, France
,
G. Defossez
1   Registre général des cancers de Poitou-Charentes, Faculté de médecine, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France
,
CRISAP, CoRIM,
P. Ingrand
1   Registre général des cancers de Poitou-Charentes, Faculté de médecine, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France
5   INSERM, CIC 802, Poitiers, France
› Institutsangaben
Weitere Informationen

Publikationsverlauf

received: 25. Oktober 2012

accepted: 27. März 2013

Publikationsdatum:
20. Januar 2018 (online)

Preview

Summary

Objective: The aim of this study was to develop and evaluate a selection algorithm of relevant records for the notification of incident cases of cancer on the basis of the in dividual data available in a multi-source information system.

Methods: This work was conducted on data for the year 2008 in the general cancer registry of Poitou-Charentes region (France). The selection algorithm hierarchizes information according to its level of relevance for tumoral topography and tumoral morphology independently. The selected data are combined to form composite records. These records are then grouped in respect with the notification rules of the International Agency for Research on Cancer for multiple primary cancers. The evaluation, based on recall, precision and F-measure confronted cases validated manually by the registry’s physi -cians with tumours notified with and without records selection.

Results: The analysis involved 12,346 tumours validated among 11,971 individuals. The data used were hospital discharge data (104,474 records), pathology data (21,851 records), healthcare insurance data (7508 records) and cancer care centre’s data (686 records). The selection algorithm permitted performances improvement for notification of tumour topography (F-measure 0.926 with vs. 0.857 without selection) and tumour morphology (F-measure 0.805 with vs. 0.750 without selection).

Conclusion: These results show that selection of information according to its origin is efficient in reducing noise generated by imprecise coding. Further research is needed for solving the semantic problems relating to the integration of heterogeneous data and the use of non-structured information.