Methods Inf Med 2013; 52(05): 411-421
DOI: 10.3414/ME12-01-0101
Original Articles
Schattauer GmbH

Automated Selection of Relevant Information for Notification of Incident Cancer Cases within a Multisource Cancer Registry

V. Jouhet
1   Registre général des cancers de Poitou-Charentes, Faculté de médecine, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France
2   CHU de Bordeaux, Pole de santé publique, Service d’information medicale, Bordeaux, France
G. Defossez
1   Registre général des cancers de Poitou-Charentes, Faculté de médecine, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France
P. Ingrand
1   Registre général des cancers de Poitou-Charentes, Faculté de médecine, Centre Hospitalier Universitaire de Poitiers, Université de Poitiers, Poitiers, France
5   INSERM, CIC 802, Poitiers, France
› Author Affiliations
Further Information

Publication History

received: 25 October 2012

accepted: 27 March 2013

Publication Date:
20 January 2018 (online)


Objective: The aim of this study was to develop and evaluate a selection algorithm of relevant records for the notification of incident cases of cancer on the basis of the in dividual data available in a multi-source information system.

Methods: This work was conducted on data for the year 2008 in the general cancer registry of Poitou-Charentes region (France). The selection algorithm hierarchizes information according to its level of relevance for tumoral topography and tumoral morphology independently. The selected data are combined to form composite records. These records are then grouped in respect with the notification rules of the International Agency for Research on Cancer for multiple primary cancers. The evaluation, based on recall, precision and F-measure confronted cases validated manually by the registry’s physi -cians with tumours notified with and without records selection.

Results: The analysis involved 12,346 tumours validated among 11,971 individuals. The data used were hospital discharge data (104,474 records), pathology data (21,851 records), healthcare insurance data (7508 records) and cancer care centre’s data (686 records). The selection algorithm permitted performances improvement for notification of tumour topography (F-measure 0.926 with vs. 0.857 without selection) and tumour morphology (F-measure 0.805 with vs. 0.750 without selection).

Conclusion: These results show that selection of information according to its origin is efficient in reducing noise generated by imprecise coding. Further research is needed for solving the semantic problems relating to the integration of heterogeneous data and the use of non-structured information.

  • References

  • 1 Buemi A. Pathology of Tumours for Cancer Registry Personnel. Lyon: IARC; 2008.
  • 2 Percy C, Fritz A, Jack A, Shanmugarathan S, Sobin L, Parkin D. et al. International Classification of Diseases for Oncology (ICD-O). Third ed World Health Organization; 2000.
  • 3 Curado M, Okamoto N, Ries L, Sriplung H, Young J, Carli M. et al. International rules for multiple primary cancers. ICD-O Third Edition 2004.
  • 4 Comité national des registres (CNR). Appel à qualification et procédure d’évaluation Institut de veille sanitaire; 2010 (updated 10/06/2010; cited 26/05/2011). Available from registres/fonctionnement.htm.
  • 5 Black RJ, Simonato L, Storm HH, Démaret E. Automated Data Collection in Cancer Registration. Lyon: IARC; 1998.
  • 6 MacKay EN, Sellers AH. The Ontario cancer incidence survey, 1964-1966: a new approach to cancer data acquisition. Can Med Assoc J 1973; 109 (06) 489 passim
  • 7 Clarke EA, Marrett LD, Kreiger N. Cancer registration in Ontario: a computer approach. In: Jensen OM, Parkin DM, MacLennan R, Muir CS, Skeet RG. editors Cancer Registration: Principles and Methods. IARC Sci Publ 1991; 95: 246-257.
  • 8 Contiero P, Tittarelli A, Maghini A, Fabiano S, Frassoldi E, Costa E. et al. Comparison with manual registration reveals satisfactory completeness and efficiency of a computerized cancer registration system. J Biomed Inform 2008; 41 (01) 24-32.
  • 9 Cancer incidence in five continents Volume VIII. IARC Sci Publ 2002; 155: 1-781.
  • 10 Olive F, Gomez F, Schott AM, Remontet L, Bossard N, Mitton N. et al. Critical analysis of French DRG based information system (PMSI) databases for the epidemiology of cancer: a longitudinal approach becomes possible. Rev Epidemiol Santé Publique 2011; 59 (01) 53-58.
  • 11 Tagliabue G, Maghini A, Fabiano S, Tittarelli A, Frassoldi E, Costa E. et al. Consistency and accuracy of diagnostic cancer codes generated by automated registration: comparison with manual registration. Popul Health Metr 2006; 4: 10
  • 12 Tognazzo S, Andolfo A, Bovo E, Fiore AR, Greco A, Guzzinati S. et al. Quality control of automatically defined cancer cases by the automated registration system of the Venetian Tumour Registry. Cent Eur J Public Health 2005; 15 (06) 657-664.
  • 13 Tognazzo S, Emanuela B, Rita FA, Stefano G, Daniele M, Fiorella SC. et al. Probabilistic classifiers and automated cancer registration: an exploratory application. J Biomed Inform 2009; 42 (01) 1-10.
  • 14 Couris CM, Polazzi S, Olive F, Remontet L, Bossard N, Gomez F. et al. Breast cancer incidence using administrative data: correction with sensitivity and specificity. J Clin Epidemiol 2009; 62 (06) 660-666.
  • 15 Berg JW. Morphologic classification of human cancer. In: Shottenfeld Jr. DFJ. editor. Cancer epidemiology and prevention. 2nd ed.. New York: Oxford University Press; 1996.
  • 16 Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv 2002; 34 (02) 1-47.
  • 17 Jouhet V, Defossez G, Burgun A, Le Beux P, Levillain P, Ingrand P. et al. Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer. Methods Inf Med 2012; 51 (03) 242-251.
  • 18 Holowaty EJ, Lee G, Moravan V, Chong N, Dale DJ. A Reabstraction Study to Estimate the Completeness and Accuracy of Data Elements in the Ontario Cancer Registry. Report submitted to Health Canada. Cancer Care Ontario; Toronto, Canada: 1996.