Methods Inf Med 1996; 35(04/05): 309-316
DOI: 10.1055/s-0038-1634680
Original Article
Schattauer GmbH

Machine Learning of Motor Vehicle Accident Categories from Narrative Data

M. R. Lehto
1   School of Industrial Engineering, Purdue University, West Lafayette, IN, USA
,
G. S. Sorock
2   Liberty Mutual Research Center for Safety and Health, Hopkinton, MA, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
20 February 2018 (online)

Abstract:

Bayesian inferencing as a machine learning technique was evaluated for identifying pre-crash activity and crash type from accident narratives describing 3,686 motor vehicle crashes. It was hypothesized that a Bayesian model could learn from a computer search for 63 keywords related to accident categories. Learning was described in terms of the ability to accurately classify previously unclassifiable narratives not containing the original keywords. When narratives contained keywords, the results obtained using both the Bayesian model and keyword search corresponded closely to expert ratings (P(detection)≥0.9, and P(false positive)≤0.05). For narratives not containing keywords, when the threshold used by the Bayesian model was varied between p>0.5 and p>0.9, the overall probability of detecting a category assigned by the expert varied between 67% and 12%. False positives correspondingly varied between 32% and 3%. These latter results demonstrated that the Bayesian system learned from the results of the keyword searches.

 
  • References

  • 1 Langley JD. Experiences using New Zealand’s hospital based surveillance system for injury prevention. Meth Inform Med 1995; 34: 340-4.
  • 2 Buckley SM, Chalmers DJ, Langley JD. Injuries due to falls from horses. Aust J Pub Health 1993; 3: 269-71.
  • 3 McLoughlin E, Langley JD, Laing RM. Prevention of children’s burns: Legislation and fabric flammability. NZ Med J 1986; 99: 804-7.
  • 4 Jenkins EL, Hard DL. Implications for the use of E codes of the international classification of diseases and narrative data in identifying tractor-related deaths in agriculture, United States, 1980-1986. Scand J Work Environmental Health 1992; 18 (Suppl): 49-50.
  • 5 Sorock GS, Ranney TA, Lehto MR. Motor vehicle crashes in roadway construction workzones: an analysis using narrative text from insurance claims. Acci Anal and Prev 1996; 28: 131-8.
  • 6 West RJ. Development and Use of a System for Classifying Accidents based on Driver’s Reports. Technical Report. Transport and Road Research Laboratory; Crowthorne, Berkshire: 1996
  • 7 Salton G, McGill MJ. Introduction to Modern Information Retrieval. New York: McGraw-Hill; 1983
  • 8 Baxendale PB. Machine made index for technical literature – an experiment. IBM J Res Dev 1958; 2: 354-61.
  • 9 Clarke DC, Wall RE. An economical program for limited parsing of English. In: AFIPS Conference Proceedings 1965; 27: 307-16.
  • 10 Dillon M, Gray AS. FASIT: A fully functional (syntactically) based indexing system. J Am Soc Info Sci 1983; 34: 99-108.
  • 11 Salton G, Buckley C, Smith M. On the application of syntactic methodologies in automatic text analysis. Info Proc and Mgmt 1990; 12: 43-51.
  • 12 Van Rijsbergen CJ. Information Retrieval. (2nd ed).. London: Butterworths; 1979
  • 13 Bookstein A. Probability and fuzzy-set applications to information retrieval. In: Williams ME. ed. Annual Review of Information Science and Technology (Vol 20). White Plains, NY: Knowledge Industry Publications; 1985: 117-51.
  • 14 Moustakis V, Lehto MR, Salvendy G. Survey of expert opinion: Which machine learning method may be used for which task?. Int J Human Computer Interaction 1996; 8: 221-36.
  • 15 Lehto MR. Warnings and Safety Instructions (Electronic Hypertext version 2.0). Ann Arbor, MI: Fuller Technical Publications; 1994
  • 16 Zhu W, Lehto MR. Decision Support for Indexing and Retrieval of Information in Hypertext Systems. (working paper) Purdue University; West Lafayette IN: 1996
  • 17 Tanner WP, Swets JA. A decision making theory of visual detection. Psych Rev 1954; 61: 401-9.
  • 18 Lachenbruch PA, Mickey MR. Estimation of error rates in discriminant analysis. Technometrics 1968; 10: 1-11.
  • 19 Gardner MJ, Altman DG. Calculating confidence intervals for proportions and their differences. In: Gardner MJ, Altman DG. eds. Statistics with Confidence-Confidence Intervals and Statistical Guidelines. London: British Medical Journal Publ; 1989: 28-33.