Summary
Objectives
: This paper discusses and develops a document image recognition, keyword extraction
and automatic XML generation system to search analogous cases from paper-based documents.
In this paper, we propose the document structure recognition method and automatic
XML generation method for the tabular form discharge summary documents. This paper
also develops the prototype system using the proposed method. Evaluation experiments
using actual documents are doneto discuss the effectiveness of the developed system.
Methods
: The developed system consists of the following methods. Paper-based summary documents
are scanned by a scanner using 300 dpi first. Noise and tilt of the image are reduced
by pre-processing, and the table structures are identified. Characters in the table
are recognized and converted to text data by the OCR engine. XML documents are automatically
generated using obtained results.
Results
: In this paper, patient discharge summary documents archived at Mie University Hospital
were used. The results show that XML documents can be automatically generated when
standard tabular form documents are input into the developed system. In this experiment,
it takes about 20 seconds to generate an XML document using the general personal computer.
This paper also compares the developed system with a commercial product to discuss
the effectiveness of the present system. Experimental results also show that the accuracy
of table structure recognition is high and it can be used in a practical situation.
Conclusions
: This paper showed the effectiveness of the proposed method to recognize the tabular
form document images to generate XML documents.
Keywords
Document structure recognition - analogous case search - tabular form documents -
patient discharges - computer-assisted image processing