Digital Library

cab1

 
Title:      EMPIRICAL EVALUATION OF CRF-BASED BIBLIOGRAPHY EXTRACTION FROM RESEARCH PAPERS
Author(s):      Manabu Ohta, Ryohei Inoue, Atsuhiro Takasu
ISBN:      978-972-8939-68-7
Editors:      Miguel Baptista Nunes, Pedro IsaĆ­as and Philip Powell
Year:      2012
Edition:      Single
Keywords:      Bibliography Extraction, Conditional Random Fields (CRF), Error Detection, OCR, Digital Library
Type:      Full Paper
First Page:      18
Last Page:      26
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      We proposed an automatic bibliography extraction method for research papers scanned with OCR markup. The method uses conditional random fields (CRF) to label serially OCRed text lines in the article title page as appropriate bibliographic element names. Although we achieved good extraction accuracies for some Japanese academic journals, extraction errors are inevitable. Therefore, this paper proposes three confidence measures for bibliography labeling to detect such extraction errors. This paper also reports an empirical evaluation of CRF-based page analysis for research papers on the basis not only of labeling accuracy but also of labeling error detection. We applied the three confidence measures to labeling three academic journals published in Japan. The experiments showed that the proposed confidence measures reasonably indicated the labeling accuracies and could be used for error detection.
   

Social Media Links

Search

Login