Digital Library

cab1

 
Title:      WPPS: A NOVEL AND COMPREHENSIVE FRAMEWORK FOR WEB PAGE UNDERSTANDING AND INFORMATION EXTRACTION
Author(s):      Ruslan R. Fayzrakhmanov
ISBN:      978-989-8533-09-8
Editors:      Bebo White and Pedro Isaías
Year:      2012
Edition:      Single
Keywords:      Web information extraction, web page understanding, ontological models, object oriented paradigm, declarative approach, bridged adapter
Type:      Full Paper
First Page:      19
Last Page:      26
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      In this paper, we present WPPS, a new, highly configurable Java-based framework for developing efficient and robust methods that address problems in the fields of web page understanding and information extraction. Furthermore, we introduce the representation of a web page as a unified ontological model (UOM), describing its different aspects such as layout, visual features, interface, DOM tree, and its logical structure, as well as their features and relations. An API provided for the development of new methods makes it possible to combine a declarative approach, represented by a set of inference rules and SPARQL queries, with an object oriented approach. The latter is realised by providing a necessary level of abstraction to work with ontological concepts as Java classes. Abstraction is made via the software design pattern “bridged adapter”, which is introduced in this paper. We illustrate the framework with one example scenario about web page navigation menu. The framework and the UOM have demonstrated their efficiency in ABBA and TAMCROW projects.
   

Social Media Links

Search

Login