Digital Library

cab1

 
Title:      AN APPROACH FOR EXTRACTING WEB FORM LABELS BASED ON DISTANCE ANALYSIS OF HTML COMPONENTS
Author(s):      Leonardo Bres dos Santos, Carina F. Dorneles, Ronaldo dos S. Mello
ISBN:      978-989-8533-09-8
Editors:      Bebo White and Pedro IsaĆ­as
Year:      2012
Edition:      Single
Keywords:      Web Forms, Web Data, Deep Web, Information Extraction, label extraction.
Type:      Full Paper
First Page:      27
Last Page:      34
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      Deep Web, or hidden databases, volume continues to increase as well as the interest to discover and extract Web hidden database data and schemata. This is motivated by applications that intend to provide unified search over several Web forms or the hidden content of Web databases. On considering this context, this paper presents an approach for detecting and extracting labels in Web forms. For detecting a Web form, we propose an algorithm that analyzes HTML tags and identifies if a Web page contains a form or not. We also developed an algorithm for label extraction based on the distance between a form field and page labels in order to find out the relationship between them. Some preliminary experiments demonstrate the effectiveness of the developed algorithms.
   

Social Media Links

Search

Login