Digital Library

cab1

 
Title:      AUTOMATIC IMPROVEMENT OF TERMS USED IN FOCUSED CRAWLING PROCESSES ON WEB PAGES
Author(s):      Gilson Faria Costa, Guilherme Tavares de Assis and Marcos Vinicius Oliveira Souza
ISBN:      978-989-8533-69-2
Editors:      Pedro IsaĆ­as and Hans Weghorn
Year:      2017
Edition:      Single
Keywords:      Automatic Improvement of Terms, Web Crawling, Focused Crawling
Type:      Full Paper
First Page:      71
Last Page:      78
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      The great popularity and, specially, the fast Web growth have led to the proposal and analysis of new techniques for helping users to locate effectively the needed information in a satisfactory time, without much difficulty. Traditional crawlers are not capable to identify relevant sub-spaces on Web related to a specific theme; however, focused crawlers are capable to solve, effectively and efficiently, the mentioned problem. Usually, a focused crawler process requires, as an input parameter, a well-defined set of terms that express the desired topic of interest; depending on such set of terms, the effectiveness of a crawling process may not be satisfactory. In order to automatically improve the set of terms necessary to perform focused crawling processes related to a genre-aware approach, we propose two strategies in this work. Our experiments generated results that improved precision and F1 measures by up to 88.9% and 32.1%, respectively, in crawling processes that considered not well-defined sets of terms as input parameter.
   

Social Media Links

Search

Login