Digital Library

cab1

 
Title:      TRASH ARTICLE DETECTION USING CATEGORIZATION TECHNIQUES
Author(s):      Christos Bouras , Vassilis Poulopoulos , George Tsichritzis
ISBN:      978-972-8924-97-3
Editors:      Hans Weghorn and Pedro Isaías
Year:      2009
Edition:      V I, 2
Keywords:      Trash articles, categorization, news articles, trash detection
Type:      Full Paper
First Page:      51
Last Page:      58
Language:      English
Cover:      cover          
Full Contents:      click to dowload Download
Paper Abstract:      We explore techniques for detecting news articles containing invalid information, using the help of text categorization technology. The information that exists on the World Wide Web is huge enough in order to distract the users when trying to find useful information. In order to overcome the large amounts of data many methodologies of text categorization have been presented. One major problem we have to deal with is that many articles fetched by a crawler, then stored in a back-end database, and finally given as an input to a categorization subsystem, may not contain valid information for the user (trashy articles). This may lead to the user losing his trust towards the system. In this paper, we analyze the special properties of trashy news articles’ categorization that allows us to detect them and we propose a specific methodology for trash detection. Finally, we evaluate the proposed algorithm on a news categorization system and we depict the overall benefit of a trash detection mechanism on the system.
   

Social Media Links

Search

Login