Digital Library

cab1

 
Title:      DEVELOPMENT OF ALGORITHMS FOR WEB SPAM DETECTION BASED ON STRUCTURE AND LINK ANALYSIS
Author(s):      Michael Hilberer
ISBN:      ISSN: 1645-7641
Editors:      Pedro Isaías
Year:      2005
Edition:      V III, 2
Keywords:      Search Engine, Web Spam, Graph Theory, Link Structure, Link Analysis, Data Mining, SEO, Statistical Learning.
Type:      Journal Paper
First Page:      11
Last Page:      24
Language:      English
Cover:      no-img_eng.gif          
Full Contents:      click to dowload Download
Paper Abstract:      Among the biggest problems and threats for current search engines are spammers who try to manipulate the search engine ranking algorithms to boost the position of their pages on the Search Engine Result Pages. Different proposals and methods address this problem, mostly by altering the ranking criteria, algorithms or even manual, human intervention [27]. Most solutions, calculations, or interventions however increase the required processing resources dramatically and have to be applied after the already complex ranking procedures. Spam detection algorithms are critical to the business of search engines and allow search engine providers to detect ranking algorithm problems and to improve their search results. Search engines in general do not publish spam detection and ranking algorithms, but judging from the search engine result pages search engines still need to improve spam detection and elimination. It appears to be that some search engines do not use such techniques at all or rely on filters to ban the web spam after building their index. We believe that by analyzing the link structure of the web it is possible to detect web spam more efficiently. We use directed graphs, in which every webpage is a vertex and every link a directed edge, to identify common attributes and characteristics of such link structures. Based on our empiric research we believe that our algorithms can distinguish between the “naturally grown” web and clusters of inflated webpages used to manipulate search engine results by exploiting search engine algorithms based on link popularity.
   

Social Media Links

Search

Login