Type of Publication
Book Chapter
Date:
12 /
2015
Featured in:
Monfort, V., Krempels, KH. (eds) Web Information Systems and Technologies
Authors:
Nuno Gonçalves and António Videira
Automatic classification of webpages has several applications in industry: digital marketing, search engines, content filtering and many more. Traditionally this classification has been done using only the textual information of webpages, which includes the html code, tags, title and more lately also the url. The aim of this paper is to prove that for some subjective variables, although very important to the applications mentioned, the visual information of webpages as they are rendered by the browser has extremely rich content for the classification task. The variables studied are the aesthetic value (whether pages are beautiful or ugly) and the design recency of them (whether pages are old fashioned or look modern). We then proved that automatic classifications that rely only on the visual look and feel can achieve very high accuracies. As we used several low-level and mid-level features and studied several criteria for selection and classification, our classifiers were able to improve one step further the stat of the art. Finally, we applied this framework to classify webpages in their topic (content aware) and also to classify whether pages are a blog or not (functional aware).
© 2024 VISTeam | Made by Black Monster Media
Institute of Systems and Robotics Department of Electrical and Computers Engineering University of Coimbra