Type of Publication

Conference Papers

Date:

4 /

2014

Status

Published

Automatic Web Page Classification using Visual Content

Featured in:

10th International Conference on Web Information Systems and Technologies, Barcelona, Spain

Authors:

António Videira and Nuno Gonçalves

Abstract

There is a constantly increasing requirement for automatic classification techniques with greater classification accuracy. To automatically classify and process web pages, the current systems use the text content of those pages. However, little work has been done on using the visual content of a web page. On this account, our work is focused on performing web page classification using only their visual content. First a descriptor is constructed, by extracting different features from each page. The features used are the simple color and edge histograms, Gabor and Tamura features. Then two methods of feature selection, one based on the Chi-Square criterion, the other on the Principal Components Analysis are applied to that descriptor, to select the top discriminative attributes. Another approach involves using the Bag of Words (BoW) model to treat the SIFT local features extracted from each image as words, allowing to construct a dictionary. Then we classify web pages based on their aesthetic value, their recency and type of content. The machine learning methods used in this work are the Naïve Bayes, Support Vector Machine, Decision Tree and AdaBoost. Different tests are performed to evaluate the performance of each classifier. Finally, we thus prove that the visual appearance of a web page has rich content not explored by current web crawlers based only on text content.

Citation

António Videira and Nuno Gonçalves (2014). Automatic Web Page Classification Using Visual Content. In WEBIST (2) (pp. 193-204).

RECENT PUBLICATIONS

Geometric implicit neural representations for signed distance functions

Authors: Luiz Schirmer, Tiago Novello, Vinícius da Silva, Guilherme Schardong, Daniel Perazzo, Hélio Lopes, Nuno Gonçalves, Luiz Velho

Featured in: Special Section on SIBGRAPI 2023 Tutorials

Towards Secure Biometric Solutions: Enhancing Facial Recognition while Protecting User Data

Authors: Jose Silva, Aniana Cruz, Bruno Sousa and Nuno Gonçalves

Featured in: 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2025

StylePuncher: encoding a hidden QR code into images

Authors: Farhad Shadmand, Luiz Schirmer and Nuno Gonçalves

Featured in: 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2025

suggested news

Prof. Nuno participates in Conference on Digital Governance

ISR-UC maintains the “Excellent” rating in FCT evaluation!

Nuno Gonçalves presents seminar at the University of...

RECENT PROJECTS

FACING2 – Face Image Understanding

VISUAL-ID – Unique Visual Identities in Graphics, Images and Faces

UniqueMark

Publication featured in: 10th International Conference on Web Information Systems and Technologies, Barcelona, Spain

Resource featured in: 10th International Conference on Web Information Systems and Technologies, Barcelona, Spain

Automatic Web Page Classification using Visual Content

Abstract

Citation

Related Content

RECENT PUBLICATIONS

Geometric implicit neural representations for signed distance functions

Authors: Luiz Schirmer, Tiago Novello, Vinícius da Silva, Guilherme Schardong, Daniel Perazzo, Hélio Lopes, Nuno Gonçalves, Luiz Velho

Featured in: Special Section on SIBGRAPI 2023 Tutorials

Towards Secure Biometric Solutions: Enhancing Facial Recognition while Protecting User Data

Authors: Jose Silva, Aniana Cruz, Bruno Sousa and Nuno Gonçalves

Featured in: 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2025

StylePuncher: encoding a hidden QR code into images

Authors: Farhad Shadmand, Luiz Schirmer and Nuno Gonçalves

Featured in: 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2025

suggested news

RECENT PROJECTS

The Lab

About us

Resources