Type of Publication

Thesis

Date:

9 /

2013

Status

Published

Web Page Classification using Visual Features

Featured in:

MD Thesis

Authors:

António Videira

Abstract

With the increase in the number of Internet users, the growing of websites is proportional, thereby web page classification has become a huge topic of research in the last few years. There is a constantly increasing requirement for automatic classification techniques with greater classification accuracy. To automatically classify and process web pages, the current systems use the textual content of those pages, which includes both the displayed content and the underlying HTML code. However, until now,
little work has been done on using the visual content of a web page to perform classification. On this account, in this thesis we focus on performing web page classification using their visual content. The web pages can present different and varied visual information depending on their specific topic. In this work I build a classification system to enable automatic analysis of a web page visual appearance as it appears to the user. First a descriptor is construct, by extracting different features from each page. The features used are the simple color and edge histogram, Gabor and texture features. Then two methods of feature selection, one based on the Chi-Square criterion, the other on the Principal Components Analysis are applied to that descriptor, to select the top attributes. Another approach involves using the Bag of Words (BoW) model to treat the SIFT local features extracted from each image as words, allowing to construct a dictionary. Then it is possible to describe new images by extracting the local features from them and matching them with features in the dictionary which are closest. Then we classify web pages based on their aesthetic value, their recency and type of website. The machine learning methods used in this work are the Na ̈ıve Bayes, Support Vector Machine, Decision Tree and AdaBoost. Different tests are performed to evaluate the performance of each classifier in each experiment. And by investigating our approach in detail, we are able to draw general conclusions and statements about whether or not the visual content should be ignored when performing web page classification. The main advantage of our approach is the good accuracy in each experiment.

Citation

António Videira (2013), Web Page Classification using Visual Features. MD thesis (in Portuguese). University of Coimbra, 2013.

RECENT PUBLICATIONS

VOIDFace: A Privacy-Preserving Multi-Network Face Recognition With Enhanced Security

Authors: Ajnas Muhammed; Iurii Medvedev; Nuno Gonçalves

Featured in: IEEE International Joint Conference on Biometrics (IJCB 2025)

Part I – Proceedings of the 12th Iberian Conference on Pattern Recognition and Image Analysis

Authors: Nuno Gonçalves; Hélder P. Oliveira; Joan Andreu Sánchez

Featured in: 12th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA 2025)

Part II – Proceedings of the 12th Iberian Conference on Pattern Recognition and Image Analysis

Authors: Nuno Gonçalves; Hélder P. Oliveira; Joan Andreu Sánchez

Featured in: 12th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA 2025)

suggested news

Paper accepted to IJCB 2025

Prof. Nuno and VIS Team successfully organizes IbPRIA...

Four papers presented @ IbPRIA 2025

RECENT PROJECTS

FACING2 – Face Image Understanding

VISUAL-ID – Unique Visual Identities in Graphics, Images and Faces

UniqueMark

Publication featured in: MD Thesis

Resource featured in: MD Thesis

Web Page Classification using Visual Features

Abstract

Citation

Related Content

RECENT PUBLICATIONS

VOIDFace: A Privacy-Preserving Multi-Network Face Recognition With Enhanced Security

Authors: Ajnas Muhammed; Iurii Medvedev; Nuno Gonçalves

Featured in: IEEE International Joint Conference on Biometrics (IJCB 2025)

Part I – Proceedings of the 12th Iberian Conference on Pattern Recognition and Image Analysis

Authors: Nuno Gonçalves; Hélder P. Oliveira; Joan Andreu Sánchez

Featured in: 12th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA 2025)

Part II – Proceedings of the 12th Iberian Conference on Pattern Recognition and Image Analysis

Authors: Nuno Gonçalves; Hélder P. Oliveira; Joan Andreu Sánchez

Featured in: 12th Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA 2025)

suggested news

RECENT PROJECTS

The Lab

About us

Resources