Type of Publication

Thesis

Date:

9 /

2013

Status

Published

Web Page Classification using Visual Features

Featured in:

MD Thesis

Authors:

António Videira

Abstract

With the increase in the number of Internet users, the growing of websites is proportional, thereby web page classification has become a huge topic of research in the last few years. There is a constantly increasing requirement for automatic classification techniques with greater classification accuracy. To automatically classify and process web pages, the current systems use the textual content of those pages, which includes both the displayed content and the underlying HTML code. However, until now,
little work has been done on using the visual content of a web page to perform classification. On this account, in this thesis we focus on performing web page classification using their visual content. The web pages can present different and varied visual information depending on their specific topic. In this work I build a classification system to enable automatic analysis of a web page visual appearance as it appears to the user. First a descriptor is construct, by extracting different features from each page. The features used are the simple color and edge histogram, Gabor and texture features. Then two methods of feature selection, one based on the Chi-Square criterion, the other on the Principal Components Analysis are applied to that descriptor, to select the top attributes. Another approach involves using the Bag of Words (BoW) model to treat the SIFT local features extracted from each image as words, allowing to construct a dictionary. Then it is possible to describe new images by extracting the local features from them and matching them with features in the dictionary which are closest. Then we classify web pages based on their aesthetic value, their recency and type of website. The machine learning methods used in this work are the Na ̈ıve Bayes, Support Vector Machine, Decision Tree and AdaBoost. Different tests are performed to evaluate the performance of each classifier in each experiment. And by investigating our approach in detail, we are able to draw general conclusions and statements about whether or not the visual content should be ignored when performing web page classification. The main advantage of our approach is the good accuracy in each experiment.

Citation
António Videira (2013), Web Page Classification using Visual Features. MD thesis (in Portuguese). University of Coimbra, 2013.

Related Content

No tagged content to show
No tagged content to show
No tagged content to show
No tagged content to show

RECENT PUBLICATIONS

MorFacing: A Benchmark for Estimation Face Recognition Robustness to Face Morphing Attacks

Authors: Iurii Medvedev and Nuno Gonçalves
Featured in: IEEE International Joint Conference on Biometrics (IJCB 2024)

Face Liveness Detection Competition (LivDet-Face)

Authors: Lambert Igene, Afzal Hossain, Stephanie Schuckers, Mohammad Zahir Uddin Chowdhury, Humaira Rezaie, Ayden Rollins, Jesse Dykes, Rahul Vijaykumar, Sebastien Marcel, Juan Tapia, Carlos Aravena, Daniel Schulz, Nima Karimian and Anafsheh Adami, Diogo Nunes, João Marcos, Nuno Gonçalves, Lovro Sikošek, Borut Batagelj, Nima Schei, David Pabon, Manuela Tiedemann, Vasiliy Pryadchenko, Aleksandr Alenin, Alhasan Alkhaddour, Anton Pimenov, Artem Tregubov, Igor Avdonin, Maxim Lazantsev and Mikhail Pozigun
Featured in: IEEE International Joint Conference on Biometrics Competitions, 2024

Social NSTransformers: Low-Quality Pedestrian Trajectory Prediction

Authors: Zihan Jiang, Yiqun Ma, Bingyu Shi, Xin Lu, Jian Xing, Nuno Gonçalves and Bo Jin
Featured in: IEEE Transactions on Artificial Intelligence

suggested news

Laser engraving of precious metal artifacts (UniqueMark® deterministic...
UniqueMark® and UniQode® Glitter patent published
Paper about protecting facial recognition systems against morphing...

RECENT PROJECTS

FACING2 – Face Image Understanding
VISUAL-ID – Unique Visual Identities in Graphics, Images and Faces
UniqueMark

Institute of Systems and Robotics Department of Electrical and Computers Engineering University of Coimbra