Type of Publication

Thesis

Date:

7 /

2015

Status

Published

Android app for Automatic Web Page Classification: Analysis of Text and Visual Features

Featured in:

MD Thesis

Authors:

Diego Ugalde

Abstract

Internet keeps growing everyday and with that, the creation of new web pages. Due to this fact, web pages of many different categories can be found such as News, Sports or Business. This issue has made investigators think about one innovative concept: Webpage Classification. This new approach implies the categorization of web pages to one or more category labels. Some research has been done during the last years using text and visual content extracted from the web pages to be able to classify. However, the need of being able to do such a thing in an Android app has not been investigated yet, to the best of our knowledge. Consequently, this thesis is focused in the development of an Android app which is able to classify web pages. First of all, text and visual features have to be extracted from each webpage. Four types of visual features were extracted from each web page to construct a visual features vector of 160 attributes. Concerning to the text features, a text features vector was also built for each of the webpage with 160 attributes. To do so, a “Bag-Of-Words” of one hundred and sixty words was set up from the HTML code already extracted and filtered. Thus, we end up having a full vector of 320 attributes for each webpage. A binary classification was performed trying to distinguish web pages for Adults and for Kids. Good results were obtained especially when using AdaBoost classifier with text and visual features where a 94.44% of accuracy of correct classifications was achieved.

Citation
Diego Ugalde (2015), Android app for Automatic Web Page Classification: Analysis of Text and Visual Features. MD thesis. University of Coimbra, 2015.

Related Content

Content type: Thesis Presentation

Link: here

Upload Date:2024-10-13T13:40

Researcher Coordinator, VIS TEAM Leader
No tagged content to show
No tagged content to show
No tagged content to show

RECENT PUBLICATIONS

MorFacing: A Benchmark for Estimation Face Recognition Robustness to Face Morphing Attacks

Authors: Iurii Medvedev and Nuno Gonçalves
Featured in: IEEE International Joint Conference on Biometrics (IJCB 2024)

Face Liveness Detection Competition (LivDet-Face)

Authors: Lambert Igene, Afzal Hossain, Stephanie Schuckers, Mohammad Zahir Uddin Chowdhury, Humaira Rezaie, Ayden Rollins, Jesse Dykes, Rahul Vijaykumar, Sebastien Marcel, Juan Tapia, Carlos Aravena, Daniel Schulz, Nima Karimian and Anafsheh Adami, Diogo Nunes, João Marcos, Nuno Gonçalves, Lovro Sikošek, Borut Batagelj, Nima Schei, David Pabon, Manuela Tiedemann, Vasiliy Pryadchenko, Aleksandr Alenin, Alhasan Alkhaddour, Anton Pimenov, Artem Tregubov, Igor Avdonin, Maxim Lazantsev and Mikhail Pozigun
Featured in: IEEE International Joint Conference on Biometrics Competitions, 2024

Social NSTransformers: Low-Quality Pedestrian Trajectory Prediction

Authors: Zihan Jiang, Yiqun Ma, Bingyu Shi, Xin Lu, Jian Xing, Nuno Gonçalves and Bo Jin
Featured in: IEEE Transactions on Artificial Intelligence

suggested news

Laser engraving of precious metal artifacts (UniqueMark® deterministic...
UniqueMark® and UniQode® Glitter patent published
Paper about protecting facial recognition systems against morphing...

RECENT PROJECTS

FACING2 – Face Image Understanding
VISUAL-ID – Unique Visual Identities in Graphics, Images and Faces
UniqueMark

Institute of Systems and Robotics Department of Electrical and Computers Engineering University of Coimbra