Type of Publication

Thesis

Date:

9 /

2022

Status

Published

Reducing Overconfident Predictions in Multimodality Perception for Autonomous Driving

Featured in:

PhD Thesis

Authors:

Gledson Melotti

Abstract

In the last recent years, machine learning techniques have occupied a great space in order to solve problems in the areas related to perception systems applied to autonomous driving and advanced driver-assistance systems, such as: road users detection, traffic signal recognition, road detection, multiple object tracking, lane detection, scene understanding. In this way, a large number of techniques have been developed to cope with problems belonging to sensory perception field. Currently, deep network is the state-of-the-art for object recognition, begin softmax and sigmoid functions as prediction layers. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of “critical” perception systems applied in autonomous driving and robotics. Given this, we propose a probabilistic approach based on distributions calculated out of the logit layer scores
of pre-trained networks which are then used to constitute new decision layers based on Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) inference. We demonstrate that the hereafter called ML and MAP functions are more suitable for probabilistic interpretations than softmax and sigmoid-based predictions for object recognition, where our approach shows promising performance compared to the usual softmax and sigmoid functions, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this thesis is that the so-called ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP functions are used in the test/prediction phase. To validate our methodology, we explored distinct sensor modalities via RGB images and LiDARs (3D point clouds, range-view and reflectance-view) data from the KITTI dataset. The range-view and reflectance-view modalities were obtained by projecting the range/reflectance data to the 2D image-plane and consequently upsampling the projected points. The results achieved by the proposed approach were presented considering the individual modalities and through the early and late fusion strategies.

Citation
Gledson Melotti (2022). Reducing Overconfident Predictions in Multimodality Perception for Autonomous Driving. PhD Thesis. University of Coimbra, 2022

Related Content

Researcher Coordinator, VIS TEAM Leader
PhD Student
No tagged content to show
No tagged content to show
No tagged content to show

RECENT PUBLICATIONS

Geometric implicit neural representations for signed distance functions

Authors: Luiz Schirmer, Tiago Novello, Vinícius da Silva, Guilherme Schardong, Daniel Perazzo, Hélio Lopes, Nuno Gonçalves, Luiz Velho
Featured in: Special Section on SIBGRAPI 2023 Tutorials

Towards Secure Biometric Solutions: Enhancing Facial Recognition while Protecting User Data

Authors: Jose Silva, Aniana Cruz, Bruno Sousa and Nuno Gonçalves
Featured in: 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2025

StylePuncher: encoding a hidden QR code into images

Authors: Farhad Shadmand, Luiz Schirmer and Nuno Gonçalves
Featured in: 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM) 2025

suggested news

Best Paper Award @ICPRAM 2025
Nuno Gonçalves serves as jury member for PhD...
ACHILLES project launches official Website and Newsletter

RECENT PROJECTS

FACING2 – Face Image Understanding
VISUAL-ID – Unique Visual Identities in Graphics, Images and Faces
UniqueMark

Institute of Systems and Robotics Department of Electrical and Computers Engineering University of Coimbra