Publications related to the VIS Team
Probabilistic Approach for Road-Users Detection
Object detection in autonomous driving applications implies the detection and tracking of semantic objects that are commonly native to urban driving environments, as pedestrians and vehicles. One of the major challenges in state-of-the-art deep-learning based object detection are false positives which occur with overcon dent scores. This is highly undesirable in autonomous driving and other critical robotic-perception domains because of safety concerns. This paper proposes an approach to alleviate the problem of overcon dent predictions by introducing a novel probabilistic layer to deep object detection networks in testing. The suggested approach avoids the tradi- tional Sigmoid or Softmax prediction layer which often produces overcon dent predictions. It is demonstrated that the proposed technique reduces overcon dence in the false positives without degrading the performance on the true positives. The approach is validated on the 2D-KITTI objection detection through the YOLOV4 and SECOND (Lidar-based detector). The proposed approach enables interpretable probabilistic predictions without the requirement of re-training the network and therefore is very practical.
- Author(s): Gledson Melotti, Weihao Lu, Pedro Conde, Dezong Zhao, Alireza Asvadi, Nuno Gon ̧calves, Cristiano Premebida
- Featured In: IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
- Publication Type: Journal Articles
- DOI: 10.1109/TITS.2023.3268578
- Year: 2023
- View File
- Visit Website
MorDeephy: Face Morphing Detection Via Fused Classification (preprint)
Face morphing attack detection (MAD) is one of the most challenging tasks in the field of face recognition nowadays. In this work, we introduce a novel deep learning strategy for a single image face morphing detection, which implies the discrimination of morphed face images along with a so- phisticated face recognition task in a complex classification scheme. It is directed onto learning the deep facial features, which carry information about the authenticity of these fea- tures. Our work also introduces several additional contributions: the public and easy-to-use face morphing detection benchmark and the results of our wild datasets filtering strategy. Our method, which we call MorDeephy, achieved the state of the art performance and demonstrated a promi- nent ability for generalising the task of morphing detection to unseen scenarios.
Towards understanding the character of quality sampling in deep learning face recognition
Face recognition has become one of the most important modalities of biometrics in recent years. It widely utilises deep learning computer vision tools and adopts large collections of unconstrained face images of celebrities for training. Such choice of the data is related to its public availability when existing document compliant face image collections are hardly accessible due to security and privacy issues. Such inconsistency between the training data and deploy scenario may lead to a leak in performance in biometric systems, which are developed speci cally for dealing with ID document compliant images. To mitigate this problem, we propose to regularise the training of the deep face recognition network with a speci c sample mining strategy, which penalises the samples by their estimated quality. In addition to several considered quality metrics in recent work, we also expand our deep learning strategy to other sophisticated quality estimation methods and perform experiments to better understand the nature of quality sampling. Namely, we seek for the penalising manner (sampling character) that better satis es the purpose of adapting deep learning face recognition for images of ID and travel documents. Extensive experiments demonstrate the ef ciency of the approach for ID document compliant face images.
Pseudo RGB-D Face Recognition
In the last decade, advances and popularity of low cost RGB-D sensors have enabled us to acquire depth information of objects. Consequently, researchers began to solve face recognition problems by capturing RGB-D face images using these sensors. Until now, it is not easy to acquire the depth of human faces because of limitations imposed by privacy policies, and RGB face images are still more common. Therefore, obtaining the depth map directly from the corresponding RGB image could be helpful to improve the performance of subsequent face processing tasks such as face recognition. Intelligent creatures can use a large amount of experience to obtain three-dimensional spatial information only from two-dimensional plane scenes. It is machine learning methodology which is to solve such problems that can teach computers to generate correct answers by training. To replace the depth sensors by generated pseudo depth maps, in this paper, we propose a pseudo RGB-D face recognition framework and provide data driven ways to generate the depth maps from 2D face images. Specially, we design and implement a generative adversarial network model named “D+GAN” to perform the multi-conditional image- to-image translation with face attributes. By this means, we validate the pseudo RGB-D face recognition with experiments on various datasets. With the cooperation of image fusion technologies, especially Non-subsampled Shearlet Transform, the accuracy of face recognition has been signi cantly improved.
Reducing Overconfidence Predictions in Autonomous Driving Perception
In state-of-the-art deep learning for object recognition, Softmax and Sigmoid layers are most commonly employed as the predictor outputs. Such layers often produce overconfidence predictions rather than proper probabilistic scores, which can thus harm the decision-making of ‘critical’ perception systems applied in autonomous driving and robotics. Given this, we propose a probabilistic approach based on distributions calculated out of the Logit layer scores of pre-trained networks which are then used to constitute new decision layers based on Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) inference. We demonstrate that the hereafter called ML and MAP layers are more suitable for probabilistic interpretations than Softmax and Sigmoid-based predictions for object recognition.We explore distinct sensor modalities via RGB images and LiDARs (RV: range-view) data from the KITTI and Lyft Level-5 datasets, where our approach shows promising performance compared to the usual Softmax and Sigmoid layers, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this paper is that the so-called ML and MAP layers can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP layers are used in the test/prediction phase. The Classification results are presented using reliability diagrams, while detection results are illustrated using precision-recall curves.
CodeFace: a deep learning printer-proof steganography for Face Portraits.
Identity Documents (IDs) containing a facial portrait constitute a prominent form of personal identification. Photograph substitution in official documents (a genuine photo replaced by a non- genuine photo) or originally fraudulent documents with an arbitrary photograph are well known attacks, but unfortunately still efficient ways of misleading the national authorities in in-person identification processes. Therefore, in order to confirm that the identity document holds a validated photo, a novel face image steganography technique to encode secret messages in facial portraits and then decode these hidden messages from physically printed facial photos of Identity Documents (IDs) and Machine-Readable Travel Documents (MRTDs), is addressed in this paper. The encoded face image looks like the original image to a naked eye. Our architecture is called CodeFace. CodeFace comprises a deep neural network that learns an encoding and decoding algorithm to robustly include several types of image perturbations caused by image compression, digital transfer, printer devices, environmental lighting and digital cameras. The appearance of the encoded facial photo is preserved by minimizing the distance of the facial features between the encoded and original facial image and also through a new network architecture to improve the data restoration for small images. Extensive experiments were performed with real printed documents and smartphone cameras. The results obtained demonstrate high robustness in the decoding of hidden messages in physical polycarbonate and PVC cards, as well as the stability of the method for encoding messages up to a size of 120 bits.
Card3DFace—An Application to Enhance 3D Visual Validation in ID Cards and Travel Documents
The identification of a person is a natural way to gain access to information or places. A face image is an essential element of visual validation. In this paper, we present the Card3DFace application, which captures a single-shot image of a person’s face. After reconstructing the 3D model of the head, the application generates several images from different perspectives, which, when printed on a card with a layer of lenticular lenses, produce a 3D visualization effect of the face. The image acquisition is achieved with a regular consumer 3D camera, either using plenoptic, stereo or time-of-flight technologies. This procedure aims to assist and improve the human visual recognition of ID cards and travel documents through an affordable and fast process while simultaneously increasing their security level. The whole system pipeline is analyzed and detailed in this paper. The results of the experiments performed with polycarbonate ID cards show that this end-to-end system is able to produce cards with realistic 3D visualization effects for humans.
Towards Facial Biometrics for ID Document Validation in Mobile Devices
Various modern security systems follow atendency to simplify the usage of the existing biometric recognition solutions and embed them into ubiquitous portable devices. In this work, we continue the investigation and development of our method for securing identification documents. The original facial biometric template, which is extracted from the trusted frontal face image, is stored on the identification document in a secured personalized machine-readable code. Such document is protected from face photo manipulation and may be validated with an offline mobile application. We apply automatic methods of compressing the developed face descriptors to make the biometric validation system more suitable for mobile applications. As an additional contribution, we introduce several print-capture datasets that may be used for training and evaluating similar systems for mobile identification and travel documents validation.
Bio-Inspired Modality Fusion for Active Speaker Detection
Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.
Deep Facial Diagnosis: Deep Transfer Learning From Face Recognition to Facial Diagnosis
The relationship between face and disease has been discussed from thousands years ago, which leads to the occurrence of facial diagnosis. The objective here is to explore the possibility of identifying diseases from uncontrolled 2D face images by deep learning techniques. In this paper, we propose using deep transfer learning from face recognition to perform the computer-aided facial diagnosis on various diseases. In the experiments, we perform the computer-aided facial diagnosis on single (beta-thalassemia) and multiple diseases (beta-thalassemia, hyperthyroidism, Down syndrome, and leprosy) with a relatively small dataset. The overall top-1 accuracy by deep transfer learning from face recognition can reach over 90% which outperforms the performance of both traditional machine learning methods and clinicians in the experiments. In practical, collecting disease-specific face images is complex, expensive and time consuming, and imposes ethical limitations due to personal data treatment. Therefore, the datasets of facial diagnosis related researches are private and generally small comparing with the ones of other machine learning application areas. The success of deep transfer learning applications in the facial diagnosis with a small dataset could provide a low-cost and noninvasive way for disease screening and detection.
- Author(s): Bo Jin, Leandro Cruz, Nuno Gonçalves
- Featured In: IEEE Access, vol. 8, pp. 123649-123661
- Publication Type: Journal Articles
- DOI: 10.1109/ACCESS.2020.3005687
- Year: 2020
- View File