Publications related to the VIS Team
Bio-Inspired Modality Fusion for Active Speaker Detection
Human beings have developed fantastic abilities to integrate information from various sensory sources exploring their inherent complementarity. Perceptual capabilities are therefore heightened, enabling, for instance, the well-known "cocktail party" and McGurk effects, i.e., speech disambiguation from a panoply of sound signals. This fusion ability is also key in refining the perception of sound source location, as in distinguishing whose voice is being heard in a group conversation. Furthermore, neuroscience has successfully identified the superior colliculus region in the brain as the one responsible for this modality fusion, with a handful of biological models having been proposed to approach its underlying neurophysiological process. Deriving inspiration from one of these models, this paper presents a methodology for effectively fusing correlated auditory and visual information for active speaker detection. Such an ability can have a wide range of applications, from teleconferencing systems to social robotics. The detection approach initially routes auditory and visual information through two specialized neural network structures. The resulting embeddings are fused via a novel layer based on the superior colliculus, whose topological structure emulates spatial neuron cross-mapping of unimodal perceptual fields. The validation process employed two publicly available datasets, with achieved results confirming and greatly surpassing initial expectations.
Deep Facial Diagnosis: Deep Transfer Learning From Face Recognition to Facial Diagnosis
The relationship between face and disease has been discussed from thousands years ago, which leads to the occurrence of facial diagnosis. The objective here is to explore the possibility of identifying diseases from uncontrolled 2D face images by deep learning techniques. In this paper, we propose using deep transfer learning from face recognition to perform the computer-aided facial diagnosis on various diseases. In the experiments, we perform the computer-aided facial diagnosis on single (beta-thalassemia) and multiple diseases (beta-thalassemia, hyperthyroidism, Down syndrome, and leprosy) with a relatively small dataset. The overall top-1 accuracy by deep transfer learning from face recognition can reach over 90% which outperforms the performance of both traditional machine learning methods and clinicians in the experiments. In practical, collecting disease-specific face images is complex, expensive and time consuming, and imposes ethical limitations due to personal data treatment. Therefore, the datasets of facial diagnosis related researches are private and generally small comparing with the ones of other machine learning application areas. The success of deep transfer learning applications in the facial diagnosis with a small dataset could provide a low-cost and noninvasive way for disease screening and detection.
- Author(s): Bo Jin, Leandro Cruz, Nuno Gonçalves
- Featured In: IEEE Access, vol. 8, pp. 123649-123661
- Publication Type: Journal Articles
- DOI: 10.1109/ACCESS.2020.3005687
- Year: 2020
- View File