Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Visual saliency computing aims to imitate the human visual attention mechanism to identify the most prominent or unique areas or objects from a visual scene. It is one of the basic low-level image processing techniques and can be applied to many ...
Road intersection plays a vital role in road network construction, automatic drive, and intelligent transportation systems. Most methods detect road intersections only using geometrical features without spatio-temporal features, leading to insufficient ...
Temporal Sentence Grounding in Videos (TSGV), \ie, grounding a natural language sentence which indicates complex human activities in a long and untrimmed video sequence, has received unprecedented attentions over the last few years. Although each newly ...
This technical report describes the overview of our approach to the "Watch and Buy: Multimodal Product Identification Challenge". Specifically, we tackle this problem with a three-stage framework, i.e., product detection, retrieval and classification. ...
Disentangling factors has proven to be crucial for building interpretable AI systems. Disentangled generative models would have explanatory input variables to increase the trustworthiness and robustness. Previous works apply a progressive ...
Adversarial example(AE) aims at fooling a Convolution Neural Network by introducing small perturbations in the input image. The proposed work uses the magnitude and phase of the Fourier Spectrum and the entropy of the image to defend against AE. We ...
The routing-by-agreement mechanism in capsule networks (CapsNets) is used to build visual hierarchical relationships with a characteristic of assigning parts to wholes. The connections between capsules of different layers become sparser with more ...
Systematic error, which is not determined by chance, often refers to the inaccuracy (involving either the observation or measurement process) inherent to a system. In this paper, we exhibit some long-neglected but frequent-happening adversarial examples ...
This paper presents an urban footpath image dataset captured through crowdsourcing using the mapillary service (mobile application) and demonstrating its use for data analytics applications by employing object detection and image segmentation. The study ...
The smart city concept has now become one of the key enablers in urban city management. The adoption and permeation of ICT and AI-driven techniques have enabled the authorities to resolve poor urban planning issues with improved delivery of citizen ...
Deep learning has achieved significant success in multimedia fields involving computer vision, natural language processing, and acoustics. However, research in adversarial learning also shows that they are highly vulnerable to adversarial examples. ...
Emotional detection based on facial expressions is an important procedure in high-risk tasks such as criminal investigation or lie detection. To reduce the impact of the inconsistency in the duration of macro- and micro-expression, we propose an ...
Micro-expressions describe unconscious facial movements which reflect a person's psychological state even when there is an attempt to conceal it. Often used in psychological and forensic applications, their manual recognition requires professional ...
We present Wav2Lip-Emotion, a video-to-video translation architecture that modifies facial expressions of emotion in videos of speakers. Previous work modifies emotion in images, uses a single image to produce a video with animated emotion, or puppets ...
Recent progress in deep learning-based image generation has madeit easier to create convincing fake videos called deepfakes. Whilethe benefits of such technology are undeniable, it can also be usedas realistic fake news support for mass disinformation. ...
With the prevailing of deep learning technology, especially generative adversarial networks (GAN), generating photo-realistic facial images has made huge progress. Image generation techniques have many good applications such as data augmentation, ...
Real-world visual recognition is far more complex than object recognition; there is stuff without distinctive shape or appearance, and the same object appearing in different contexts calls for different actions. While we need context-aware visual ...
In this paper, we extensively present our solutions for the MuSe-Stress sub-challenge and the MuSe-Physio sub-challenge of Multimodal Sentiment Challenge (MuSe) 2021. The goal of MuSe-Stress sub-challenge is to predict the level of emotional arousal and ...
With the proliferation of user-generated videos in online websites, it becomes particularly important to achieve automatic perception and understanding of human emotion/sentiment from these videos. In this paper, we present our solutions to the MuSe-...
Automatic estimation of emotional state has a wide application in human-computer interaction. In this paper, we present our solutions for the MuSe-Stress and MuSe-Physio sub-challenge of Multimodal Sentiment Analysis (MuSe 2021). The goal of these two ...