Causal-Gaze Zero-Shot Learning (CG-ZSL): Causality-Based Human Attention Integration for Generalizable Visual Recognition

Research article

Causal-Gaze Zero-Shot Learning (CG-ZSL): Causality-Based Human Attention Integration for Generalizable Visual Recognition

^¹, ^¹

¹Department of Computer Science, The University of York, UK

Volume 5 Issue 1

DOI: https://doi.org/10.71448/bcds2451-1

Published: 30/03/2024

Cite this article as: Edwin R. Hancock, Yan Hua Dong. Causal-Gaze Zero-Shot Learning (CG-ZSL): Causality-Based Human Attention Integration for Generalizable Visual Recognition. Bulletin of Computer and Data Sciences, Volume 5 Issue 1. Page: 1-10.

Abstract

Human gaze has been widely explored as a supervisory signal for fine-grained recognition and zero-shot learning (ZSL). However, existing models merely align machine attention with human gaze using auxiliary losses, leaving the fundamental causal role of gaze in human decision-making unexplored. This leads to models that correlate with gaze but do not reason through it. We propose Causal-Gaze Zero-Shot Learning (CG-ZSL), the first causal framework that integrates human gaze as an explicit mediator in the visual-semantic reasoning pathway. We formalize the ZSL pipeline as a Structural Causal Model (SCM), where human gaze acts as an intermediate variable linking visual attributes to class-level semantic embeddings. Using this formulation, we derive counterfactual attention invariance, enabling the disentanglement of causal attribute regions from dataset-specific biases. We further introduce a Causal Attention Intervention (CAI) module and a Gaze-Mediated Semantic Alignment (GMSA) mechanism that enforce bidirectional causal consistency between gaze, attributes, and predictions. Experiments on CUB, SUN, and AWA2 datasets show significant improvements over state-of-the-art ZSL and GZSL models, especially under distribution shift and domain generalization settings. Unlike prior attention-alignment systems, CG-ZSL produces human-interpretable, causally-grounded explanations and maintains performance under counterfactual perturbations.

Keywords: causal gaze zero-shot learning, structural causal model, causal attention intervention, gaze-mediated semantic alignment, interpretable visual reasoning

Abstract

Keywords: causal gaze zero-shot learning, structural causal model, causal attention intervention, gaze-mediated semantic alignment, interpretable visual reasoning

Edwin R. Hancock

Department of Computer Science, The University of York, UK

Yan Hua Dong

Department of Computer Science, The University of York, UK

yanhuadong@proton.me

DOI

https://doi.org/10.71448/bcds2451-1

Cite this article as:

Edwin R. Hancock, Yan Hua Dong. Causal-Gaze Zero-Shot Learning (CG-ZSL): Causality-Based Human Attention Integration for Generalizable Visual Recognition. Bulletin of Computer and Data Sciences, Volume 5 Issue 1. Page: 1-10.

Publication history

Received: 21/11/2023
Revised: 29/12/2023
Accepted: 21/02/2024
Published: 30/03/2024

Copyright © 2024 Edwin R. Hancock, Yan Hua Dong. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.