Multi-Scale FourierMIL for Hierarchical Frequency-Domain Multiple Instance Learning in Whole-Slide Image Classification

Jun Li1, Xiao Bai1, Jin Zheng1
1The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
DOI: https://doi.org/10.71448/bcds2561-1
Published: 30/03/2025
Cite this article as: Jun Li, Xiao Bai, Jin Zheng. Multi-Scale FourierMIL for Hierarchical Frequency-Domain Multiple Instance Learning in Whole-Slide Image Classification. Bulletin of Computer and Data Sciences, Volume 6 Issue 1. Page: 1-15.

Abstract

Whole-slide images (WSIs) in computational pathology contain morphological patterns across multiple spatial scales, yet most multiple instance learning (MIL) methods operate on a single resolution. Recent work shows that frequency-domain token mixing via the Fourier transform can improve both accuracy and efficiency over self-attention for WSI classification, but existing frequency-based MIL models still reason at only one scale. In this paper, we propose Multi-Scale FourierMIL (MS-FourierMIL), a hierarchical frequency-domain MIL framework that integrates patch tokens from multiple spatial resolutions. For each scale, we extract patch embeddings with a frozen feature extractor and apply a scale-specific Fourier token mixer based on a learnable all-pass filter, then perform cross-scale frequency fusion over pooled scale representations to capture interactions between coarse tissue architecture and fine cellular detail. A simple adaptive padding scheme stabilizes Fourier transforms for variable bag sizes while preserving token statistics, and a final class token conditioned on all scales produces slide-level predictions. On benchmark WSI classification tasks, MS-FourierMIL outperforms single-scale Fourier-based MIL, transformer-based MIL, and graph-based MIL baselines, particularly when lesion sizes and tissue context are highly variable, while maintaining competitive computational cost. Qualitative analyses of scale-specific attribution maps show that MS-FourierMIL focuses on global patterns at low resolution and localized tumor regions at high resolution, aligning with pathologists’ multi-scale reasoning and highlighting frequency-domain multi-scale MIL as a promising strategy for accurate and efficient WSI analysis.

Keywords: whole-slide images, computational pathology, multiple instance learning, frequency-domain token mixing, multi-scale modeling

Abstract

Whole-slide images (WSIs) in computational pathology contain morphological patterns across multiple spatial scales, yet most multiple instance learning (MIL) methods operate on a single resolution. Recent work shows that frequency-domain token mixing via the Fourier transform can improve both accuracy and efficiency over self-attention for WSI classification, but existing frequency-based MIL models still reason at only one scale. In this paper, we propose Multi-Scale FourierMIL (MS-FourierMIL), a hierarchical frequency-domain MIL framework that integrates patch tokens from multiple spatial resolutions. For each scale, we extract patch embeddings with a frozen feature extractor and apply a scale-specific Fourier token mixer based on a learnable all-pass filter, then perform cross-scale frequency fusion over pooled scale representations to capture interactions between coarse tissue architecture and fine cellular detail. A simple adaptive padding scheme stabilizes Fourier transforms for variable bag sizes while preserving token statistics, and a final class token conditioned on all scales produces slide-level predictions. On benchmark WSI classification tasks, MS-FourierMIL outperforms single-scale Fourier-based MIL, transformer-based MIL, and graph-based MIL baselines, particularly when lesion sizes and tissue context are highly variable, while maintaining competitive computational cost. Qualitative analyses of scale-specific attribution maps show that MS-FourierMIL focuses on global patterns at low resolution and localized tumor regions at high resolution, aligning with pathologists’ multi-scale reasoning and highlighting frequency-domain multi-scale MIL as a promising strategy for accurate and efficient WSI analysis.

Keywords: whole-slide images, computational pathology, multiple instance learning, frequency-domain token mixing, multi-scale modeling
Jun Li
The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
Xiao Bai
The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
Jin Zheng
The school of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China

DOI

Cite this article as:

Jun Li, Xiao Bai, Jin Zheng. Multi-Scale FourierMIL for Hierarchical Frequency-Domain Multiple Instance Learning in Whole-Slide Image Classification. Bulletin of Computer and Data Sciences, Volume 6 Issue 1. Page: 1-15.

Publication history

Copyright © 2025 Jun Li, Xiao Bai, Jin Zheng. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Browse Advance Search