Whole-slide images (WSIs) in computational pathology contain morphological patterns across multiple spatial scales, yet most multiple instance learning (MIL) methods operate on a single resolution. Recent work shows that frequency-domain token mixing via the Fourier transform can improve both accuracy and efficiency over self-attention for WSI classification, but existing frequency-based MIL models still reason at only one scale. In this paper, we propose Multi-Scale FourierMIL (MS-FourierMIL), a hierarchical frequency-domain MIL framework that integrates patch tokens from multiple spatial resolutions. For each scale, we extract patch embeddings with a frozen feature extractor and apply a scale-specific Fourier token mixer based on a learnable all-pass filter, then perform cross-scale frequency fusion over pooled scale representations to capture interactions between coarse tissue architecture and fine cellular detail. A simple adaptive padding scheme stabilizes Fourier transforms for variable bag sizes while preserving token statistics, and a final class token conditioned on all scales produces slide-level predictions. On benchmark WSI classification tasks, MS-FourierMIL outperforms single-scale Fourier-based MIL, transformer-based MIL, and graph-based MIL baselines, particularly when lesion sizes and tissue context are highly variable, while maintaining competitive computational cost. Qualitative analyses of scale-specific attribution maps show that MS-FourierMIL focuses on global patterns at low resolution and localized tumor regions at high resolution, aligning with pathologists’ multi-scale reasoning and highlighting frequency-domain multi-scale MIL as a promising strategy for accurate and efficient WSI analysis.
Whole-slide images (WSIs) in computational pathology contain morphological patterns across multiple spatial scales, yet most multiple instance learning (MIL) methods operate on a single resolution. Recent work shows that frequency-domain token mixing via the Fourier transform can improve both accuracy and efficiency over self-attention for WSI classification, but existing frequency-based MIL models still reason at only one scale. In this paper, we propose Multi-Scale FourierMIL (MS-FourierMIL), a hierarchical frequency-domain MIL framework that integrates patch tokens from multiple spatial resolutions. For each scale, we extract patch embeddings with a frozen feature extractor and apply a scale-specific Fourier token mixer based on a learnable all-pass filter, then perform cross-scale frequency fusion over pooled scale representations to capture interactions between coarse tissue architecture and fine cellular detail. A simple adaptive padding scheme stabilizes Fourier transforms for variable bag sizes while preserving token statistics, and a final class token conditioned on all scales produces slide-level predictions. On benchmark WSI classification tasks, MS-FourierMIL outperforms single-scale Fourier-based MIL, transformer-based MIL, and graph-based MIL baselines, particularly when lesion sizes and tissue context are highly variable, while maintaining competitive computational cost. Qualitative analyses of scale-specific attribution maps show that MS-FourierMIL focuses on global patterns at low resolution and localized tumor regions at high resolution, aligning with pathologists’ multi-scale reasoning and highlighting frequency-domain multi-scale MIL as a promising strategy for accurate and efficient WSI analysis.