|
Title:
|
MID-LEVEL TRANSFORMER FUSION OF LOCAL AND GLOBAL FEATURES FOR REMOTE SENSING SCENE CLASSIFICATION |
|
Author(s):
|
Vian Abdulmajeed, Khaled Jouini and Ouajdi Korbaa |
|
ISBN:
|
978-989-8704-71-9 |
|
Editors:
|
Paula Miranda and Pedro IsaĆas |
|
Year:
|
2025 |
|
Edition:
|
Single |
|
Keywords:
|
Remote Sensing Scene Classification, Transformer-Based Fusion, Swin Transformer, CBAM, Channel Attention, Spatial
Attention |
|
Type:
|
Full Paper |
|
First Page:
|
75 |
|
Last Page:
|
82 |
|
Language:
|
English |
|
Cover:
|
|
|
Full Contents:
|
if you are a member please login
|
|
Paper Abstract:
|
Accurate remote sensing scene classification requires the effective integration of fine-grained local details with broader
global context. This paper proposes a novel mid-level fusion framework that leverages a Transformer Encoder to
dynamically fuse these complementary features. Our dual-branch architecture extracts local details using EfficientNet-B0,
enhanced with a Convolutional Block Attention Module (CBAM), while capturing global context through a Swin Tiny
Transformer. By integrating these features at an intermediate stage, the model learns complex interactions between local
and global representations. The proposed approach achieves highly competitive accuracy, attaining 98.67% accuracy on
EuroSAT and 96.06% on RESISC45, outperforming several recent methods, while maintaining a practical model size.
Ablation studies validate the contributions of the Transformer Encoder and CBAM, demonstrating their synergistic effect
on feature refinement. |
|
|
|
|
|
|