
Shelby Hall Graduate Research Forum Posters
Files
Download Full Text (302 KB)
Description
Accurately diagnosing appendicitis remains a significant challenge in emergency medicine due to its varied clinical presentation and overlap with other abdominal conditions. Existing diagnostic models often rely on unimodal data, such as computed tomography (CT) images or clinical notes, limiting their ability to capture the full complexity of patient information. This study proposes a novel multimodal framework, a Unified Adaptive Cross-Attention Multimodal Framework (u-ACM), that integrates CT images and clinical notes using hybrid fusion strategies to improve diagnostic accuracy for appendicitis. The u-ACM leverages adaptive contextual filtering to dynamically remove irrelevant features, crossattention mechanisms to align features between modalities, and a multi-level fusion approach that combines early, middle (cross-attention), and late fusion strategies. The integration of complementary information from both modalities will enable the model to achieve robust predictions by capturing both anatomical insights from CT images and contextual details from clinical notes. Additionally, the use of adaptive contextual filtering and cross-attention mechanisms is expected to improve feature alignment between modalities, leading to better multimodal representations. The u-ACM will be evaluated against unimodal baselines (image-only and text-only models) on a dataset comprising paired CT images and clinical notes and some existing methods like CARZero. The results will hopefully demonstrate that the u-ACM outperforms unimodal approaches and CARZero across all evaluation metrics, including accuracy, precision, recall, F1-score, and area under the curve (AUC).
This study will make several contributions to the field of multimodal learning in medical diagnostics: (1) a novel multimodal framework for integrating heterogeneous data sources, (2) an adaptive approach for filtering irrelevant features before training, (3) a dynamic cross-attention mechanism for learning relationships between modalities, and (4) a multi-level fusion strategy that combines low-level, mid-level, and high-level features.
Publication Date
3-2025
Department
Computer Science
City
Mobile
Disciplines
Bioimaging and Biomedical Optics | Biomedical Devices and Instrumentation | Other Analytical, Diagnostic and Therapeutic Techniques and Equipment | Other Biomedical Engineering and Bioengineering | Other Medicine and Health Sciences
Recommended Citation
Sunuwar, Mahesh, "Unified Adaptive Cross-Attention Multimodal Network" (2025). Shelby Hall Graduate Research Forum Posters. 15.
https://jagworks.southalabama.edu/southalabama-shgrf-posters/15

Included in
Bioimaging and Biomedical Optics Commons, Biomedical Devices and Instrumentation Commons, Other Analytical, Diagnostic and Therapeutic Techniques and Equipment Commons, Other Biomedical Engineering and Bioengineering Commons, Other Medicine and Health Sciences Commons