Aims Video capsule endoscopy is a minimally invasive tool for examining the small intestine,
               though its adoption is limited due to intensive manual interpretation and battery
               constraints. This study seeks to improve capsule endoscopy efficiency through edge
               Artificial Intelligence (AI), which processes data on the device itself, enabling
               real-time diagnostics at the point of care. The Galar dataset — a comprehensive, multi-center,
               multi-label video capsule dataset — addresses these limitations in automated detection
               and anatomical localization, bridging a gap as no edge AI solutions currently exist
               for capsule endoscopy. This project facilitates broader clinical use and potential
               improvements in patient outcomes.
            
               Methods The Galar dataset, compiled across Germany, contains over 3.5 million frames from
               80 capsule videos annotated for technical, anatomical, and pathological features.
               Annotations by five expert annotators used the Computer Vision Annotation Tool (CVAT),
               with accuracy validated via ResNet-50 model training. Based on our dataset, our collaborative
               research focuses on developing edge AI approaches that improve model performance.
               First, Convolutional Neural Networks (CNNs) were combined with Hidden Markov Models
               (HMMs) for time-series analysis, enabling accurate localization within the gastrointestinal
               tract while employing a low-parameter model (approx. 1 million parameters) fit for
               low-power devices. Additionally, ensemble models, combining image classifiers and
               autoencoders, optimized detection accuracy while minimizing computational load.
            
               Results The Galar dataset stands as one of the most comprehensive resources in capsule endoscopy,
               with 29 anatomical and pathological labels essential for gastrointestinal diagnostics.
               The collaborative edge AI approach, integrating CNNs with HMMs, achieved a 93% accuracy
               on the dataset, demonstrating precise anatomical localization. Ensemble models also
               reached high accuracy in distinguishing features within the dataset, achieving an
               AUC score of 76% for anomaly detection. These low-complexity models align with capsule
               endoscopy's energy constraints, enabling real-time, autonomous evaluations on resource-limited
               devices. Comparatively, this edge AI model achieves similar, if not improved, outcomes
               with fewer resources than traditional CNN models.
            
               Conclusions As one of the largest annotated public datasets for video capsule endoscopy, the
               Galar dataset is a valuable resource for AI advancements in gastrointestinal imaging.
               Findings indicate that low-parameter, energy-efficient models enabled by edge AI can
               enhance diagnostic efficiency in capsule endoscopy, lessen clinician workload, and
               enable real-time anomaly detection directly on the capsule. These models have the
               potential to perform preliminary analyses on-device, sending only relevant images
               or real-time capsule localization and conserving battery life. This advancement positions
               video capsule endoscopy as a more efficient, accessible diagnostic tool.