The Endoscopy Tutor

"With advancements in technology, we can delve deeper into the mysteries of the human body, enhancing our understanding and improving outcomes in endoscopy. Together, we're pioneering a new era of precision and insight in digestive health."

The Upper Endoscopy Tutor

This is a proof-of-concept study. I am currently annotating a selection of my upper-endoscopy videos in Roboflow, using semantic segmentation to mask the key upper-gastrointestinal landmarks. I plan to train these annotations using a pretrained YOLOv11 model.

Once the model performs satisfactorily, I intend to integrate a small multimodal language model—possibly MedGemma-4B—to enable automatic recognition of the specific upper-GI segment being visualized and to provide real-time guidance to the endoscopist.

If successful, this project could serve as a foundation for more advanced applications in therapeutic endoscopy. While artificial intelligence in gastroenterology currently focuses mainly on the detection and diagnosis of lower-GI lesions, this study aims to expand its use to procedural guidance in upper-GI endoscopy.