Embodied artificial intelligence in ophthalmology

Duan, J., Yu, S., Tan, H. L., Zhu, H. & Tan, C. A survey of embodied AI: from simulators to research tasks. IEEE Trans. Emerg. Top. Comput. Intell. 6, 230–244 (2022).

Article

Google Scholar

Smith, L. & Gasser, M. The development of embodied cognition: six lessons from babies. Artif. Life 11, 13–29 (2005).

Article
PubMed

Google Scholar

Strathearn, C. & Ma, M. Modelling user preference for embodied artificial intelligence and appearance in realistic humanoid robots. Informatics 7, 28 (2020).

Article

Google Scholar

Kumar, K. A., Rajan, J. F., Appala, C., Balurgi, S. & Balaiahgari, P. R. Medibot: personal medical assistant. in Proc. 2nd International Conference on Networking and Communications (ICNWC) 1–6 (2024).

Thirunavukarasu, A. J. et al. Robot-assisted eye surgery: a systematic review of effectiveness, safety, and practicality in clinical settings. Transl. Vis. Sci. Technol. 13, 20 (2024).

Article
PubMed
PubMed Central

Google Scholar

Vimala, S. et al. Telemedical robot using IoT with live supervision and emergency alert. in Proc. 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN) 1327–1331 (IEEE, 2023).

Wang, W. et al. Neuromorphic sensorimotor loop embodied by monolithically integrated, low-voltage, soft e-skin. Science 380, 735–742 (2023).

Article
CAS
PubMed

Google Scholar

Liu, T. L. et al. Robot learning to play drums with an open-ended internal model. in Proc. IEEE International Conference on Robotics And Biomimetics (ROBIO) 305–311 (IEEE, 2018).

Zhuang, Z. Y., Yu, X., Mahony, R. & IEEE. LyRN (Lyapunov Reaching Network): a real-time closed loop approach from monocular vision. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 8331–8337 (IEEE, 2020).

Zhao, Z. et al. Exploring embodied intelligence in soft robotics: a review. Biomimetics 9, 248 (2024).

Article
PubMed
PubMed Central

Google Scholar

Liu, Y., Tan, Y. & Lan, H. Self-supervised contrastive learning for audio-visual action recognition. in 30th IEEE International Conference on Image Processing (ICIP) 1000–1004 (IEEE, 2023).

Abràmoff, M. D., Lavin, P. T., Birch, M., Shah, N. & Folk, J. C. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digit. Med. 1, 39 (2018).

Article
PubMed
PubMed Central

Google Scholar

Ting, D. S. W. et al. Deep learning in ophthalmology: the technical and clinical considerations. Prog. Retin. Eye Res. 72, 100759 (2019).

Article
PubMed

Google Scholar

Shi, D. et al. Translation of color fundus photography into fluorescein angiography using deep learning for enhanced diabetic retinopathy screening. Ophthalmol. Sci. 3, 100401 (2023).

Article
PubMed
PubMed Central

Google Scholar

Chen, R. et al. Translating color fundus photography to indocyanine green angiography using deep-learning for age-related macular degeneration screening. npj Digit. Med. 7, 34 (2024).

Article
PubMed
PubMed Central

Google Scholar

Song, F., Zhang, W., Zheng, Y., Shi, D. & He, M. A deep learning model for generating fundus autofluorescence images from color fundus photography. Adv. Ophthalmol. Pr. Res. 3, 192–198 (2023).

Google Scholar

Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Shi, D. et al. EyeFound: a multimodal generalist foundation model for ophthalmic imaging. arXiv preprint at. (2024).

Shi, D. et al. EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis. arXiv preprint at. (2024).

Wang, T. et al. EmbodiedScan: a holistic multi-modal 3D perception suite towards embodied AI. in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 19757–19767 (IEEE, 2024).

Mieling, R. et al. Collaborative robotic biopsy with trajectory guidance and needle tip force feedback. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 6893–6900 (IEEE, 2023).

Lin, J. et al. Advances in embodied navigation using large language models: a survey. arXiv preprint at. (2024).

Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

Article
CAS
PubMed

Google Scholar

Liu, S. et al. Long short-term human motion prediction in human-robot co-carrying. in Proc. International Conference on Advanced Robotics and Mechatronics (ICARM) 815–820 (IEEE, 2023).

Wang, W. et al. Augmenting Language Models with Long-Term Memory. In Advances in Neural Information Processing Systems (eds Oh, A. et al.) 36, 74530–74543 (Curran Associates, Inc., 2023).

Wang, J. et al. Large language models for robotics: Opportunities, challenges, and perspectives. Journal of Automation and Intelligence 4, 52–64 (2025).

Article

Google Scholar

Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. in 36th Conference on Neural Information Processing Systems (NeurIPS) (eds. Koyejo, S. et al.) (Neural Information Processing Systems (NIPS), 2022).

Wang, X. et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models. The Eleventh International Conference on Learning Representations. (2023).

Wang, D. et al. Hierarchical graph neural networks for causal discovery and root cause localization. arXiv preprint at. (2023).

Mnih, V. et al. Playing Atari with deep reinforcement learning. arXiv preprint at. (2013).

Gomaa, A. & Mahdy, B. Unveiling the role of expert guidance: a comparative analysis of user-centered imitation learning and traditional reinforcement learning. arXiv preprint at. (2024).

Zhang, R. et al. A graph-based reinforcement learning-enabled approach for adaptive human-robot collaborative assembly operations. J. Manuf. Syst. 63, 491–503 (2022).

Article

Google Scholar

Zhang, Y. et al. Towards efficient LLM grounding for embodied multi-agent collaboration. arXiv preprint at. (2024).

Wang, L., Fei, Y., Tang, H. & Yan, R. CLFR-M: Continual learning framework for robots via human feedback and dynamic memory. in Proc. IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) 216–221 (IEEE, 2024).

Deng, H., Zhang, H., Ou, J. & Feng, C. Can LLM be a good path planner based on prompt engineering? Mitigating the hallucination for path planning. arXiv preprint at. (2024).

Chen, L. et al. Towards end-to-end embodied decision making via multi-modal large language model: explorations with GPT4-vision and beyond. NeurIPS 2023 Foundation Models for Decision Making Workshop. (2023).

Singh, I. et al. ProgPrompt: generating situated robot task plans using large language models. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 11523–11530 (IEEE, 2023).

Shin, S., jeon, S., Kim, J., Kang, G.-C. & Zhang, B.-T. Socratic planner: inquiry-based zero-shot planning for embodied instruction following. arXiv preprint at. (2024).

Zhou, Z., Song, J., Yao, K., Shu, Z. & Ma, L. ISR-LLM: iterative self-refined large language model for long-horizon sequential task planning. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 2081–2088 (IEEE, 2024).

Yihao, L. et al. From screens to scenes: a survey of embodied AI in healthcare. Inf. Fusion 119, 103033 (2025).

Article

Google Scholar

Huang, P. I. Y. Enhancement of robot position control for dual-user operation of remote robot system with force. Feedback 14, 9376 (2024).

CAS

Google Scholar

Ding, P. et al. QUAR-VLA: Vision-Language-Action Model for Quadruped Robots. In Computer Vision – ECCV 2024 (eds Leonardis, A. et al.) Vol. 15063, 352–367 (Springer Nature Switzerland, Cham, 2025).

Mu, Y. et al. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. In Advances in Neural Information Processing Systems (eds Oh, A. et al.) Vol. 36, 25081–25094 (Curran Associates, Inc., 2023).

Song, C. H. et al. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 2986–2997 (IEEE, 2023).

Alafaleq, M. Robotics and cybersurgery in ophthalmology: a current perspective. J. Robot. Surg. 17, 1159–1170 (2023).

Article
PubMed

Google Scholar

Nielsen, K. B., Lautrup, M. L., Andersen, J. K., Savarimuthu, T. R. & Grauslund, J. Deep learning–based algorithms in screening of diabetic retinopathy: a systematic review of diagnostic performance. Ophthalmol. Retin. 3, 294–304 (2019).

Article

Google Scholar

Zhu, Y. et al. Advancing glaucoma care: integrating artificial intelligence in diagnosis, management, and progression detection. Bioengineering 11, 122 (2024).

Article
PubMed
PubMed Central

Google Scholar

GBD 2019 Blindness and Vision Impairment Collaborators, Vision Loss Expert Group of the Global Burden of Disease Study Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. Lancet Glob. Health 9, e130–e143 (2021).

Article

Google Scholar

Vujosevic, S., Limoli, C. & Nucci, P. Novel artificial intelligence for diabetic retinopathy and diabetic macular edema: what is new in 2024?. Curr. Opin. Ophthalmol. 35, 472–479 (2024).

Article
PubMed
PubMed Central

Google Scholar

Liu, H. et al. Economic evaluation of combined population-based screening for multiple blindness-causing eye diseases in China: a cost-effectiveness analysis. Lancet Glob. Health 11, e456–e465 (2023).

Article
CAS
PubMed

Google Scholar

Kang, E. Y.-C. et al. A multimodal imaging–based deep learning model for detecting treatment-requiring retinal vascular diseases: model development and validation study. JMIR Med. Inform. 9, e28868 (2021).

Article
PubMed
PubMed Central

Google Scholar

Draelos, M. et al. Contactless optical coherence tomography of the eyes of freestanding individuals with a robotic scanner. Nat. Biomed. Eng. 5, 726–736 (2021).

Article
PubMed
PubMed Central

Google Scholar

He, S. et al. Bridging the camera domain gap with image-to-image translation improves glaucoma diagnosis. Transl. Vis. Sci. Technol. 12, 20–20 (2023).

Article
PubMed
PubMed Central

Google Scholar

Zhen, Y., Yan, H., Qilin, S., Hong, C. & Wei, T. Artificial intelligence-enabled low-cost photorefraction for accurate refractive error measurement under complex ambient lighting conditions: a model development and validation study. Available at SSRN 5064133. (2024).

Vought, R., Vought, V., Szirth, B. & Khouri, A. S. Future direction for the deployment of deep learning artificial intelligence: Vision threatening disease detection in underserved communities during COVID-19. Saudi J. Ophthalmol. 37, 193–199 (2023).

Article
PubMed
PubMed Central

Google Scholar

Song, A. et al. RobOCTNet: robotics and deep learning for referable posterior segment pathology detection in an emergency department population. Transl. Vis. Sci. Technol. 13, 12 (2024).

Article
PubMed
PubMed Central

Google Scholar

Ma, R. et al. Multimodal machine learning enables AI chatbot to diagnose ophthalmic diseases and provide high-quality medical responses. npj Digit. Med. 8, 1–18 (2025).

Article

Google Scholar

Yang, Z. et al. Understanding natural language: potential application of large language models to ophthalmology. Asia Pac. J. Ophthalmol. 13, 100085 (2024).

Article

Google Scholar

Chotcomwongse, P., Ruamviboonsuk, P. & Grzybowski, A. Utilizing large language models in ophthalmology: the current landscape and challenges. Ophthalmol. Ther. 13, 2543–2558 (2024).

Article
PubMed
PubMed Central

Google Scholar

Chen, X. et al. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer. npj Digit. Med. 7, 111 (2024).

Article
PubMed
PubMed Central

Google Scholar

Chen, X. et al. EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model. Journal of Medical Internet Research 26, e60063 (2024).

Article
PubMed
PubMed Central

Google Scholar

Chen, X. et al. ICGA-GPT: report generation and question answering for indocyanine green angiography images. Br. J. Ophthalmol. 108, 1450–1456 (2024).

Article
PubMed

Google Scholar

Chen, X. et al. ChatFFA: an ophthalmic chat system for unified vision-language understanding and question answering for fundus fluorescein angiography. iScience 27, 110021 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Jin, K., Yuan, L., Wu, H., Grzybowski, A. & Ye, J. Exploring large language model for next generation of artificial intelligence in ophthalmology. Front. Med. 10, 1291404 (2023).

Article

Google Scholar

Roizenblatt, M., Grupenmacher, A. T., Belfort Junior, R., Maia, M. & Gehlbach, P. L. Robot-assisted tremor control for performance enhancement of retinal microsurgeons. Br. J. Ophthalmol. 103, 1195–1200 (2019).

Article
PubMed

Google Scholar

Gerber, M. J., Pettenkofer, M. & Hubschman, J. P. Advanced robotic surgical systems in ophthalmology. Eye 34, 1554–1562 (2020).

Article
PubMed
PubMed Central

Google Scholar

Nespolo, R. G. et al. Feature Tracking and segmentation in real time via deep learning in vitreoretinal surgery: a platform for artificial intelligence-mediated surgical guidance. Ophthalmol. Retin. 7, 236–242 (2023).

Article

Google Scholar

Garcia Nespolo, R. et al. Evaluation of artificial intelligence-based intraoperative guidance tools for phacoemulsification cataract surgery. JAMA Ophthalmol. 140, 170–177 (2022).

Article
PubMed
PubMed Central

Google Scholar

Zhou, M. et al. Needle detection and localisation for robot-assisted subretinal injection using deep learning. CAAI Trans. Intell. Technol. 1–13 (2023).

Huang, Y., Asaria, R., Stoyanov, D., Sarunic, M. & Bano, S. PseudoSegRT: efficient pseudo-labelling for intraoperative OCT segmentation. Int J. Comput. Assist. Radio. Surg. 18, 1245–1252 (2023).

Article

Google Scholar

Ladha, R., Meenink, T., Smit, J. & de Smet, M. D. Advantages of robotic assistance over a manual approach in simulated subretinal injections and its relevance for gene therapy. Gene Ther. 30, 264–270 (2023).

Article
CAS
PubMed

Google Scholar

Baldi, P. F. et al. Vitreoretinal surgical instrument tracking in three dimensions using deep learning. Transl. Vis. Sci. Technol. 12, 20 (2023).

Article
PubMed
PubMed Central

Google Scholar

Wu, T. et al. Deep learning-enhanced robotic subretinal injection with real-time retinal motion compensation. arXiv preprint at. (2025).

Kim, J. W. et al. Autonomously navigating a surgical tool inside the eye by learning from demonstration. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 7351–7357 (IEEE, 2020).

Gomaa, A., Mahdy, B., Kleer, N. & Krüger, A. Towards a surgeon-in-the-loop ophthalmic robotic apprentice using reinforcement and imitation learning. in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 6939–6946 (IEEE, 2024).

Messaoudi, M. D., Menelas, B. J. & McHeick, H. Review of navigation assistive tools and technologies for the visually impaired. Sensors22, 7888 (2022).

Article
PubMed
PubMed Central

Google Scholar

Tang, T. et al. Special cane with visual odometry for real-time indoor navigation of blind people. in IEEE International Conference on Visual Communications and Image Processing (VCIP) 255–255 (IEEE, 2020).

Zhang, Y. et al. Visual Navigation of Mobile Robots in Complex Environments Based on Distributed Deep Reinforcement Learning. in 2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT) 1–5 (IEEE, 2022).

Guo, C. & Li, H. Application of 5G network combined with AI robots in personalized nursing in China: a literature review. Front. Public Health 10, 948303 (2022).

Article
PubMed
PubMed Central

Google Scholar

Juang, L. H. & Wu, M. N. Fall Down Detection Under Smart Home System. J. Med. Syst. 39, 107 (2015).

Article
PubMed

Google Scholar

Chen, X. et al. Visual Question Answering in Ophthalmology: a progressive and practical perspective. arXiv preprint at. (2024).

Tam, W. et al. Nursing education in the age of artificial intelligence powered Chatbots (AI-Chatbots): Are we ready yet?. Nurse Educ. Today 129, 105917 (2023).

Article
PubMed

Google Scholar

Liu, Y., Holekamp, N. M. & Heier, J. S. Prospective, longitudinal study: daily self-imaging with home OCT for neovascular age-related macular degeneration. Ophthalmol. Retin. 6, 575–585 (2022).

Article

Google Scholar

Chen, J., Zhan, X., Wang, Y. & Huang, X. Medical robots based on artificial intelligence in the medical education. in Proc. 2nd International Conference on Artificial Intelligence and Education (ICAIE) 1–4 (IEEE, 2021).

Wang, T. et al. Intelligent cataract surgery supervision and evaluation via deep learning. Int. J. Surg. 104, 106740 (2022).

Article
PubMed

Google Scholar

Hamm, J. et al. A Modular robotic platform for biological research: cell culture automation and remote experimentation. Adv. Intell. Syst. 6, 2300566 (2024).

Article

Google Scholar

Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023).

Article
CAS
PubMed
PubMed Central

Google Scholar

Rapp, J. T., Bremer, B. J. & Romero, P. A. Self-driving laboratories to autonomously navigate the protein fitness landscape. Nat. Chem. Eng. 1, 97–107 (2024).

Article
PubMed
PubMed Central

Google Scholar

Tan, T. F. et al. Metaverse and virtual health care in ophthalmology: opportunities and challenges. Asia Pac. J. Ophthalmol.11, 237–246 (2022).

Article

Google Scholar

Kang, D., Nam, C. & Kwak, S. S. Robot feedback design for response delay. Int. J. Soc. Robot. 16, 341–361 (2023).

Article

Google Scholar

Chen, X. et al. Evaluating large language models and agents in healthcare: key challenges in clinical applications. Intelligent Medicine 5, 151–163 (2025).

Article

Google Scholar

Xu, P., Chen, X., Zhao, Z. & Shi, D. Unveiling the clinical incapabilities: a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis. Br. J. Ophthalmol. 108, 1384–1389 (2024).

Article
PubMed

Google Scholar

Majumdar, A. et al. Openeqa: Embodied question answering in the era of foundation models. in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16488–16498 (IEEE, 2024).

Cheng, Z. et al. EmbodiedEval: evaluate multimodal LLMs as embodied agents. arXiv preprint at. (2025).

Mahamadou, A. J. D. & Trotsyuk, A. A. Revisiting technical bias mitigation strategies. Annu. Rev. Biomed. Data Sci. 8, (2025).

Hofmann, V., Kalluri, P. R., Jurafsky, D. & King, S. AI generates covertly racist decisions about people based on their dialect. Nature 633, 147–154 (2024).

Article
CAS
PubMed
PubMed Central

Google Scholar

Di Paolo, M., Boggi, U. & Turillazzi, E. Bioethical approach to robot-assisted surgery. Br. J. Surg. 106, 1271–1272 (2019).

Article
PubMed

Google Scholar

O’Sullivan, S. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int. J. Med. Robot. Comput. Assist. Surg 15, e1968 (2019).

Article

Google Scholar

Biswas, P., Sikander, S. & Kulkarni, P. Recent advances in robot-assisted surgical systems. Biomed. Eng. Adv. 6, 100109 (2023).

Article

Google Scholar

Lee, A., Baker, T. S., Bederson, J. B. & Rapoport, B. I. Levels of autonomy in FDA-cleared surgical robots: a systematic review. npj Digit. Med. 7, 103 (2024).

Article
PubMed
PubMed Central

Google Scholar

Fiske, A., Henningsen, P. & Buyx, A. Your Robot Therapist Will See You Now: Ethical Implications of Embodied Artificial Intelligence in Psychiatry, Psychology, and Psychotherapy. J. Med. Internet Res. 21, e13216 (2019).

Article
PubMed
PubMed Central

Google Scholar

Vats, T. et al. Navigating the landscape: Safeguarding privacy and security in the era of ambient intelligence within healthcare settings. Cyber Security Appl. 2, 100046 (2024).

Article

Google Scholar

Tamuhla, T., Tiffin, N. & Allie, T. An e-consent framework for tiered informed consent for human genomic research in the global south, implemented as a REDCap template. BMC Med. Ethics 23, 119 (2022).

Article
PubMed
PubMed Central

Google Scholar