Embodied artificial intelligence in ophthalmology

Embodied artificial intelligence in ophthalmology
  • Duan, J., Yu, S., Tan, H. L., Zhu, H. & Tan, C. A survey of embodied AI: from simulators to research tasks. IEEE Trans. Emerg. Top. Comput. Intell. 6, 230–244 (2022).

    Article 

    Google Scholar 

  • Smith, L. & Gasser, M. The development of embodied cognition: six lessons from babies. Artif. Life 11, 13–29 (2005).

    Article 
    PubMed 

    Google Scholar 

  • Strathearn, C. & Ma, M. Modelling user preference for embodied artificial intelligence and appearance in realistic humanoid robots. Informatics 7, 28 (2020).

    Article 

    Google Scholar 

  • Kumar, K. A., Rajan, J. F., Appala, C., Balurgi, S. & Balaiahgari, P. R. Medibot: personal medical assistant. in Proc. 2nd International Conference on Networking and Communications (ICNWC) 1–6 (2024).

  • Thirunavukarasu, A. J. et al. Robot-assisted eye surgery: a systematic review of effectiveness, safety, and practicality in clinical settings. Transl. Vis. Sci. Technol. 13, 20 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vimala, S. et al. Telemedical robot using IoT with live supervision and emergency alert. in Proc. 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN) 1327–1331 (IEEE, 2023).

  • Wang, W. et al. Neuromorphic sensorimotor loop embodied by monolithically integrated, low-voltage, soft e-skin. Science 380, 735–742 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Liu, T. L. et al. Robot learning to play drums with an open-ended internal model. in Proc. IEEE International Conference on Robotics And Biomimetics (ROBIO) 305–311 (IEEE, 2018).

  • Zhuang, Z. Y., Yu, X., Mahony, R. & IEEE. LyRN (Lyapunov Reaching Network): a real-time closed loop approach from monocular vision. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 8331–8337 (IEEE, 2020).

  • Zhao, Z. et al. Exploring embodied intelligence in soft robotics: a review. Biomimetics 9, 248 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liu, Y., Tan, Y. & Lan, H. Self-supervised contrastive learning for audio-visual action recognition. in 30th IEEE International Conference on Image Processing (ICIP) 1000–1004 (IEEE, 2023).

  • Abràmoff, M. D., Lavin, P. T., Birch, M., Shah, N. & Folk, J. C. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digit. Med. 1, 39 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ting, D. S. W. et al. Deep learning in ophthalmology: the technical and clinical considerations. Prog. Retin. Eye Res. 72, 100759 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Shi, D. et al. Translation of color fundus photography into fluorescein angiography using deep learning for enhanced diabetic retinopathy screening. Ophthalmol. Sci. 3, 100401 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, R. et al. Translating color fundus photography to indocyanine green angiography using deep-learning for age-related macular degeneration screening. npj Digit. Med. 7, 34 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Song, F., Zhang, W., Zheng, Y., Shi, D. & He, M. A deep learning model for generating fundus autofluorescence images from color fundus photography. Adv. Ophthalmol. Pr. Res. 3, 192–198 (2023).

    Google Scholar 

  • Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Shi, D. et al. EyeFound: a multimodal generalist foundation model for ophthalmic imaging. arXiv preprint at. (2024).

  • Shi, D. et al. EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis. arXiv preprint at. (2024).

  • Wang, T. et al. EmbodiedScan: a holistic multi-modal 3D perception suite towards embodied AI. in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 19757–19767 (IEEE, 2024).

  • Mieling, R. et al. Collaborative robotic biopsy with trajectory guidance and needle tip force feedback. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 6893–6900 (IEEE, 2023).

  • Lin, J. et al. Advances in embodied navigation using large language models: a survey. arXiv preprint at. (2024).

  • Gao, S. et al. Empowering biomedical discovery with AI agents. Cell 187, 6125–6151 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Liu, S. et al. Long short-term human motion prediction in human-robot co-carrying. in Proc. International Conference on Advanced Robotics and Mechatronics (ICARM) 815–820 (IEEE, 2023).

  • Wang, W. et al. Augmenting Language Models with Long-Term Memory. In Advances in Neural Information Processing Systems (eds Oh, A. et al.) 36, 74530–74543 (Curran Associates, Inc., 2023).

  • Wang, J. et al. Large language models for robotics: Opportunities, challenges, and perspectives. Journal of Automation and Intelligence 4, 52–64 (2025).

    Article 

    Google Scholar 

  • Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. in 36th Conference on Neural Information Processing Systems (NeurIPS) (eds. Koyejo, S. et al.) (Neural Information Processing Systems (NIPS), 2022).

  • Wang, X. et al. Self-Consistency Improves Chain of Thought Reasoning in Language Models. The Eleventh International Conference on Learning Representations. (2023).

  • Wang, D. et al. Hierarchical graph neural networks for causal discovery and root cause localization. arXiv preprint at. (2023).

  • Mnih, V. et al. Playing Atari with deep reinforcement learning. arXiv preprint at. (2013).

  • Gomaa, A. & Mahdy, B. Unveiling the role of expert guidance: a comparative analysis of user-centered imitation learning and traditional reinforcement learning. arXiv preprint at. (2024).

  • Zhang, R. et al. A graph-based reinforcement learning-enabled approach for adaptive human-robot collaborative assembly operations. J. Manuf. Syst. 63, 491–503 (2022).

    Article 

    Google Scholar 

  • Zhang, Y. et al. Towards efficient LLM grounding for embodied multi-agent collaboration. arXiv preprint at. (2024).

  • Wang, L., Fei, Y., Tang, H. & Yan, R. CLFR-M: Continual learning framework for robots via human feedback and dynamic memory. in Proc. IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE International Conference on Robotics, Automation and Mechatronics (RAM) 216–221 (IEEE, 2024).

  • Deng, H., Zhang, H., Ou, J. & Feng, C. Can LLM be a good path planner based on prompt engineering? Mitigating the hallucination for path planning. arXiv preprint at. (2024).

  • Chen, L. et al. Towards end-to-end embodied decision making via multi-modal large language model: explorations with GPT4-vision and beyond. NeurIPS 2023 Foundation Models for Decision Making Workshop. (2023).

  • Singh, I. et al. ProgPrompt: generating situated robot task plans using large language models. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 11523–11530 (IEEE, 2023).

  • Shin, S., jeon, S., Kim, J., Kang, G.-C. & Zhang, B.-T. Socratic planner: inquiry-based zero-shot planning for embodied instruction following. arXiv preprint at. (2024).

  • Zhou, Z., Song, J., Yao, K., Shu, Z. & Ma, L. ISR-LLM: iterative self-refined large language model for long-horizon sequential task planning. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 2081–2088 (IEEE, 2024).

  • Yihao, L. et al. From screens to scenes: a survey of embodied AI in healthcare. Inf. Fusion 119, 103033 (2025).

    Article 

    Google Scholar 

  • Huang, P. I. Y. Enhancement of robot position control for dual-user operation of remote robot system with force. Feedback 14, 9376 (2024).

    CAS 

    Google Scholar 

  • Ding, P. et al. QUAR-VLA: Vision-Language-Action Model for Quadruped Robots. In Computer Vision – ECCV 2024 (eds Leonardis, A. et al.) Vol. 15063, 352–367 (Springer Nature Switzerland, Cham, 2025).

  • Mu, Y. et al. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. In Advances in Neural Information Processing Systems (eds Oh, A. et al.) Vol. 36, 25081–25094 (Curran Associates, Inc., 2023).

  • Song, C. H. et al. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. in 2023 IEEE/CVF International Conference on Computer Vision (ICCV) 2986–2997 (IEEE, 2023).

  • Alafaleq, M. Robotics and cybersurgery in ophthalmology: a current perspective. J. Robot. Surg. 17, 1159–1170 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Nielsen, K. B., Lautrup, M. L., Andersen, J. K., Savarimuthu, T. R. & Grauslund, J. Deep learning–based algorithms in screening of diabetic retinopathy: a systematic review of diagnostic performance. Ophthalmol. Retin. 3, 294–304 (2019).

    Article 

    Google Scholar 

  • Zhu, Y. et al. Advancing glaucoma care: integrating artificial intelligence in diagnosis, management, and progression detection. Bioengineering 11, 122 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • GBD 2019 Blindness and Vision Impairment Collaborators, Vision Loss Expert Group of the Global Burden of Disease Study Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. Lancet Glob. Health 9, e130–e143 (2021).

    Article 

    Google Scholar 

  • Vujosevic, S., Limoli, C. & Nucci, P. Novel artificial intelligence for diabetic retinopathy and diabetic macular edema: what is new in 2024?. Curr. Opin. Ophthalmol. 35, 472–479 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Liu, H. et al. Economic evaluation of combined population-based screening for multiple blindness-causing eye diseases in China: a cost-effectiveness analysis. Lancet Glob. Health 11, e456–e465 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Kang, E. Y.-C. et al. A multimodal imaging–based deep learning model for detecting treatment-requiring retinal vascular diseases: model development and validation study. JMIR Med. Inform. 9, e28868 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Draelos, M. et al. Contactless optical coherence tomography of the eyes of freestanding individuals with a robotic scanner. Nat. Biomed. Eng. 5, 726–736 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • He, S. et al. Bridging the camera domain gap with image-to-image translation improves glaucoma diagnosis. Transl. Vis. Sci. Technol. 12, 20–20 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhen, Y., Yan, H., Qilin, S., Hong, C. & Wei, T. Artificial intelligence-enabled low-cost photorefraction for accurate refractive error measurement under complex ambient lighting conditions: a model development and validation study. Available at SSRN 5064133. (2024).

  • Vought, R., Vought, V., Szirth, B. & Khouri, A. S. Future direction for the deployment of deep learning artificial intelligence: Vision threatening disease detection in underserved communities during COVID-19. Saudi J. Ophthalmol. 37, 193–199 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Song, A. et al. RobOCTNet: robotics and deep learning for referable posterior segment pathology detection in an emergency department population. Transl. Vis. Sci. Technol. 13, 12 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Ma, R. et al. Multimodal machine learning enables AI chatbot to diagnose ophthalmic diseases and provide high-quality medical responses. npj Digit. Med. 8, 1–18 (2025).

    Article 

    Google Scholar 

  • Yang, Z. et al. Understanding natural language: potential application of large language models to ophthalmology. Asia Pac. J. Ophthalmol. 13, 100085 (2024).

    Article 

    Google Scholar 

  • Chotcomwongse, P., Ruamviboonsuk, P. & Grzybowski, A. Utilizing large language models in ophthalmology: the current landscape and challenges. Ophthalmol. Ther. 13, 2543–2558 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, X. et al. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer. npj Digit. Med. 7, 111 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, X. et al. EyeGPT for Patient Inquiries and Medical Education: Development and Validation of an Ophthalmology Large Language Model. Journal of Medical Internet Research 26, e60063 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Chen, X. et al. ICGA-GPT: report generation and question answering for indocyanine green angiography images. Br. J. Ophthalmol. 108, 1450–1456 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Chen, X. et al. ChatFFA: an ophthalmic chat system for unified vision-language understanding and question answering for fundus fluorescein angiography. iScience 27, 110021 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Jin, K., Yuan, L., Wu, H., Grzybowski, A. & Ye, J. Exploring large language model for next generation of artificial intelligence in ophthalmology. Front. Med. 10, 1291404 (2023).

    Article 

    Google Scholar 

  • Roizenblatt, M., Grupenmacher, A. T., Belfort Junior, R., Maia, M. & Gehlbach, P. L. Robot-assisted tremor control for performance enhancement of retinal microsurgeons. Br. J. Ophthalmol. 103, 1195–1200 (2019).

    Article 
    PubMed 

    Google Scholar 

  • Gerber, M. J., Pettenkofer, M. & Hubschman, J. P. Advanced robotic surgical systems in ophthalmology. Eye 34, 1554–1562 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Nespolo, R. G. et al. Feature Tracking and segmentation in real time via deep learning in vitreoretinal surgery: a platform for artificial intelligence-mediated surgical guidance. Ophthalmol. Retin. 7, 236–242 (2023).

    Article 

    Google Scholar 

  • Garcia Nespolo, R. et al. Evaluation of artificial intelligence-based intraoperative guidance tools for phacoemulsification cataract surgery. JAMA Ophthalmol. 140, 170–177 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Zhou, M. et al. Needle detection and localisation for robot-assisted subretinal injection using deep learning. CAAI Trans. Intell. Technol. 1–13 (2023).

  • Huang, Y., Asaria, R., Stoyanov, D., Sarunic, M. & Bano, S. PseudoSegRT: efficient pseudo-labelling for intraoperative OCT segmentation. Int J. Comput. Assist. Radio. Surg. 18, 1245–1252 (2023).

    Article 

    Google Scholar 

  • Ladha, R., Meenink, T., Smit, J. & de Smet, M. D. Advantages of robotic assistance over a manual approach in simulated subretinal injections and its relevance for gene therapy. Gene Ther. 30, 264–270 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar 

  • Baldi, P. F. et al. Vitreoretinal surgical instrument tracking in three dimensions using deep learning. Transl. Vis. Sci. Technol. 12, 20 (2023).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Wu, T. et al. Deep learning-enhanced robotic subretinal injection with real-time retinal motion compensation. arXiv preprint at. (2025).

  • Kim, J. W. et al. Autonomously navigating a surgical tool inside the eye by learning from demonstration. in Proc. IEEE International Conference on Robotics and Automation (ICRA) 7351–7357 (IEEE, 2020).

  • Gomaa, A., Mahdy, B., Kleer, N. & Krüger, A. Towards a surgeon-in-the-loop ophthalmic robotic apprentice using reinforcement and imitation learning. in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 6939–6946 (IEEE, 2024).

  • Messaoudi, M. D., Menelas, B. J. & McHeick, H. Review of navigation assistive tools and technologies for the visually impaired. Sensors22, 7888 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tang, T. et al. Special cane with visual odometry for real-time indoor navigation of blind people. in IEEE International Conference on Visual Communications and Image Processing (VCIP) 255–255 (IEEE, 2020).

  • Zhang, Y. et al. Visual Navigation of Mobile Robots in Complex Environments Based on Distributed Deep Reinforcement Learning. in 2022 6th Asian Conference on Artificial Intelligence Technology (ACAIT) 1–5 (IEEE, 2022).

  • Guo, C. & Li, H. Application of 5G network combined with AI robots in personalized nursing in China: a literature review. Front. Public Health 10, 948303 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Juang, L. H. & Wu, M. N. Fall Down Detection Under Smart Home System. J. Med. Syst. 39, 107 (2015).

    Article 
    PubMed 

    Google Scholar 

  • Chen, X. et al. Visual Question Answering in Ophthalmology: a progressive and practical perspective. arXiv preprint at. (2024).

  • Tam, W. et al. Nursing education in the age of artificial intelligence powered Chatbots (AI-Chatbots): Are we ready yet?. Nurse Educ. Today 129, 105917 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Liu, Y., Holekamp, N. M. & Heier, J. S. Prospective, longitudinal study: daily self-imaging with home OCT for neovascular age-related macular degeneration. Ophthalmol. Retin. 6, 575–585 (2022).

    Article 

    Google Scholar 

  • Chen, J., Zhan, X., Wang, Y. & Huang, X. Medical robots based on artificial intelligence in the medical education. in Proc. 2nd International Conference on Artificial Intelligence and Education (ICAIE) 1–4 (IEEE, 2021).

  • Wang, T. et al. Intelligent cataract surgery supervision and evaluation via deep learning. Int. J. Surg. 104, 106740 (2022).

    Article 
    PubMed 

    Google Scholar 

  • Hamm, J. et al. A Modular robotic platform for biological research: cell culture automation and remote experimentation. Adv. Intell. Syst. 6, 2300566 (2024).

    Article 

    Google Scholar 

  • Szymanski, N. J. et al. An autonomous laboratory for the accelerated synthesis of novel materials. Nature 624, 86–91 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rapp, J. T., Bremer, B. J. & Romero, P. A. Self-driving laboratories to autonomously navigate the protein fitness landscape. Nat. Chem. Eng. 1, 97–107 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tan, T. F. et al. Metaverse and virtual health care in ophthalmology: opportunities and challenges. Asia Pac. J. Ophthalmol.11, 237–246 (2022).

    Article 

    Google Scholar 

  • Kang, D., Nam, C. & Kwak, S. S. Robot feedback design for response delay. Int. J. Soc. Robot. 16, 341–361 (2023).

    Article 

    Google Scholar 

  • Chen, X. et al. Evaluating large language models and agents in healthcare: key challenges in clinical applications. Intelligent Medicine 5, 151–163 (2025).

    Article 

    Google Scholar 

  • Xu, P., Chen, X., Zhao, Z. & Shi, D. Unveiling the clinical incapabilities: a benchmarking study of GPT-4V(ision) for ophthalmic multimodal image analysis. Br. J. Ophthalmol. 108, 1384–1389 (2024).

    Article 
    PubMed 

    Google Scholar 

  • Majumdar, A. et al. Openeqa: Embodied question answering in the era of foundation models. in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 16488–16498 (IEEE, 2024).

  • Cheng, Z. et al. EmbodiedEval: evaluate multimodal LLMs as embodied agents. arXiv preprint at. (2025).

  • Mahamadou, A. J. D. & Trotsyuk, A. A. Revisiting technical bias mitigation strategies. Annu. Rev. Biomed. Data Sci. 8, (2025).

  • Hofmann, V., Kalluri, P. R., Jurafsky, D. & King, S. AI generates covertly racist decisions about people based on their dialect. Nature 633, 147–154 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Di Paolo, M., Boggi, U. & Turillazzi, E. Bioethical approach to robot-assisted surgery. Br. J. Surg. 106, 1271–1272 (2019).

    Article 
    PubMed 

    Google Scholar 

  • O’Sullivan, S. Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int. J. Med. Robot. Comput. Assist. Surg 15, e1968 (2019).

    Article 

    Google Scholar 

  • Biswas, P., Sikander, S. & Kulkarni, P. Recent advances in robot-assisted surgical systems. Biomed. Eng. Adv. 6, 100109 (2023).

    Article 

    Google Scholar 

  • Lee, A., Baker, T. S., Bederson, J. B. & Rapoport, B. I. Levels of autonomy in FDA-cleared surgical robots: a systematic review. npj Digit. Med. 7, 103 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Fiske, A., Henningsen, P. & Buyx, A. Your Robot Therapist Will See You Now: Ethical Implications of Embodied Artificial Intelligence in Psychiatry, Psychology, and Psychotherapy. J. Med. Internet Res. 21, e13216 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Vats, T. et al. Navigating the landscape: Safeguarding privacy and security in the era of ambient intelligence within healthcare settings. Cyber Security Appl. 2, 100046 (2024).

    Article 

    Google Scholar 

  • Tamuhla, T., Tiffin, N. & Allie, T. An e-consent framework for tiered informed consent for human genomic research in the global south, implemented as a REDCap template. BMC Med. Ethics 23, 119 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • link