基于生成式人工智能的虚拟现实媒介内容生成

出版科学 ›› 2026, Vol. 34 ›› Issue (2): 50-58.

基于生成式人工智能的虚拟现实媒介内容生成

（马来西亚城市大学创意产业学院，吉隆坡，46100）（中国矿业大学计算机科学与技术学院 / 人工智能学院，徐州，221116）

收稿日期:2025-02-10 修回日期:2026-01-12 出版日期:2026-05-08 发布日期:2026-05-08
作者简介:曹苇航，马来西亚城市大学创意产业学院2023级博士生、讲师；曹天杰，工学博士，中国矿业大学计算机科学与技术学院 / 人工智能学院教授、博士生导师。
基金资助:
G237
本文系2023年江苏省高等教育教改研究立项课题（重点）“高校教师课堂质量AI辅助管理方法研究”（2023JSJG178）研究成果。

Content Generation of Virtual Reality Media Based on Generative Artificial Intelligence

（Faculty of Creative Industries, City University Malaysia, Kuala Lumpur, 46100）（School of Computer Science and Technology / School of Artificial Intelligence, China University of Mining and Technology, XuZhou, 221116）

Received:2025-02-10 Revised:2026-01-12 Online:2026-05-08 Published:2026-05-08

摘要/Abstract

摘要：

探讨生成式人工智能技术驱动下的虚拟现实媒介内容生成特征与机制。指出生成式人工智能赋能下的虚拟现实媒介具备四大特征：一是通过自动化生成技术提升内容创作效率；二是支持多样化的感官体验；三是依托用户数据实现虚拟环境的个性化适配；四是支持虚实交互的实时性与自然性。这些特性在工程实现中面临的三大核心需求，包括跨模态数据的高效协同生成，虚拟环境与物理世界的动态一致性保持，以及多模态信息融合的沉浸感优化。为系统性地支撑这些核心需求，构建了“跨模态转换 - 虚实映射 - 物理仿真 - 多模态信息融合”为核心的内容生成四元机制模型。最后，指出在大规模产业化应用方面，虚拟现实媒介内容生成仍面临技术、伦理、规范和用户体验等多维度的挑战。

关键词:

生成式人工智能　虚拟现实媒介　模态对齐　数字孪生　多模态信息融合

Abstract:

This article explores the content generation characteristics and mechanisms of virtual reality media （VRM） driven by generative artificial intelligence （GenAI）. The article identifies four features of VRM empowered by GenAI ： first， it enhances content creation efficiency through automated generation technology; second， it supports diverse sensory experiences; third， it leverages user data analysis to achieve personalized adaptation of virtual environments; and fourth， it ensures real-time and natural interaction between the virtual and physical worlds. The article reveals three core requirements for engineering implementation ：efficient collaborative generation of cross-modal data， dynamic consistency between virtual environments and the physical world， and optimization of immersion through multimodal information fusion. To systematically address these requirements， a content generation quadrupole mechanism model is proposed， centered on “cross-modal transformation - virtual-real mapping - physical simulation - multimodal information fusion”. Finally, the article notes that VRM content generation for large-scale industrial applications still faces multidimensional challenges, including technology, ethics, regulations, and user experience.

Key words: Generative artificial intelligence　Virtual reality media　Modality alignment　Digital twins　Multimodal information fusion

中图分类号:

G237 " target="_blank">
G237

曹苇航　曹天杰. 基于生成式人工智能的虚拟现实媒介内容生成[J]. 出版科学, 2026, 34(2): 50-58.

Cao Weihang Cao Tianjie. Content Generation of Virtual Reality Media Based on Generative Artificial Intelligence[J]. Publishing Journal, 2026, 34(2): 50-58.

参考文献

[1] Kim J. Synthetic vision in virtual reality documentaries[J]. Film-Philosophy， 2021， 25（3）： 322

[2] 张利洁，王小禾 . 跨越“第四堵墙”：虚拟现实叙事的媒介潜力 [J]. 中国出版，2022（18）：25

[3] 尤丽娜，周诗涵，周荣庭 .“AIGC+”：虚拟现实媒介内容生产机制研究 [J]. 出版科学，2024，32（3）：33

[4]Raj A， Kaza S， Poole B， et al. Dreambooth3d ： Subject-driven text-to-3d generation[C]//

Proceedings of the IEEE/CVF international conference on computer vision. 2023 ： 2350

[5] Shin J， Lee J， Lee S， et al. CanonicalFusion ： Generating Drivable 3D Human Avatars from Multiple Images[C]//European Conference on Computer Vision. Cham ： Springer Nature Switzerland， 2024 ： 40

[6] Su M， Feng B， Zhao J， et al. Big Movements or Small Motions ： Controlling Digital Avatars with Single-Camera Motion Capture[C]//International Conference on Human-Computer Interaction. Cham ： Springer Nature Switzerland， 2024 ： 130-148

[7] Normoyle A， Sedoc J， Durupinar F. Using LLMs to animate interactive story characters with emotions and personality[C]//2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops （VRW）. IEEE， 2024 ： 634

[8] Sanchez-Gonzalez A， Godwin J， Pfaff T， et al. Learning to simulate complex physics with graph networks[C]//International conference on machine learning. PMLR， 2020 ： 8459

[9] Frisoli A， Leonardis D. Wearable haptics for virtual reality and beyond[J]. Nature Reviews Electrical Engineering， 2024， 1（10）： 667

[10] Radford A， Kim J W， Hallacy C， et al. Learning transferable visual models from natural language supervision[C]//International conference on machine learning. PmLR， 2021 ： 8748-8749

[11] Pang Y， Li Y， Shen J， et al. Towards bridging semantic gap to improve semantic segmentation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision， 2019 ： 4236

[12] Zhou B， Li L， Wang Y， et al. Unialign ： Scaling Multimodal Alignment within One Unified Model[C]// Proceedings of the Computer Vision and Pattern Recognition Conference，2025 ： 29644-29645

[13] 程樯，陈微 . 数智化“生成式”电影多模态交互构建研究 [J]. 电影艺术，2024（3）：116-117

[14]Dihlmann J N， Engelhardt A， Lensch H. SIGNeRF ： Scene integrated generation for neural radiance fields[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition，2024 ： 6679

[15]Wu T， Yuan Y J， Zhang L X， et al. Recent advances in 3d gaussian splatting[J]. Computational Visual Media， 2024， 10（4）： 613

[16]Farea A， Yli-Harja O， Emmert-Streib F. Understanding physics-informed neural networks ： techniques， applications， trends， and challenges[J]. AI， 2024， 5（3）： 1534

[17]Chen P， Gong J， Chen T. Accuracy can lie ： On the impact of surrogate model in configuration tuning[J]. IEEE Transactions on Software Engineering， 2025， 51（2）：548

[18]Liang Z， Xu Y， Hong Y， et al. A Survey of Multimodel Large Language Models[C]//Proceedings of the 3rd International Conference on Computer， Artificial Intelligence and Control Engineering， 2024 ： 405