Open-Source Research Demo GitHub Pages Showcase

NingBGM: Video Background Music Generation via Holistic Scene Understanding with Multi-Agent Collaboration

Project page, demo results, and open-source implementation for multimodal video background music generation with collaborative scene understanding.

View NingBGM Code Explore Demo Results

Abstract

NingBGM Code

Existing video-to-music generation methods primarily rely on single visual modality inputs, often failing to capture the rich multimodal information present in real-world scenarios. Consequently, the generated music frequently lacks semantic and emotional alignment with the scene. To address this, we propose NingBGM, a video background music generation framework based on multi-agent collaboration. This framework introduces a holistic scene representation that fuses multi-source information to model temporal dynamics, objects, semantics, emotions, and narrative logic, serving as a guiding input for music generation. NingBGM organizes agents into three specialized teams for requirements analysis, scene understanding, and music generation. Each team follows a collaborative mechanism of supervisor assignment, expert execution, and verifier validation to simulate professional creation workflows. Furthermore, a structured agent role definition and task design module allow general agents to rapidly transform into domain experts, translating high-level user intentions into executable micro-instructions. We also construct and open-source CSD-200, the first multimodal holistic scene test set for video-to-music tasks, which fills a gap in fine-grained evaluation benchmarks. Experiments demonstrate that NingBGM achieves superior performance compared to baselines across multiple key metrics, validating the effectiveness of the multi-agent collaborative paradigm. Ablation studies further confirm that constructing holistic scenes through multimodal fusion is crucial for enhancing audio-visual synchronization.

Jump to comparisons

Open-source code

NingBGM

Browse the GitHub repository, implementation details, and project resources for the paper demo.

View Code on GitHub

Experiment (Demo)

免责声明

以下说明仅用于学术研究与技术展示。如对内容来源、版权或合规性存在疑问，请与我们联系。

学术用途声明

本页面中的模型结果、示例样本与对比内容仅用于学术研究和技术展示，用于说明方法设计与实验效果，不构成任何商业承诺、授权或服务保证。

版权说明

页面中涉及的原始视频、图片、音频及其他素材，其版权归原权利人所有。除论文方法、页面组织与自制结果展示外，其余内容仅在合理展示范围内使用。

致谢

Video comparison 区域的部分素材参考自 https://vem-paper.github.io/VeM-page/。

引用

如果本项目对你有帮助，欢迎引用我们的论文。论文正式引用信息可在后续版本中补充。

更多结果使用说明

部分展示素材来源于公开网络或公开数据。
相关内容仅用于学术研究与技术展示。
禁止任何个人或组织将页面内容直接用于商业用途。
如存在版权、隐私或合规问题，请联系我们，我们会及时处理或下架相关内容。