AI快讯

MMLab@HKU 闪耀CVPR 2025：对话全球AI顶流

小智 AI动态资讯 2025年06月12日

0 收藏 0 点赞 227 浏览 5700 个字

摘要 :

MMLab@HKU 闪耀CVPR 2025：对话全球AI顶流： CVPR 2025 纳什维尔开幕在即：MMLab 携前沿成果深度参与作为计算机视觉领域最具影响力的国际会议之一，CVPR（IEEE Confere……

哈喽！伙伴们，我是小智，你们的AI向导。欢迎来到每日的AI学习时间。今天，我们将一起深入AI的奇妙世界，探索“MMLab@HKU 闪耀CVPR 2025：对话全球AI顶流”，并学会本篇文章中所讲的全部知识点。还是那句话“不必远征未知，只需唤醒你的潜能！”跟着小智的步伐，我们终将学有所成，学以致用，并发现自身的更多可能性。话不多说，现在就让我们开始这场激发潜能的AI学习之旅吧。

MMLab@HKU 闪耀CVPR 2025：对话全球AI顶流：

CVPR 2025 纳什维尔开幕在即：MMLab 携前沿成果深度参与

作为计算机视觉领域最具影响力的国际会议之一，CVPR（IEEE Conference on Computer Vision and Pattern Recognition）每年都汇聚了全球顶尖高校、研究机构与产业界的最新突破与前沿成果。CVPR 2025 将于 6 月 11 日至 15 日在美国纳什维尔举行。MMLab@HKU 携 24 篇高质量论文隆重亮相，涵盖图像生成、视频理解、具身智能、三维重建、多模态融合等多个研究热点。欢迎大家前来与论文作者面对面交流！

相关网站：https://mmlab.hk/event/cvpr2025

三大国际竞赛：以赛促研，智造未来

在 CVPR 2025，MMLab是三项国际竞赛的发起与主办方，涵盖开放世界自动驾驶以及机器人交互智能等多个热门方向。通过精心设计的任务设置与评测机制，团队为全球研究者搭建了一个聚焦真实场景挑战与技术落地能力的竞技舞台。我们希望以此激发更多创新灵感，共同拓展视觉智能的未来边界。

• Autonomous Grand Challenge 2025
• End-to-End Autonomous Driving through V2X Cooperation
• RoboTwin Dual-Arm Collaboration Challenge

六场深度活动：解锁 AI 落地的技术密码

除了国际竞赛，MMLab 在 CVPR 2025 也主办了六项前沿 Workshop、Tutorial 活动，全面覆盖自动驾驶、多模态、世界模型、协同感知、数据赋能等热点议题。

• Embodied Intelligence for Autonomous Systems on the Horizon
• Workshop on Autonomous Driving
• Distillation of Foundation Models for Autonomous Driving
• Multi-Agent Embodied Intelligent Systems Meet Generative-AI Era: Opportunities, Challenges and Futures
• Robotics 101: An Odyssey from A Vision Perspective
• The 1st Workshop on Benchmarking World Models

技术风向标：多项AI前沿研究盘点

在生成式智能与多模态感知飞速发展的当下，这一系列研究成果展示了在跨模态理解、场景生成、人机交互和机器人智能等领域的一些进步。比如，文本驱动的视频合成、图像安全性评估、高精度的三维高斯建模和机器人操作策略学习这些技术，都在提升模型的通用性、效率以及在现实世界中的适应能力。不管你关心的是更安全可信的生成系统、更聪明的机器人大脑，还是更高质量的视觉生成模型，这些项目都代表了技术创新的前沿，欢迎关注！

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization [Oral]

• 统一物理人景交互合成，通过任务分词实现
• arXiv: https://arxiv.org/abs/2503.19901
• Github: https://github.com/liangpan99/TokenHSI

Parallelized Autoregressive Visual Generation [Highlight]

• PAR，根据视觉token间依赖关系所设计的并行自回归生成模型
• arXiv: https://arxiv.org/abs/2412.15119
• Github: https://yuqingwang1029.github.io/PAR-project/

RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins [Highlight]

• 机器人双臂评测基准集与数据合成器
• arXiv: https://arxiv.org/abs/2504.13059
• Github: https://github.com/TianxingChen/RoboTwin

HMAR: Efficient Hierarchical Masked AutoRegressive Image Generation

• HMAR，通过多尺度自回归与掩码重建结合的高效高质量图像生成模型
• arXiv: https://arxiv.org/html/2506.04421v1
• Project Page: https://research.nvidia.com/labs/dir/hmar/

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

• MBQ，均衡视觉和语言之间敏感性差异的视觉-语言模型量化方法
• arXiv: https://arxiv.org/abs/2412.19509
• Github: https://github.com/thu-nics/MBQ

MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation

• MIDI-3D，拓展3D物体生成模型到可组合的3D场景生成。
• Arxiv: https://arxiv.org/abs/2412.03558
• Github: https://github.com/VAST-AI-Research/MIDI-3D

T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation

• T2ISafety，一个评估文生图模型安全性的基准
• Arxiv: https://arxiv.org/abs/2501.12612
• Github: https://github.com/adwardlee/t2i_safety

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

• T2V-CompBench，评估文生视频模型的组合生成能力
• arXiv: https://arxiv.org/abs/2407.14505
• Github: https://t2v-compbench-2025.github.io/

CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians

• 通过 3D Gaussian Splatting 实现高效的组合文本到三维内容生成
• arXiv: https://arxiv.org/abs/2410.20723
• Github: https://chongjiange.github.io/compgs.html

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

• 扩散模型驱动，生成多变3D角色并自动绑定
• arXiv: https://arxiv.org/abs/2411.17423
• Github: https://github.com/yisuanwang/DRiVE

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

• 自适应灵巧手操作的交互感知扩散策略
• arXiv: https://arxiv.org/abs/2411.18562
• Github: https://dexdiffuser.github.io/

Distilling Monocular Foundation Model for Fine-grained Depth Completion

• 知识蒸馏得到单目基础模型用于将稀疏深度稠密化
• arXiv: https://arxiv.org/abs/2503.16970
• Github: https://github.com/Sharpiless/DMD3C

Flashgs: Efficient 3d gaussian splatting for large-scale and high-resolution rendering

• 高效 3D Gaussian Splatting，实现大规模高分辨率渲染
• arXiv: https://arxiv.org/abs/2408.07967
• Github: https://github.com/InternLandMark/FlashGS

Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models

• 针对大型视觉语言模型伪造检测能力的全新综合评测基准
• arXiv: https://arxiv.org/pdf/2503.15024
• Github: https://github.com/Forensics-Bench/Forensics-Bench
• project page: https://forensics-bench.github.io/
• dataset:https://huggingface.co/datasets/Forensics-bench/Forensics-bench

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

• 生成式机器人3D操作增强表征
• arXiv: https://arxiv.org/abs/2411.18369
• Github: https://github.com/TianxingChen/G3Flow

GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning

• 通过视频预训练，根据当前图结构预测生成未来图结构

Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes

• 统一空地视角3D Gaussian Splatting，重建渲染大场景
• arXiv: https://arxiv.org/abs/2412.01745
• Github: https://github.com/OpenRobotLab/HorizonGS

Janus: Decoupling visual encoding for unified multimodal understanding and generation

• 解耦视觉编码以实现统一的多模态理解和生成
• arXiv: https://arxiv.org/abs/2410.13848
• Github: https://github.com/deepseek-ai/Janus

JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data

• 用CARLA的仿真数据帮助减少自动驾驶感知中实际数据标注的需求
• arXiv: https://arxiv.org/abs/2503.08422
• Github: https://github.com/Runjian-Chen/JiSAM

MangaNinja: Line Art Colorization with Precise Reference Following

• 精准可控的线稿上色
• arXiv: https://arxiv.org/abs/2501.08332
• Github: https://github.com/ali-vilab/MangaNinjia

NADER: Neural Architecture Design via Multi-Agent Collaboration

• 多智能体协作的神经网络架构设计
• arXiv: https://arxiv.org/abs/2412.19206

OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation

• 首个面向开放式图文交错生成任务的综合评测基准
• arXiv: https://arxiv.org/abs/2411.18499
• Github: https://opening-benchmark.github.io/

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

• 面向机器人操控的 MLLM 大模型
• arXiv: https://arxiv.org/abs/2502.21257
• Github: https://github.com/FlagOpen/RoboBrain

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model

• 探索自回归运动生成模型的缩放定律
• arXiv: https://arxiv.org/abs/2412.14559
• Github: https://github.com/shunlinlu/ScaMo_code

EdgeTAM: On-Device Track Anything Model

• 压缩视频分割基模型SAM2，保持模型效果同时实现端侧部署
• arXiv: https://arxiv.org/abs/2501.07256

嘿，伙伴们，今天我们的AI探索之旅已经圆满结束。关于“MMLab@HKU 闪耀CVPR 2025：对话全球AI顶流”的内容已经分享给大家了。感谢你们的陪伴，希望这次旅程让你对AI能够更了解、更喜欢。谨记，精准提问是解锁AI潜能的钥匙哦！如果有小伙伴想要了解学习更多的AI知识，请关注我们的官网“AI智研社”，保证让你收获满满呦！

赏

微信打赏二维码微信扫一扫