| 1 |
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization |
提出LIBERO-PRO以解决现有VLA模型评估不公正问题 |
vision-language-action VLA |
✅ |
|
| 2 |
Bridge Thinking and Acting: Unleashing Physical Potential of VLM with Generalizable Action Expert |
提出基于可泛化动作专家的框架,提升VLM在物理世界的动作执行能力 |
vision-language-action VLA |
|
|
| 3 |
Exploring Instruction Data Quality for Explainable Image Quality Assessment |
针对可解释图像质量评估,提出基于聚类的数据选择方法IQA-Select,提升数据效率。 |
large language model multimodal |
|
|
| 4 |
Harnessing Synthetic Preference Data for Enhancing Temporal Understanding of Video-LLMs |
TimeWarp:利用合成偏好数据增强视频大语言模型的时间理解能力 |
large language model |
✅ |
|
| 5 |
Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition |
提出基于3D生物力学动作识别的网球动作语言反馈框架,提升训练效果。 |
large language model |
|
|
| 6 |
DHQA-4D: Perceptual Quality Assessment of Dynamic 4D Digital Human |
提出DHQA-4D数据集与DynaMesh-Rater模型,用于动态4D数字人感知质量评估 |
multimodal |
|
|
| 7 |
The Overlooked Value of Test-time Reference Sets in Visual Place Recognition |
提出参考集微调方法,提升视觉定位在跨域场景下的泛化性能 |
foundation model |
|
|