| 1 |
All You Need for Object Detection: From Pixels, Points, and Prompts to Next-Gen Fusion and Multimodal LLMs/VLMs in Autonomous Vehicles |
面向自动驾驶,综述融合LLM/VLM的新一代多模态目标检测技术 |
large language model multimodal |
|
|
| 2 |
OracleAgent: A Multimodal Reasoning Agent for Oracle Bone Script Research |
OracleAgent:用于甲骨文研究的多模态推理Agent系统 |
large language model multimodal |
|
|
| 3 |
AD-SAM: Fine-Tuning the Segment Anything Vision Foundation Model for Autonomous Driving Perception |
AD-SAM:微调SAM视觉基础模型,用于自动驾驶感知 |
foundation model |
|
|
| 4 |
ProstNFound+: A Prospective Study using Medical Foundation Models for Prostate Cancer Detection |
ProstNFound+:利用医学基础模型实现前列腺癌微超声检测的前瞻性研究 |
foundation model |
|
|
| 5 |
SpinalSAM-R1: A Vision-Language Multimodal Interactive System for Spine CT Segmentation |
SpinalSAM-R1:用于脊柱CT分割的视觉-语言多模态交互系统 |
multimodal |
✅ |
|
| 6 |
MoME: Mixture of Visual Language Medical Experts for Medical Imaging Segmentation |
提出MoME:一种用于医学影像分割的视觉语言混合专家模型 |
large language model foundation model |
|
|
| 7 |
WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios |
WOD-E2E:针对端到端驾驶中长尾场景的Waymo开放数据集 |
large language model multimodal |
|
|
| 8 |
Semantic Frame Aggregation-based Transformer for Live Video Comment Generation |
提出基于语义帧聚合Transformer的SFAT模型,用于生成直播视频评论。 |
multimodal |
|
|
| 9 |
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes |
OmniX:利用全景生成与感知,生成可用于图形渲染的3D场景 |
multimodal |
|
|
| 10 |
SteerVLM: Robust Model Control through Lightweight Activation Steering for Vision Language Models |
提出SteerVLM以增强视觉语言模型的控制能力 |
multimodal |
|
|
| 11 |
Representation-Level Counterfactual Calibration for Debiased Zero-Shot Recognition |
提出表征级反事实校准方法,解决零样本识别中的上下文偏差问题 |
multimodal |
|
|
| 12 |
Which Way Does Time Flow? A Psychophysics-Grounded Evaluation for Vision-Language Models |
提出AoT-PsyPhyBENCH基准,评估视觉-语言模型对视频时间方向的理解能力 |
multimodal |
|
|
| 13 |
ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts |
ConceptScope:通过解耦视觉概念表征来量化和识别数据集偏差。 |
foundation model |
|
|