| 1 |
Decipher the Modality Gap in Multimodal Contrastive Learning: From Convergent Representations to Pairwise Alignment |
理论分析多模态对比学习中的模态鸿沟,揭示维度坍塌是根本原因,并提出对齐方案。 |
contrastive learning multimodal |
|
|
| 2 |
SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts |
SPEC-RL:通过推测性Rollout加速On-Policy强化学习,提升LLM推理效率。 |
reinforcement learning PPO large language model |
✅ |
|
| 3 |
General Exploratory Bonus for Optimistic Exploration in RLHF |
提出通用探索奖励(GEB),解决RLHF中乐观探索的偏差问题。 |
reinforcement learning RLHF large language model |
|
|
| 4 |
Causally-Enhanced Reinforcement Policy Optimization |
提出因果增强策略优化(CE-PO),提升LLM推理的因果一致性和鲁棒性。 |
PPO reward shaping large language model |
|
|
| 5 |
Knowledge distillation through geometry-aware representational alignment |
提出基于几何感知的表征对齐知识蒸馏方法,提升语言模型性能。 |
distillation instruction following |
|
|
| 6 |
CrystalGym: A New Benchmark for Materials Discovery Using Reinforcement Learning |
提出 CrystalGym:一个用于强化学习材料发现的新基准测试环境 |
reinforcement learning large language model |
|
|
| 7 |
Solve Smart, Not Often: Policy Learning for Costly MILP Re-solving |
提出POC框架,通过策略学习优化代价敏感的MILP重求解问题 |
reinforcement learning policy learning |
|
|
| 8 |
Unleashing Flow Policies with Distributional Critics |
提出分布流Critic(DFC),增强离线强化学习中Flow Policy的性能 |
reinforcement learning flow matching multimodal |
|
|
| 9 |
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models |
提出因子解耦增强的数据移除方法,提升深度预测模型在分布偏移下的性能。 |
predictive model |
|
|
| 10 |
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport |
LOTFormer:通过低秩最优传输实现双重随机线性注意力机制,提升长序列建模效率。 |
linear attention |
|
|
| 11 |
Flow Matching for Robust Simulation-Based Inference under Model Misspecification |
提出FMCPE框架,利用Flow Matching提升SBI在模型失配下的鲁棒性 |
flow matching |
|
|
| 12 |
LLM Interpretability with Identifiable Temporal-Instantaneous Representation |
提出可识别时序瞬时表示的LLM可解释性框架,提升概念关系发现能力 |
representation learning large language model |
|
|
| 13 |
Two-Scale Latent Dynamics for Recurrent-Depth Transformers |
提出基于二尺度潜在动态的循环深度Transformer,提升计算效率和性能。 |
latent dynamics |
|
|
| 14 |
Towards Monotonic Improvement in In-Context Reinforcement Learning |
提出上下文价值引导的ICRL方法,解决ICRL中单调改进的难题 |
reinforcement learning |
✅ |
|
| 15 |
Learning without Global Backpropagation via Synergistic Information Distillation |
提出协同信息蒸馏(SID)框架,解决深度学习反向传播的扩展性瓶颈。 |
distillation |
✅ |
|
| 16 |
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning |
提出C$^2$GSPG,解决强化学习推理模型中的过度自信问题,提升自知推理能力。 |
reinforcement learning large language model |
✅ |
|
| 17 |
Impute-MACFM: Imputation based on Mask-Aware Flow Matching |
提出Impute-MACFM,基于掩码感知流匹配实现更鲁棒高效的表格数据插补,尤其适用于纵向数据。 |
flow matching |
|
|
| 18 |
Tracing the Representation Geometry of Language Models from Pretraining to Post-training |
提出几何表示追踪方法以揭示语言模型的复杂能力 |
DPO large language model |
|
|
| 19 |
From Noise to Laws: Regularized Time-Series Forecasting via Denoised Dynamic Graphs |
PRISM:通过去噪动态图正则化时间序列预测,实现长期稳定预测 |
MAE physically plausible |
|
|
| 20 |
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm |
提出信赖域奖励优化(TRRO)框架,解决逆强化学习中reward和policy联合学习的不稳定性问题。 |
reinforcement learning inverse reinforcement learning |
|
|