| 1 |
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective |
提出基于约束马尔可夫决策过程的大语言模型蒸馏方法 |
reinforcement learning distillation large language model |
|
|
| 2 |
Aurora: Towards Universal Generative Multimodal Time Series Forecasting |
Aurora:面向通用生成式多模态时间序列预测的基座模型 |
flow matching distillation foundation model |
|
|
| 3 |
Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining |
C-FREE:一种无对比多模态自监督分子图预训练方法,融合2D拓扑和3D结构信息。 |
representation learning multimodal |
|
|
| 4 |
SpinGPT: A Large-Language-Model Approach to Playing Poker Correctly |
SpinGPT:一种基于大型语言模型解决德州扑克问题的方案 |
reinforcement learning large language model |
|
|
| 5 |
Enriching Knowledge Distillation with Intra-Class Contrastive Learning |
提出基于类内对比学习的知识蒸馏方法,提升软标签的信息丰富度 |
contrastive learning distillation |
|
|
| 6 |
Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces |
提出基于离散扩散策略的强化学习方法,解决组合动作空间难题 |
reinforcement learning diffusion policy |
|
|
| 7 |
Adaptive Margin RLHF via Preference over Preferences |
提出DPO-PoP,利用偏好之上的偏好信息自适应调整边际,提升RLHF性能。 |
reinforcement learning RLHF DPO |
|
|
| 8 |
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning |
SPEAR:基于自模仿学习和渐进探索的Agentic强化学习方法 |
reinforcement learning imitation learning reward shaping |
|
|
| 9 |
Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement |
提出一种基于拓扑排序、剪枝和解耦的线性因果表示学习方法 |
representation learning large language model |
|
|
| 10 |
Context and Diversity Matter: The Emergence of In-Context Learning in World Models |
提出上下文环境学习(ICEL)框架,提升世界模型在未知环境下的适应性。 |
world model embodied AI |
|
|
| 11 |
Universal Inverse Distillation for Matching Models with Real-Data Supervision (No GANs) |
提出RealUID:一种通用的无GAN匹配模型逆向蒸馏框架,可利用真实数据加速生成。 |
flow matching distillation |
|
|
| 12 |
In-Context Learning can Perform Continual Learning Like Humans |
提出上下文持续学习(ICCL),实现类人长期记忆和跨任务知识积累。 |
Mamba linear attention large language model |
|
|
| 13 |
Adaptive Dual-Mode Distillation with Incentive Schemes for Scalable, Heterogeneous Federated Learning on Non-IID Data |
提出自适应双模式蒸馏与激励机制,解决非独立同分布数据下异构联邦学习的可扩展性问题。 |
distillation |
|
|
| 14 |
RLP: Reinforcement as a Pretraining Objective |
提出RLP:一种将强化学习作为预训练目标的方法,提升模型推理能力。 |
reinforcement learning chain-of-thought |
|
|
| 15 |
A Theoretical Analysis of Discrete Flow Matching Generative Models |
为离散流匹配生成模型提供理论分析,证明其收敛性 |
flow matching |
|
|
| 16 |
Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives |
提出MA-SPL和MA-MPL算法以解决多智能体在线协调问题 |
policy learning |
|
|
| 17 |
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning |
提出EPO算法,解决LLM Agent在多轮稀疏奖励强化学习中的探索-利用崩溃问题 |
reinforcement learning |
|
|
| 18 |
From Parameters to Behavior: Unsupervised Compression of the Policy Space |
提出无监督方法压缩策略空间以提高深度强化学习效率 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 19 |
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning |
提出MASA自对齐强化学习,提升推理模型元认知能力与泛化性 |
reinforcement learning |
|
|
| 20 |
Fairness-Aware Reinforcement Learning (FAReL): A Framework for Transparent and Balanced Sequential Decision-Making |
提出FAReL框架,解决强化学习中性能与公平性的权衡问题,应用于招聘和欺诈检测。 |
reinforcement learning |
|
|
| 21 |
Overclocking Electrostatic Generative Models |
提出逆泊松流匹配以加速电静态生成模型 |
flow matching distillation |
|
|
| 22 |
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms? |
Triple-BERT:用于网约车订单调度的单智能体强化学习方法,性能优于多智能体强化学习。 |
reinforcement learning TD3 |
✅ |
|