| 15 |
Towards a Unified View of Large Language Model Post-Training |
统一大语言模型后训练视角,提出混合后训练算法HPT,提升数学推理能力。 |
reinforcement learning large language model |
|
|
| 16 |
RL's Razor: Why Online Reinforcement Learning Forgets Less |
揭示RL的“奥卡姆剃刀”:在线强化学习在微调中能更好保留先验知识 |
reinforcement learning large language model foundation model |
|
|
| 17 |
Rethinking the long-range dependency in Mamba/SSM and transformer models |
从理论角度分析Mamba/SSM和Transformer的长程依赖建模能力 |
Mamba SSM |
|
|
| 18 |
Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables |
提出基于概率上下文变量的元逆强化学习方法,解决均值场博弈中异构智能体的奖励函数推断问题 |
reinforcement learning inverse reinforcement learning |
|
|
| 19 |
Wavelet Fourier Diffuser: Frequency-Aware Diffusion Model for Reinforcement Learning |
提出Wavelet Fourier Diffuser,解决离线强化学习中轨迹频率偏移问题。 |
reinforcement learning offline reinforcement learning |
|
|
| 20 |
Data-Augmented Quantization-Aware Knowledge Distillation |
提出数据增强感知的量化知识蒸馏方法,提升低比特模型精度 |
distillation |
|
|
| 21 |
Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology |
揭示强化学习、测试时缩放与扩散引导的内在联系,提出重采样对齐方法。 |
reinforcement learning |
|
|
| 22 |
Resource-Aware Neural Network Pruning Using Graph-based Reinforcement Learning |
提出基于图强化学习的资源感知型神经网络剪枝方法,提升剪枝效率。 |
reinforcement learning |
|
|
| 23 |
What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning? |
提出PAMC以解决稀疏奖励学习中的效率问题 |
reinforcement learning offline RL dreamer |
|
|
| 24 |
Parking Availability Prediction via Fusing Multi-Source Data with A Self-Supervised Learning Enhanced Spatio-Temporal Inverted Transformer |
提出SST-iTransformer,融合多源数据和自监督学习,用于精准预测停车位可用性。 |
representation learning MAE |
|
|