| 21 |
Causal Policy Learning in Reinforcement Learning: Backdoor-Adjusted Soft Actor-Critic |
提出DoSAC以解决强化学习中的隐性混淆问题 |
reinforcement learning policy learning SAC |
|
|
| 22 |
Dissecting Long-Chain-of-Thought Reasoning Models: An Empirical Study |
系统分析长链推理模型以提升推理能力与效率 |
reinforcement learning chain-of-thought |
✅ |
|
| 23 |
Aligning Multimodal Representations through an Information Bottleneck |
通过信息瓶颈原理提出新方法以解决多模态表示对齐问题 |
representation learning multimodal |
|
|
| 24 |
TabFlex: Scaling Tabular Learning to Millions with Linear Attention |
提出TabFlex以解决大规模表格学习效率问题 |
linear attention large language model |
|
|
| 25 |
Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data |
提出Agentomics-ML以解决生物数据自动化建模问题 |
predictive model large language model multimodal |
✅ |
|
| 26 |
Mixture-of-Experts Meets In-Context Reinforcement Learning |
提出T2MIR框架以解决ICRL中的多模态与任务异质性问题 |
reinforcement learning contrastive learning |
✅ |
|
| 27 |
StatsMerging: Statistics-Guided Model Merging via Task-Specific Teacher Distillation |
提出StatsMerging以解决模型合并中的标签依赖问题 |
distillation |
|
|
| 28 |
Two-dimensional Taxonomy for N-ary Knowledge Representation Learning Methods |
提出二维分类法以解决n元知识表示学习的复杂性问题 |
representation learning |
|
|
| 29 |
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay |
提出难度针对的在线数据选择与回放重放以提高LLM强化微调的数据效率 |
reinforcement learning large language model |
✅ |
|
| 30 |
Mitigating Degree Bias Adaptively with Hard-to-Learn Nodes in Graph Contrastive Learning |
提出HAR损失以适应性缓解图对比学习中的度偏差问题 |
contrastive learning |
|
|
| 31 |
TreeRPO: Tree Relative Policy Optimization |
提出TreeRPO以优化推理过程中的奖励信号 |
reinforcement learning large language model |
✅ |
|
| 32 |
UnHiPPO: Uncertainty-aware Initialization for State Space Models |
提出UnHiPPO以解决状态空间模型中的噪声问题 |
state space model |
|
|
| 33 |
When Maximum Entropy Misleads Policy Optimization |
分析最大熵强化学习在控制任务中的误导性 |
reinforcement learning reward design |
|
|
| 34 |
Learning long range dependencies through time reversal symmetry breaking |
提出RHEL算法以解决长程依赖学习问题 |
SSM state space model |
|
|