| 31 |
Structured Agent Distillation for Large Language Model |
提出结构化代理蒸馏以解决大语言模型压缩问题 |
imitation learning distillation large language model |
|
|
| 32 |
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining |
提出多模态平衡偏好优化方法以解决模态失衡问题 |
preference learning large language model multimodal |
|
|
| 33 |
InfiFPO: Implicit Model Fusion via Preference Optimization in Large Language Models |
提出InfiFPO以解决大语言模型融合中的偏好对齐问题 |
DPO direct preference optimization large language model |
|
|
| 34 |
FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning |
提出FlowQ以解决离线强化学习中的指导问题 |
reinforcement learning offline reinforcement learning flow matching |
|
|
| 35 |
Time to Embed: Unlocking Foundation Models for Time Series with Channel Descriptions |
提出CHARM以解决时间序列建模的局限性问题 |
representation learning foundation model |
|
|
| 36 |
Energy-Efficient Deep Reinforcement Learning with Spiking Transformers |
提出Spike-Transformer强化学习算法以解决能耗问题 |
reinforcement learning deep reinforcement learning |
|
|
| 37 |
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Momentum |
提出AAPO以解决现有RL方法在推理能力提升中的低效问题 |
reinforcement learning PPO large language model |
|
|
| 38 |
Imitation Learning via Focused Satisficing |
提出聚焦满意度的模仿学习方法以提升行为接受度 |
reinforcement learning deep reinforcement learning imitation learning |
|
|
| 39 |
The Evolution of Alpha in Finance Harnessing Human Insight and LLM Agents |
提出五阶段分类法以推动金融领域的智能投资系统发展 |
representation learning large language model multimodal |
|
|
| 40 |
Interpretable Reinforcement Learning for Load Balancing using Kolmogorov-Arnold Networks |
提出Kolmogorov-Arnold网络以解决负载均衡的可解释强化学习问题 |
reinforcement learning PPO |
|
|
| 41 |
Preference Learning with Lie Detectors can Induce Honesty or Evasion |
通过谎言探测器的偏好学习提升AI系统的诚实性 |
preference learning DPO |
|
|
| 42 |
Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation |
提出一种高效的连续时间强化学习算法以解决样本和计算效率问题 |
reinforcement learning |
|
|
| 43 |
Text embedding models can be great data engineers |
提出ADEPT以自动化数据工程管道问题 |
predictive model TAMP |
|
|
| 44 |
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning |
提出TinyV以解决验证器假阴性问题 |
reinforcement learning large language model |
✅ |
|
| 45 |
Performance Optimization of Energy-Harvesting Underlay Cognitive Radio Networks Using Reinforcement Learning |
提出强化学习优化能量采集下的认知无线电网络性能 |
reinforcement learning |
|
|
| 46 |
KIPPO: Koopman-Inspired Proximal Policy Optimization |
提出KIPPO以解决复杂动态环境中的策略优化问题 |
reinforcement learning policy learning PPO |
|
|
| 47 |
Bellman operator convergence enhancements in reinforcement learning algorithms |
提出贝尔曼算子改进以提升强化学习算法收敛性 |
reinforcement learning |
|
|
| 48 |
Personalised Insulin Adjustment with Reinforcement Learning: An In-Silico Validation for People with Diabetes on Intensive Insulin Treatment |
提出自适应基础-波动剂量建议系统以优化糖尿病患者胰岛素调整 |
reinforcement learning |
|
|
| 49 |
FlowTSE: Target Speaker Extraction with Flow Matching |
提出FlowTSE以解决目标说话人提取问题 |
flow matching |
|
|
| 50 |
Self Distillation via Iterative Constructive Perturbations |
提出循环优化框架以提升深度学习模型的泛化能力 |
distillation |
|
|
| 51 |
From Reasoning to Code: GRPO Optimization for Underrepresented Languages |
提出GRPO优化方法以解决小众编程语言代码生成问题 |
reinforcement learning large language model |
|
|
| 52 |
Riemannian Flow Matching for Brain Connectivity Matrices via Pullback Geometry |
提出DiffeoCFM以解决脑连接矩阵生成问题 |
flow matching |
✅ |
|
| 53 |
When to retrain a machine learning model |
提出基于不确定性的模型重训练方法以应对数据演变问题 |
reinforcement learning offline reinforcement learning |
|
|