| 49 |
Large Language Model Guided Incentive Aware Reward Design for Cooperative Multi-Agent Reinforcement Learning |
提出基于大语言模型引导的激励感知奖励设计框架,用于提升合作多智能体强化学习效果 |
reinforcement learning reward design large language model |
|
|
| 50 |
The limits of bio-molecular modeling with large language models : a cross-scale evaluation |
BioMol-LLM-Bench:跨尺度生物分子建模中大语言模型能力的系统性评估与局限性分析 |
Mamba large language model chain-of-thought |
|
|
| 51 |
Delayed Homomorphic Reinforcement Learning for Environments with Delayed Feedback |
提出延迟同态强化学习(DHRL)框架,解决延迟反馈环境下的强化学习问题。 |
reinforcement learning policy learning OMOMO |
|
|
| 52 |
SODA: Semi On-Policy Black-Box Distillation for Large Language Models |
SODA:面向大语言模型的半在线黑盒蒸馏方法,提升效率与稳定性。 |
distillation large language model |
|
|
| 53 |
Automated Attention Pattern Discovery at Scale in Large Language Models |
提出AP-MAE,通过注意力模式分析和干预提升大语言模型性能 |
masked autoencoder MAE large language model |
|
|
| 54 |
APPA: Adaptive Preference Pluralistic Alignment for Fair Federated RLHF of LLMs |
APPA:面向LLM公平联邦RLHF的自适应偏好多元对齐 |
reinforcement learning PPO RLHF |
|
|
| 55 |
Not All Tokens Matter: Towards Efficient LLM Reasoning via Token Significance in Reinforcement Learning |
提出基于Token重要性的强化学习方法,提升LLM推理效率与准确性 |
reinforcement learning large language model chain-of-thought |
|
|
| 56 |
Realistic Market Impact Modeling for Reinforcement Learning Trading Environments |
提出MACE环境,解决强化学习交易中市场冲击成本建模不足问题 |
reinforcement learning DRL PPO |
|
|
| 57 |
Provable Multi-Task Reinforcement Learning: A Representation Learning Framework with Low Rank Rewards |
提出基于低秩奖励矩阵的多任务强化学习表征学习框架,提升学习效率。 |
reinforcement learning representation learning |
|
|
| 58 |
Empowering Power Outage Prediction with Spatially Aware Hybrid Graph Neural Networks and Contrastive Learning |
提出SA-HGNN模型,结合对比学习,提升极端天气下电力中断预测的准确性。 |
predictive model contrastive learning spatial relationship |
|
|
| 59 |
Co-Evolving Latent Action World Models |
提出CoLA-World,实现潜变量动作世界模型的协同进化,提升视频模拟和视觉规划能力。 |
world model world models |
|
|
| 60 |
Making Bias Non-Predictive: Training Robust LLM Reasoning via Reinforcement Learning |
提出EIT框架,通过强化学习提升LLM在推理中对抗认知偏差的鲁棒性 |
reinforcement learning reward design large language model |
|
|
| 61 |
One Model for All: Multi-Objective Controllable Language Models |
提出多目标控制(MOC)方法,训练单个LLM以实现用户偏好控制的个性化输出。 |
reinforcement learning RLHF HuMoR |
|
|
| 62 |
DP-OPD: Differentially Private On-Policy Distillation for Language Models |
提出DP-OPD以解决语言模型隐私保护与压缩效率的矛盾 |
distillation large language model |
|
|
| 63 |
Stratifying Reinforcement Learning with Signal Temporal Logic |
提出基于分层信号时序逻辑的强化学习框架,提升任务规划能力。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 64 |
Causal Process Models: Reframing Dynamic Causal Graph Discovery as a Reinforcement Learning Problem |
提出因果过程模型,将动态因果图发现重构为强化学习问题 |
reinforcement learning world model world models |
|
|
| 65 |
Apriel-1.5-OpenReasoner: RL Post-Training for General-Purpose and Efficient Reasoning |
提出Apriel-1.5-OpenReasoner,通过强化学习后训练提升通用推理能力和效率。 |
reinforcement learning instruction following chain-of-thought |
|
|
| 66 |
Adversarial Robustness of Deep State Space Models for Forecasting |
针对时序预测,提出对抗鲁棒的深度状态空间模型,提升模型在恶意扰动下的预测精度。 |
SSM state space model |
|
|
| 67 |
Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory |
提出几何限制理论以解决知识蒸馏性能饱和问题 |
distillation |
|
|
| 68 |
Correcting Source Mismatch in Flow Matching with Radial-Angular Transport |
提出径向-角度流匹配(RAFM)以解决流匹配中源分布不匹配问题 |
flow matching |
|
|
| 69 |
Boosted Distributional Reinforcement Learning: Analysis and Healthcare Applications |
提出BDRL算法,通过优化分布强化学习解决医疗决策中异构群体的一致性问题。 |
reinforcement learning |
|
|
| 70 |
Isokinetic Flow Matching for Pathwise Straightening of Generative Flows |
提出Isokinetic Flow Matching,通过动态正则化显著提升生成流的快速采样效率。 |
flow matching |
|
|
| 71 |
Anticipatory Reinforcement Learning: From Generative Path-Laws to Distributional Value Functions |
提出ARL框架,通过生成路径法则和分布价值函数解决非马尔可夫决策过程中的强化学习问题。 |
reinforcement learning |
|
|
| 72 |
Selecting Decision-Relevant Concepts in Reinforcement Learning |
提出自动概念选择算法以优化强化学习决策 |
reinforcement learning |
|
|
| 73 |
Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning |
提出Causal Multi-Agent Decision Framework以解决网络防御中的模糊性问题 |
reinforcement learning |
|
|
| 74 |
Generative modeling of granular flow on inclined planes using conditional flow matching |
提出基于条件流匹配的生成模型,用于倾斜面上颗粒流的内部运动学重建。 |
flow matching |
|
|
| 75 |
SLSREC: Self-Supervised Contrastive Learning for Adaptive Fusion of Long- and Short-Term User Interests |
SLSRec:自监督对比学习融合长短期用户兴趣,提升会话推荐效果 |
contrastive learning |
|
|
| 76 |
EventFlow: Forecasting Temporal Point Processes with Flow Matching |
EventFlow:利用Flow Matching进行时序点过程预测,显著降低预测误差。 |
flow matching |
|
|
| 77 |
An Information-Theoretic Analysis of OOD Generalization in Meta-Reinforcement Learning |
基于信息论的元强化学习OOD泛化分析与界限 |
reinforcement learning |
|
|
| 78 |
Personalized Federated Distillation Assisted Vehicle Edge Caching Strategy |
提出个性化联邦蒸馏辅助的车载边缘缓存策略,降低通信开销。 |
distillation |
|
|
| 79 |
Kinetic-Mamba: Mamba-Assisted Predictions of Stiff Chemical Kinetics |
Kinetic-Mamba:利用Mamba预测刚性化学动力学,提升燃烧模拟精度。 |
Mamba |
|
|
| 80 |
NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning |
NePPO:面向通用和多智能体强化学习的近势策略优化 |
reinforcement learning |
|
|
| 81 |
Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation |
提出基于文本蒸馏的音频到图像鸟类检索方法,无需配对数据。 |
distillation |
|
|
| 82 |
Learning from Imperfect Demonstrations via Temporal Behavior Tree-Guided Trajectory Repair |
提出基于时间行为树的轨迹修复方法以改善机器人学习 |
reinforcement learning policy learning |
|
|
| 83 |
Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It |
提出个体惩罚约束下的Restless Bandits新策略,解决动态无线网络资源分配问题 |
reinforcement learning deep reinforcement learning |
|
|
| 84 |
Cog-DRIFT: Exploration on Adaptively Reformulated Instances Enables Learning from Hard Reasoning Problems |
Cog-DRIFT通过自适应重构实例,解决LLM在困难推理问题上的学习难题。 |
reinforcement learning curriculum learning |
|
|
| 85 |
Hierarchical Contrastive Learning for Multimodal Data |
提出分层对比学习(HCL)框架,解决多模态数据表示中模态间复杂关系建模问题。 |
representation learning contrastive learning multimodal |
|
|
| 86 |
Hidden in the Multiplicative Interaction: Uncovering Fragility in Multimodal Contrastive Learning |
提出Gated Symile,解决多模态对比学习中模态不可靠性问题,提升检索精度。 |
contrastive learning multimodal |
|
|
| 87 |
Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement |
提出LSE-MTP,通过多步预测和隐语义增强提升世界模型的连贯性 |
world model world models large language model |
|
|
| 88 |
AttnDiff: Attention-based Differential Fingerprinting for Large Language Models |
AttnDiff:基于注意力的差分指纹技术,用于识别大型语言模型的衍生关系 |
PPO DPO large language model |
|
|
| 89 |
A Mixture of Experts Foundation Model for Scanning Electron Microscopy Image Analysis |
提出用于扫描电子显微镜图像分析的混合专家基础模型,提升泛化性和自动化水平。 |
representation learning foundation model |
|
|
| 90 |
The UNDO Flip-Flop: A Controlled Probe for Reversible Semantic State Management in State Space Model |
提出UNDO Flip-Flop任务,用于评估状态空间模型中可逆语义状态管理能力。 |
Mamba SSM state space model |
|
|
| 91 |
Optimal-Transport-Guided Functional Flow Matching for Turbulent Field Generation in Hilbert Space |
提出基于最优传输引导的函数式流匹配方法,用于Hilbert空间中的湍流场生成。 |
flow matching spatiotemporal |
|
|
| 92 |
Value Mirror Descent for Reinforcement Learning |
提出值镜下降法以优化强化学习中的价值迭代 |
reinforcement learning |
|
|
| 93 |
Graph Topology Information Enhanced Heterogeneous Graph Representation Learning |
提出ToGRL框架,通过拓扑学习增强异构图表示,提升下游任务性能 |
representation learning |
|
|
| 94 |
Top-K Retrieval with Fixed-Size Linear-Attention Completion: Backbone- and KV-Format-Preserving Attention for KV-Cache Read Reduction |
提出固定大小线性注意力补全的Top-K检索,减少KV缓存读取,提升长文本生成效率。 |
linear attention |
|
|
| 95 |
Jeffreys Flow: Robust Boltzmann Generators for Rare Event Sampling via Parallel Tempering Distillation |
提出Jeffreys Flow,通过并行回火蒸馏解决玻尔兹曼生成器中的模式崩塌问题 |
distillation |
|
|