| 1 |
GRASP: Grounded CoT Reasoning with Dual-Stage Optimization for Multimodal Sarcasm Target Identification |
提出GRASP框架,通过双阶段优化和Grounded CoT推理解决多模态讽刺目标识别问题。 |
multimodal visual grounding chain-of-thought |
|
|
| 2 |
Hierarchical Alignment: Enforcing Hierarchical Instruction-Following in LLMs through Logical Consistency |
提出神经符号分层对齐(NSHA),通过逻辑一致性增强LLM的分层指令遵循能力 |
large language model instruction following |
|
|
| 3 |
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism |
通过权重剪枝揭示大语言模型生成有害内容的统一机制 |
large language model |
|
|
| 4 |
CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space |
提出CONDESION-BENCH,评估大语言模型在组合动作空间中的条件决策能力 |
large language model |
|
|
| 5 |
Breaking Block Boundaries: Anchor-based History-stable Decoding for Diffusion Large Language Models |
提出基于锚点的历史稳定解码AHD,解决扩散大语言模型中半自回归解码的块约束问题。 |
large language model |
|
|
| 6 |
Many-Tier Instruction Hierarchy in LLM Agents |
提出ManyIH以解决LLM Agent中多层级指令冲突问题,并构建相应评测基准。 |
large language model instruction following |
|
|
| 7 |
Automated Instruction Revision (AIR): A Structured Comparison of Task Adaptation Strategies for LLM |
提出AIR:一种基于规则归纳的LLM自动化指令修订方法,用于下游任务自适应。 |
large language model |
|
|
| 8 |
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation |
提出BERT-as-a-Judge,用于高效、鲁棒的基于参考答案的LLM评估。 |
large language model |
|
|
| 9 |
Persona-E$^2$: A Human-Grounded Dataset for Personality-Shaped Emotional Responses to Textual Events |
提出Persona-E$^2$数据集,用于研究人格对文本事件情感反应的影响 |
large language model |
|
|
| 10 |
MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator |
MuTSE:人机协同的多用途文本简化评估器,解决LLM文本简化评估难题。 |
large language model |
|
|
| 11 |
Across the Levels of Analysis: Explaining Predictive Processing in Humans Requires More Than Machine-Estimated Probabilities |
批判性分析语言模型在人类语言处理中的作用,强调多层次分析的必要性 |
large language model |
|
|
| 12 |
Task-Aware LLM Routing with Multi-Level Task-Profile-Guided Data Synthesis for Cold-Start Scenarios |
提出TRouter,通过多级任务画像引导的数据合成,解决冷启动场景下的LLM路由问题。 |
large language model |
|
|
| 13 |
TaxPraBen: A Scalable Benchmark for Structured Evaluation of LLMs in Chinese Real-World Tax Practice |
TaxPraBen:构建中文税务实践LLM结构化评估基准 |
large language model |
|
|