| 1 |
SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models |
SocioEval:一个基于模板的框架,用于评估基础模型中的社会经济地位偏见 |
large language model foundation model |
|
|
| 2 |
Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy |
提出ChomskyBench,通过乔姆斯基谱系系统评估大语言模型的形式推理能力。 |
large language model |
|
|
| 3 |
Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting |
提出ESR和CDS指标,并利用语用提示提升LLM社会推理能力 |
large language model |
|
|
| 4 |
Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation |
综述LLM上下文增强技术:从上下文提示到因果检索增强生成 |
large language model |
|
|
| 5 |
When Modalities Remember: Continual Learning for Multimodal Knowledge Graphs |
提出MRCKG模型,解决持续多模态知识图谱推理中的灾难性遗忘问题。 |
multimodal |
|
|
| 6 |
BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence |
提出行为对齐分数(BAS)评估LLM置信度,优化决策并避免过度自信。 |
large language model |
|
|
| 7 |
Debating Truth: Debate-driven Claim Verification with Multiple Large Language Model Agents |
提出DebateCV框架,利用多智能体辩论驱动的声明验证,提升复杂声明的验证精度。 |
large language model |
|
|
| 8 |
Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents |
提出IFRAgent框架,通过显式和隐式意图学习增强个性化移动代理 |
large language model multimodal |
|
|
| 9 |
VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents |
提出VeriOS,通过查询驱动的人机交互提升OS Agent在不可信环境下的可靠性 |
large language model multimodal |
|
|
| 10 |
Borderless Long Speech Synthesis |
提出Borderless长语音合成框架,实现Agent驱动的、无边界的语音生成。 |
instruction following chain-of-thought |
|
|
| 11 |
Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models |
揭示大语言模型中的确认偏差并提出干预策略以提升规则发现能力 |
large language model |
|
|
| 12 |
LLM Analysis of 150+ years of German Parliamentary Debates on Migration Reveals Shift from Post-War Solidarity to Anti-Solidarity in the Last Decade |
利用LLM分析德国议会百年辩论,揭示从战后团结到反团结的转变 |
large language model |
|
|
| 13 |
Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems |
通过先验知识降低多智能体系统中谄媚行为,提升讨论准确性 |
large language model |
|
|
| 14 |
Council Mode: Mitigating Hallucination and Bias in LLMs via Multi-Agent Consensus |
提出Council Mode,通过多Agent共识机制缓解LLM中的幻觉和偏见问题。 |
large language model |
|
|
| 15 |
How Annotation Trains Annotators: Competence Development in Social Influence Recognition |
研究标注过程对标注者能力的影响,提升社交影响力识别任务的数据质量。 |
large language model |
|
|
| 16 |
LogicPoison: Logical Attacks on Graph Retrieval-Augmented Generation |
提出LogicPoison,针对图检索增强生成系统的逻辑连接进行攻击。 |
large language model |
|
|
| 17 |
Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control |
提出基于LLM表征空间中效价-唤醒子空间的情感控制方法,实现多行为操控。 |
large language model |
|
|
| 18 |
Human Psychometric Questionnaires Mischaracterize LLM Psychology: Evidence from Generation Behavior |
揭示人类心理测量问卷在刻画LLM心理特征上的局限性,提出基于生成行为的心理测量方法。 |
large language model |
|
|
| 19 |
What Is The Political Content in LLMs' Pre- and Post-Training Data? |
分析LLM训练数据中的政治倾向,揭示数据偏差对模型政治立场的影响 |
large language model |
|
|
| 20 |
Escaping the BLEU Trap: A Signal-Grounded Framework with Decoupled Semantic Guidance for EEG-to-Text Decoding |
提出SemKey框架,通过解耦语义引导实现脑电信号到文本解码的突破。 |
large language model |
|
|
| 21 |
SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy |
提出SWAY指标与反事实CoT缓解策略,以应对大语言模型的谄媚问题 |
large language model |
|
|
| 22 |
Beyond Precision: Importance-Aware Recall for Factuality Evaluation in Long-Form LLM Generation |
提出重要性感知召回指标,用于评估长文本生成的事实性 |
large language model |
|
|
| 23 |
Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents |
系统性检测并校正商业LLM和深度研究Agent中的引用幻觉 |
large language model |
|
|
| 24 |
Measuring What Cannot Be Surveyed: LLMs as Instruments for Latent Cognitive Variables in Labor Economics |
利用大型语言模型测量难以调查的潜在认知变量,应用于劳动经济学。 |
large language model |
|
|
| 25 |
BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation |
针对科学出版代理中BibTeX引用幻觉问题,提出评估基准和clibib缓解方案 |
large language model |
|
|
| 26 |
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! |
揭示微调开源LLM的数据泄露风险:攻击者可通过后门提取微调数据 |
large language model |
|
|
| 27 |
IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge |
IslamicMMLU:构建伊斯兰知识评估基准,评测大型语言模型性能 |
large language model |
|
|