| 1 |
VaccineRAG: Boosting Multimodal Large Language Models' Immunity to Harmful RAG Samples |
提出VaccineRAG以解决RAG样本对LLMs的影响问题 |
large language model multimodal chain-of-thought |
|
|
| 2 |
IDEAlign: Comparing Large Language Models to Human Experts in Open-ended Interpretive Annotations |
IDEAlign:通过“奇数挑一”范式,评估LLM在开放式解释性标注任务中与人类专家的对齐程度 |
large language model |
|
|
| 3 |
Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models |
揭示LLaMA 3.2-3B生成短篇小说中关于黑人和白人女性的种族偏见 |
large language model |
|
|
| 4 |
Scaling behavior of large language models in emotional safety classification across sizes and tasks |
研究LLM在情感安全分类中的规模效应,探索轻量级模型在心理健康领域的应用潜力 |
large language model |
|
|
| 5 |
Comparative Study of Pre-Trained BERT and Large Language Models for Code-Mixed Named Entity Recognition |
对比研究预训练BERT与大语言模型在Code-Mixed命名实体识别中的性能 |
large language model |
|
|
| 6 |
An Ensemble Classification Approach in A Multi-Layered Large Language Model Framework for Disease Prediction |
提出一种多层LLM框架下的集成方法,用于提升阿拉伯语社交媒体疾病预测精度。 |
large language model |
|
|
| 7 |
E-THER: A Multimodal Dataset for Empathic AI -- Towards Emotional Mismatch Awareness |
提出E-THER多模态数据集,用于提升AI在识别言语-视觉情感不一致方面的能力。 |
multimodal |
|
|
| 8 |
DeepSeek performs better than other Large Language Models in Dental Cases |
DeepSeek在大语言模型牙科病例分析中表现优于其他模型 |
large language model |
|
|
| 9 |
Behavioral Fingerprinting of Large Language Models |
提出大语言模型行为指纹框架,揭示模型对齐策略差异 |
large language model |
✅ |
|
| 10 |
DRAssist: Dispute Resolution Assistance using Large Language Models |
提出DRAssist以利用大型语言模型解决争议问题 |
large language model |
|
|
| 11 |
Extracting OPQRST in Electronic Health Records using Large Language Models with Reasoning |
利用大型语言模型与推理能力,从电子病历中提取OPQRST信息。 |
large language model |
|
|
| 12 |
FActBench: A Benchmark for Fine-grained Automatic Evaluation of LLM-Generated Text in the Medical Domain |
构建医学领域LLM生成文本自动评估基准FActBench,提升事实性评估准确度 |
large language model chain-of-thought |
|
|
| 13 |
How Instruction-Tuning Imparts Length Control: A Cross-Lingual Mechanistic Analysis |
研究指令调优如何赋予大语言模型长度控制能力:一种跨语言的机制分析 |
large language model foundation model |
|
|
| 14 |
PalmX 2025: The First Shared Task on Benchmarking LLMs on Arabic and Islamic Culture |
PalmX 2025:首个面向阿拉伯和伊斯兰文化的大语言模型评测共享任务 |
large language model |
|
|
| 15 |
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds |
MoSEs:基于风格专家混合与条件阈值的不确定性感知AI生成文本检测 |
large language model |
✅ |
|
| 16 |
SpecEval: Evaluating Model Adherence to Behavior Specifications |
SpecEval:评估大模型行为规范一致性,发现高达20%的合规性差距。 |
foundation model |
|
|
| 17 |
LLMs and their Limited Theory of Mind: Evaluating Mental State Annotations in Situated Dialogue |
提出基于LLM的两步框架,评估团队对话中共享心智模型的偏差。 |
large language model |
|
|
| 18 |
Towards Fundamental Language Models: Does Linguistic Competence Scale with Model Size? |
提出基础语言模型范式,探索语言能力与模型规模的解耦策略 |
large language model |
|
|
| 19 |
Avoidance Decoding for Diverse Multi-Branch Story Generation |
提出Avoidance Decoding,解决LLM故事生成中多样性不足和重复性问题。 |
large language model |
|
|
| 20 |
AMBEDKAR-A Multi-level Bias Elimination through a Decoding Approach with Knowledge Augmentation for Robust Constitutional Alignment of Language Models |
提出AMBEDKAR框架,通过知识增强解码消除LLM中印度社会偏见,提升宪法一致性。 |
large language model |
|
|
| 21 |
JudgeAgent: Knowledge-wise and Dynamic LLM Evaluation with Agent-as-Interviewer |
提出JudgeAgent,利用Agent-as-Interviewer进行知识驱动的LLM动态评估 |
large language model |
✅ |
|
| 22 |
Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization |
提出对比推理提示优化(CRPO),通过检索增强对比学习提升LLM提示质量。 |
large language model |
|
|
| 23 |
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm Simulators for Conditional Synthetic Data Generation |
提出Genetic Prompt,利用LLM作为遗传算法模拟器,实现条件性合成数据生成。 |
large language model |
|
|
| 24 |
Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts |
提出RW-Steering,通过上下文工程提升LLM在混合和不当上下文中的可信度 |
large language model |
|
|