| 1 |
Sample-efficient Integration of New Modalities into Large Language Models |
提出SEMI方法,高效地将新模态集成到大型语言模型中 |
large language model foundation model multimodal |
|
|
| 2 |
Facts Fade Fast: Evaluating Memorization of Outdated Medical Knowledge in Large Language Models |
提出MedRevQA和MedChangeQA数据集,评估大语言模型对过时医学知识的记忆 |
large language model |
|
|
| 3 |
RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models |
提出RTQA框架,利用大语言模型递归推理解决复杂时序知识图谱问答难题。 |
large language model |
✅ |
|
| 4 |
SPFT-SQL: Enhancing Large Language Model for Text-to-SQL Parsing by Self-Play Fine-Tuning |
SPFT-SQL:通过自博弈微调增强大型语言模型在Text-to-SQL解析任务中的性能 |
large language model |
|
|
| 5 |
Quantized Large Language Models in Biomedical Natural Language Processing: Evaluation and Recommendation |
量化LLM实现生物医学NLP:评估与推荐,降低部署成本 |
large language model |
|
|
| 6 |
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models |
综述性研究:全面评估大型语言模型推理过程中的可信度 |
large language model |
✅ |
|
| 7 |
Spoken in Jest, Detected in Earnest: A Systematic Review of Sarcasm Recognition -- Multimodal Fusion, Challenges, and Future Prospects |
综述语音讽刺识别:多模态融合、挑战与未来展望 |
multimodal |
|
|
| 8 |
CANDY: Benchmarking LLMs' Limitations and Assistive Potential in Chinese Misinformation Fact-Checking |
CANDY:评估大语言模型在中文虚假信息核查中的局限性与辅助潜力 |
large language model chain-of-thought |
✅ |
|
| 9 |
Chain or tree? Re-evaluating complex reasoning from the perspective of a matrix of thought |
提出矩阵思维(MoT)框架,提升LLM在复杂推理任务中的效率与准确性 |
large language model chain-of-thought |
✅ |
|
| 10 |
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? |
提出Inverse IFEval基准,评估LLM克服训练偏差并遵循反常指令的能力 |
large language model instruction following |
|
|
| 11 |
Towards an AI Musician: Synthesizing Sheet Music Problems for Musical Reasoning |
提出SSMR-Bench:合成乐谱推理问题,提升AI音乐家能力 |
large language model multimodal |
|
|
| 12 |
MobileRAG: Enhancing Mobile Agent with Retrieval-Augmented Generation |
MobileRAG:通过检索增强生成提升移动代理性能,解决任务错误、环境交互和记忆缺失问题。 |
large language model |
✅ |
|
| 13 |
Decoding the Poetic Language of Emotion in Korean Modern Poetry: Insights from a Human-Labeled Dataset and AI Modeling |
提出KPoEM数据集以解决现代韩诗情感分析问题 |
large language model |
|
|
| 14 |
ODKE+: Ontology-Guided Open-Domain Knowledge Extraction with LLMs |
ODKE+:利用LLM和本体指导的开放域知识抽取系统,实现大规模高精度知识图谱构建。 |
large language model |
|
|
| 15 |
Cross-Layer Attention Probing for Fine-Grained Hallucination Detection |
提出跨层注意力探测(CLAP)技术,用于细粒度地检测大型语言模型中的幻觉现象。 |
large language model |
|
|
| 16 |
MAGneT: Coordinated Multi-Agent Generation of Synthetic Multi-Turn Mental Health Counseling Sessions |
MAGneT:协同多智能体生成合成多轮心理健康咨询对话,解决高质量数据稀缺问题。 |
large language model |
|
|
| 17 |
On Robustness and Reliability of Benchmark-Based Evaluation of LLMs |
评估LLM基准测试的鲁棒性和可靠性:探究语言变异对模型性能的影响 |
large language model |
|
|
| 18 |
VoxRole: A Comprehensive Benchmark for Evaluating Speech-Based Role-Playing Agents |
提出VoxRole:用于评估语音角色扮演代理的综合基准 |
large language model |
|
|
| 19 |
SelfAug: Mitigating Catastrophic Forgetting in Retrieval-Augmented Generation via Distribution Self-Alignment |
提出SelfAug,通过自对齐分布缓解RAG中灾难性遗忘问题 |
large language model |
✅ |
|
| 20 |
SiLVERScore: Semantically-Aware Embeddings for Sign Language Generation Evaluation |
提出SiLVERScore,用于语义感知的、基于嵌入的、手语生成评估方法。 |
multimodal |
|
|
| 21 |
OleSpeech-IV: A Large-Scale Multispeaker and Multilingual Conversational Speech Dataset with Diverse Topics |
OleSpeech-IV:一个大规模、多说话人、多语种、主题丰富的会话语音数据集 |
TAMP |
|
|
| 22 |
Why Language Models Hallucinate |
揭示语言模型幻觉根源:训练与评估机制偏差导致模型倾向于猜测而非承认不确定性 |
large language model |
|
|
| 23 |
Explicit and Implicit Data Augmentation for Social Event Detection |
提出SED-Aug框架,结合显式文本增强和隐式特征增强,提升社交事件检测性能。 |
large language model |
✅ |
|
| 24 |
Towards Stable and Personalised Profiles for Lexical Alignment in Spoken Human-Agent Dialogue |
构建稳定且个性化的词汇配置文件,为口语人机对话中的词汇对齐奠定基础 |
large language model |
|
|
| 25 |
Iti-Validator: A Guardrail Framework for Validating and Correcting LLM-Generated Itineraries |
Iti-Validator:用于验证和修正LLM生成行程的保障框架 |
large language model |
|
|
| 26 |
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize |
揭示基于探针的恶意输入检测方法泛化性不足的根本原因 |
large language model |
✅ |
|
| 27 |
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth |
Drivelology:构建多语言“深度胡说”数据集,挑战LLM的语用理解能力 |
large language model |
|
|
| 28 |
Evaluating the Robustness of Retrieval-Augmented Generation to Adversarial Evidence in the Health Domain |
评估检索增强生成在医疗领域对抗性证据下的鲁棒性 |
large language model |
✅ |
|