| 1 |
HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models |
提出HSSBench以评估多模态大语言模型在社会科学与人文学科的能力 |
large language model multimodal |
|
|
| 2 |
Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era |
探讨多模态分布语义在亿参数时代的表现 |
foundation model multimodal |
|
|
| 3 |
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation |
提出DRE方法以提升小型与大型语言模型在对话评估中的整合效果 |
large language model |
|
|
| 4 |
MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP |
提出MELABenchv1以评估小型微调模型在低资源马耳他NLP中的表现 |
large language model |
|
|
| 5 |
Enhancing Decision-Making of Large Language Models via Actor-Critic |
提出LAC框架以解决大语言模型决策能力不足问题 |
large language model |
|
|
| 6 |
Voice Activity Projection Model with Multimodal Encoders |
提出多模态编码器的语音活动投影模型以改善人机交互 |
multimodal |
✅ |
|
| 7 |
RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing |
提出RadialRouter以解决大语言模型路由效率低下问题 |
large language model |
|
|
| 8 |
Relationship Detection on Tabular Data Using Statistical Analysis and Large Language Models |
提出一种混合方法以检测表格数据中的关系 |
large language model |
|
|
| 9 |
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions |
提出一种新颖的数据增强方法以解决自动口语评估中的数据稀缺问题 |
large language model multimodal |
|
|
| 10 |
More or Less Wrong: A Benchmark for Directional Bias in LLM Comparative Reasoning |
提出MathComp基准以解决LLM比较推理中的方向性偏差问题 |
large language model chain-of-thought |
|
|
| 11 |
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL |
提出SQLens以解决文本到SQL转换中的语义错误检测与修正问题 |
large language model |
|
|
| 12 |
Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems |
提出针对LLM系统的代表性伤害测量工具的改进建议 |
large language model |
|
|
| 13 |
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation |
提出对抗水印影响的对齐恢复方法以提升语言模型性能 |
large language model |
|
|
| 14 |
Zero-Shot Open-Schema Entity Structure Discovery |
提出零样本开放模式实体结构发现方法以解决现有提取不足问题 |
large language model |
|
|
| 15 |
SmoothRot: Combining Channel-Wise Scaling and Rotation for Quantization-Friendly LLMs |
提出SmoothRot以解决大语言模型量化中的激活异常问题 |
large language model |
✅ |
|
| 16 |
GEM: Empowering LLM for both Embedding Generation and Language Understanding |
提出GEM以解决LLM嵌入生成与语言理解的矛盾问题 |
large language model |
|
|
| 17 |
Long or short CoT? Investigating Instance-level Switch of Large Reasoning Models |
提出SwitchCoT以解决长短CoT策略选择问题 |
chain-of-thought |
|
|
| 18 |
SkipGPT: Dynamic Layer Pruning Reinvented with Token Awareness and Module Decoupling |
提出SkipGPT以解决大语言模型的动态层修剪问题 |
large language model |
✅ |
|
| 19 |
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis |
通过快捷神经元分析提出可信赖的LLM评估方法 |
large language model |
✅ |
|
| 20 |
Homogeneous Keys, Heterogeneous Values: Exploiting Local KV Cache Asymmetry for Long-Context LLMs |
提出AsymKV以解决长上下文LLM中的KV缓存不对称问题 |
large language model |
✅ |
|
| 21 |
Rectified Sparse Attention |
提出Rectified Sparse Attention以解决长序列生成效率问题 |
large language model |
|
|
| 22 |
Controlling Difficulty of Generated Text for AI-Assisted Language Learning |
提出可控文本生成方法以支持初学者语言学习 |
large language model |
|
|
| 23 |
High Accuracy, Less Talk (HALT): Reliable LLMs through Capability-Aligned Finetuning |
提出HALT方法以解决大语言模型的幻觉问题 |
large language model |
|
|
| 24 |
Lacuna Inc. at SemEval-2025 Task 4: LoRA-Enhanced Influence-Based Unlearning for LLMs |
提出LIBU算法以解决大语言模型的去学习问题 |
large language model |
|
|
| 25 |
Unveiling and Eliminating the Shortcut Learning for Locate-Then-Edit Knowledge Editing via Both Subject and Relation Awareness |
提出双阶段优化方法以解决知识编辑中的快捷学习问题 |
large language model |
|
|
| 26 |
Around the World in 24 Hours: Probing LLM Knowledge of Time and Place |
提出GeoTemp数据集以评估语言模型的时间与空间推理能力 |
chain-of-thought |
|
|
| 27 |
From Real to Synthetic: Synthesizing Millions of Diversified and Complicated User Instructions with Attributed Grounding |
提出基于属性引导的合成方法以生成多样化用户指令 |
large language model |
✅ |
|
| 28 |
Magic Mushroom: A Customizable Benchmark for Fine-grained Analysis of Retrieval Noise Erosion in RAG Systems |
提出Magic Mushroom基准以解决RAG系统中的检索噪声问题 |
large language model |
|
|
| 29 |
EuroGEST: Investigating gender stereotypes in multilingual language models |
提出EuroGEST以评估多语言模型中的性别刻板印象 |
large language model |
|
|