| 1 |
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine |
提出TCM-Ladder以解决中医多模态问答评估问题 |
large language model multimodal |
|
|
| 2 |
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models |
提出新评估管道以解决大规模视觉语言模型的偏见与推理忠实性问题 |
large language model chain-of-thought |
|
|
| 3 |
SocialMaze: A Benchmark for Evaluating Social Reasoning in Large Language Models |
提出SocialMaze基准以评估大型语言模型的社会推理能力 |
large language model chain-of-thought |
✅ |
|
| 4 |
ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs |
提出ARC框架以提升零样本长文档摘要的论点覆盖分析 |
large language model instruction following |
|
|
| 5 |
Large Language Model Meets Constraint Propagation |
提出GenCP以解决大语言模型约束执行不足问题 |
large language model |
|
|
| 6 |
FLAT-LLM: Fine-grained Low-rank Activation Space Transformation for Large Language Model Compression |
提出FLAT-LLM以解决大语言模型压缩中的效率与准确性问题 |
large language model |
|
|
| 7 |
Retrieval Augmented Generation based Large Language Models for Causality Mining |
提出基于检索增强生成的动态提示方案以提升因果关系挖掘性能 |
large language model |
|
|
| 8 |
Don't Take the Premise for Granted: Evaluating the Premise Critique Ability of Large Language Models |
提出Premise Critique Bench以提升大型语言模型的前提批判能力 |
large language model |
✅ |
|
| 9 |
Gaussian mixture models as a proxy for interacting language models |
提出交互高斯混合模型以替代复杂语言模型 |
large language model |
|
|
| 10 |
Diversity of Transformer Layers: One Aspect of Parameter Scaling Laws |
提出层间多样性分析以优化Transformer参数扩展 |
large language model |
|
|
| 11 |
Is Your Model Fairly Certain? Uncertainty-Aware Fairness Evaluation for LLMs |
提出UCerF以解决大型语言模型公平性评估中的不确定性问题 |
large language model |
|
|
| 12 |
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving |
提出SwingArena以解决长上下文GitHub问题的评估挑战 |
large language model |
|
|
| 13 |
Probing Association Biases in LLM Moderation Over-Sensitivity |
提出主题关联分析以解决LLM内容审核过度敏感问题 |
large language model |
|
|
| 14 |
One Task Vector is not Enough: A Large-Scale Study for In-Context Learning |
提出QuiteAFew数据集以提升上下文学习的任务向量表现 |
large language model |
|
|
| 15 |
Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time |
提出SITAlign框架以解决大型语言模型的对齐问题 |
large language model |
|
|
| 16 |
LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition |
提出DEER方法以提升命名实体识别的效果 |
large language model |
|
|
| 17 |
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation |
提出分离评估方法以提升大语言模型在数学问题上的推理能力 |
large language model |
|
|
| 18 |
ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions |
提出ToolHaystack以解决长时间交互中工具使用评估不足的问题 |
large language model |
|
|
| 19 |
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora |
提出AutoSchemaKG以实现自主知识图谱构建 |
large language model |
|
|