| 1 |
Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning |
提出SOMADHAN数据集以解决孟加拉数学语言问题 |
large language model chain-of-thought |
|
|
| 2 |
Evaluating and Steering Modality Preferences in Multimodal Large Language Model |
提出MC²基准以评估和引导多模态大语言模型的模态偏好 |
large language model multimodal |
|
|
| 3 |
Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities |
提出SPARCOM框架以揭示LLM指令遵循能力的稀疏神经元 |
large language model instruction following |
|
|
| 4 |
Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective |
提出MAMMQA框架以解决多模态问答中的信息综合问题 |
large language model multimodal |
|
|
| 5 |
Explaining Large Language Models with gSMILE |
提出gSMILE以解决大语言模型可解释性问题 |
large language model |
|
|
| 6 |
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions |
提出LayerIF以解决大语言模型层质量估计问题 |
large language model |
|
|
| 7 |
Test-Time Learning for Large Language Models |
提出测试时学习方法以提升大语言模型在特定领域的适应性 |
large language model |
|
|
| 8 |
Rethinking the Outlier Distribution in Large Language Models: An In-depth Study |
深入研究大语言模型中的异常值分布以提升量化性能 |
large language model |
|
|
| 9 |
How does Misinformation Affect Large Language Model Behaviors and Preferences? |
提出MisBench以评估大型语言模型对虚假信息的反应 |
large language model |
✅ |
|
| 10 |
RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models |
提出RelationalFactQA以评估大型语言模型的表格事实检索能力 |
large language model |
|
|
| 11 |
Who Reasons in the Large Language Models? |
提出Stethoscope for Networks以揭示大语言模型的推理能力 |
large language model |
|
|
| 12 |
Multi-objective Large Language Model Alignment with Hierarchical Experts |
提出HoE方法以解决多目标大语言模型对齐问题 |
large language model |
|
|
| 13 |
Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models |
提出LLM自动传输框架以优化大语言模型的成本与准确性 |
large language model |
|
|
| 14 |
DenseLoRA: Dense Low-Rank Adaptation of Large Language Models |
提出DenseLoRA以提高大语言模型的参数效率 |
large language model |
✅ |
|
| 15 |
DLP: Dynamic Layerwise Pruning in Large Language Models |
提出动态层级剪枝方法以提升大语言模型的推理效率 |
large language model |
✅ |
|
| 16 |
CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models |
提出CogniBench框架以评估大型语言模型的认知可信度 |
large language model |
✅ |
|
| 17 |
From prosthetic memory to prosthetic denial: Auditing whether large language models are prone to mass atrocity denialism |
审计大型语言模型对历史记忆的影响与否定主义风险 |
large language model |
|
|
| 18 |
Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives |
提出数据混合方法以优化大语言模型训练效果 |
large language model |
|
|
| 19 |
DecisionFlow: Advancing Large Language Model as Principled Decision Maker |
提出DecisionFlow以解决语言模型决策透明性不足问题 |
large language model |
✅ |
|
| 20 |
Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts |
提出基于大语言模型与传统机器学习集成的ADHD检测方法 |
large language model |
|
|
| 21 |
Assessment of L2 Oral Proficiency using Speech Large Language Models |
利用多模态大语言模型评估L2口语能力以解决自动评分问题 |
large language model |
|
|
| 22 |
RPM: Reasoning-Level Personalization for Black-Box Large Language Models |
提出RPM框架以解决黑箱大语言模型个性化问题 |
large language model |
|
|
| 23 |
Uncertainty Unveiled: Can Exposure to More In-context Examples Mitigate Uncertainty for Large Language Models? |
提出通过增加上下文示例来降低大型语言模型的不确定性 |
large language model |
|
|
| 24 |
Research Community Perspectives on "Intelligence" and Large Language Models |
调查研究者对“智能”及大型语言模型的看法 |
large language model |
|
|
| 25 |
On VLMs for Diverse Tasks in Multimodal Meme Classification |
提出多模态模型以提升表情包分类性能 |
multimodal |
|
|
| 26 |
Automated Privacy Information Annotation in Large Language Model Interactions |
构建自动隐私信息标注系统以应对LLM交互中的隐私泄露问题 |
large language model |
|
|
| 27 |
STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models |
提出STEER-BENCH以评估大型语言模型的可引导性 |
large language model |
|
|
| 28 |
SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis |
提出SV-TrustEval-C以解决LLM在代码漏洞分析中的评估问题 |
large language model |
|
|
| 29 |
Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation |
提出声学感知数据增强以提升ASR模型的鲁棒性 |
foundation model |
|
|
| 30 |
Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning |
提出Self-Route以解决推理模型资源消耗问题 |
large language model chain-of-thought |
|
|
| 31 |
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs |
提出AutoJudger以解决多模态大语言模型评估成本高的问题 |
large language model multimodal |
|
|
| 32 |
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction |
提出MM-VAP以解决人机交互中的预测轮流问题 |
multimodal |
|
|
| 33 |
Predicting Implicit Arguments in Procedural Video Instructions |
提出Implicit-VidSRL数据集以解决隐式参数预测问题 |
multimodal |
|
|
| 34 |
LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model |
提出LLMPR以解决印度司法案件积压问题 |
large language model |
|
|
| 35 |
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing |
提出R2R以高效导航语言模型推理路径 |
large language model |
✅ |
|
| 36 |
Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling |
提出语言模型的事实自我意识以提高生成内容的准确性 |
large language model |
|
|
| 37 |
Exploring the Hidden Capacity of LLMs for One-Step Text Generation |
探索LLMs在一步文本生成中的隐藏能力 |
large language model |
|
|
| 38 |
A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction |
提出小型语言图以解决工程信息提取中的计算资源问题 |
large language model |
|
|
| 39 |
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences |
提出SpecExtend以解决长序列推理性能下降问题 |
large language model |
✅ |
|
| 40 |
CodeMirage: A Multi-Lingual Benchmark for Detecting AI-Generated and Paraphrased Source Code from Production-Level LLMs |
提出CodeMirage以解决AI生成代码检测的基准问题 |
large language model |
|
|
| 41 |
REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning |
提出REAL-Prover以解决高等数学定理证明问题 |
large language model |
|
|
| 42 |
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies |
提出基于子子句频率的校准方法以提升文本到SQL解析的可靠性 |
large language model |
|
|
| 43 |
RefTool: Enhancing Model Reasoning with Reference-Guided Tool Creation |
提出RefTool以解决工具生成不足的问题 |
large language model |
|
|
| 44 |
Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science |
通过数据增强LLM生成高质量研究创意 |
large language model |
|
|
| 45 |
Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History |
提出框架评估LLM对社会人口特征的适应性 |
large language model |
|
|
| 46 |
Pretrained LLMs Learn Multiple Types of Uncertainty |
研究大型语言模型捕捉多种不确定性以提升任务准确性 |
large language model |
|
|
| 47 |
BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge |
提出BLUCK数据集以评估孟加拉语言理解与文化知识 |
large language model |
|
|
| 48 |
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation |
提出MARS-Bench以解决LLMs在复杂对话中的鲁棒性问题 |
large language model |
|
|
| 49 |
LLM-Driven E-Commerce Marketing Content Optimization: Balancing Creativity and Conversion |
提出基于LLM的电商营销内容优化框架以提升转化率 |
multimodal |
|
|
| 50 |
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties |
提出Trans-EnV框架以评估LLMs对英语变体的语言鲁棒性 |
large language model |
✅ |
|
| 51 |
Calibrating LLM Confidence by Probing Perturbed Representation Stability |
提出CCPS以解决大型语言模型置信度校准问题 |
large language model |
|
|
| 52 |
Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing |
提出新方法识别大型语言模型知识盲区的一致性问题 |
large language model |
|
|
| 53 |
MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs |
提出MAKIEval框架以评估LLMs的文化意识 |
large language model |
|
|
| 54 |
Are Language Models Consequentialist or Deontological Moral Reasoners? |
提出道德推理分类框架以分析语言模型的伦理决策 |
large language model |
✅ |
|
| 55 |
Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance |
探讨潜在语言一致性对LLM任务性能的影响 |
large language model |
|
|
| 56 |
PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims |
提出PEDANTIC数据集以解决专利索赔不确定性问题 |
large language model |
|
|
| 57 |
Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead |
分析884篇非洲语言NLP研究以推动包容性发展 |
large language model |
|
|
| 58 |
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset |
提出rStar-Coder以解决大规模代码推理数据集稀缺问题 |
large language model |
✅ |
|
| 59 |
Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration? |
提出CogCalib框架以解决LLMs校准不足问题 |
large language model |
|
|
| 60 |
MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection |
提出多语言幻觉检测方法以解决LLM生成文本的可靠性问题 |
large language model |
|
|
| 61 |
Concealment of Intent: A Game-Theoretic Analysis |
提出意图隐藏对抗性提示以应对大语言模型的安全问题 |
large language model |
|
|
| 62 |
CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation |
提出CHIMERA知识库以促进科学思想重组与研究分析 |
large language model |
✅ |
|
| 63 |
Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration |
提出DART框架以解决小语言模型推理能力不足问题 |
large language model |
|
|
| 64 |
Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration |
提出XpandA框架以解决长文本处理中的信息损失问题 |
large language model |
|
|
| 65 |
POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization |
提出POLAR数据集以解决在线极化问题 |
large language model |
|
|