| 1 |
Comparison of Scoring Rationales Between Large Language Models and Human Raters |
对比大型语言模型与人类评分者的评分理由,探究自动评分一致性问题 |
large language model |
|
|
| 2 |
How to Make Large Language Models Generate 100% Valid Molecules? |
提出SmiSelf框架,确保大语言模型100%生成有效分子 |
large language model |
✅ |
|
| 3 |
The Impact of Role Design in In-Context Learning for Large Language Models |
研究角色设计对大语言模型上下文学习的影响,提升模型在多任务上的性能 |
large language model |
|
|
| 4 |
Cognition-of-Thought Elicits Social-Aligned Reasoning in Large Language Models |
提出Cognition-of-Thought框架以提升大语言模型的社会对齐推理能力 |
large language model |
|
|
| 5 |
Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models |
提出CLAIRE系统,用于检测维基百科语料库级别知识不一致性,提升编辑效率。 |
large language model |
|
|
| 6 |
A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks |
提出结构化框架,评估并提升多模态LLM在中国文化情境下的理解能力 |
multimodal |
✅ |
|
| 7 |
CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding |
提出临床对比解码(CCD)框架,缓解放射学多模态大语言模型中的幻觉问题 |
large language model multimodal |
|
|
| 8 |
Modeling the language cortex with form-independent and enriched representations of sentence meaning reveals remarkable semantic abstractness |
利用形式无关且丰富的句子表征建模语言皮层,揭示显著的语义抽象性 |
large language model |
|
|
| 9 |
An Senegalese Legal Texts Structuration Using LLM-augmented Knowledge Graph |
利用LLM增强的知识图谱构建塞内加尔法律文本结构化体系 |
large language model |
|
|
| 10 |
Train Once, Answer All: Many Pretraining Experiments for the Cost of One |
提出单次训练多重实验方法,降低大语言模型预训练实验成本。 |
large language model |
|
|
| 11 |
Alignment through Meta-Weighted Online Sampling: Bridging the Gap between Data Generation and Preference Optimization |
提出MetaAPO,通过元加权在线采样弥合数据生成与偏好优化之间的差距 |
large language model |
|
|
| 12 |
Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression |
提出基于动态专家聚类与结构化压缩的MoE LLM优化框架,解决负载不均、参数冗余和通信开销问题。 |
large language model |
|
|
| 13 |
Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations |
系统评估语音对话大模型在决策和推荐中的偏见,揭示多轮对话下的偏见放大效应。 |
large language model |
|
|
| 14 |
Language, Culture, and Ideology: Personalizing Offensiveness Detection in Political Tweets with Reasoning LLMs |
利用推理LLM个性化政治推文冒犯性检测,考虑语言、文化和意识形态因素 |
large language model |
|
|
| 15 |
Dual-Space Smoothness for Robust and Balanced LLM Unlearning |
PRISM:通过双空间平滑实现鲁棒且均衡的LLM不可学习 |
large language model |
|
|
| 16 |
LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL |
LLMSQL:为大语言模型时代升级WikiSQL文本到SQL数据集 |
large language model |
|
|
| 17 |
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models |
提出A2D以解决扩散语言模型的安全性问题 |
large language model |
|
|
| 18 |
Small Language Models for Curriculum-based Guidance |
利用小型语言模型和课程指导,构建可持续的AI教学助手 |
large language model |
|
|
| 19 |
$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training |
提出BluePrint数据集,用于评估和训练LLM在社交媒体中的用户行为模拟。 |
large language model |
|
|
| 20 |
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks |
提出语义投票方法,无需自评估即可高效提升LLM在开放式任务上的性能 |
large language model |
|
|
| 21 |
MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction |
提出MaskSQL框架,通过抽象化保护LLM文本转SQL任务中的隐私。 |
large language model |
|
|
| 22 |
No Loss, No Gain: Gated Refinement and Adaptive Compression for Prompt Optimization |
提出GRACE框架,通过门控优化和自适应压缩提升Prompt优化效率与性能。 |
large language model |
✅ |
|
| 23 |
PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space |
PonderLM-2:通过在连续空间中预训练具有潜在思想的LLM,提升单token生成质量。 |
chain-of-thought |
|
|
| 24 |
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs |
提出测试时策略自适应(T2PAM)框架,增强LLM多轮交互中的性能。 |
large language model |
|
|
| 25 |
d$^2$Cache: Accelerating Diffusion-Based LLMs via Dual Adaptive Caching |
提出d$^2$Cache,通过双重自适应缓存加速扩散模型LLM的推理。 |
large language model |
✅ |
|
| 26 |
The Geometry of Creative Variability: How Credal Sets Expose Calibration Gaps in Language Models |
利用Credal集揭示语言模型在创造性任务中的校准差距 |
large language model |
|
|
| 27 |
Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate |
提出多智能体辩论中谄媚行为的评估框架,揭示其对辩论质量的负面影响 |
large language model |
|
|