| 1 |
C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning |
提出C2-Faith基准以评估链式推理中的因果与覆盖忠实性 |
large language model chain-of-thought |
|
|
| 2 |
TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings |
TSEmbed:通过解耦任务目标实现通用多模态嵌入的任务扩展 |
large language model multimodal |
|
|
| 3 |
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation |
提出IF-RewardBench,用于全面评估指令跟随评估中判别模型的性能。 |
large language model instruction following |
✅ |
|
| 4 |
Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought |
揭示推理模型中的表演性CoT:通过激活探测解耦模型信念与思维链 |
chain-of-thought |
|
|
| 5 |
Detection of Illicit Content on Online Marketplaces using Large Language Models |
利用大型语言模型检测在线市场中的非法内容 |
large language model |
|
|
| 6 |
FireBench: Evaluating Instruction Following in Enterprise and API-Driven LLM Applications |
FireBench:评估企业和API驱动的LLM应用中的指令遵循能力 |
instruction following |
|
|
| 7 |
From Unfamiliar to Familiar: Detecting Pre-training Data via Gradient Deviations in Large Language Models |
GDS:基于梯度偏差的大语言模型预训练数据检测方法 |
large language model |
|
|
| 8 |
An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs |
提出探索-分析-消歧推理框架,利用低参数LLM实现媲美GPT-4-Turbo的词义消歧。 |
large language model chain-of-thought |
|
|
| 9 |
Balancing Coverage and Draft Latency in Vocabulary Trimming for Faster Speculative Decoding |
提出词汇表裁剪方法,平衡覆盖率与延迟,加速推测解码。 |
large language model |
|
|
| 10 |
Feature Resemblance: On the Theoretical Understanding of Analogical Reasoning in Transformers |
提出特征相似性理论以理解变换器中的类比推理 |
large language model |
|
|
| 11 |
Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval |
提出INTRA方法,利用LLM内部知识进行无需检索的事实核查,实现更强的泛化能力。 |
large language model |
|
|
| 12 |
Stacked from One: Multi-Scale Self-Injection for Context Window Extension |
提出SharedLLM,通过多尺度自注入扩展LLM上下文窗口至128K tokens。 |
large language model |
|
|
| 13 |
FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling |
FlashAttention-4:面向非对称硬件扩展的算法与Kernel流水线协同设计 |
large language model |
|
|
| 14 |
Progressive Residual Warmup for Language Model Pretraining |
提出渐进残差预热(ProRes)方法,加速并稳定Transformer语言模型预训练。 |
large language model |
✅ |
|
| 15 |
Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh |
构建孟加拉国少数民族语言多模态平行语料库,助力濒危语言数字化。 |
multimodal |
|
|
| 16 |
VietJobs: A Vietnamese Job Advertisement Dataset |
VietJobs:首个大规模越南语招聘广告数据集,为NLP和劳动力市场分析提供基准。 |
large language model |
✅ |
|
| 17 |
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity |
提出Sparse-BitNet以提升大语言模型的稀疏性与低比特量化效率 |
large language model |
✅ |
|
| 18 |
NeuronMoE: Neuron-Guided Mixture-of-Experts for Efficient Multilingual LLM Extension |
NeuronMoE:神经元引导的MoE用于高效多语言LLM扩展 |
large language model |
|
|
| 19 |
HiFlow: Hierarchical Feedback-Driven Optimization for Constrained Long-Form Text Generation |
HiFlow:用于约束长文本生成的分层反馈驱动优化框架 |
large language model |
|
|
| 20 |
VRM: Teaching Reward Models to Understand Authentic Human Preferences |
提出VRM,通过变分推理学习奖励模型以理解真实人类偏好 |
large language model |
|
|
| 21 |
AILS-NTUA at SemEval-2026 Task 10: Agentic LLMs for Psycholinguistic Marker Extraction and Conspiracy Endorsement Detection |
提出基于Agentic LLM的心理语言学标记提取与阴谋论认可检测方法 |
chain-of-thought |
|
|
| 22 |
Beyond the Context Window: A Cost-Performance Analysis of Fact-Based Memory vs. Long-Context LLMs for Persistent Agents |
对比长上下文LLM与事实型记忆系统,为持久Agent选择提供成本-性能分析。 |
large language model |
|
|
| 23 |
The Fragility Of Moral Judgment In Large Language Models |
揭示大语言模型道德判断的脆弱性:叙事形式和任务设计显著影响判断结果 |
large language model |
|
|
| 24 |
Safer Reasoning Traces: Measuring and Mitigating Chain-of-Thought Leakage in LLMs |
研究CoT推理中LLM的PII泄露问题,并提出轻量级推理时门控方法缓解泄露风险。 |
chain-of-thought |
|
|
| 25 |
NERdME: a Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories |
提出NERdME数据集,用于识别代码仓库中研究成果的命名实体。 |
large language model |
|
|
| 26 |
Towards Robust Retrieval-Augmented Generation Based on Knowledge Graph: A Comparative Analysis |
基于知识图谱的检索增强生成:提升RAG系统在噪声环境下的鲁棒性 |
large language model |
|
|
| 27 |
Oral to Web: Digitizing 'Zero Resource'Languages of Bangladesh |
构建孟加拉国少数民族语言多模态平行语料库,助力濒危语言数字化。 |
multimodal |
|
|