| 1 |
Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models |
建议开放向量提示接口以实现大语言模型的可定制化 |
large language model |
|
|
| 2 |
Traces of Social Competence in Large Language Models |
通过改进的False Belief Test评估大型语言模型的社会认知能力 |
large language model |
|
|
| 3 |
Benchmarking Motivational Interviewing Competence of Large Language Models |
评估大型语言模型在动机访谈中的能力,验证其在心理咨询领域的应用潜力。 |
large language model |
|
|
| 4 |
A Neural Topic Method Using a Large-Language-Model-in-the-Loop for Business Research |
LX Topic:融合大语言模型的神经主题模型,提升商业研究中文本分析质量 |
large language model |
|
|
| 5 |
Monitoring Emergent Reward Hacking During Generation via Internal Activations |
提出基于内部激活的奖励劫持监测方法,用于检测生成过程中的模型对齐问题。 |
large language model chain-of-thought |
|
|
| 6 |
CONCUR: Benchmarking LLMs for Concurrent Code Generation |
CONCUR:用于评估LLM并发代码生成能力的新基准测试 |
large language model |
|
|
| 7 |
Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model |
Bielik-Q2-Sharp:针对波兰语11B语言模型的极端2比特量化方法对比研究 |
large language model |
|
|
| 8 |
Rethinking Role-Playing Evaluation: Anonymous Benchmarking and a Systematic Study of Personality Effects |
提出匿名评估方法,并研究人格增强对角色扮演Agent性能的影响 |
large language model |
|
|
| 9 |
CzechTopic: A Benchmark for Zero-Shot Topic Localization in Historical Czech Documents |
CzechTopic:面向捷克历史文档的零样本主题定位基准 |
large language model |
✅ |
|
| 10 |
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning |
提出T2S-Bench基准测试和SoT提示方法,提升LLM在文本到结构推理任务上的性能。 |
large language model |
✅ |
|
| 11 |
The Company You Keep: How LLMs Respond to Dark Triad Traits |
研究大型语言模型对黑暗三角特质的响应机制 |
large language model |
|
|
| 12 |
Retrieval or Representation? Reassessing Benchmark Gaps in Multilingual and Visually Rich RAG |
重新评估多语言和视觉RAG中的基准差距:文档表示优于检索方法 |
multimodal |
|
|
| 13 |
When Do Language Models Endorse Limitations on Human Rights Principles? |
评估大型语言模型对人权原则限制的倾向与偏差 |
large language model |
|
|
| 14 |
FINEST: Improving LLM Responses to Sensitive Topics Through Fine-Grained Evaluation |
FINEST:通过细粒度评估提升LLM对敏感话题的回应质量 |
large language model |
|
|
| 15 |
Assessing the Effectiveness of LLMs in Delivering Cognitive Behavioral Therapy |
评估大型语言模型在认知行为疗法中的有效性 |
large language model |
|
|
| 16 |
ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement |
ErrorLLM:通过建模SQL错误来改进Text-to-SQL的生成效果 |
large language model |
|
|