| 1 |
Multimodal Modeling of CRISPR-Cas12 Activity Using Foundation Models and Chromatin Accessibility Data |
利用基础模型和染色质可及性数据提升CRISPR-Cas12 gRNA活性预测 |
foundation model multimodal |
|
|
| 2 |
Contemporary AI foundation models increase biological weapons risk |
提出新框架评估AI模型对生物武器风险的影响 |
large language model foundation model |
|
|
| 3 |
Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills |
提出层次化多模态技能模块以解决GUI代理知识不足问题 |
large language model multimodal |
✅ |
|
| 4 |
LLM-as-a-Fuzzy-Judge: Fine-Tuning Large Language Models as a Clinical Evaluation Judge with Fuzzy Logic |
提出LLM-as-a-Fuzzy-Judge以解决临床评估自动化问题 |
large language model |
✅ |
|
| 5 |
Formalising Software Requirements using Large Language Models |
提出VERIFAI项目以解决软件需求的可追溯性与验证问题 |
large language model |
|
|
| 6 |
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving |
提出TeleMath基准以评估大语言模型在电信数学问题求解中的表现 |
large language model |
|
|
| 7 |
SoK: Evaluating Jailbreak Guardrails for Large Language Models |
提出多维分类法以评估大型语言模型的监控防护机制 |
large language model |
✅ |
|
| 8 |
Intelligent Automation for FDI Facilitation: Optimizing Tariff Exemption Processes with OCR And Large Language Models |
提出智能自动化框架以优化外资投资的关税豁免流程 |
large language model |
|
|
| 9 |
Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements |
通过静态代码分析增强大型语言模型以自动改善代码质量 |
large language model |
|
|
| 10 |
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models |
提出WGSR-Bench以解决战略推理评估问题 |
large language model |
|
|
| 11 |
Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification? |
提出ToxiMol基准以解决分子毒性修复问题 |
large language model multimodal |
|
|
| 12 |
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning |
提出科学家首考基准以评估多模态大语言模型的认知能力 |
large language model multimodal |
|
|
| 13 |
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning |
提出DiMo-GUI以解决GUI基础上的自然语言查询问题 |
visual grounding |
|
|
| 14 |
OPT-BENCH: Evaluating LLM Agent on Large-Scale Search Spaces Optimization Problems |
提出OPT-BENCH以评估LLM代理在大规模搜索空间优化问题上的表现 |
large language model |
✅ |
|
| 15 |
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis |
提出LEA以量化生成模型响应中的源贡献问题 |
large language model |
|
|
| 16 |
Invocable APIs derived from NL2SQL datasets for LLM Tool-Calling Evaluation |
提出NL2API数据集生成方法以评估LLM工具调用能力 |
large language model |
|
|
| 17 |
SWE-Factory: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks |
提出SWE-Factory以解决GitHub问题解决数据集构建难题 |
large language model |
✅ |
|
| 18 |
GenPlanX. Generation of Plans and Execution |
提出GenPlanX以解决自然语言规划任务理解问题 |
large language model |
|
|
| 19 |
Precise Zero-Shot Pointwise Ranking with LLMs through Post-Aggregated Global Context Information |
提出全球一致比较点对点排名方法以提升零样本文档排名效果 |
large language model |
|
|
| 20 |
LLM-Driven Personalized Answer Generation and Evaluation |
利用大语言模型生成个性化答案以提升在线学习体验 |
large language model |
|
|
| 21 |
What Users Value and Critique: Large-Scale Analysis of User Feedback on AI-Powered Mobile Apps |
提出大规模用户反馈分析方法以提升AI移动应用体验 |
large language model |
|
|
| 22 |
Automated Validation of Textual Constraints Against AutomationML via LLMs and SHACL |
提出自动化验证文本约束以解决AutomationML建模问题 |
large language model |
|
|
| 23 |
Beyond Formal Semantics for Capabilities and Skills: Model Context Protocol in Manufacturing |
提出模型上下文协议以简化制造业能力与技能建模 |
large language model |
|
|
| 24 |
Primender Sequence: A Novel Mathematical Construct for Testing Symbolic Inference and AI Reasoning |
提出Primender序列以评估大型语言模型的符号推理能力 |
large language model |
|
|
| 25 |
StepProof: Step-by-step verification of natural language mathematical proofs |
提出StepProof以解决自然语言数学证明逐步验证问题 |
large language model |
|
|
| 26 |
LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs |
提出LogiPlan以评估大语言模型在逻辑规划中的能力 |
large language model |
|
|
| 27 |
SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks |
提出SOFT以解决LLM微调中的成员推断攻击问题 |
large language model |
|
|
| 28 |
PAL: Probing Audio Encoders via LLMs - Audio Information Transfer into LLMs |
提出轻量级音频LLM集成方法以提升音频信息传递效率 |
large language model |
✅ |
|
| 29 |
Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges |
提出Reasoning Agentic RAG以解决复杂推理与动态检索问题 |
large language model |
✅ |
|
| 30 |
Towards Understanding Bias in Synthetic Data for Evaluation |
探讨合成数据中的偏差以优化信息检索系统评估 |
large language model |
✅ |
|
| 31 |
Discrete Audio Tokens: More Than a Survey! |
提出离散音频标记以提升音频处理效率与性能 |
large language model |
✅ |
|