👁 Image Submitted by 👁 Image Dhaval Patel 41 Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents 👁 ibm IBM 1
👁 Image Submitted by 👁 Image Dhaval Patel 12 Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines 👁 ibm IBM 2
👁 Image Submitted by 👁 Image Dhaval Patel 1 SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks 👁 ibm IBM 2
👁 Image Submitted by 👁 Image Dhaval Patel 7 Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds 👁 ibm IBM 1
👁 Image Submitted by 👁 Image Dhaval Patel 7 DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules 👁 ibm IBM 2
👁 Image Submitted by 👁 Image Leo Y 57 From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents 👁 ibm IBM 71 2
👁 Image Submitted by 👁 Image Avihu Dekel 24 NLE: Non-autoregressive LLM-based ASR by Transcript Editing 👁 ibm IBM 2
👁 Image Submitted by 👁 Image Zhangchen Xu 29 TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments 👁 ibm IBM 253 3