Voozh

AI & ML interests

Enterprise AI and ML, Foundation Models, Responsible AI

Recent Activity

👁 Image

DhavalPatel submitted a paper 10 days ago

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

👁 Image

DhavalPatel submitted a paper about 1 month ago

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

👁 Image

DhavalPatel submitted a paper about 1 month ago

Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

View all activity

Papers

👁 Image

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

👁 Image

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

View all Papers

👁 Image

Submitted by

👁 Image

Dhaval Patel

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

👁 ibm
IBM

👁 Image

Submitted by

👁 Image

Dhaval Patel

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

👁 ibm
IBM

👁 Image

Submitted by

👁 Image

Dhaval Patel

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

👁 ibm
IBM

👁 Image

Submitted by

👁 Image

Dhaval Patel

Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

👁 ibm
IBM

👁 Image

Submitted by

👁 Image

Dhaval Patel

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

👁 ibm
IBM

👁 Image

Submitted by

👁 Image

Leo Y

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

👁 ibm
IBM

71 2

👁 Image

Submitted by

👁 Image

Avihu Dekel

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

👁 ibm
IBM

👁 Image

Submitted by

👁 Image

Zhangchen Xu

TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments

👁 ibm
IBM

253 3

URL: https://huggingface.co/ibm/papers

⇱ ibm (IBM)

AI & ML interests

Recent Activity

Papers

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

Code-Guided Reasoning for Small Language Models: Evaluating Executable MCQA Scaffolds

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules

From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

NLE: Non-autoregressive LLM-based ASR by Transcript Editing

TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments