Skip to content
You signed in with another tab or window. to refresh your session.
You signed out in another tab or window. to refresh your session.
You switched accounts on another tab or window. to refresh your session.
Popular repositories
Loading
-
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Python
1.8k
129
-
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Python
715
50
-
[EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!
Python
479
70
-
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Python
419
39
-
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Python
195
15
-
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
Python
58
8
Repositories
Showing 10 of 16 repositories
-
-
Jupyter Notebook
2
0
1
0
Updated
-
Arbitrage
Public
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation
-
MultipoleAttention
Public
[NeurIPS 2025] Multipole Attention for Efficient Long Context Reasoning
-
CDLM
Public
CDLM: Consistency Diffusion Language Models for Faster Sampling
Python
36
MIT
0
0
0
Updated
-
plan-and-act
Public
[ICML 2025] Improving Planning of Agents for Long-Horizon Tasks
Python
29
MIT
4
1
0
Updated
-
sciml-agent
Public
SciMLAgents: Write the Solver, Not the Solution
-
ETS
Public
ETS: Efficient Tree Search for Inference-Time Scaling
-
SqueezedAttention
Public
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
-
Tool2Vec
Public
Efficient and Scalable Estimation of Tool Representations in Vector Space
Python
29
MIT
4
3
0
Updated
You can’t perform that action at this time.