VOOZH about

URL: https://huggingface.co/papers/2606.23050

โ‡ฑ Paper page - Unlimited OCR Works


Papers
arxiv:2606.23050

Unlimited OCR Works

Published on Jun 22
ยท Submitted by taesiri on Jun 23
ยท ๐Ÿ‘ baidu
BAIDU
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Unlimited OCR introduces Reference Sliding Window Attention to eliminate growing memory consumption during long-sequence OCR tasks, enabling efficient transcription of multiple pages in a single forward pass.

Recently, end-to-end OCR models, exemplified by DeepSeek OCR, have once again thrust OCR into the spotlight. A widely held view is that employing a large language model (LLM) as the decoder allows the model to leverage the prior distribution of language, leading to improved OCR performance. However, the downside is equally evident: as the output sequence lengthens, the accumulated KV cache drives up memory consumption and progressively slows down generation. This stands in stark contrast to humans, who exhibit no such decline in efficiency during long-horizon copying tasks. In this technical report, we propose Unlimited OCR, a model designed to emulate human parsing working memory. Taking DeepSeek OCR as the baseline, we replace all attention layers in the decoder with our proposed Reference Sliding Window Attention (R-SWA), which reduces attention computation costs while maintaining a constant KV cache throughout the entire decoding process. By combining the high compression rate of DeepSeek OCR's encoder with our constant KV cache design, Unlimited OCR can transcribe dozens of pages of documents in a single forward pass under a standard maximum length of 32K. More importantly, R-SWA is a general-purpose parsing attention mechanism - beyond OCR, it is equally applicable to tasks such as ASR, translation, etc. Codes and model weights are publicly available at http://github.com/baidu/Unlimited-OCR.

Community

This comment has been hidden (marked as Spam)

Pixels are my body, and tokens are my blood.

I have parsed over a thousand pages.

Unknown to KV cache.

Nor known to context limits.

So, as I pray:

Unlimited OCR Works.

ยท Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.23050

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.23050 in a dataset README.md to link it from this page.

Spaces citing this paper 9

Browse 9 spaces citing this paper

Collections including this paper 6