VOOZH about

URL: https://huggingface.co/prithivMLmods/DREX-062225-exp

⇱ prithivMLmods/DREX-062225-exp · Hugging Face


👁 DREX-062225-exp.png

DREX-062225-exp

The DREX-062225-exp (Document Retrieval and Extraction eXpert) model is a specialized fine-tuned version of docscopeOCR-7B-050425-exp, optimized for Document Retrieval, Content Extraction, and Analysis Recognition. Built on top of the Qwen2.5-VL architecture, this model enhances document comprehension capabilities with focused training on the Opendoc2-Analysis-Recognition dataset for superior document analysis and information extraction tasks.

DREX: Document Retrieval and Extraction eXpert [ experimental ]

Key Enhancements

  • Advanced Document Retrieval: Specialized capabilities for locating and retrieving specific information from complex document structures and layouts.

  • Enhanced Content Extraction: Optimized for extracting structured data, key information, and relevant content from diverse document types including reports, forms, and technical documentation.

  • Superior Analysis Recognition: Fine-tuned recognition abilities for document analysis tasks, pattern identification, and contextual understanding of document hierarchies.

  • Inherited OCR Excellence: Maintains all advanced OCR capabilities from the base docscopeOCR model including mathematical LaTeX formatting and multi-language support.

  • Document-Centric Understanding: Specialized training for understanding document relationships, cross-references, and contextual dependencies within complex document sets.


Markdown (.MD) - Inference

👁 1.png


👁 2.png


Quick Start with Transformers

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
 "prithivMLmods/DREX-062225-exp", torch_dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained("prithivMLmods/DREX-062225-exp")

messages = [
 {
 "role": "user",
 "content": [
 {
 "type": "image",
 "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
 },
 {"type": "text", "text": "Extract and analyze the key information from this document."},
 ],
 }
]

text = processor.apply_chat_template(
 messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
 text=[text],
 images=image_inputs,
 videos=video_inputs,
 padding=True,
 return_tensors="pt",
)
inputs = inputs.to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
 out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
 generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Training Details

Parameter Value
Dataset Opendoc2-Analysis-Recognition
Dataset Size 6,910 samples
Base Model docscopeOCR-7B-050425-exp
Model Architecture Qwen2_5_VLForConditionalGeneration
Hardware 2 × A40 (19 vCPUs)
Total Disk 280,000 MB
Training Time 3,407 seconds (~0.95 hours)
Warmup Steps 250
Precision bfloat16

This model builds upon the robust foundation of docscopeOCR-7B-050425-exp with specialized training for document retrieval and extraction tasks.

Intended Use

This model is specifically designed for:

  • Document Retrieval: Efficiently locating specific information within large document collections and complex layouts.
  • Content Extraction: Precise extraction of structured data, tables, forms, and key information from various document types.
  • Analysis Recognition: Advanced recognition and analysis of document patterns, structures, and contextual relationships.
  • Enterprise Document Processing: Automated processing of business documents, reports, contracts, and administrative forms.
  • Research Document Analysis: Academic paper analysis, citation extraction, and research document comprehension.
  • Regulatory Compliance: Processing of compliance documents, regulatory filings, and standardized reporting formats.

Limitations

  • Inherits computational requirements from the base docscopeOCR model, requiring substantial resources for optimal performance.
  • Performance may vary on document types significantly different from the Opendoc2-Analysis-Recognition training dataset.
  • May show reduced accuracy on extremely specialized or domain-specific document formats not covered in training.
  • Long document processing requires adequate memory allocation and may not be suitable for real-time streaming applications.
  • Optimal performance depends on proper visual token configuration and input preprocessing.

References

Downloads last month
9
Safetensors
Model size
8B params
Tensor type
BF16
·

Model tree for prithivMLmods/DREX-062225-exp

Finetuned
(3)
this model
Quantizations
2 models

Dataset used to train prithivMLmods/DREX-062225-exp

Collections including prithivMLmods/DREX-062225-exp

Papers for prithivMLmods/DREX-062225-exp