Azure Document Intelligence OCR Labeling Limitations and Their Impact on Custom Model Training Accuracy Even After FineTune

Zeno Walker 0 Reputation points
  1. How can Azure Document Intelligence achieve reliable model accuracy when the training labels are fully dependent on internal Auto OCR outputs that may already be incorrect or incomplete on some images of Credit cards?
  2. If OCR fails to detect some PAN digits (especially embossed or blurry middle sections), does “Draw Region” help the model learn missing values, or does it only map existing OCR tokens?
  3. Why is there no option to manually edit or override OCR-detected label for credit cards values before training or fine-tuning in custom extraction models?
  4. What is the recommended Microsoft approach for handling blurry or embossed credit card images where internal OCR models cannot correctly detect PAN values?
  5. If incorrect OCR results are used during labeling and fine-tuning, will the prediction quality always remain dependent on Auto internal OCR accuracy, even after model training?
0 comments No comments

Sign in to comment

1 answer

  1. SRILAKSHMI C 19,110 Reputation points Microsoft External Staff Moderator

    Hello @Zeno Walker,

    Thank you for reaching out to Microsoft Q&A.

    The behavior you are observing is related to the architectural relationship between the OCR layer and the custom extraction layer in Document Intelligence. I will address each of your questions individually below.

    1. How can Document Intelligence achieve reliable accuracy when training labels depend on internal OCR output?

    Document Intelligence custom extraction models are built on top of Microsoft’s OCR engine. The overall pipeline works in two major stages:

    1. OCR/Text Detection

    The service first performs OCR to identify:

    • Text content
    • Tokens/words
    • Character groupings
    • Bounding boxes/coordinates
    • Reading order
    1. Neural Extraction Model

    The custom model then learns:

    • Spatial relationships between fields
    • Layout patterns
    • Neighboring text context
    • Relative positioning
    • Repeated document structures

    This means the extraction model is not simply memorizing raw text values. Instead, it learns the structural and contextual relationship between detected OCR tokens and the labels assigned during training.

    However, OCR remains the foundational layer.

    If OCR:

    • Misses characters,
    • Produces incomplete tokens,
    • Misreads embossed digits,
    • Or fails to detect text entirely,

    then the downstream extraction model is constrained by those OCR results.

    For example:

    If the actual PAN is

    1234 5678 9012 3456
    

    but OCR detects

    1234 XXXX 9012 3456
    

    then the extraction model only receives the detected tokens as training input.

    In practice, custom training improves:

    • Field association accuracy,
    • Layout understanding,
    • Extraction consistency across templates,

    but it cannot fully compensate for text that the OCR engine never detected.

    2. Does “Draw Region” help the model learn missing values?

    No. Draw Region does not create or recover missing OCR text.

    The Draw Region functionality only:

    • Associates existing OCR tokens with a field label,
    • Defines the spatial region corresponding to the label,
    • Helps the model understand where the field is located.

    It does not:

    • Inject new text,
    • Override OCR output,
    • Correct recognition errors,
    • Or invent missing digits.

    For example:

    If OCR detected:

    1234 XXXX 3456
    

    and the actual card number is:

    1234 5678 XXXX 3456
    

    then drawing a larger region around the card number area will not teach the model the missing “5678” digits because those tokens do not exist in the OCR layer.

    The model can only learn from OCR content that has already been recognized.

    3. Why is there no manual OCR correction capability in the labeling experience?

    Currently, the labeling workflow in Document Intelligence is token-based.

    Internally, labels are linked directly to:

    • OCR token IDs,
    • Bounding boxes,
    • Spatial coordinates,
    • Page positioning metadata.

    Because of this architecture, arbitrary manual editing of OCR text would break the alignment between:

    • The image region,
    • OCR token coordinates,
    • And training annotations.

    As a result, the current labeling UI is designed for:

    • Selecting OCR-detected text,
    • Associating fields,
    • And training extraction relationships,

    rather than functioning as a manual OCR correction editor.

    At this time, there is no supported feature in the Document Intelligence Studio UI to manually replace or override OCR-recognized text prior to training.

    4. What is the recommended Microsoft approach for blurry or embossed credit card images?

    For difficult document types such as embossed credit cards, Microsoft generally recommends improving OCR quality before model training and extraction.

    This is because embossed cards are inherently challenging for OCR systems due to:

    • Reflections,
    • Lighting variability,
    • Lack of contrast,
    • Shadows,
    • Blur,
    • Compression artifacts.

    Recommended approaches include:

    Improve capture quality

    Best results are typically achieved using:

    • High-resolution images (300 DPI or greater),
    • Proper focus,
    • Controlled lighting,
    • Reduced glare/reflection,
    • Minimal compression,
    • Straight alignment.

    Apply image preprocessing

    Preprocessing can significantly improve OCR performance.

    Common techniques include:

    • Deblurring,
    • Sharpening,
    • Contrast enhancement,
    • Grayscale conversion,
    • Noise reduction,
    • Edge enhancement,
    • Deskewing.

    Use text-embedded PDFs when available

    If the source documents are digitally generated PDFs with embedded text, extraction quality improves substantially because OCR dependency is reduced.

    Use specialized OCR preprocessing pipelines

    For embossed credit card scenarios specifically, some customers implement Custom image enhancement, Specialized OCR engines, Or preprocessing pipelines before passing documents into Document Intelligence.

    This is especially helpful when the built-in OCR engine struggles with embossed or reflective characters.

    Use validation and human review workflows

    For highly sensitive fields such as PAN values, Microsoft generally recommends supplementing extraction with:

    • Confidence thresholds,
    • Business rule validation,
    • Human review for low-confidence predictions.

    5. Will prediction quality always remain dependent on OCR accuracy even after fine-tuning?

    To a significant extent, yes.

    The custom extraction model can learn:

    • Document structure,
    • Field positioning,
    • Typography patterns,
    • Contextual relationships,
    • Consistent extraction behavior.

    This can sometimes reduce the impact of minor OCR inconsistencies.

    However, the extraction model still requires OCR tokens as anchors for learning.

    If OCR Completely misses digits, Produces incorrect characters, Or fails to detect a text region,

    then the model cannot reliably reconstruct the missing information.

    In other words:

    Fine-tuning improves extraction intelligence, but it does not replace the OCR engine itself.

    The overall prediction quality will always be strongly influenced by the quality of the underlying OCR results.

    Document Intelligence custom extraction models operate as a layered system:

    1. OCR detects text and structure
    2. The custom model learns extraction behavior from those OCR results

    Therefore OCR quality is foundational, Custom training enhances extraction logic, But training cannot fully recover information that OCR never recognized.

    For high-accuracy credit card extraction scenarios, the recommended architecture is typically:

    1. Image quality optimization
    2. Image preprocessing/enhancement
    3. OCR quality validation
    4. Document Intelligence extraction
    5. Post-processing validation
    6. Human review for low-confidence cases

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    1. SRILAKSHMI C 19,110 Reputation points Microsoft External Staff Moderator

      Hi @Zeno Walker,

      Following up to see if the above answer was helpful. If this answers your query, please do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

      Thank you

    2. SRILAKSHMI C 19,110 Reputation points Microsoft External Staff Moderator

      Hi @Zeno Walker,

      Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

      If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

      Looking forward to your response and appreciate your time on this.

      If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

      Thank you!


    Sign in to comment
Sign in to answer

Your answer