If OCR (Text Searchable) Performs Inadequately

This section explains how the OCR (Text Searchable) function works when creating searchable PDF/XPS/OOXML files, and about the adequate file format for the OCR (Text Searchable) function.
Refer to the following instructions if you cannot obtain the proper result of OCR (optical character recognition).

OCR (Text Searchable) Function for Creating Searchable PDF/XPS/OOXML Files

When creating searchable PDF/XPS/OOXML files, the performance of OCR differs, depending on the selected file format. See the table below for details.
IMPORTANT
Even if you perform OCR according to a language used in the originals, the proper result may not be obtained, depending on the text and file format of the originals.
Item
File Format
PDF/XPS/PowerPoint
Word
Recognition Language
Characters are recognized as one of the following languages or language groups according to a language selected in [Switch Language/Keyboard] in [Preferences] (Settings/Registration)*1 *2
Press [Change] to select a language used in the originals from the following languages or language groups. Characters are recognized according to the selected language.
Asian Languages
Text in the following languages is recognized:
Japanese, Chinese (Simplified), Chinese (Traditional), Korean
European Languages
Text in the following languages or language groups is recognized
Languages
English, French, Italian, German, Spanish, Dutch, Portuguese, Albanian, Catalan, Danish, Finnish, Icelandic, Norwegian, Swedish, Croatian, Czech, Hungarian, Polish, Slovak, Estonian, Latvian, Lithuanian, Russian, Greek, Turkish
Language Groups
Western European (ISO), Central European (ISO), Baltic (ISO) *3
Character Recognition for Asian Languages
Recognition Character Type
Japanese: Alphanumeric characters, Kana characters, Kanji characters (JIS first level, JIS second level (partly)), Symbols
Chinese (Simplified): Alphanumeric characters, Chinese characters, Symbols (GB2312-80)
Chinese (Traditional): Alphanumeric characters, Chinese characters, Symbols (Big5)
Korean: Alphanumeric characters, Kanji characters, Korean Hangul characters, Symbols (KSC5601)
Recognition Font
Multi font supported (Ming-cho type is recommended)
Italic type cannot be recognized
Converted Font
-
When Japanese is selected:
Asian text: MS Mincho
European text: Century
When Chinese (Simplified) is selected:
Asian text: SimSun
European text: Calibri
When Chinese (Traditional) is selected:
Asian text: PMingLiU
European text: Calibri
Character Recognition for European Languages
Recognition Character Type
Alphanumeric characters, Special characters of the recognized language*4, Symbols
Recognition Font
Multi font supported (Times, Century, and Arial are recommended)
Italic type can be recognized
Converted Font
-
Displayed in Calibli
Italic type cannot be converted
*1 Displayed language in the language list in [Switch Language/Keyboard] in [Preferences] (Settings/Registration) may differ.
*2 If you select English, French, Italian, German, Spanish, Thai, or Vietnamese in [Switch Language/Keyboard] in [Preferences] (Settings/Registration), the language is recognized as Western European (ISO).
*3 Each language group consists of the following languages. If you select a language group, text written in languages of the language group is recognized.
Western European (ISO):
English, French, Italian, German, Spanish, Dutch, Portuguese, Albanian, Catalan, Danish, Finnish, Icelandic, Norwegian, Swedish
Central European (ISO):
Croatian, Czech, Hungarian, Polish, Slovak
Baltic (ISO):
Estonian, Latvian, Lithuanian
*4 If you select Greek, the following special characters can be recognized. If you select other languages, special characters for each language can be recognized. Some special characters cannot be recognized, depending on the languages.
Α, Β, Γ, Δ, Ε, Ζ, Η, Θ, Ι, Κ, Λ, Μ, Ν, Ξ, Ο, Π, Ρ, Σ, Τ, Υ, Φ, Χ, Ψ, Ω, α, β, γ, δ, ε, ζ, η, θ, ι, κ, λ, μ, ν, ξ, ο, π, ρ, σ, τ, υ, φ, χ, ψ, ω

File Format for Creating Searchable PDF/XPS/OOXML Files

By using a proper file format for OCR, you can improve the accuracy of the OCR result.
If you cannot obtain a proper OCR result, confirm that the file format of the scanned original is appropriate for OCR.
IMPORTANT
If you use originals which contain a large amount of text per page, OCR may not perform properly.
When you select Word format, OCR may not perform properly even if you use originals in the recommended file format.
Depending on the background color, character style, character size, and character slant, some characters may be replaced incorrectly or may be missing in the OCR result.
Paragraphs, breaks, and tables in the original may not be recognized.
A part of an image, such as graphics, photos, or seal imprints, may be recognized and replaced with text.
Item
Details
Format of Original
Printed documents, Text documents (a document which consists of text, figures, images, tables, and no character slant)
Format of Text
Horizontal writing, Vertical writing
Documents which contain both horizontal and vertical writing can be recognized.
Only horizontal writing can be recognized for European languages and Korean.
Document without complex columns
Character Size
8 to 40 point
Format of Table
(only for Word documents)
Tables that meet the following conditions
Square tables with solid lines
The number of rows is 32 or below
The number of columns is 32 or below
52W7-0UW