If OCR (Text Searchable) Performs Inadequately

This section explains how the OCR (Text Searchable) function works when creating searchable PDF/XPS/OOXML files, and about the adequate file format for the OCR (Text Searchable) function.
Refer to the following instructions if you cannot obtain the proper result of OCR (optical character recognition).

OCR (Text Searchable) Function for Creating Searchable PDF/XPS/OOXML Files

When creating searchable PDF/XPS/OOXML files, the performance of OCR differs, depending on the selected file format. See the table below for details.
IMPORTANT
Even if you perform OCR according to a language used in the originals, the proper result may not be obtained, depending on the text and file format of the originals.
Recognition Language
Characters are recognized as one of the following languages or language groups according to a language selected in [Language/Keyboard Switch] in [Preferences] (Settings/Registration)*1 *2
Asian Languages
Text in the following languages is recognized:
Japanese, Chinese (Simplified), Chinese (Traditional), Korean
European Languages
Text in the following languages or language groups is recognized
Languages
English, French, Italian, German, Spanish, Dutch, Portuguese, Albanian, Catalan, Danish, Finnish, Icelandic, Norwegian, Swedish, Croatian, Czech, Hungarian, Polish, Slovak, Estonian, Latvian, Lithuanian, Russian, Greek, Turkish
Language Groups
Western European (ISO), Central European (ISO), Baltic (ISO) *3
Character Recognition for Asian Languages
Recognition Character Type
Japanese: Alphanumeric characters, Kana characters, Kanji characters (JIS first level, JIS second level (partly)), Symbols
Chinese (Simplified): Alphanumeric characters, Chinese characters, Symbols (GB2312-80)
Chinese (Traditional): Alphanumeric characters, Chinese characters, Symbols (Big5)
Korean: Alphanumeric characters, Kanji characters, Korean Hangul characters, Symbols (KSC5601)
Recognition Font
Multi font supported (Ming-cho type is recommended)
Italic type cannot be recognized
Character Recognition for European Languages
Recognition Character Type
Alphanumeric characters, Special characters of the recognized language*4, Symbols
Recognition Font
Multi font supported (Times, Century, and Arial are recommended)
Italic type can be recognized
*1 Displayed language in the language list in [Language/Keyboard Switch] in [Preferences] (Settings/Registration) may differ.
*2 If you select English, French, Italian, German, Spanish, Thai, or Vietnamese in [Language/Keyboard Switch] in [Preferences] (Settings/Registration), the language is recognized as Western European (ISO).
*3 Each language group consists of the following languages. If you select a language group, text written in languages of the language group is recognized.
Western European (ISO):
English, French, Italian, German, Spanish, Dutch, Portuguese, Albanian, Catalan, Danish, Finnish, Icelandic, Norwegian, Swedish
Central European (ISO):
Croatian, Czech, Hungarian, Polish, Slovak
Baltic (ISO):
Estonian, Latvian, Lithuanian
*4 If you select Greek, the following special characters can be recognized. If you select other languages, special characters for each language can be recognized. Some special characters cannot be recognized, depending on the languages.
Α, Β, Γ, Δ, Ε, Ζ, Η, Θ, Ι, Κ, Λ, Μ, Ν, Ξ, Ο, Π, Ρ, Σ, Τ, Υ, Φ, Χ, Ψ, Ω, α, β, γ, δ, ε, ζ, η, θ, ι, κ, λ, μ, ν, ξ, ο, π, ρ, σ, τ, υ, φ, χ, ψ, ω

File Format for Creating Searchable PDF/XPS/OOXML Files

By using a proper file format for OCR, you can improve the accuracy of the OCR result.
If you cannot obtain a proper OCR result, confirm that the file format of the scanned original is appropriate for OCR.
IMPORTANT
If you use originals which contain a large amount of text per page, OCR may not perform properly.
Item
Details
Format of Original
Printed documents, Text documents (a document which consists of text, figures, images, tables, and no character slant)
Format of Text
Horizontal writing, Vertical writing
Documents which contain both horizontal and vertical writing can be recognized.
Only horizontal writing can be recognized for European languages and Korean.
Document without complex columns
Character Size
8 to 40 point
0CYL-0JC