If OCR (Text Searchable) Performs Inadequately
This section explains how the OCR (Text Searchable) function works when creating searchable PDF/OOXML files, and about the adequate file format for the OCR (Text Searchable) function.
Refer to the following instructions if you cannot obtain the proper result of OCR (Text Searchable).
OCR (Text Searchable)
The characters that can be processed with OCR are listed in the following table. If OCR processing does not function properly, check the character type.
IMPORTANT
|
Even if you perform OCR according to a language used in the originals, the proper result may not be obtained, depending on the text and file format of the originals.
|
Item
|
File Format
|
PDF/PowerPoint
|
Word
|
Recognition Language
|
Characters are recognized as one of the following languages or language groups according to a language selected in [Switch Language/Keyboard] in [Preferences] (Settings/Registration)*1 *2
|
Press [Change] to select a language used in the originals from the following languages or language groups. Characters are recognized according to the selected language.
|
Asian Languages
|
Text in the following languages is recognized:
Japanese, Chinese (Simplified), Chinese (Traditional), Korean
|
European Languages
|
Text in the following languages or language groups is recognized
Languages
English, French, Italian, German, Spanish, Dutch, Portuguese, Albanian, Catalan, Danish, Finnish, Icelandic, Norwegian, Swedish, Croatian, Czech, Hungarian, Polish, Slovak, Estonian, Latvian, Lithuanian, Russian, Greek, Turkish
Language Groups
Western European (ISO), Central European (ISO), Baltic (ISO) *3
|
Character Recognition for Asian Languages
|
Recognition Character Type
|
Japanese: Alphanumeric characters, Kana characters, Kanji characters (JIS first level, JIS second level (partly)), Symbols
Chinese (Simplified): Alphanumeric characters, Chinese characters, Symbols (GB2312-80)
Chinese (Traditional): Alphanumeric characters, Chinese characters, Symbols (Big5)
Korean: Alphanumeric characters, Kanji characters, Korean Hangul characters, Symbols (KSC5601)
|
Recognition Font
|
Multi font supported (Ming-cho type is recommended)
Italic type cannot be recognized
|
Converted Font
|
-
|
When Japanese is selected:
Asian text: MS Mincho
European text: Century
When Chinese (Simplified) is selected:
Asian text: SimSun
European text: Calibri
When Chinese (Traditional) is selected:
Asian text: PMingLiU
European text: Calibri
|
Character Recognition for European Languages
|
Recognition Character Type
|
Alphanumeric characters, Special characters of the recognized language*4, Symbols
|
Recognition Font
|
Multi font supported (Times, Century, and Arial are recommended)
Italic type can be recognized
|
Converted Font
|
-
|
Displayed in Calibli
Italic type cannot be converted
|
*1 Displayed language in the language list in [Switch Language/Keyboard] in [Preferences] (Settings/Registration) may differ.
*2 If you select English, French, Italian, German, Spanish, Thai, or Vietnamese in [Switch Language/Keyboard] in [Preferences] (Settings/Registration), the language is recognized as Western European (ISO).
*3 Each language group consists of the following languages. If you select a language group, text written in languages of the language group is recognized.
Western European (ISO):
|
English, French, Italian, German, Spanish, Dutch, Portuguese, Albanian, Catalan, Danish, Finnish, Icelandic, Norwegian, Swedish
|
Central European (ISO):
|
Croatian, Czech, Hungarian, Polish, Slovak
|
Baltic (ISO):
|
Estonian, Latvian, Lithuanian
|
*4 If you select Greek, the following special characters can be recognized. If you select other languages, special characters for each language can be recognized. Some special characters cannot be recognized, depending on the languages.
Α, Β, Γ, Δ, Ε, Ζ, Η, Θ, Ι, Κ, Λ, Μ, Ν, Ξ, Ο, Π, Ρ, Σ, Τ, Υ, Φ, Χ, Ψ, Ω, α, β, γ, δ, ε, ζ, η, θ, ι, κ, λ, μ, ν, ξ, ο, π, ρ, σ, τ, υ, φ, χ, ψ, ω
Original Formats
By using a proper file format for OCR, you can improve the accuracy of the OCR result.
If you cannot obtain a proper OCR result, confirm that the file format of the scanned original is appropriate for OCR.
IMPORTANT
|
If you use originals which contain a large amount of text per page, OCR may not perform properly.
When you select Word format, OCR may not perform properly even if you use originals in the recommended file format.
Depending on the background colour, character style, character size, and character slant, some characters may be replaced incorrectly or may be missing in the OCR result.
Paragraphs, breaks, and tables in the original may not be recognized.
A part of an image may be recognized and replaced with text.
|
Item
|
Details
|
Format of Original
|
Printed documents, Text documents (a document which consists of text, figures, images, tables, and no character slant)
|
Format of Text
|
Horizontal writing, Vertical writing
Documents which contain both horizontal and vertical writing can be recognized.
Only horizontal writing can be recognized for European languages and Korean.
Document without complex columns
|
Character Size
|
8 to 40 point
|
Format of Table
(only for Word documents)
|
Tables that meet the following conditions
Square tables with solid lines
The number of rows is 32 or below
The number of columns is 32 or below
|