This article explains how to create scans that can be successfully read by screen readers and other assistive technology tools.
What makes a scan accessible?
Scans are image of a pages. The textual elements within that image aren't really text -- they are simply patterns of light and dark pixels. The first step in making them accessible is using an AI process called optical character recognition (OCR) to convert these pixels patterns into editable and searchable text. The output of an OCR conversion is only as good as the input. If the OCR process can’t correctly identify and interpret characters, the text it generates will be nonsense.
The final step in making a scan accessible is adding "tags" or metadata to this text that indicate language, headings, reading order, and other structural elements that assistive technologies such as text-to-speech, Braille conversion, or highlighting and annotation tools need to correctly interpret and represent the text. Adobe Acrobat has an AI-based "auto-tagging" process that can do this for you, but only if the scanned document is clean and structured in a predictable way.
Making scans that OCR and autotag correctly
- Start with a clean original: Highlighting, underlining, and page damage can prevent the OCR and autotagging processes from correctly recognizing or interpreting text.
- Avoid marginalia and underlying: Margin notes and underlining can also confuse OCR and autotagging software, producing extraneous characters and interfering with word, paragraph and page structure predictions. Erase them or find a clean copy of a text to scan.
- Keep the pages straight: Scan with all pages oriented in the same direction and as close to horizontal or vertical as possible. Most OCR tools can correct for slight skewing, but highly tilted text will not be interpreted correctly.
- Don’t block the text: Avoid cutting off text or blocking it with your hands, bookmarks, etc. OCR tools can't infer what is missing, and may misinterpret what they can see due to the lack of context.
- For best results, scan one page at a time:
- Scanning documents “two-up” — that is, with two facing pages in a book or journal scanned at the same time — often creates shadows and distortions that prevent parts of the text from being recognized or correctly interpreted.
- Auto-tagging tools can usually recognize two facing pages as separate blocks of text to be read sequentially if the page layouts are simple, but pages with multiple columns of text, sidebars, or notes will need to be scanned one at a time.
Use Adobe Acrobat to OCR and add tags
If you scanned the document with the Canon multifunction printer/scanners they OCR text automatically, but you must still open it in Adobe Acrobat Pro (available on all college-owned computers) to auto-tag and then check reading order.
- Open Acrobat Pro, then open your file.
- Click the Tools menu and choose Action Wizard.
- Click Make Accessible in the Actions list.
- Click Start.
- Acrobat will run through a series of steps, pausing when your input or approval is needed.
- As a final step, it will run an Accessibility Checker.
- Right-click on any issues it finds and choose Fix to fix them.
- Check contrast and reading order manually.
For more detailed guidance see Adobe Acrobat: Make PDFs Accessible.