Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines
Lourdes Moreno * Paloma Martínez, Isabel Segura-Bedmar and Ricardo Revert Grupo LaBDA Departamento de Informática Universidad Carlos II de Madrid (*) [email protected]
Vilanova I la Geltrú (Universitat Politècnica Catalunya ), septiembre 2015
Reference ACM Digital Library: http://dl.acm.org/citation.cfm?id=2829927&CFID=573822944&CFTOKEN=54544041
Contents
• Motivation and introduction
• EASY-TO-READ (E2R) Guidelines
• WCAG 2.0: readability and understandability
• Natural language processing (NLP) approaches for text simplification
• Proof of Concept: Lexical Simplification of Drug Package Leaflets
• Conclusions
LaBDA, Universidad Carlos III de Madrid
MOTIVATION
• Part of citizenship faces accessibility barriers when texts containing:
long sentences
unusual words
complex linguistic structures
…
• Environment: web content
• Readability and understanding should be considered when texts are created
LaBDA, Universidad Carlos III de Madrid
INTRODUCTION Target groups
• People with cognitive or learning disabilities
• Also:
Pre lingually deaf persons
Older people (Individual cognitive abilities such as attention span and memory)
Non-alphabetized people
Immigrants (different native language)
People with aphasia, dyslexia, autism
LaBDA, Universidad Carlos III de Madrid
INTRODUCTION Initiatives
• Easy-to-Read (E2R)
Inclusion Europe 2009
Guidelines of IFLA 2010
• Web Content Accessibility Guidelines (WCAG) 2.0
Regulatory framework
Hard Success criteria
Conformance level AA
LaBDA, Universidad Carlos III de Madrid
EASY-TO-READ (E2R) Guidelines
• In general terms these guidelines are:
Use simplest and most common words
Avoid long words
Avoided use of abbreviations
The same term used to refer to the same concept
Use short sentences
Avoid complex sentences with dependent clauses
Use active language and avoid passive voice
LaBDA, Universidad Carlos III de Madrid
EASY-TO-READ (E2R) Guidelines What can be done?
• To make online texts more accessible and readable
• In complex words or phrases are replaced with more commonly used words
• These adaptations are carried out with the use of text simplification techniques:
www.noticiasfacil.es www.e-include.info/ simple.wikipedia.org/
www.simplext.es/
• Manual process? In some cases it is unfeasible
• Support Technology
LaBDA, Universidad Carlos III de Madrid
EASY-TO-READ (E2R) Guidelines
• These E2R guidelines are aimed only to text content.
• In addition: page structure, presentation, …
=> For this reason, accessibility requirements of WCAG 2.0 must be taken into account
LaBDA, Universidad Carlos III de Madrid
WCAG 2.0: READABILITY AND UNDERSTANDABILITY
understandable vs readability
“a text could be highly readable, since the syntax is extremely simple, but extremely hard to understand because of the lexicon used”
Readability gives an evaluation about the structure of sentences (it concerns syntax and consequently requires syntactic simplification approaches)
understandability captures the lexical aspects and lexical simplification approaches are required
LaBDA, Universidad Carlos III de Madrid
WCAG success criteria concerning text
• 3.1 (Readable: Make text content readable and understandable)
Readability - 3.1.5 (Reading Level)
Understandable - 3.1.3 (Unusual Words) and 3.1.4 ( Abbreviations)
Code (Level Conformance)
Description
1.1.1 Non-text
Content (Level A).
Every non-text content that is presented to the
user has a alternative text that serves the equivalent purpose
2.4.2 Page Titled
(Level A).
Web pages have titles that describe topic or
purpose.
2.4.4 Link Purpose (In
Context):
(text type)
The purpose of each link can be determined
from the link text alone or from the link text together with its programmatically determined link context
2.4.6 Headings and Labels (Level AA).
Headings and labels describe topic or purpose.
2.4.9 Link Purpose (Link Only) (Level AAA).
(text type)
A mechanism is available to allow the purpose of each link to be identified from link text alone, except where the purpose of the link would be ambiguous to users in general.
2.4.10 Section Headings (Level AAA).
Section headings are used to organize the content.
3.1.1 Language of Page (Level A).
The default human language of each Web page can be programmatically determined.
3.1.2 Language of Parts (Level AA).
The human language of each passage or phrase in the content can be programmatically determined.
3.1.3 Unusual Words (Level AAA).
A mechanism is available for identifying specific definitions of words or phrases used in an unusual.
3.1.4 Abbreviations (Level AAA).
A mechanism for identifying the expanded form or meaning of abbreviations is available.
3.1.5 Reading Level (Level AAA).
When text requires reading ability more advanced than the lower secondary education level after removal of proper names and titles,
supplemental content, or a version that does not require reading ability more advanced than the lower secondary education level, is available.
LaBDA, Universidad Carlos III de Madrid
WCAG 2.0: READABILITY AND UNDERSTANDABILITY Additional accessibility requirements
• WCAG 2.0 document does not specify guidelines to these matters as concerning visual or auditory accessibility
• A set of additional WCAG 2.0 success criteria has been obtained regarding the presentation, navigation, structure, cognitive aspects in user task,…
• Some of these additional success criteria are:
1.4.8 (Visual Presentation)
2.2.3 (No Timing)
2.4.5 (Multiple Ways)
3.2.3 (Consistent Navigation)
3.2.4 (Consistent Identification)
2.2.3 (No Timing)
3.3.1 (Error Identification)
3.3.2 (Labels or Instructions)
3.3.5 (Help)
LaBDA, Universidad Carlos III de Madrid
WCAG 2.0: READABILITY AND UNDERSTANDABILITY Discussion and conclusions
• No correspondence between concepts in E2R guidelines and success criteria of WCAG 2.0
=> The professional closely to the field of the accessibility conformity WCAG does not know how to accomplish requirements E2R
• Aside from WCAG 2.0 regarding the text, further accessibility features should be considered
• WCAG 2.0 support is not enough
• Technology supporting the authorship of texts is required
LaBDA, Universidad Carlos III de Madrid
WCAG 2.0: READABILITY AND UNDERSTANDABILITY Discussion and conclusions
• Proposal:
PLN approaches with a use of E2R and WCAG 2.0 resources provide the semi-automatic support
Different NLP strategies to simplify texts depending on whether you want to analyse understandable or readability
LaBDA, Universidad Carlos III de Madrid
Natural language processing (NLP)
• The discipline devoted to develop technology to understand natural language
• Applications:
Machine translation
Information retrieval
Information extraction from unstructured data
Summarization
Question answering
….
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION Support to accessibility
• NLP processes are applied with the objective of transforming a text in an equivalent one, but more accessible to people with any kind of cognitive disability
• Three NLP processes that could be applied to text simplification tasks are described:
Language detection
Abbreviations detection
Topic detection
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION Language detection
• Language detection consists on identifying the language of a text
• It is helpful for example: when screen readers are used
• Approaches:
To find out it is to check if language-specific characters, (e.g. Dutch if string “ik” appears, German is “ich” or “β” is used, Polish if “czy” or “ń”, “Ł”, “ź” are included in words)
To use n-grams frequency distributions. All languages have words that occur more frequently than others (Zipf´s Law)
• if two texts of a same language are compared then they should have similar n-grams frequency distributions)
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION Abbreviations
• Approaches to recognized abbreviations and corresponding expansions:
Pattern-matching methods based on rules and heuristics to detect upper alphanumeric strings
• To identify Long form (short form) or Short form (long form)
A sequence of words co-occurs frequently with an abbreviation and the sequence does not occur with other near words => it is an “abbreviation-definition” relationship.
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION Text summarization or topic detection
• Goal : to obtain a set of sentences that reflects the content
• This technique offers accessibility support to editors of web contents to create:
Titles of paragraphs Sections that faithfully represent the content
• Approach:
Automatic text extraction: considering relevant sentences of a text has a big amount of important words
The importance of a word is calculated with a measure that relies on how frequent is a word in a document and in how many documents from a collection the word appears.
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION Text Simplification
• It is essential in several types of texts: News, Government and administrative information, laws and rights, etc.
• There are three subtasks of text simplification
1 Syntactic simplification that divides complex sentences in simplest sentences
2 Lexical simplification whose objective is to replace complex vocabulary by common vocabulary
3 Clarification that provides definitions and explanations.
These tasks are not completely automatic, they have to be manually reviewed in some cases.
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION Text Simplification
Lexical simplification:
• Replacing words (taking into account the context) and complex utterances by easier words or phrases.
• Heuristic: complex words have a low frequency
• Proposals based on frequency give better results compared to other sophisticated systems [Semeval 2012]
• Resource: lexical resources as Wordnet are used to extract synonyms as candidates to replace a complex or difficult word.
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION Text Simplification
Lexical simplification
• Complexity measures: frequency of words in texts as well as the length of phrases
FOX index
Flesch-Kinaid
These indexes have to be validated by final users
LaBDA, Universidad Carlos III de Madrid
NLP APPROACHES FOR TEXT SIMPLIFICATION
WCAG 2.0 PLN Approach
2.4.2 (Page Titled) 2.4.6 (Headings and Labels) 2.4.10 (Section Headings)
Text summarization
3.1.4 (Abbreviations )
Abbreviations
3.1.3 (Unusual Words) Dictionaries with definition
3.1.5 (Reading Level) Syntactic simplification
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
• The principal text source of information for patients
• This document provides information about a its appearance, actions, side effects and drug interactions, contraindications, special warnings
• It is difficult to understand by patients:
Vocabulary is specific, technical. Long paragraphs, especially those containing lists of
side effects. Using a small font size (9 points)
• Problems: Patient misunderstanding could be a potential source of medication errors and adverse drug reactions.
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
• Goal of the system:
Provide information in an easy and clear way to read.
• Medical terms (in particular, drug effects) are translated into lay terms, which patients can understand.
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
FIRST Module:
Named Entity Recognition (NER)
• Detects the mentions of drug effects
• Use MedDRA (medical multilingual terminology dictionary about events associated with drugs )
• MeaningCloud integrates MedDRA, into GATE
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
SECOND module:
Lexical Simplifier
• To Identify the effects whose names are considered complex with the objective of replacing them by a simpler synonym
• Two different strategies: preferred term substitution and most frequent term substitution.
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifier
• Preferred Term Substitution
MedDRA allows to defining sets of synonyms and providing a preferred term for each set
• Cefalalgia (cephalalgia) would be substituted for cefalea (headache)
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifier
• Most Frequent Term Substitution
Corpus of MedlinePlus website documents (1,536 documents)
• 939 belonging to drug package leaflets • 597 to general health related articles about diseases, effects and
diagnoses. Elasticsearch to index the MedLinePlus documents Hypothesis: complex terms should be less frequent than simpler terms
in the corpus 1) The frequency of each effect in the corpus is calculated 2) an effect will be substituted for its synonym with the highest
frequency (if it is not itself) in the corpus.
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifier
Synonyms from MedDRA appear in MedLinePlus corpus
catarro (nasopharyngitis), 12
resfriado (cold), 48
resfriado común (common cold)
7
síntomas de resfriado (cold symptoms)
6
The complex term replaced by resfriado (cold)
LaBDA, Universidad Carlos III de Madrid
PROOF OF CONCEPT Lexical Simplification of Drug Package Leaflets
SECOND module. Lexical Simplifier ori
gin
al Muy frecuentes: diarrea e indigestión.
Frecuentes: náuseas, vómitos, dolor abdominal. Poco frecuentes: hemorragia. Raros: perforación gástrica, flatulencia, estreñimiento
PT
Muy frecuentes: diarrea e dispepsia. Frecuentes: náuseas, vómitos, dolor abdominal. Poco frecuentes: hemorragia. Raros: perforación gástrica, flatulencia, estreñimiento
freq
Muy frecuentes: diarrea e pirosis. Frecuentes: náuseas, vómitos, dolor abdominal.
Poco frecuentes: sangrado. Raros: perforación gástrica, gases, estreñimiento
LaBDA, Universidad Carlos III de Madrid
CONCLUSIONS
• For some people, it is difficult to infer the meaning of an unusual word or phrase from context
• Long sentences and complex linguistic structures can cause barriers in access to the text content as indicated in WCAG and E2R guidelines
However, these guidelines do not provide precise methods and support (semi) automatic with which to address these accessibility issues concerning to text readable and understandable
• PLN approaches with a use of E2R and WCAG 2.0 resources provide the semi-automatic support
Proof of concept: Prototype to simplify drug package leaflet that implements a component for lexical simplification
LaBDA, Universidad Carlos III de Madrid
CONCLUSIONS Work in progress
• New approaches to offer support: abbreviations, summaries, definitions of unusual words, etc.
• Evaluations by users (In addition, by experts)
• Taking into account other important issues as:
Presentation elements
Page structure
Navigation structures
LaBDA, Universidad Carlos III de Madrid
REFERENCE Lourdes Moreno, Paloma Martínez, Isabel Segura-Bedmar, and Ricardo Revert. 2015. Exploring language technologies to provide support to WCAG 2.0 and E2R guidelines. In Proceedings of the XVI International Conference on Human Computer Interaction (Interacción '15). ACM, New York, NY, USA, , Article 57 , 8 pages. DOI=http://dx.doi.org/10.1145/2829875.2829927