Home | Site Map | More information on BackNem 2.0

This is a summary of a detailed article; please see the latter for details, links, and examples.

BackNem 2.0 is a Java batch processor for the backtranslation to print of braille transcribed according to the rules of either the Nemeth or EBAE systems. BackNem combines hand-written Java code with lexers and parsers derived automatically from hand-written ANTLR grammars containing actions specified in Java.

The input to BackNem is an electronic braille file with the braille cells specified using North American ASCII Braille. The output of BackNem is a webpage or XHTML file with text represented as XHTML and mathematics represented as MathML.

It is well-known that the context-sensitive nature of the braille rules is the major challenge for backtranslation. BackNem deals with this issue by addressing the different possibilities in a top-down manner. ANTLR's semantic and syntactic predicates are used to facilitate context-sensitive lexical analysis.

The first processing step in BackNem detects spatial arithmetic, page numbers, and embedded Computer Braille Code (CBC). These three constructs are then handled separately from other items.

The second processing step distinguishes the remaining braille items as either text or math and completes the back-translation of text items.

Distinguishing the nature of an item requires up to three phases depending on the item. Once an item is characterized as math, it is collected for later processing. Meanwhile, text items are carried forward as necessary to complete their back-translation.

Phases II and III are extensions of processes used in the BackLit backtranslator for EBAE. Processes specific to distingushing text items from math items are, of course, not needed for EBAE.

Phase I of the second step uses document context, syntax analysis and simplified lexical analysis to attempt to characterize items. Certain items, including matrices, can be characterized as definitely math at this point while others can be characterized as definitely text. Phase I is specific to back-translating Nemeth and is not used for EBAE.

Phase II of the second step applies a detailed context-sensitive lexical analysis to text items and remaining uncharacterized items. This analysis is used to isolate the main portion of the item from any leading and trailing punctuation marks and/or indicators and to support the backtranslation of text items.

Certain symbols detected during the lexical analysis will result in the immediate characterization of an uncharacterized item as math.

Otherwise, the exceptions table for braille words is searched for the back-translation of the main portion of text items and of remaining uncharacterized items. If the search is successful, uncharacterized items are characterized as text. Also, if the search is successful the back-translation found in the table is accepted for the main portion of all items either previously or newly characterized as text.

Phase III of the second step completes the context-sensitive lexical analysis and back-translation of remaining items under the assumption that the item is text.

The back-translation of the main portion of any remaining uncharacterized items is compared with a table of English words and, if necessary, a table of acceptable misspellings. An item where the main portion is not in either of these two tables is determined not to have been text and is characterized as math. (This approach will result in the mischaracterization as math of any misspelled words not in the misspellings table; these items are usually caught later by the error checking in the math backtranslator.)

The third step backtranslates the previously collected sequences of mathematical items.

This step, which makes heavy use of ANTLR-based analysis, handles both inline and displayed math including matrices and related constructs.

The third step starts with a context-sensitive lexical analysis which is used to tokenize Nemeth braille sequences to individual Nemeth mathematical symbols. (A Nemeth mathematical symbol is a braille cell or sequence of braille cells used either to represent an ink print mathematical character or item or to indicate a planar construct.)

Two parsers are used in sequence to complete the backtranslation of mathematical sequences. The first parser refines the stream of tokens produced by the previous lexical analysis and the second parser generates the MathML output.

The actions of the first parser include resolving ambiguities remaining after the lexical analysis, redefining certain tokens, inserting extra tokens where Nemeth uses implicit conventions for spatial layouts, and backtranslating those Nemeth mathematical symbols that represent ink print symbols, identifiers, and numbers to the corresponding MathML token elements.

The second parser assembles the refined stream of tokens generated by the first parser into MathML expressions and backtranslates Nemeth layout indicators to the appropriate MathML layout schemata.

Article history:

- First posted: 5/23/2007.