org.foray.hyphen
Class PatternTree

java.lang.Object
  extended by org.foray.hyphen.PatternTree
All Implemented Interfaces:
Serializable, PatternConsumer

public class PatternTree
extends Object
implements PatternConsumer, Serializable

An implementation of the Knuth/Liang hyphenation scheme that is part of TeX. This scheme has three data components: classes, exceptions, and patterns. More information about these components can be found in Appendix H of "The "TeXbook", which is also volume A of "Computers & Typesetting".

The strings patterns are usually small (from 2 to 5 characters), but each char in the tree is stored in a node. Therefore keeping a low memory footprint is important. Also, hyphenation is a frequently-used task, making speed an important design consideration as well.

This implementation uses a TernaryTreeMap to map the pattern strings to an index into the inter-letter pattern values. The TernaryTreeMap ternary tree provides a nice combination of low memory footprint and speed that is suitable for storing hyphenation information. The current ternary tree implementation is limited to 65,536 nodes, but this has not been a practical limitation yet. A natural language typically requires from 5000 to 15000 hyphenation patterns. The original author of this class wrote: "In my tests the english patterns took 7694 nodes and the german patterns 10055 nodes, so we are well within the 65,000 node limitation."

See Also:
Serialized Form

Nested Class Summary
static class PatternTree.Source
          Enumeration of the possible sources of this tree.
 
Constructor Summary
PatternTree()
          Constructor.
 
Method Summary
 void addClass(String chargroup)
          Add a Liang-style character class.
 void addException(String hyphenatedWord, int qtyMorphExceptions)
          Add a Liang-style hyphenation exception.
 void addMorphException(String exceptionWord, String pre, String post, String no)
          Add a morphing hyphenation break to an exception word.
 void addPattern(String rawPattern)
          Add a Liang-style hyphenation pattern.
static String getInterletterValues(String pattern)
          Extract the inter-letter values from a given pattern, returning them as a String.
static String getPatternChars(String pattern)
          Extract the character sequence from a Liang-style hyphenation pattern.
 PatternTree.Source getSource()
          Returns the source of this tree.
 void setHyphenChar(char hyphenChar)
          Sets the character that should be interpreted as the hyphenation character in exceptions.
 void setMinAfter(byte minAfter)
          Sets the minimum number of characters that should be left on a line before a hyphenation break.
 void setMinBefore(byte minBefore)
          Sets the minimum number of characters that should be at the beginning of a line after a hyphenation break.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PatternTree

public PatternTree()
Constructor.

Method Detail

addClass

public void addClass(String chargroup)
Description copied from interface: PatternConsumer
Add a Liang-style character class. Character classes define the valid word characters for hyphenation. If a word contains a character not defined in any of the classes, it is not hyphenated. Character classes also define a way to normalize the characters in order to compare them with the stored patterns. Hyphenation is generally a case-insensitive operation, that is, the Strings "CE", "Ce", "cE", and "ce" should all be treated as equivalent for purposes of making hyphenation decisions. Therefore, pattern files usually use only lower case characters. To mark two characters as equivalent, they are included in the same character class, with the normalized character recorded first in that class. So, for the example above, a class "cC" and another "eE" would ensure that those characters were treated in a case-insensitive way.

Specified by:
addClass in interface PatternConsumer
Parameters:
chargroup - The character class to add.

addException

public void addException(String hyphenatedWord,
                         int qtyMorphExceptions)
Description copied from interface: PatternConsumer
Add a Liang-style hyphenation exception. An exception supercedes the result obtained by the algorithm. It is useful in cases where the algorith gives a bad result, in in which the user otherwise wants to provide his own hyphenation.

Specified by:
addException in interface PatternConsumer
Parameters:
hyphenatedWord - The raw word for which the exception is being created. For example, the English pattern dictionary distributed with TeX includes the exception "oblig-a-tory", which is the text expected here.
qtyMorphExceptions - The number of morph exceptions that will be added to this exception word.
See Also:
PatternConsumer.addMorphException(String, String, String, String)

addMorphException

public void addMorphException(String exceptionWord,
                              String pre,
                              String post,
                              String no)
                       throws org.axsl.hyphen.HyphenationException
Description copied from interface: PatternConsumer
Add a morphing hyphenation break to an exception word. A morphing break opportunity is one that changes the spelling of the word if it is selected. A morph exception can only be added to a word that already is recorded as an exception, and for which the "qtyMorphing" is greater than 0.

Specified by:
addMorphException in interface PatternConsumer
Parameters:
exceptionWord - The raw word for which the exception is being created. This must be the same word that was used in PatternConsumer.addException(String, int).
pre - The "pre" portion of the special exception.
post - The "post" portion of the special exception.
no - The "no" portion of the special exception.
Throws:
org.axsl.hyphen.HyphenationException - If exceptionWord is not found in the exception words.

addPattern

public void addPattern(String rawPattern)
Description copied from interface: PatternConsumer
Add a Liang-style hyphenation pattern.

Specified by:
addPattern in interface PatternConsumer
Parameters:
rawPattern - The raw Liang-style pattern to be added, for example ".ab4i".

getPatternChars

public static String getPatternChars(String pattern)
Extract the character sequence from a Liang-style hyphenation pattern.

Parameters:
pattern - The raw pattern to be parsed.
Returns:
The pattern character, that is, the pattern with the inter-letter values removed. For example, if pattern is the Liang pattern ".ab4i", the return value should be ".abi".

getInterletterValues

public static String getInterletterValues(String pattern)
Extract the inter-letter values from a given pattern, returning them as a String. Values for the slot before the first character and after the last character should be included.

Parameters:
pattern - The pattern whose inter-letter values should be extracted.
Returns:
The inter-letter values indicating the desirability and priority of hyphenating at a given point within the pattern. It should contain only digit characters (i.e. '0' to '9'). The size of the return String should be one greater than the size of getPatternChars(String) for the same pattern. This is because values are included for the slot immediately before the first character in pattern, and for the slot immediately after the last character in pattern. For example, for a pattern of ".ab4i", which could be translated into a fully-expanded pattern of "0.0a0b4i0", the return value should be "00040".

setHyphenChar

public void setHyphenChar(char hyphenChar)
Description copied from interface: PatternConsumer
Sets the character that should be interpreted as the hyphenation character in exceptions.

Specified by:
setHyphenChar in interface PatternConsumer
Parameters:
hyphenChar - The hyphenation character to set.
See Also:
PatternConsumer.addException(String, int)

setMinAfter

public void setMinAfter(byte minAfter)
Description copied from interface: PatternConsumer
Sets the minimum number of characters that should be left on a line before a hyphenation break. This is different than the similar concept that might be specified by a client application at document-processing time, which is discretionary and for aesthetic purposes. Instead this value is needed at parse-time to ensure that inaccurate hyphenation breaks are not generated.

Specified by:
setMinAfter in interface PatternConsumer
Parameters:
minAfter - The minimum number of characters that should be left on a line before a hyphenation break.

setMinBefore

public void setMinBefore(byte minBefore)
Description copied from interface: PatternConsumer
Sets the minimum number of characters that should be at the beginning of a line after a hyphenation break. This is different than the similar concept that might be specified by a client application at document-processing time, which is discretionary and for aesthetic purposes. Instead this value is needed to ensure that inaccurate hyphenation breaks are not generated.

Specified by:
setMinBefore in interface PatternConsumer
Parameters:
minBefore - The minimum number of characters that should be at the beginning of a line after a hyphenation break.

getSource

public PatternTree.Source getSource()
Returns the source of this tree.

Returns:
The source of this tree.


Copyright © 2017. All rights reserved.