org.foray.hyphen
Class PatGen

java.lang.Object
  extended by org.foray.hyphen.PatGen

public class PatGen
extends Object

Hyphenation pattern generator based on Liang's patgen.web. Specifically, it is based on version 2.3 (October 24, 1996) of patgen.web, which was obtained from the CTAN archives. The original documentation has been edited where needed. TeX-specific material has been removed. Most author and versioning information has been removed or placed in the credits secion of the source code. Otherwise, the intent is to simply be a java version of the original program.

The Ubuntu man page for the TeX patgen is quite useful in understanding the original API.

What follows is the header documentation from the original program.

Introduction

This program takes a list of hyphenated words and generates a set of patterns that can be used by the TeX 82 hyphenation algorithm.

The patterns consist of strings of letters and digits, where a digit indicates a "hyphenation value" for some intercharacter position. For example, the pattern "3t2ion" specifies that if the string "tion" occurs in a word, we should assign a hyphenation value of 3 to the position immediately before the "t", and a value of 2 to the position between the "t" and the "i".

To hyphenate a word, we find all patterns that match within the word and determine the hyphenation values for each intercharacter position. If more than one pattern applies to a given position, we take the maximum of the values specified (i.e., the higher value takes priority). If the resulting hyphenation value is odd, this position is a feasible breakpoint; if the value is even or if no value has been specified, we are not allowed to break at this position.

See Also:
TUG page re Liang thesis "Word Hy-phen-a-tion by Com-put-er"

Constructor Summary
PatGen(String dictionary, String patterns, String translate, String patout)
          Constructor.
 
Method Summary
static void main(String[] args)
          Command-line interface for the PatGen class.
 void process()
          This is where PATGEN actually starts.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PatGen

public PatGen(String dictionary,
              String patterns,
              String translate,
              String patout)
       throws org.axsl.hyphen.HyphenationException
Constructor.

Parameters:
dictionary - The dictionary input stream to be read.
patterns - The patterns input stream to be read (may be null).
translate - The translate input stream to be read (may be null).
patout - The output stream to which the patterns should be written.
Throws:
org.axsl.hyphen.HyphenationException - If dictionary is null, or if output is null.
See Also:
"patgen.web, line 93"
Method Detail

process

public void process()
             throws IOException,
                    org.axsl.hyphen.HyphenationException
This is where PATGEN actually starts. We initialize the pattern trie, get hyphenation level and pattern length limits from the terminal, and generate patterns.

Throws:
org.axsl.hyphen.HyphenationException - For errors during generation, usually overflows of data structures.
IOException - For errors reading the various input files or writing the output.
See Also:
"patgen.web, line 1867"

main

public static void main(String[] args)
Command-line interface for the PatGen class.

Parameters:
args - The command-line arguments. There are four possible:
  • -dictionary [file] path to the input dictionary (word-list)
  • -patterns [file] path to the existing patterns file (optional)
  • -translate [file] path to the translate file (optional)
  • -patout [file] path to the output patterns file


Copyright © 2017. All rights reserved.