FOray Users
Module Users

FOray Modules: FOrayText



FOrayText encapsulates the numerous text-related tasks that are needed by any publishing system. These tasks can roughly be broken into the following categories:

  • Text conversions
  • Hyphenation
  • Line-breaking
  • Faux Small Caps

Although designed to fill the needs of the XSL-FO processing model, FOrayText is designed to be independent of that model and should be suitable for use with other systems. It has no dependencies on other FOray modules, but does require an aXSL-compliant font system.

Text Conversions

Here is list of the various text-conversion tasks that are needed by an XSL-FO system, listed in the effective order that they must be applied (only the equivalent effect is required):

  1. white-space-treatment (XSL-FO Standard 1.0 Section 7.15.8). This is handled in org.foray.fotree.FOText.
  2. linefeed-treatment (XSL-FO Standard 1.0 Section 7.15.7). This is handled in org.foray.fotree.FOText.
  3. text-transform (XSL-FO Standard 1.0 Section 7.16.6). This is handled in org.foray.fotree.FOText.

    The result after the above three conversions is the text in the refined FO Tree.

  4. white-space-collapse (XSL-FO Standard 1.0 Section 7.15.12).

    After applying white-space-collapse, the resulting text is the text that should be used in area tree construction. This is handled in org.foray.fotree.FOText.

  5. faux small-caps. This is not an XSL-FO requirement, but rather a common use-case, requiring the conversion of lowercase text to uppercase. Sizing issues related to this are handled in the line-breaking logic. Actual conversion is handled in org.foray.area.TextArea.
  6. ligatures and other font-specified conversions. These are not currently supported. When supported, the sizing issues will be handled in the line-breaking logic, and the actual conversion handled in org.foray.area.TextArea.
  7. after soft line-breaking has been performed, white-space-treatment must again be considered for cases where white-space is at the beginning or end of a line. This is handled in org.foray.area.TextArea.

Conversion issues are complicated by the fact that some conversions may need to consider their context, and the context may not be entirely within the current chunk of text. The conversions that are context-sensitive are the “ignore-if” options of white-space-treatment, and the “capitalize” option of text-transform. The XSL-FO Standard is not as clear as one would hope in determining what context should be considered, but two threads on the XSL-FO mailing list may shed some light:


FOrayText is designed to allow multiple line-breaking strategies (LBS) to be employed within the same system. It might be fair to consider the line-breaking portion of FOrayText to be part of the layout system. Like the task of breaking lines into pages, the process of breaking line content into lines can be either eager or patient. An eager LBS can be used by a patient pagination system. However, the opposite is not true. A patient LBS cannot be used by an eager pagination system because the eager pagination system must demand the results from each piece of input that it provides to the LBS before it can continue processing the next piece.

The following features are used by all line-breaking strategies, regardless of whether the client pagination system is eager or patient:

  • Input is passed in a common way, using interfaces that describe textual content and non-textual content. The client system must implement these interfaces.
  • The LBS requests new lines in a common way, using an interface designed for that purpose.
  • Output is handled in a common way. The LineBreakHandler interface is invoked by the LBS, allowing the client to take the output and do something with the it, presumably create LineArea objects with it.
  • The client always initiates the conversation.

However, other elements of the process differ between eager and patient systems. The main difference is that, when an eager pagination system is in control, the LBS must be prepared, when asking for a new line, to receive “null” as the answer, and to pass an appropriate response back to the client application. The client application is then responsible to store whatever state information it needs, including the LBS instance itself (which will also contain some state information), get the new line (on the new page), and continue processing.

Note that the same LineBreaker object must be used for an entire block. When using a patient LineBreaker, this will not be an issue, but when using an eager one, care must be taken. Eager line-breaking must keep track of its state within a given block. (To-do: Address whether a LineBreaker can be reset and reused.)

Faux Small-Caps

For fonts that do not have small-cap glyphs, users sometimes want to imitate small-caps by converting lowercase text to uppercase, but applying a smaller font-size to such converted text. Although decidedly poor typography, this kludge is popular in practice.

In addition to the conversion of lowercase text to uppercase text, faux small caps also presents a line-breaking issue. If any text is actually converted, the corresponding change in font-size means that the traits of the converted text are now different from the traits of the text that was not converted. This loss of integrity means that the input chunk of text must eventually be treated as being broken into smaller pieces. The question then is whether it should be done concurrently with line-breaking or not. If done upstream from line-breaking, line-breaking doesn't even need to know about it. In any other case, line-breaking needs to know whether it should or should not output the smaller pieces or the larger piece intact. In any case, it must size the text using the converted text at the smaller point size. Also, note that all whitespace should be treated as uppercase text, since that is the true size of the text being processed.