|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.foray.hyphen.util.NaturalLanguage
public final class NaturalLanguage
Manages various aspects of a natural language, specifically what grapheme clusters are valid in that language. NOTE: There may be a better way to do this, but I have not found it yet. Java has the "Locale" class, which gives access to certain resources. However, this seems to be JVM-specific, and does not allow for extension by addition of new locales. Also the ICU4J libraries from IBM (parts of which are included in Java 5, parts in Java 6) provide some similar capabilities, but do not seem to be documented well enough for us to use. It seems like writing this class will be easier than trying to figure out any of the other.
Constructor Summary | |
---|---|
NaturalLanguage()
Private Constructor. |
Method Summary | |
---|---|
void |
addCluster(int[] codepoints)
Add a new Grapheme Cluster to this language. |
void |
addRange(int start,
int end)
Add a range of Unicode codepoints to this language. |
boolean |
isIncluded(int codepoint)
Indicates whether a specific Unicode codepoint is valid as a grapheme in this language. |
boolean |
isIncluded(int[] codepoints,
int start,
int end)
Indicates whether a given sequence of characters is a valid grapheme cluster in this language. |
int |
validateText(CharSequence theChars)
Validates the content of a sequence of chars to determine whether they are valid in this language. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public NaturalLanguage()
Method Detail |
---|
public void addRange(int start, int end)
start
- The first codepoint in the range to be added.end
- The last codepoint in the range to be added.public void addCluster(int[] codepoints)
codepoints
- The sequence of Unicode codepoint that define the
Grapheme Cluster.public boolean isIncluded(int codepoint)
codepoint
- The Unicode codepoint to be tested.
codepoint
is valid in this language.public boolean isIncluded(int[] codepoints, int start, int end)
codepoints
- The sequence of codepoints to be tested. This
sequence must be already normalized to the canonical decomposed
sequence and order.start
- The index to the first character that is being tested.end
- The index to the last character that is being tested.
public int validateText(CharSequence theChars)
theChars
- The String or other CharSequence that contains the text
to be validated. This text does not need to already be normalized.
theChars
that is not valid in this language, or -1
if all clusters are valid.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |