org.foray.hyphen.util
Class ValidateChars

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by org.xml.sax.ext.DefaultHandler2
          extended by org.foray.hyphen.util.ValidateChars
All Implemented Interfaces:
ContentHandler, DTDHandler, EntityResolver, ErrorHandler, DeclHandler, EntityResolver2, LexicalHandler

public class ValidateChars
extends DefaultHandler2

Command-line application that reads an XML file and checks its content against a predefined set of characters that are legitimate in a given language, reporting on anomalies. The purpose here is to find words that are misspelled or that are not encoded properly, so that they can be fixed in preparation for creating a word list.


Field Summary
static byte STATUS_FILE_NOT_FOUND
          Command-line return status constant indicating that a file was not found.
static byte STATUS_PARSING_ERROR
          Command-line return status constant indicating that there was a parsing error.
static byte STATUS_WRONG_QTY_ARGUMENTS
          Command-line return status constant indicating that the number of arguments is wrong.
 
Constructor Summary
ValidateChars(HyphenationServer4a server, InputSource input, String catalog, String languageCode)
          Constructor.
 
Method Summary
 void characters(char[] buffer, int offset, int length)
           
 XMLReader createParser()
          Creates a SAX parser.
 void endDocument()
           
 void endElement(String uri, String local, String qName)
           
 org.apache.commons.logging.Log getLogger()
          Returns the logger.
static void main(String[] args)
          Command-line interface for validating the characters in an XML document.
 void setDocumentLocator(Locator locator)
           
 void start()
          Intantiates parser and starts parsing of input.
 void startDocument()
           
 void startElement(String uri, String local, String qName, Attributes attributes)
           
 
Methods inherited from class org.xml.sax.ext.DefaultHandler2
attributeDecl, comment, elementDecl, endCDATA, endDTD, endEntity, externalEntityDecl, getExternalSubset, internalEntityDecl, resolveEntity, resolveEntity, startCDATA, startDTD, startEntity
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

STATUS_WRONG_QTY_ARGUMENTS

public static final byte STATUS_WRONG_QTY_ARGUMENTS
Command-line return status constant indicating that the number of arguments is wrong.

See Also:
Constant Field Values

STATUS_FILE_NOT_FOUND

public static final byte STATUS_FILE_NOT_FOUND
Command-line return status constant indicating that a file was not found.

See Also:
Constant Field Values

STATUS_PARSING_ERROR

public static final byte STATUS_PARSING_ERROR
Command-line return status constant indicating that there was a parsing error.

See Also:
Constant Field Values
Constructor Detail

ValidateChars

public ValidateChars(HyphenationServer4a server,
                     InputSource input,
                     String catalog,
                     String languageCode)
Constructor.

Parameters:
server - The server used to find natural language resources.
input - The input source encapsulating the document to be pretty-printed.
catalog - The location of a catalog file that should be used to find DTDs.
languageCode - The valid ISO-639 language against which this document will be tested. find the DTD for this document.
Method Detail

start

public void start()
           throws IOException,
                  SAXException,
                  ParserConfigurationException
Intantiates parser and starts parsing of input.

Throws:
IOException - For I/O Errors.
SAXException - For parsing errors.
ParserConfigurationException - For errors configuring parser.

createParser

public XMLReader createParser()
                       throws SAXException,
                              ParserConfigurationException
Creates a SAX parser.

Returns:
The created SAX parser.
Throws:
SAXException - For error creating parser.
ParserConfigurationException - For error configuring parser.

setDocumentLocator

public void setDocumentLocator(Locator locator)
Specified by:
setDocumentLocator in interface ContentHandler
Overrides:
setDocumentLocator in class org.xml.sax.helpers.DefaultHandler

startDocument

public void startDocument()
Specified by:
startDocument in interface ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler

endDocument

public void endDocument()
Specified by:
endDocument in interface ContentHandler
Overrides:
endDocument in class org.xml.sax.helpers.DefaultHandler

startElement

public void startElement(String uri,
                         String local,
                         String qName,
                         Attributes attributes)
Specified by:
startElement in interface ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler

endElement

public void endElement(String uri,
                       String local,
                       String qName)
Specified by:
endElement in interface ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler

characters

public void characters(char[] buffer,
                       int offset,
                       int length)
Specified by:
characters in interface ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler

getLogger

public org.apache.commons.logging.Log getLogger()
Returns the logger.

Returns:
The logger.

main

public static void main(String[] args)
Command-line interface for validating the characters in an XML document.

Parameters:
args - command-line arguments. Argument 1 is the location of the input file. Argument 2 is the ISO-639 language code for the language to be used to validate this file. Argument 3 is the URL to the directory containing the natural language input files. Argument 4 is an optional location of an OASIS-compliant catalog file. that can be used to locate local DTDs.


Copyright © 2017. All rights reserved.