net.sf.saxon.regex

Class JDK15RegexTranslator

public class JDK15RegexTranslator extends RegexTranslator

This class translates XML Schema regex syntax into JDK 1.5 regex syntax. This differs from the JDK 1.4 translator because JDK 1.5 handles non-BMP characters (wide characters) in places where JDK 1.4 does not, for example in a range such as [X-Y]. This enables much of the code from the 1.4 translator to be removed. Author: James Clark, Thai Open Source Software Center Ltd. See statement at end of file. Modified by Michael Kay (a) to integrate the code into Saxon, and (b) to support XPath additions to the XML Schema regex syntax. This version also removes most of the complexities of handling non-BMP characters, since JDK 1.5 handles these natively.
Nested Class Summary
static classJDK15RegexTranslator.BackReference
abstract static classJDK15RegexTranslator.CharClass
static classJDK15RegexTranslator.CharRange
static classJDK15RegexTranslator.Complement
static classJDK15RegexTranslator.Empty
static classJDK15RegexTranslator.Property
abstract static classJDK15RegexTranslator.SimpleCharClass
static classJDK15RegexTranslator.SingleChar
static classJDK15RegexTranslator.Subtraction
static classJDK15RegexTranslator.Union
Field Summary
static JDK15RegexTranslator.CharClass[]categoryCharClasses
Translates XML Schema and XPath regexes into java.util.regex regexes.
static JDK15RegexTranslator.CharClass[]specialBlockCharClasses
CharClass for each block name in specialBlockNames.
static JDK15RegexTranslator.CharClass[]subCategoryCharClasses
Method Summary
static voidmain(String[] args)
Main method for testing.
static Stringtranslate(CharSequence regExp, int xmlVersion, boolean xpath, boolean ignoreWhitespace, boolean caseBlind)
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern.
protected booleantranslateAtom()

Field Detail

categoryCharClasses

public static final JDK15RegexTranslator.CharClass[] categoryCharClasses
Translates XML Schema and XPath regexes into java.util.regex regexes.

See Also: java.util.regex.Pattern XML Schema Part 2

specialBlockCharClasses

public static final JDK15RegexTranslator.CharClass[] specialBlockCharClasses
CharClass for each block name in specialBlockNames.

subCategoryCharClasses

public static final JDK15RegexTranslator.CharClass[] subCategoryCharClasses

Method Detail

main

public static void main(String[] args)
Main method for testing. Outputs to System.err the Java translation of a supplied regular expression

Parameters: args command line arguments arg[0] a regular expression arg[1] = xpath to invoke the XPath rules

Throws: RegexSyntaxException

translate

public static String translate(CharSequence regExp, int xmlVersion, boolean xpath, boolean ignoreWhitespace, boolean caseBlind)
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern. The translation assumes that the string to be matched against the regex uses surrogate pairs correctly. If the string comes from XML content, a conforming XML parser will automatically check this; if the string comes from elsewhere, it may be necessary to check surrogate usage before matching.

Parameters: xmlVersion set to XML10 for XML 1.0 or XML11 for XML 1.1 regExp a String containing a regular expression in the syntax of XML Schemas Part 2 xpath a boolean indicating whether the XPath 2.0 F+O extensions to the schema regex syntax are permitted ignoreWhitespace true if whitespace is to be ignored ('x' flag) caseBlind true if case is to be ignored ('i' flag)

Returns: a JDK 1.5 regular expression

Throws: RegexSyntaxException if regexp is not a regular expression in the syntax of XML Schemas Part 2, or XPath 2.0, as appropriate

See Also: java.util.regex.Pattern XML Schema Part 2

translateAtom

protected boolean translateAtom()