Class Characters.Filter

  • Enclosing class:
    Characters

    public static class Characters.Filter
    extends Character.Subset
    Subsets of Unicode characters identified by their general category. The categories are identified by constants defined in the Character class, like LOWERCASE_LETTER, UPPERCASE_LETTER, DECIMAL_DIGIT_NUMBER and SPACE_SEPARATOR.

    An instance of this class can be obtained from an enumeration of character types using the for­Types(byte[]) method, or using one of the constants predefined in this class. Then, Unicode characters can be tested for inclusion in the subset by calling the contains(int) method.

    Relationship with international standards
    ISO 19162:2015 §B.5.2 recommends to ignore spaces, case and the following characters when comparing two identified object names: “_” (underscore), “-” (minus sign), “/” (solidus), “(” (left parenthesis) and “)” (right parenthesis). The same specification also limits the set of valid characters in a name to the following (§6.3.1):
    A-Z a-z 0-9 _ [ ] ( ) { } < = > . , : ; + - (space) % & ' " * ^ / \ ? | °
    Note: SIS does not enforce this restriction in its programmatic API, but may perform some character substitutions at Well Known Text (WKT) formatting time.
    If we take only the characters in the above list which are valid in a Unicode identifier and remove the characters that ISO 19162 recommends to ignore, the only characters left are letters and digits.
    Since:
    0.3
    See Also:
    Character​.Subset, Character​.get­Type(int), WKT 2 specification §B.5

    Defined in the sis-utility module