Class Characters.Filter
Object
Character.Subset
Filter
- Enclosing class:
- Characters
Subsets of Unicode characters identified by their general category.
 The categories are identified by constants defined in the 
Character class, like
 LOWERCASE_LETTER,
 UPPERCASE_LETTER,
 DECIMAL_DIGIT_NUMBER and
 SPACE_SEPARATOR.
 An instance of this class can be obtained from an enumeration of character types
 using the forTypes(byte[]) method, or using one of the constants predefined
 in this class. Then, Unicode characters can be tested for inclusion in the subset by
 calling the contains(int) method.
Relationship with international standards
ISO 19162:2015 §B.5.2 recommends to ignore spaces, case and the following characters when comparing two identified object names: “_” (underscore), “-” (minus sign), “/” (solidus),
 “(” (left parenthesis) and “)” (right parenthesis).
 The same specification also limits the set of valid characters in a name to the following (§6.3.1):
 A-Z a-z 0-9 _ [ ] ( ) { } < = > . , : ; + - (space) % & ' " * ^ / \ ? | °Note: SIS does not enforce this restriction in its programmatic API,
 but may perform some character substitutions at Well Known Text (WKT) formatting time.
 If we take only the characters in the above list which are valid in a Unicode identifier and remove the characters that ISO 19162 recommends to ignore, the only characters
 left are letters and digits.- Since:
- 0.3
- See Also:
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final Characters.FilterThe subset of all characters for whichCharacter.isLetterOrDigit(int)returnstrue.static final Characters.FilterThe subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)returnstrue, excluding ignorable characters.
- 
Method SummaryModifier and TypeMethodDescriptionbooleancontains(int codePoint) Returnstrueif this subset contains the given Unicode character.static Characters.FilterforTypes(byte... types) Returns a subset representing the union of all Unicode characters of the given types.Methods inherited from class Character.Subsetequals, hashCode, toString
- 
Field Details- 
LETTERS_AND_DIGITSThe subset of all characters for whichCharacter.isLetterOrDigit(int)returnstrue. This subset includes the following general categories:
 SIS uses this filter when comparing two identified object names. See the Relationship with international standards section in this class javadoc for more information.Character.LOWERCASE_LETTER,UPPERCASE_LETTER,TITLECASE_LETTER,MODIFIER_LETTER,OTHER_LETTERandDECIMAL_DIGIT_NUMBER.- See Also:
 
- 
UNICODE_IDENTIFIERThe subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)returnstrue, excluding ignorable characters. This subset includes all theLETTERS_AND_DIGITScategories with the addition of the following ones:Character.LETTER_NUMBER,CONNECTOR_PUNCTUATION,NON_SPACING_MARKandCOMBINING_SPACING_MARK.
 
- 
- 
Method Details- 
containspublic boolean contains(int codePoint) Returnstrueif this subset contains the given Unicode character.- Parameters:
- codePoint- the Unicode character, as a code point value.
- Returns:
- trueif this subset contains the given character.
 
- 
forTypesReturns a subset representing the union of all Unicode characters of the given types.- Parameters:
- types- the character types, as- Characterconstants.
- Returns:
- the subset of Unicode characters of the given type.
- See Also:
 
 
-