Object
Character.Subset
Filter
- Enclosing class:
Characters
Subsets of Unicode characters identified by their general category.
The categories are identified by constants defined in the
Character
class, like
LOWERCASE_LETTER
,
UPPERCASE_LETTER
,
DECIMAL_DIGIT_NUMBER
and
SPACE_SEPARATOR
.
An instance of this class can be obtained from an enumeration of character types
using the forTypes(byte[])
method, or using one of the constants predefined
in this class. Then, Unicode characters can be tested for inclusion in the subset by
calling the contains(int)
method.
Relationship with international standards
ISO 19162:2015 §B.5.2 recommends to ignore spaces, case and the following characters when comparing two identified object names: “_
” (underscore), “-
” (minus sign), “/
” (solidus),
“(
” (left parenthesis) and “)
” (right parenthesis).
The same specification also limits the set of valid characters in a name to the following (§6.3.1):
A-Z a-z 0-9 _ [ ] ( ) { } < = > . , : ; + - (space) % & ' " * ^ / \ ? | °
Note: SIS does not enforce this restriction in its programmatic API,
but may perform some character substitutions at Well Known Text (WKT) formatting time.
If we take only the characters in the above list which are valid in a Unicode identifier and remove the characters that ISO 19162 recommends to ignore, the only characters
left are letters and digits.- Since:
- 0.3
- See Also:
-
Field Summary
Modifier and TypeFieldDescriptionstatic final Characters.Filter
The subset of all characters for whichCharacter.isLetterOrDigit(int)
returnstrue
.static final Characters.Filter
The subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)
returnstrue
, excluding ignorable characters. -
Method Summary
Modifier and TypeMethodDescriptionboolean
contains
(int codePoint) Returnstrue
if this subset contains the given Unicode character.static Characters.Filter
forTypes
(byte... types) Returns a subset representing the union of all Unicode characters of the given types.Methods inherited from class Character.Subset
equals, hashCode, toString
-
Field Details
-
LETTERS_AND_DIGITS
The subset of all characters for whichCharacter.isLetterOrDigit(int)
returnstrue
. This subset includes the following general categories:Character.LOWERCASE_LETTER
,UPPERCASE_LETTER
,TITLECASE_LETTER
,MODIFIER_LETTER
,OTHER_LETTER
andDECIMAL_DIGIT_NUMBER
.- See Also:
-
UNICODE_IDENTIFIER
The subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)
returnstrue
, excluding ignorable characters. This subset includes all theLETTERS_AND_DIGITS
categories with the addition of the following ones:Character.LETTER_NUMBER
,CONNECTOR_PUNCTUATION
,NON_SPACING_MARK
andCOMBINING_SPACING_MARK
.
-
-
Method Details
-
contains
public boolean contains(int codePoint) Returnstrue
if this subset contains the given Unicode character.- Parameters:
codePoint
- the Unicode character, as a code point value.- Returns:
true
if this subset contains the given character.
-
forTypes
Returns a subset representing the union of all Unicode characters of the given types.- Parameters:
types
- the character types, asCharacter
constants.- Returns:
- the subset of Unicode characters of the given type.
- See Also:
-