CharSequence
instances. Some methods defined in this
class duplicate the functionalities already provided in the standard String
class,
but works on a generic CharSequence
instance instead of String
.
Unicode support
Every methods defined in this class work on code points instead of characters when appropriate. Consequently, those methods should behave correctly with characters outside the Basic Multilingual Plane (BMP).Policy on space characters
Java defines two methods for testing if a character is a white space:Character.isWhitespace(int)
and Character.isSpaceChar(int)
.
Those two methods differ in the way they handle no-break spaces, tabulations and line feeds. The general policy in the SIS library is:
- Use
isWhitespace(…)
when separating entities (words, numbers, tokens, etc.) in a list. Using that method, characters separated by a no-break space are considered as part of the same entity. - Use
isSpaceChar(…)
when parsing a single entity, for example a single word. Using this method, no-break spaces are considered as part of the entity while line feeds or tabulations are entity boundaries.
isWhitespace(…)
is appropriate for skipping spaces between the numbers.
But if there is spaces to skip inside a single number, then isSpaceChar(…)
is a good choice
for accepting no-break spaces and for stopping the parse operation at tabulations or line feed character.
A tabulation or line feed between two characters is very likely to separate two distinct values.
In practice, the Format
implementations in the SIS library typically use
isSpaceChar(…)
while most of the rest of the SIS library, including this
CharSequences
class, consistently uses isWhitespace(…)
.
Note that the String.trim()
method doesn't follow any of those policies and should
generally be avoided. That trim()
method removes every ISO control characters without
distinction about whether the characters are space or not, and ignore all Unicode spaces.
The trimWhitespaces(String)
method defined in this class can be used as an alternative.
Handling of null values
Most methods in this class accept anull
CharSequence
argument. In such cases
the method return value is either a null
CharSequence
, an empty array, or a
0
or false
primitive type calculated as if the input was an empty string.- Since:
- 0.3
- See Also:
-
Field Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic CharSequence
Creates an acronym from the given text.static CharSequence
camelCaseToSentence
(CharSequence identifier) Given a string in camel cases (typically an identifier), returns a string formatted like an English sentence.static CharSequence
camelCaseToWords
(CharSequence identifier, boolean toLowerCase) Given a string in camel cases, returns a string with the same words separated by spaces.static int
Returns the number of Unicode code points in the given characters sequence, or 0 ifnull
.static int
codePointCount
(CharSequence text, int fromIndex, int toIndex) Returns the number of Unicode code points in the given characters sub-sequence, or 0 ifnull
.static CharSequence
commonPrefix
(CharSequence s1, CharSequence s2) Returns the longest sequence of characters which is found at the beginning of the two given texts.static CharSequence
commonSuffix
(CharSequence s1, CharSequence s2) Returns the longest sequence of characters which is found at the end of the two given texts.static CharSequence
commonWords
(CharSequence s1, CharSequence s2) Returns the words found at the beginning and end of both texts.static void
copyChars
(CharSequence src, int srcOffset, char[] dst, int dstOffset, int length) Copies a sequence of characters in the givenchar[]
array.static int
count
(CharSequence text, char toSearch) Counts the number of occurrence of the given character in the given character sequence.static int
count
(CharSequence text, String toSearch) Returns the number of occurrences of thetoSearch
string in the giventext
.static boolean
endsWith
(CharSequence text, CharSequence suffix, boolean ignoreCase) Returnstrue
if the given character sequence ends with the given suffix.static boolean
equals
(CharSequence s1, CharSequence s2) Returnstrue
if the two given texts are equal.static boolean
equalsFiltered
(CharSequence s1, CharSequence s2, Characters.Filter filter, boolean ignoreCase) Returnstrue
if the given texts are equal, optionally ignoring case and filtered-out characters.static boolean
Returnstrue
if the two given texts are equal, ignoring case.static int
indexOf
(CharSequence text, int toSearch, int fromIndex, int toIndex) Returns the index within the given character sequence of the first occurrence of the specified character, starting the search at the specified index.static int
indexOf
(CharSequence text, CharSequence toSearch, int fromIndex, int toIndex) Returns the index within the given strings of the first occurrence of the specified part, starting at the specified index.static int
indexOfLineStart
(CharSequence text, int numLines, int fromIndex) Returns the index of the first character after the given number of lines.static boolean
isAcronymForWords
(CharSequence acronym, CharSequence words) Returnstrue
if the first string is likely to be an acronym of the second string.static boolean
isUnicodeIdentifier
(CharSequence identifier) Returnstrue
if the given identifier is a legal Unicode identifier.static boolean
isUpperCase
(CharSequence text) Returnstrue
if the given text is non-null, contains at least one upper-case character and no lower-case character.static int
lastIndexOf
(CharSequence text, int toSearch, int fromIndex, int toIndex) Returns the index within the given character sequence of the last occurrence of the specified character, searching backward in the given index range.static int
length
(CharSequence text) Returns the length of the given characters sequence, or 0 ifnull
.static byte[]
parseBytes
(CharSequence values, char separator, int radix) static double[]
parseDoubles
(CharSequence values, char separator) static float[]
parseFloats
(CharSequence values, char separator) static int[]
parseInts
(CharSequence values, char separator, int radix) static long[]
parseLongs
(CharSequence values, char separator, int radix) static short[]
parseShorts
(CharSequence values, char separator, int radix) static boolean
regionMatches
(CharSequence text, int fromIndex, CharSequence part) Returnstrue
if the given text at the given offset contains the given part, in a case-sensitive comparison.static boolean
regionMatches
(CharSequence text, int fromIndex, CharSequence part, boolean ignoreCase) Returnstrue
if the given text at the given offset contains the given part, optionally in a case-insensitive way.static CharSequence
replace
(CharSequence text, CharSequence toSearch, CharSequence replaceBy) Replaces all occurrences of a given string in the given character sequence.static CharSequence
shortSentence
(CharSequence text, int maxLength) Makes sure that thetext
string is not longer thanmaxLength
characters.static int
skipLeadingWhitespaces
(CharSequence text, int fromIndex, int toIndex) Returns the index of the first non-white character in the given range.static int
skipTrailingWhitespaces
(CharSequence text, int fromIndex, int toIndex) Returns the index after the last non-white character in the given range.static CharSequence
spaces
(int length) Returns a character sequence of the specified length filled with white spaces.static CharSequence[]
split
(CharSequence text, char separator) Splits a text around the given character.static CharSequence[]
splitOnEOL
(CharSequence text) Splits a text around the End Of Line (EOL) characters.static boolean
startsWith
(CharSequence text, CharSequence prefix, boolean ignoreCase) Returnstrue
if the given character sequence starts with the given prefix.static CharSequence
toASCII
(CharSequence text) Replaces some Unicode characters by ASCII characters on a "best effort basis".static CharSequence
token
(CharSequence text, int fromIndex) Returns the token starting at the given offset in the given text.static CharSequence
Trims the fractional part of the given formatted number, provided that it doesn't change the value.static CharSequence
Returns a text with leading and trailing whitespace characters omitted.static CharSequence
trimWhitespaces
(CharSequence text, int lower, int upper) Returns a sub-sequence with leading and trailing whitespace characters omitted.static String
trimWhitespaces
(String text) Deprecated, for removal: This API element is subject to removal in a future version.static CharSequence
upperCaseToSentence
(CharSequence identifier) Given a string in upper cases (typically a Java constant), returns a string formatted like an English sentence.
-
Field Details
-
EMPTY_ARRAY
An array of zero-length. This constant play a role equivalents toCollections.EMPTY_LIST
.
-
-
Method Details
-
spaces
Returns a character sequence of the specified length filled with white spaces.Use case
This method is typically invoked for performing right-alignment of text on the console or other device using monospaced font. Callers compute a value for thelength
argument by (desired width - used width). Since the used width value may be greater than expected, this method handle negativelength
values as if the value was zero.- Parameters:
length
- the string length. Negative values are clamped to 0.- Returns:
- a string of length
length
filled with white spaces.
-
length
Returns the length of the given characters sequence, or 0 ifnull
.- Parameters:
text
- the character sequence from which to get the length, ornull
.- Returns:
- the length of the character sequence, or 0 if the argument is
null
.
-
codePointCount
Returns the number of Unicode code points in the given characters sequence, or 0 ifnull
. Unpaired surrogates within the text count as one code point each.- Parameters:
text
- the character sequence from which to get the count, ornull
.- Returns:
- the number of Unicode code points, or 0 if the argument is
null
. - See Also:
-
codePointCount
Returns the number of Unicode code points in the given characters sub-sequence, or 0 ifnull
. Unpaired surrogates within the text count as one code point each.This method performs the same work than the standard
Character.codePointCount(CharSequence, int, int)
method, except that it tries to delegate to the optimized methods from theString
,StringBuilder
,StringBuffer
orCharBuffer
classes if possible.- Parameters:
text
- the character sequence from which to get the count, ornull
.fromIndex
- the index from which to start the computation.toIndex
- the index after the last character to take in account.- Returns:
- the number of Unicode code points, or 0 if the argument is
null
. - See Also:
-
count
Returns the number of occurrences of thetoSearch
string in the giventext
. The search is case-sensitive.- Parameters:
text
- the character sequence to count occurrences, ornull
.toSearch
- the string to search in the giventext
. It shall contain at least one character.- Returns:
- the number of occurrences of
toSearch
intext
, or 0 iftext
was null or empty. - Throws:
NullPointerException
- if thetoSearch
argument is null.IllegalArgumentException
- if thetoSearch
argument is empty.
-
count
Counts the number of occurrence of the given character in the given character sequence.- Parameters:
text
- the character sequence to count occurrences, ornull
.toSearch
- the character to count.- Returns:
- the number of occurrences of the given character, or 0 if the
text
is null.
-
indexOf
Returns the index within the given strings of the first occurrence of the specified part, starting at the specified index. This method is equivalent to the following method call, except that this method works on arbitraryCharSequence
objects instead ofString
s only, and that the upper limit can be specified:return text.indexOf(part, fromIndex);
fromIndex
. If negative or greater thantoIndex
, then the behavior of this method is as if the search started from 0 ortoIndex
respectively. This is consistent with theString.indexOf(String, int)
behavior.- Parameters:
text
- the string in which to perform the search.toSearch
- the substring for which to search.fromIndex
- the index from which to start the search.toIndex
- the index after the last character where to perform the search.- Returns:
- the index within the text of the first occurrence of the specified part, starting at the specified index,
or -1 if no occurrence has been found or if the
text
argument is null. - Throws:
NullPointerException
- if thetoSearch
argument is null.IllegalArgumentException
- if thetoSearch
argument is empty.- See Also:
-
indexOf
Returns the index within the given character sequence of the first occurrence of the specified character, starting the search at the specified index. If the character is not found, then this method returns -1.There is no restriction on the value of
fromIndex
. If negative or greater thantoIndex
, then the behavior of this method is as if the search started from 0 ortoIndex
respectively. This is consistent with the behavior documented inString.indexOf(int, int)
.- Parameters:
text
- the character sequence in which to perform the search, ornull
.toSearch
- the Unicode code point of the character to search.fromIndex
- the index to start the search from.toIndex
- the index after the last character where to perform the search.- Returns:
- the index of the first occurrence of the given character in the specified sub-sequence,
or -1 if no occurrence has been found or if the
text
argument is null. - See Also:
-
lastIndexOf
Returns the index within the given character sequence of the last occurrence of the specified character, searching backward in the given index range. If the character is not found, then this method returns -1.There is no restriction on the value of
toIndex
. If greater than the text length or less thanfromIndex
, then the behavior of this method is as if the search started fromlength
orfromIndex
respectively. This is consistent with the behavior documented inString.lastIndexOf(int, int)
.- Parameters:
text
- the character sequence in which to perform the search, ornull
.toSearch
- the Unicode code point of the character to search.fromIndex
- the index of the first character in the range where to perform the search.toIndex
- the index after the last character in the range where to perform the search.- Returns:
- the index of the last occurrence of the given character in the specified sub-sequence,
or -1 if no occurrence has been found or if the
text
argument is null. - See Also:
-
indexOfLineStart
Returns the index of the first character after the given number of lines. This method counts the number of occurrence of'\n'
,'\r'
or"\r\n"
starting from the given position. WhennumLines
occurrences have been found, the index of the first character after the last occurrence is returned.If the
numLines
argument is positive, this method searches forward. If negative, this method searches backward. If 0, this method returns the beginning of the current line.If this method reaches the end of
text
while searching forward, thentext.length()
is returned. If this method reaches the beginning oftext
while searching backward, then 0 is returned.- Parameters:
text
- the string in which to skip a determined amount of lines.numLines
- the number of lines to skip. Can be positive, zero or negative.fromIndex
- index at which to start the search, from 0 totext.length()
inclusive.- Returns:
- index of the first character after the last skipped line.
- Throws:
NullPointerException
- if thetext
argument is null.IndexOutOfBoundsException
- iffromIndex
is out of bounds.
-
skipLeadingWhitespaces
Returns the index of the first non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character after the given range, which is always equals or greater thantoIndex
. Note that this character may not exist iftoIndex
is equal to the text length.Special cases:
- If
fromIndex
is greater thantoIndex
, then this method unconditionally returnsfromIndex
. - If the given range contains only space characters and the character at
toIndex-1
is the high surrogate of a valid supplementary code point, then this method returnstoIndex+1
, which is the index of the next code point. - If
fromIndex
is negative ortoIndex
is greater than the text length, then the behavior of this method is undefined.
Character.isWhitespace(int)
method.- Parameters:
text
- the string in which to perform the search (cannot be null).fromIndex
- the index from which to start the search (cannot be negative).toIndex
- the index after the last character where to perform the search.- Returns:
- the index within the text of the first occurrence of a non-space character, starting
at the specified index, or a value equals or greater than
toIndex
if none. - Throws:
NullPointerException
- if thetext
argument is null.- See Also:
- If
-
skipTrailingWhitespaces
Returns the index after the last non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character in the given range, which is always equals or lower thanfromIndex
.Special cases:
- If
fromIndex
is lower thantoIndex
, then this method unconditionally returnstoIndex
. - If the given range contains only space characters and the character at
fromIndex
is the low surrogate of a valid supplementary code point, then this method returnsfromIndex-1
, which is the index of the code point. - If
fromIndex
is negative ortoIndex
is greater than the text length, then the behavior of this method is undefined.
Character.isWhitespace(int)
method.- Parameters:
text
- the string in which to perform the search (cannot be null).fromIndex
- the index from which to start the search (cannot be negative).toIndex
- the index after the last character where to perform the search.- Returns:
- the index within the text of the last occurrence of a non-space character, starting
at the specified index, or a value equals or lower than
fromIndex
if none. - Throws:
NullPointerException
- if thetext
argument is null.- See Also:
- If
-
split
Splits a text around the given character. The array returned by this method contains all subsequences of the given text that is terminated by the given character or is terminated by the end of the text. The subsequences in the array are in the order in which they occur in the given text. If the character is not found in the input, then the resulting array has just one element, which is the whole given text.This method is similar to the standard
String.split(String)
method except for the following:- It accepts generic character sequences.
- It accepts
null
argument, in which case an empty array is returned. - The separator is a simple character instead of a regular expression.
- If the
separator
argument is'\n'
or'\r'
, then this method splits around any of"\r"
,"\n"
or"\r\n"
characters sequences. - The leading and trailing spaces of each subsequences are trimmed.
- Parameters:
text
- the text to split, ornull
.separator
- the delimiting character (typically the coma).- Returns:
- the array of subsequences computed by splitting the given text around the given
character, or an empty array if
text
was null. - See Also:
-
splitOnEOL
Splits a text around the End Of Line (EOL) characters. EOL characters can be any of"\r"
,"\n"
or"\r\n"
sequences. Each element in the returned array will be a single line. If the given text is already a single line, then this method returns a singleton containing only the given text.Notes:
- At the difference of
split(toSplit, '\n’)
, this method does not remove whitespaces. - This method does not check for Unicode line separator and paragraph separator.
Performance note
Prior Java 8 this method was usually cheap because all string instances created byString.substring(int,int)
shared the samechar[]
internal array. However, since Java 8, the newString
implementation copies the data in new arrays. Consequently, it is better to use index rather than this method for splitting largeString
s. However, this method still useful for otherCharSequence
implementations providing an efficientsubSequence(int,int)
method.- Parameters:
text
- the multi-line text from which to get the individual lines, ornull
.- Returns:
- the lines in the text, or an empty array if the given text was null.
- See Also:
- At the difference of
-
parseDoubles
public static double[] parseDoubles(CharSequence values, char separator) throws NumberFormatException Splits the given text around the given character, then parses each item as adouble
. Empty sub-sequences are parsed asDouble.NaN
.- Parameters:
values
- the text containing the values to parse, ornull
.separator
- the delimiting character (typically the coma).- Returns:
- the array of numbers parsed from the given text,
or an empty array if
values
was null. - Throws:
NumberFormatException
- if at least one number cannot be parsed.
-
parseFloats
Splits the given text around the given character, then parses each item as afloat
. Empty sub-sequences are parsed asFloat.NaN
.- Parameters:
values
- the text containing the values to parse, ornull
.separator
- the delimiting character (typically the coma).- Returns:
- the array of numbers parsed from the given text,
or an empty array if
values
was null. - Throws:
NumberFormatException
- if at least one number cannot be parsed.
-
parseLongs
public static long[] parseLongs(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values
- the text containing the values to parse, ornull
.separator
- the delimiting character (typically the coma).radix
- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
values
was null. - Throws:
NumberFormatException
- if at least one number cannot be parsed.
-
parseInts
public static int[] parseInts(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values
- the text containing the values to parse, ornull
.separator
- the delimiting character (typically the coma).radix
- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
values
was null. - Throws:
NumberFormatException
- if at least one number cannot be parsed.
-
parseShorts
public static short[] parseShorts(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values
- the text containing the values to parse, ornull
.separator
- the delimiting character (typically the coma).radix
- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
values
was null. - Throws:
NumberFormatException
- if at least one number cannot be parsed.
-
parseBytes
public static byte[] parseBytes(CharSequence values, char separator, int radix) throws NumberFormatException - Parameters:
values
- the text containing the values to parse, ornull
.separator
- the delimiting character (typically the coma).radix
- the radix to be used for parsing. This is usually 10.- Returns:
- the array of numbers parsed from the given text,
or an empty array if
values
was null. - Throws:
NumberFormatException
- if at least one number cannot be parsed.
-
toASCII
Replaces some Unicode characters by ASCII characters on a "best effort basis". For example, the “ é ” character is replaced by “ e ” (without accent), the “ ″ ” symbol for minutes of angle is replaced by straight double quotes “ " ”, and combined characters like ㎏, ㎎, ㎝, ㎞, ㎢, ㎦, ㎖, ㎧, ㎩, ㎐, etc. are replaced by the corresponding sequences of characters.Note: the replacement of Greek letters is a more complex task than what this method can do, since it depends on the context. For example if the Greek letters are abbreviations for coordinate system axes like φ and λ, then the replacements depend on the enclosing coordinate system. SeeTransliterator
for more information.- Parameters:
text
- the text to scan for Unicode characters to replace by ASCII characters, ornull
.- Returns:
- the given text with substitutions applied, or
text
if no replacement has been applied, ornull
if the given text was null. - See Also:
-
trimWhitespaces
Deprecated, for removal: This API element is subject to removal in a future version.Replaced byString.strip()
in JDK 11.Returns a string with leading and trailing whitespace characters omitted. This method is similar in purpose toString.trim()
, except that the latter considers every ISO control codes below 32 to be a whitespace. ThatString.trim()
behavior has the side effect of removing the heading of ANSI escape sequences (a.k.a. X3.64), and to ignore Unicode spaces. ThistrimWhitespaces(…)
method is built on the more accurateCharacter.isWhitespace(int)
method instead.This method performs the same work than
trimWhitespaces(CharSequence)
, but is overloaded for theString
type because of its frequent use.- Parameters:
text
- the text from which to remove leading and trailing whitespaces, ornull
.- Returns:
- a string with leading and trailing whitespaces removed, or
null
is the given text was null.
-
trimWhitespaces
Returns a text with leading and trailing whitespace characters omitted. Space characters are identified by theCharacter.isWhitespace(int)
method.This method is the generic version of
trimWhitespaces(String)
.- Parameters:
text
- the text from which to remove leading and trailing whitespaces, ornull
.- Returns:
- a characters sequence with leading and trailing whitespaces removed,
or
null
is the given text was null. - See Also:
-
trimWhitespaces
Returns a sub-sequence with leading and trailing whitespace characters omitted. Space characters are identified by theCharacter.isWhitespace(int)
method.Invoking this method is functionally equivalent to the following code snippet, except that the
subSequence
method is invoked only once instead of two times:text = trimWhitespaces(text.subSequence(lower, upper));
- Parameters:
text
- the text from which to remove leading and trailing white spaces.lower
- index of the first character to consider for inclusion in the sub-sequence.upper
- index after the last character to consider for inclusion in the sub-sequence.- Returns:
- a characters sequence with leading and trailing white spaces removed, or
null
if thetext
argument is null. - Throws:
IndexOutOfBoundsException
- iflower
orupper
is out of bounds.
-
trimFractionalPart
Trims the fractional part of the given formatted number, provided that it doesn't change the value. This method assumes that the number is formatted in the US locale, typically by theDouble.toString(double)
method.More specifically if the given value ends with a
'.'
character followed by a sequence of'0'
characters, then those characters are omitted. Otherwise this method returns the text unchanged. This is a "all or nothing" method: either the fractional part is completely removed, or either it is left unchanged.Examples
This method returns"4"
if the given value is"4."
,"4.0"
or"4.00"
, but returns"4.10"
unchanged (including the trailing'0'
character) if the input is"4.10"
.Use case
This method is useful before to parse a number if that number should preferably be parsed as an integer before attempting to parse it as a floating point number.- Parameters:
value
- the value to trim if possible, ornull
.- Returns:
- the value without the trailing
".0"
part (if any), ornull
if the given text was null. - See Also:
-
shortSentence
Makes sure that thetext
string is not longer thanmaxLength
characters. Iftext
is not longer, then it is returned unchanged. Otherwise this method returns a copy oftext
with some characters substituted by the"(…)"
string.If the text needs to be shortened, then this method tries to apply the above-cited substitution between two words. For example, the following text:
"This sentence given as an example is way too long to be included in a short name."
May be shortened to something like this:"This sentence given (…) in a short name."
- Parameters:
text
- the sentence to reduce if it is too long, ornull
.maxLength
- the maximum length allowed fortext
.- Returns:
- a sentence not longer than
maxLength
, ornull
if the given text was null.
-
upperCaseToSentence
Given a string in upper cases (typically a Java constant), returns a string formatted like an English sentence. This heuristic method performs the following steps:- Replace all occurrences of
'_'
by spaces. - Converts all letters except the first one to lower case letters using
Character.toLowerCase(int)
. Note that this method does not use theString.toLowerCase()
method. Consequently, the system locale is ignored. This method behaves as if the conversion were done in the root locale.
Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.
- Parameters:
identifier
- the name of a Java constant, ornull
.- Returns:
- the identifier like an English sentence, or
null
if the givenidentifier
argument was null.
- Replace all occurrences of
-
camelCaseToSentence
Given a string in camel cases (typically an identifier), returns a string formatted like an English sentence. This heuristic method performs the following steps:- Invoke
camelCaseToWords(CharSequence, boolean)
, which separate the words on the basis of character case. For example,"transferFunctionType"
become "transfer function type". This works fine for ISO 19115 identifiers. - Next replace all occurrence of
'_'
by spaces in order to take in account another common naming convention, which uses'_'
as a word separator. This convention is used by netCDF attributes like"project_name"
. - Finally ensure that the first character is upper-case.
Exception to the above rules
If the given identifier contains only upper-case letters, digits and the'_'
character, then the identifier is returned "as is" except for the'_'
characters which are replaced by'-'
. This work well for identifiers like"UTF-8"
or"ISO-LATIN-1"
for instance.Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.
- Parameters:
identifier
- an identifier with no space, words begin with an upper-case character, ornull
.- Returns:
- the identifier with spaces inserted after what looks like words, or
null
if the givenidentifier
argument was null.
- Invoke
-
camelCaseToWords
Given a string in camel cases, returns a string with the same words separated by spaces. A word begins with a upper-case character following a lower-case character. For example if the given string is"PixelInterleavedSampleModel"
, then this method returns "Pixel Interleaved Sample Model" or "Pixel interleaved sample model" depending on the value of thetoLowerCase
argument.If
toLowerCase
isfalse
, then this method inserts spaces but does not change the case of characters. IftoLowerCase
istrue
, then this method changes to lower case the first character after each spaces inserted by this method (note that this intentionally exclude the very first character in the given string), except if the second character is upper case, in which case the word is assumed an acronym.The given string is usually a programmatic identifier like a class name or a method name.
- Parameters:
identifier
- an identifier with no space, words begin with an upper-case character.toLowerCase
-true
for changing the first character of words to lower case, except for the first word and acronyms.- Returns:
- the identifier with spaces inserted after what looks like words, or
null
if the givenidentifier
argument was null.
-
camelCaseToAcronym
Creates an acronym from the given text. This method returns a string containing the first character of each word, where the words are separated by the camel case convention, the'_'
character, or any character which is not a Unicode identifier part (including spaces).An exception to the above rule happens if the given text is a Unicode identifier without the
'_'
character, and every characters are upper case. In such case the text is returned unchanged on the assumption that it is already an acronym.Examples: given
"northEast"
, this method returns"NE"
. Given"Open Geospatial Consortium"
, this method returns"OGC"
.- Parameters:
text
- the text for which to create an acronym, ornull
.- Returns:
- the acronym, or
null
if the given text was null.
-
isAcronymForWords
Returnstrue
if the first string is likely to be an acronym of the second string. An acronym is a sequence of letters or digits built from at least one character of each word in thewords
string. More than one character from the same word may appear in the acronym, but they must always be the first consecutive characters. The comparison is case-insensitive. If any of the given arguments isnull
, this method returnsfalse
.Example
Given the"Open Geospatial Consortium"
words, the following strings are recognized as acronyms:"OGC"
,"ogc"
,"O.G.C."
,"OpGeoCon"
.- Parameters:
acronym
- a possible acronym of the sequence of words, ornull
.words
- the sequence of words, ornull
.- Returns:
true
if the first string is an acronym of the second one.
-
isUnicodeIdentifier
Returnstrue
if the given identifier is a legal Unicode identifier. This method returnstrue
if the identifier length is greater than zero, the first character is a Unicode identifier start and all remaining characters (if any) are Unicode identifier parts.Relationship with legal XML identifiers
Most legal Unicode identifiers are also legal XML identifiers, but the converse is not true. The most noticeable differences are the ‘:
’, ‘-
’ and ‘.
’ characters, which are legal in XML identifiers but not in Unicode.Characters legal in one set but not in the other Not legal in Unicode Not legal in XML :
(colon) µ
(micro sign) -
(hyphen or minus) ª
(feminine ordinal indicator) .
(dot) º
(masculine ordinal indicator) ·
(middle dot) ⁔
(inverted undertie) Many punctuation, symbols, etc. Identifier ignorable characters. _
’ (underscore) character is legal according both Unicode and XML, while spaces, ‘!
’, ‘#
’, ‘*
’, ‘/
’, ‘?
’ and most other punctuation characters are not.Usage in Apache SIS
In its handling of identifiers, Apache SIS favors Unicode identifiers without ignorable characters since those identifiers are legal XML identifiers except for the above-cited rarely used characters. As a side effect, this policy excludes ‘:
’, ‘-
’ and ‘.
’ which would normally be legal XML identifiers. But since those characters could easily be confused with namespace separators, this exclusion is considered desirable.- Parameters:
identifier
- the character sequence to test, ornull
.- Returns:
true
if the given character sequence is a legal Unicode identifier.- See Also:
-
isUpperCase
Returnstrue
if the given text is non-null, contains at least one upper-case character and no lower-case character. Space and punctuation are ignored.- Parameters:
text
- the character sequence to test (may benull
).- Returns:
true
if non-null, contains at least one upper-case character and no lower-case character.- Since:
- 0.7
- See Also:
-
equalsFiltered
public static boolean equalsFiltered(CharSequence s1, CharSequence s2, Characters.Filter filter, boolean ignoreCase) Returnstrue
if the given texts are equal, optionally ignoring case and filtered-out characters. This method is sometimes used for comparing identifiers in a lenient way.Example: the following call compares the two strings ignoring case and any characters which are not letter or digit. In particular, spaces and punctuation characters like
'_'
and'-'
are ignored:assert equalsFiltered("WGS84", "WGS_84", Characters.Filter.LETTERS_AND_DIGITS, true) == true;
- Parameters:
s1
- the first characters sequence to compare, ornull
.s2
- the second characters sequence to compare, ornull
.filter
- the subset of characters to compare, ornull
for comparing all characters.ignoreCase
-true
for ignoring cases, orfalse
for requiring exact match.- Returns:
true
if both arguments arenull
or if the two given texts are equal, optionally ignoring case and filtered-out characters.
-
equalsIgnoreCase
Returnstrue
if the two given texts are equal, ignoring case. This method is similar toString.equalsIgnoreCase(String)
, except it works on arbitrary character sequences and compares code points instead of characters.- Parameters:
s1
- the first string to compare, ornull
.s2
- the second string to compare, ornull
.- Returns:
true
if the two given texts are equal, ignoring case, or if both arguments arenull
.- See Also:
-
equals
Returnstrue
if the two given texts are equal. This method delegates toString.contentEquals(CharSequence)
if possible. This method never invokeCharSequence.toString()
in order to avoid a potentially large copy of data.- Parameters:
s1
- the first string to compare, ornull
.s2
- the second string to compare, ornull
.- Returns:
true
if the two given texts are equal, or if both arguments arenull
.- See Also:
-
regionMatches
Returnstrue
if the given text at the given offset contains the given part, in a case-sensitive comparison. This method is equivalent to the following code, except that this method works on arbitraryCharSequence
objects instead ofString
s only:return text.regionMatches(offset, part, 0, part.length());
IndexOutOfBoundsException
. Instead, iffromIndex < 0
orfromIndex + part.length() > text.length()
, then this method returnsfalse
.- Parameters:
text
- the character sequence for which to tests for the presence ofpart
.fromIndex
- the offset intext
where to test for the presence ofpart
.part
- the part which may be present intext
.- Returns:
true
iftext
containspart
at the givenoffset
.- Throws:
NullPointerException
- if any of the arguments is null.- See Also:
-
regionMatches
public static boolean regionMatches(CharSequence text, int fromIndex, CharSequence part, boolean ignoreCase) Returnstrue
if the given text at the given offset contains the given part, optionally in a case-insensitive way. This method is equivalent to the following code, except that this method works on arbitraryCharSequence
objects instead ofString
s only:return text.regionMatches(ignoreCase, offset, part, 0, part.length());
IndexOutOfBoundsException
. Instead, iffromIndex < 0
orfromIndex + part.length() > text.length()
, then this method returnsfalse
.- Parameters:
text
- the character sequence for which to tests for the presence ofpart
.fromIndex
- the offset intext
where to test for the presence ofpart
.part
- the part which may be present intext
.ignoreCase
-true
if the case should be ignored.- Returns:
true
iftext
containspart
at the givenoffset
.- Throws:
NullPointerException
- if any of the arguments is null.- Since:
- 0.4
- See Also:
-
startsWith
Returnstrue
if the given character sequence starts with the given prefix.- Parameters:
text
- the characters sequence to test.prefix
- the expected prefix.ignoreCase
-true
if the case should be ignored.- Returns:
true
if the given sequence starts with the given prefix.- Throws:
NullPointerException
- if any of the arguments is null.
-
endsWith
Returnstrue
if the given character sequence ends with the given suffix.- Parameters:
text
- the characters sequence to test.suffix
- the expected suffix.ignoreCase
-true
if the case should be ignored.- Returns:
true
if the given sequence ends with the given suffix.- Throws:
NullPointerException
- if any of the arguments is null.
-
commonPrefix
Returns the longest sequence of characters which is found at the beginning of the two given texts. If one of those texts isnull
, then the other text is returned. If there is no common prefix, then this method returns an empty string.- Parameters:
s1
- the first text, ornull
.s2
- the second text, ornull
.- Returns:
- the common prefix of both texts (may be empty), or
null
if both texts are null.
-
commonSuffix
Returns the longest sequence of characters which is found at the end of the two given texts. If one of those texts isnull
, then the other text is returned. If there is no common suffix, then this method returns an empty string.- Parameters:
s1
- the first text, ornull
.s2
- the second text, ornull
.- Returns:
- the common suffix of both texts (may be empty), or
null
if both texts are null.
-
commonWords
Returns the words found at the beginning and end of both texts. The returned string is the concatenation of the common prefix with the common suffix, with prefix and suffix eventually made shorter for avoiding to cut in the middle of a word.The purpose of this method is to create a global identifier from a list of component identifiers. The latter are often eastward and northward components of a vector, in which case this method provides an identifier for the vector as a whole.
If one of the given texts is
null
, then the other text is returned. If there are no common words, then this method returns an empty string.Example
Given the following inputs:"baroclinic_eastward_velocity"
"baroclinic_northward_velocity"
"baroclinic_velocity"
. Note that the"ward"
characters are a common suffix of both texts but nevertheless omitted because they cut a word.Possible future evolution
Current implementation searches only for a common prefix and a common suffix, ignoring any common words that may appear in the middle of the strings. A character is considered the beginning of a word if it is a letter or digit which is not preceded by another letter or digit (as leading "s" and "c" in "snake_case"), or if it is an upper case letter preceded by a lower case letter or no letter (as both "C" in "CamelCase").- Parameters:
s1
- the first text, ornull
.s2
- the second text, ornull
.- Returns:
- the common suffix of both texts (may be empty), or
null
if both texts are null. - Since:
- 1.1
-
token
Returns the token starting at the given offset in the given text. For the purpose of this method, a "token" is any sequence of consecutive characters of the same type, as defined below.Let define c as the first non-blank character located at an index equals or greater than the given offset. Then the characters that are considered of the same type are:
- If c is a Unicode identifier start, then any following characters that are Unicode identifier part.
- Otherwise any character for which
Character.getType(int)
returns the same value than for c.
- Parameters:
text
- the text for which to get the token.fromIndex
- index of the first character to consider in the given text.- Returns:
- a sub-sequence of
text
starting at the given offset, or an empty string if there are no non-blank character at or after the given offset. - Throws:
NullPointerException
- if thetext
argument is null.
-
replace
public static CharSequence replace(CharSequence text, CharSequence toSearch, CharSequence replaceBy) Replaces all occurrences of a given string in the given character sequence. If no occurrence oftoSearch
is found in the given text or iftoSearch
is equal toreplaceBy
, then this method returns thetext
unchanged. Otherwise this method returns a new character sequence with all occurrences replaced byreplaceBy
.This method is similar to
String.replace(CharSequence, CharSequence)
except that is accepts arbitraryCharSequence
objects. As of Java 10, another difference is that this method does not create a newString
iftoSearch
is equal toreplaceBy
.- Parameters:
text
- the character sequence in which to perform the replacements, ornull
.toSearch
- the string to replace.replaceBy
- the replacement for the searched string.- Returns:
- the given text with replacements applied, or
text
if no replacement has been applied, ornull
if the given text was null - Since:
- 0.4
- See Also:
-
copyChars
public static void copyChars(CharSequence src, int srcOffset, char[] dst, int dstOffset, int length) Copies a sequence of characters in the givenchar[]
array.- Parameters:
src
- the characters sequence from which to copy characters.srcOffset
- index of the first character fromsrc
to copy.dst
- the array where to copy the characters.dstOffset
- index where to write the first character indst
.length
- number of characters to copy.- See Also:
-
String.strip()
in JDK 11.