CharSequences

public final class CharSequences extends Static

Static methods working with CharSequence instances. Some methods defined in this class duplicate the functionalities already provided in the standard String class, but works on a generic CharSequence instance instead of String.

Unicode support

Every methods defined in this class work on code points instead of characters when appropriate. Consequently, those methods should behave correctly with characters outside the Basic Multilingual Plane (BMP).

Policy on space characters

Java defines two methods for testing if a character is a white space: Character.isWhitespace(int) and Character.isSpaceChar(int). Those two methods differ in the way they handle no-break spaces, tabulations and line feeds. The general policy in the SIS library is:

Use isWhitespace(…) when separating entities (words, numbers, tokens, etc.) in a list. Using that method, characters separated by a no-break space are considered as part of the same entity.
Use isSpaceChar(…) when parsing a single entity, for example a single word. Using this method, no-break spaces are considered as part of the entity while line feeds or tabulations are entity boundaries.

For example numbers formatted in the French locale use no-break spaces as group separators. When parsing a list of numbers, ordinary spaces around the numbers may need to be ignored, but no-break spaces shall be considered as part of the numbers. Consequently, isWhitespace(…) is appropriate for skipping spaces between the numbers. But if there is spaces to skip inside a single number, then isSpaceChar(…) is a good choice for accepting no-break spaces and for stopping the parse operation at tabulations or line feed character. A tabulation or line feed between two characters is very likely to separate two distinct values. In practice, the Format implementations in the SIS library typically use isSpaceChar(…) while most of the rest of the SIS library, including this CharSequences class, consistently uses isWhitespace(…).

Handling of null values

Most methods in this class accept a null CharSequence argument. In such cases the method return value is either a null CharSequence, an empty array, or a 0 or false primitive type calculated as if the input was an empty string.

Since:

0.3

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

static final String[]

EMPTY_ARRAY

An array of zero-length.
Method Summary

Modifier and Type

Method

Description

static CharSequence

camelCaseToAcronym(CharSequence text)

Creates an acronym from the given text.

static CharSequence

camelCaseToSentence(CharSequence identifier)

Given a string in camel cases (typically an identifier), returns a string formatted like an English sentence.

static CharSequence

camelCaseToWords(CharSequence identifier, boolean toLowerCase)

Given a string in camel cases, returns a string with the same words separated by spaces.

static int

codePointCount(CharSequence text)

Returns the number of Unicode code points in the given characters sequence, or 0 if null.

static int

codePointCount(CharSequence text, int fromIndex, int toIndex)

Returns the number of Unicode code points in the given characters sub-sequence, or 0 if null.

static CharSequence

commonPrefix(CharSequence s1, CharSequence s2)

Returns the longest sequence of characters which is found at the beginning of the two given texts.

static CharSequence

commonSuffix(CharSequence s1, CharSequence s2)

Returns the longest sequence of characters which is found at the end of the two given texts.

static CharSequence

commonWords(CharSequence s1, CharSequence s2)

Returns the words found at the beginning and end of both texts.

static void

copyChars(CharSequence src, int srcOffset, char[] dst, int dstOffset, int length)

Copies a sequence of characters in the given char[] array.

static int

count(CharSequence text, char toSearch)

Counts the number of occurrence of the given character in the given character sequence.

static int

count(CharSequence text, String toSearch)

Returns the number of occurrences of the toSearch string in the given text.

static boolean

endsWith(CharSequence text, CharSequence suffix, boolean ignoreCase)

Returns true if the given character sequence ends with the given suffix.

static boolean

equals(CharSequence s1, CharSequence s2)

Returns true if the two given texts are equal.

static boolean

equalsFiltered(CharSequence s1, CharSequence s2, Characters.Filter filter, boolean ignoreCase)

Returns true if the given texts are equal, optionally ignoring case and filtered-out characters.

static boolean

equalsIgnoreCase(CharSequence s1, CharSequence s2)

Returns true if the two given texts are equal, ignoring case.

static int

indexOf(CharSequence text, int toSearch, int fromIndex, int toIndex)

Returns the index within the given character sequence of the first occurrence of the specified character, starting the search at the specified index.

static int

indexOf(CharSequence text, CharSequence toSearch, int fromIndex, int toIndex)

Returns the index within the given strings of the first occurrence of the specified part, starting at the specified index.

static int

indexOfLineStart(CharSequence text, int numLines, int fromIndex)

Returns the index of the first character after the given number of lines.

static boolean

isAcronymForWords(CharSequence acronym, CharSequence words)

Returns true if the first string is likely to be an acronym of the second string.

static boolean

isUnicodeIdentifier(CharSequence identifier)

Returns true if the given identifier is a legal Unicode identifier.

static boolean

isUpperCase(CharSequence text)

Returns true if the given text is non-null, contains at least one upper-case character and no lower-case character.

static int

lastIndexOf(CharSequence text, int toSearch, int fromIndex, int toIndex)

Returns the index within the given character sequence of the last occurrence of the specified character, searching backward in the given index range.

static int

length(CharSequence text)

Returns the length of the given characters sequence, or 0 if null.

static byte[]

parseBytes(CharSequence values, char separator, int radix)

Splits the given text around the given character, then parses each item as a byte.

static double[]

parseDoubles(CharSequence values, char separator)

Splits the given text around the given character, then parses each item as a double.

static float[]

parseFloats(CharSequence values, char separator)

Splits the given text around the given character, then parses each item as a float.

static int[]

parseInts(CharSequence values, char separator, int radix)

Splits the given text around the given character, then parses each item as an int.

static long[]

parseLongs(CharSequence values, char separator, int radix)

Splits the given text around the given character, then parses each item as a long.

static short[]

parseShorts(CharSequence values, char separator, int radix)

Splits the given text around the given character, then parses each item as a short.

static boolean

regionMatches(CharSequence text, int fromIndex, CharSequence part)

Returns true if the given text at the given offset contains the given part, in a case-sensitive comparison.

static boolean

regionMatches(CharSequence text, int fromIndex, CharSequence part, boolean ignoreCase)

Returns true if the given text at the given offset contains the given part, optionally in a case-insensitive way.

static CharSequence

replace(CharSequence text, CharSequence toSearch, CharSequence replaceBy)

Replaces all occurrences of a given string in the given character sequence.

static CharSequence

shortSentence(CharSequence text, int maxLength)

Makes sure that the text string is not longer than maxLength characters.

static int

skipLeadingWhitespaces(CharSequence text, int fromIndex, int toIndex)

Returns the index of the first non-white character in the given range.

static int

skipTrailingWhitespaces(CharSequence text, int fromIndex, int toIndex)

Returns the index after the last non-white character in the given range.

static CharSequence

spaces(int length)

Returns a character sequence of the specified length filled with white spaces.

static CharSequence[]

split(CharSequence text, char separator)

Splits a text around the given character.

static CharSequence[]

splitOnEOL(CharSequence text)

Splits a text around the End Of Line (EOL) characters.

static boolean

startsWith(CharSequence text, CharSequence prefix, boolean ignoreCase)

Returns true if the given character sequence starts with the given prefix.

static CharSequence

toASCII(CharSequence text)

Replaces some Unicode characters by ASCII characters on a "best effort basis".

static CharSequence

token(CharSequence text, int fromIndex)

Returns the token starting at the given offset in the given text.

static CharSequence

trimFractionalPart(CharSequence value)

Trims the fractional part of the given formatted number, provided that it doesn't change the value.

static CharSequence

trimIgnorables(CharSequence text)

Returns a text with ignorable characters in Unicode identifier removed.

static CharSequence

trimWhitespaces(CharSequence text)

Returns a text with leading and trailing whitespace characters omitted.

static CharSequence

trimWhitespaces(CharSequence text, int lower, int upper)

Returns a sub-sequence with leading and trailing whitespace characters omitted.

static CharSequence

upperCaseToSentence(CharSequence identifier)

Given a string in upper cases (typically a Java constant), returns a string formatted like an English sentence.

Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- EMPTY_ARRAY
  
  public static final String[] EMPTY_ARRAY
  
  An array of zero-length. This constant play a role equivalents to Collections.EMPTY_LIST.

Method Details

spaces

public static CharSequence spaces(int length)

Returns a character sequence of the specified length filled with white spaces.
Use case
This method is typically invoked for performing right-alignment of text on the console or other device using monospaced font. Callers compute a value for the length argument by (desired width - used width). Since the used width value may be greater than expected, this method handle negative length values as if the value was zero.

Parameters:

length - the string length. Negative values are clamped to 0.

Returns:

a string of length length filled with white spaces.
length

public static int length(CharSequence text)

Returns the length of the given characters sequence, or 0 if null.

Parameters:

text - the character sequence from which to get the length, or null.

Returns:

the length of the character sequence, or 0 if the argument is null.
codePointCount
public static int codePointCount(CharSequence text)

Returns the number of Unicode code points in the given characters sequence, or 0 if null. Unpaired surrogates within the text count as one code point each.
Parameters:

text - the character sequence from which to get the count, or null.

Returns:

the number of Unicode code points, or 0 if the argument is null.

See Also:
codePointCount(CharSequence, int, int)
codePointCount
public static int codePointCount(CharSequence text, int fromIndex, int toIndex)

Returns the number of Unicode code points in the given characters sub-sequence, or 0 if null. Unpaired surrogates within the text count as one code point each.
This method performs the same work as the standard Character.codePointCount(CharSequence, int, int) method, except that it tries to delegate to the optimized methods from the String, StringBuilder, StringBuffer or CharBuffer classes if possible.
Parameters:

text - the character sequence from which to get the count, or null.

fromIndex - the index from which to start the computation.

toIndex - the index after the last character to take in account.

Returns:

the number of Unicode code points, or 0 if the argument is null.

See Also:
Character.codePointCount(CharSequence, int, int)

String.codePointCount(int, int)

StringBuilder.codePointCount(int, int)
count

public static int count(CharSequence text, String toSearch)

Returns the number of occurrences of the toSearch string in the given text. The search is case-sensitive.

Parameters:

text - the character sequence to count occurrences, or null.

toSearch - the string to search in the given text. It shall contain at least one character.

Returns:

the number of occurrences of toSearch in text, or 0 if text was null or empty.

Throws:

NullPointerException - if the toSearch argument is null.

IllegalArgumentException - if the toSearch argument is empty.
count

public static int count(CharSequence text, char toSearch)

Counts the number of occurrence of the given character in the given character sequence.

Parameters:

text - the character sequence to count occurrences, or null.

toSearch - the character to count.

Returns:

the number of occurrences of the given character, or 0 if the text is null.
indexOf
public static int indexOf(CharSequence text, CharSequence toSearch, int fromIndex, int toIndex)
Returns the index within the given strings of the first occurrence of the specified part, starting at the specified index. This method is equivalent to the following method call, except that this method works on arbitrary CharSequence objects instead of Strings only, and that the upper limit can be specified:
return text.indexOf(part, fromIndex);
There is no restriction on the value of fromIndex. If negative or greater than toIndex, then the behavior of this method is as if the search started from 0 or toIndex respectively. This is consistent with the String.indexOf(String, int) behavior.
Parameters:

text - the string in which to perform the search.

toSearch - the substring for which to search.

fromIndex - the index from which to start the search.

toIndex - the index after the last character where to perform the search.

Returns:

the index within the text of the first occurrence of the specified part, starting at the specified index, or -1 if no occurrence has been found or if the text argument is null.

Throws:

NullPointerException - if the toSearch argument is null.

IllegalArgumentException - if the toSearch argument is empty.

See Also:
String.indexOf(String, int)

StringBuilder.indexOf(String, int)

StringBuffer.indexOf(String, int)
indexOf
public static int indexOf(CharSequence text, int toSearch, int fromIndex, int toIndex)

Returns the index within the given character sequence of the first occurrence of the specified character, starting the search at the specified index. If the character is not found, then this method returns -1.
There is no restriction on the value of fromIndex. If negative or greater than toIndex, then the behavior of this method is as if the search started from 0 or toIndex respectively. This is consistent with the behavior documented in String.indexOf(int, int).
Parameters:

text - the character sequence in which to perform the search, or null.

toSearch - the Unicode code point of the character to search.

fromIndex - the index to start the search from.

toIndex - the index after the last character where to perform the search.

Returns:

the index of the first occurrence of the given character in the specified sub-sequence, or -1 if no occurrence has been found or if the text argument is null.

See Also:
String.indexOf(int, int)
lastIndexOf
public static int lastIndexOf(CharSequence text, int toSearch, int fromIndex, int toIndex)

Returns the index within the given character sequence of the last occurrence of the specified character, searching backward in the given index range. If the character is not found, then this method returns -1.
There is no restriction on the value of toIndex. If greater than the text length or less than fromIndex, then the behavior of this method is as if the search started from length or fromIndex respectively. This is consistent with the behavior documented in String.lastIndexOf(int, int).
Parameters:

text - the character sequence in which to perform the search, or null.

toSearch - the Unicode code point of the character to search.

fromIndex - the index of the first character in the range where to perform the search.

toIndex - the index after the last character in the range where to perform the search.

Returns:

the index of the last occurrence of the given character in the specified sub-sequence, or -1 if no occurrence has been found or if the text argument is null.

See Also:
String.lastIndexOf(int, int)
indexOfLineStart

public static int indexOfLineStart(CharSequence text, int numLines, int fromIndex)

Returns the index of the first character after the given number of lines. This method counts the number of occurrence of '\n', '\r' or "\r\n" starting from the given position. When numLines occurrences have been found, the index of the first character after the last occurrence is returned.
If the numLines argument is positive, this method searches forward. If negative, this method searches backward. If 0, this method returns the beginning of the current line.

If this method reaches the end of text while searching forward, then text.length() is returned. If this method reaches the beginning of text while searching backward, then 0 is returned.

Parameters:

text - the string in which to skip a determined number of lines.

numLines - the number of lines to skip. Can be positive, zero or negative.

fromIndex - index at which to start the search, from 0 to text.length() inclusive.

Returns:

index of the first character after the last skipped line.

Throws:

NullPointerException - if the text argument is null.

IndexOutOfBoundsException - if fromIndex is out of bounds.
skipLeadingWhitespaces
public static int skipLeadingWhitespaces(CharSequence text, int fromIndex, int toIndex)
Returns the index of the first non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character after the given range, which is always equals or greater than toIndex. Note that this character may not exist if toIndex is equal to the text length.
Special cases:
- If fromIndex is greater than toIndex, then this method unconditionally returns fromIndex.
- If the given range contains only space characters and the character at toIndex-1 is the high surrogate of a valid supplementary code point, then this method returns toIndex+1, which is the index of the next code point.
- If fromIndex is negative or toIndex is greater than the text length, then the behavior of this method is undefined.
Space characters are identified by the Character.isWhitespace(int) method.
Parameters:

text - the string in which to perform the search (cannot be null).

fromIndex - the index from which to start the search (cannot be negative).

toIndex - the index after the last character where to perform the search.

Returns:

the index within the text of the first occurrence of a non-space character, starting at the specified index, or a value equals or greater than toIndex if none.

Throws:

NullPointerException - if the text argument is null.

See Also:
skipTrailingWhitespaces(CharSequence, int, int)

trimWhitespaces(CharSequence)

String.stripLeading()
skipTrailingWhitespaces
public static int skipTrailingWhitespaces(CharSequence text, int fromIndex, int toIndex)
Returns the index after the last non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character in the given range, which is always equals or lower than fromIndex.
Special cases:
- If fromIndex is lower than toIndex, then this method unconditionally returns toIndex.
- If the given range contains only space characters and the character at fromIndex is the low surrogate of a valid supplementary code point, then this method returns fromIndex-1, which is the index of the code point.
- If fromIndex is negative or toIndex is greater than the text length, then the behavior of this method is undefined.
Space characters are identified by the Character.isWhitespace(int) method.
Parameters:

text - the string in which to perform the search (cannot be null).

fromIndex - the index from which to start the search (cannot be negative).

toIndex - the index after the last character where to perform the search.

Returns:

the index within the text of the last occurrence of a non-space character, starting at the specified index, or a value equals or lower than fromIndex if none.

Throws:

NullPointerException - if the text argument is null.

See Also:
skipLeadingWhitespaces(CharSequence, int, int)

trimWhitespaces(CharSequence)

String.stripTrailing()
split
public static CharSequence[] split(CharSequence text, char separator)
Splits a text around the given character. The array returned by this method contains all subsequences of the given text that is terminated by the given character or is terminated by the end of the text. The subsequences in the array are in the order in which they occur in the given text. If the character is not found in the input, then the resulting array has just one element, which is the whole given text.
This method is similar to the standard String.split(String) method except for the following:
- It accepts generic character sequences.
- It accepts null argument, in which case an empty array is returned.
- The separator is a simple character instead of a regular expression.
- If the separator argument is '\n' or '\r', then this method splits around any of "\r", "\n" or "\r\n" characters sequences.
- The leading and trailing spaces of each subsequences are trimmed.
Parameters:

text - the text to split, or null.

separator - the delimiting character (typically the coma).

Returns:

the array of subsequences computed by splitting the given text around the given character, or an empty array if text was null.

See Also:
String.split(String)
splitOnEOL
public static CharSequence[] splitOnEOL(CharSequence text)
Splits a text around the End Of Line (EOL) characters. EOL characters can be any of "\r", "\n" or "\r\n" sequences. Each element in the returned array will be a single line. If the given text is already a single line, then this method returns a singleton containing only the given text.
Notes:
- At the difference of split(toSplit, '\n’), this method does not remove whitespaces.
- This method does not check for Unicode line separator and paragraph separator.
Performance note
Prior Java 8 this method was usually cheap because all string instances created by String.substring(int,int) shared the same char[] internal array. However, since Java 8, the new String implementation copies the data in new arrays. Consequently, it is better to use index rather than this method for splitting large Strings. However, this method still useful for other CharSequence implementations providing an efficient subSequence(int,int) method.
Parameters:

text - the multi-line text from which to get the individual lines, or null.

Returns:

the lines in the text, or an empty array if the given text was null.

See Also:
indexOfLineStart(CharSequence, int, int)
parseDoubles

public static double[] parseDoubles(CharSequence values, char separator) throws NumberFormatException

Splits the given text around the given character, then parses each item as a double. Empty sub-sequences are parsed as Double.NaN.

Parameters:

values - the text containing the values to parse, or null.

separator - the delimiting character (typically the coma).

Returns:

the array of numbers parsed from the given text, or an empty array if values was null.

Throws:

NumberFormatException - if at least one number cannot be parsed.
parseFloats

public static float[] parseFloats(CharSequence values, char separator) throws NumberFormatException

Splits the given text around the given character, then parses each item as a float. Empty sub-sequences are parsed as Float.NaN.

Parameters:

values - the text containing the values to parse, or null.

separator - the delimiting character (typically the coma).

Returns:

the array of numbers parsed from the given text, or an empty array if values was null.

Throws:

NumberFormatException - if at least one number cannot be parsed.
parseLongs

public static long[] parseLongs(CharSequence values, char separator, int radix) throws NumberFormatException

Splits the given text around the given character, then parses each item as a long.

Parameters:

values - the text containing the values to parse, or null.

separator - the delimiting character (typically the coma).

radix - the radix to be used for parsing. This is usually 10.

Returns:

the array of numbers parsed from the given text, or an empty array if values was null.

Throws:

NumberFormatException - if at least one number cannot be parsed.
parseInts

public static int[] parseInts(CharSequence values, char separator, int radix) throws NumberFormatException

Splits the given text around the given character, then parses each item as an int.

Parameters:

values - the text containing the values to parse, or null.

separator - the delimiting character (typically the coma).

radix - the radix to be used for parsing. This is usually 10.

Returns:

the array of numbers parsed from the given text, or an empty array if values was null.

Throws:

NumberFormatException - if at least one number cannot be parsed.
parseShorts

public static short[] parseShorts(CharSequence values, char separator, int radix) throws NumberFormatException

Splits the given text around the given character, then parses each item as a short.

Parameters:

values - the text containing the values to parse, or null.

separator - the delimiting character (typically the coma).

radix - the radix to be used for parsing. This is usually 10.

Returns:

the array of numbers parsed from the given text, or an empty array if values was null.

Throws:

NumberFormatException - if at least one number cannot be parsed.
parseBytes

public static byte[] parseBytes(CharSequence values, char separator, int radix) throws NumberFormatException

Splits the given text around the given character, then parses each item as a byte.

Parameters:

values - the text containing the values to parse, or null.

separator - the delimiting character (typically the coma).

radix - the radix to be used for parsing. This is usually 10.

Returns:

the array of numbers parsed from the given text, or an empty array if values was null.

Throws:

NumberFormatException - if at least one number cannot be parsed.
toASCII
public static CharSequence toASCII(CharSequence text)

Replaces some Unicode characters by ASCII characters on a "best effort basis". For example, the “ é ” character is replaced by “ e ” (without accent), the “ ″ ” symbol for minutes of angle is replaced by straight double quotes “ " ”, and combined characters like ㎏, ㎎, ㎝, ㎞, ㎢, ㎦, ㎖, ㎧, ㎩, ㎐, etc. are replaced by the corresponding sequences of characters.
Note: the replacement of Greek letters is a more complex task than what this method can do, since it depends on the context. For example if the Greek letters are abbreviations for coordinate system axes like φ and λ, then the replacements depend on the enclosing coordinate system. See Transliterator for more information.
Parameters:

text - the text to scan for Unicode characters to replace by ASCII characters, or null.

Returns:

the given text with substitutions applied, or text if no replacement has been applied, or null if the given text was null.

See Also:
StringBuilders.toASCII(StringBuilder)

Transliterator.filter(String)

Normalizer
trimWhitespaces
public static CharSequence trimWhitespaces(CharSequence text)

Returns a text with leading and trailing whitespace characters omitted. Space characters are identified by the Character.isWhitespace(int) method.
This method is the generalized version of String.strip().
Parameters:

text - the text from which to remove leading and trailing whitespaces, or null.

Returns:

a characters sequence with leading and trailing whitespaces removed, or null is the given text was null.

See Also:
skipLeadingWhitespaces(CharSequence, int, int)

skipTrailingWhitespaces(CharSequence, int, int)

String.strip()
trimWhitespaces
public static CharSequence trimWhitespaces(CharSequence text, int lower, int upper)
Returns a sub-sequence with leading and trailing whitespace characters omitted. Space characters are identified by the Character.isWhitespace(int) method.
Invoking this method is functionally equivalent to the following code snippet, except that the subSequence method is invoked only once instead of two times:
text = trimWhitespaces(text.subSequence(lower, upper));
Parameters:

text - the text from which to remove leading and trailing white spaces.

lower - index of the first character to consider for inclusion in the sub-sequence.

upper - index after the last character to consider for inclusion in the sub-sequence.

Returns:

a characters sequence with leading and trailing white spaces removed, or null if the text argument is null.

Throws:

IndexOutOfBoundsException - if lower or upper is out of bounds.
trimIgnorables
public static CharSequence trimIgnorables(CharSequence text)
Returns a text with ignorable characters in Unicode identifier removed. While valid in identifiers, those ignorable characters are often non-displayed. An example of ignorable character is the zero-width space.
Relationship with XML
Unlike Unicode identifiers, ignorable characters are invalid in XML identifiers. This restriction avoids, for example, homograph attacks in domain name. So this method can be used for converting an Unicode identifier to an XML identifier, except for the characters listed below. Those characters are non-ignorable (so not removed by this method), but nevertheless invalid in XML identifiers.
- µ (U+00B5) — micro
- ª (U+00AA) — feminine ordinal indicator
- º (U+00BA) — masculine ordinal indicator
- ⁔ (U+2054) — inverted undertie
Parameters:

text - the text from which to remove ignorable characters, or null.

Returns:

text with ignorable characters removed, or null if the given text was null.

Since:

1.5

See Also:
Character.isIdentifierIgnorable(int)
trimFractionalPart
public static CharSequence trimFractionalPart(CharSequence value)

Trims the fractional part of the given formatted number, provided that it doesn't change the value. This method assumes that the number is formatted in the US locale, typically by the Double.toString(double) method.
More specifically if the given value ends with a '.' character followed by a sequence of '0' characters, then those characters are omitted. Otherwise this method returns the text unchanged. This is a all or nothing method: either the fractional part is completely removed, or either it is left unchanged.

Examples
This method returns "4" if the given value is "4.", "4.0" or "4.00", but returns "4.10" unchanged (including the trailing '0' character) if the input is "4.10".
Use case
This method is useful before to parse a number if that number should preferably be parsed as an integer before attempting to parse it as a floating point number.
Parameters:

value - the value to trim if possible, or null.

Returns:

the value without the trailing ".0" part (if any), or null if the given text was null.

See Also:
StringBuilders.trimFractionalPart(StringBuilder)
shortSentence

public static CharSequence shortSentence(CharSequence text, int maxLength)

Makes sure that the text string is not longer than maxLength characters. If text is not longer, then it is returned unchanged. Otherwise this method returns a copy of text with some characters substituted by the "(…)" string.
If the text needs to be shortened, then this method tries to apply the above-cited substitution between two words. For example, the following text:

"This sentence given as an example is way too long to be included in a short name."
May be shortened to something like this:
"This sentence given (…) in a short name."

Parameters:

text - the sentence to reduce if it is too long, or null.

maxLength - the maximum length allowed for text.

Returns:

a sentence not longer than maxLength, or null if the given text was null.
upperCaseToSentence
public static CharSequence upperCaseToSentence(CharSequence identifier)
Given a string in upper cases (typically a Java constant), returns a string formatted like an English sentence. This heuristic method performs the following steps:
1. Replace all occurrences of '_' by spaces.
2. Converts all letters except the first one to lower case letters using Character.toLowerCase(int). Note that this method does not use the String.toLowerCase() method. Consequently, the system locale is ignored. This method behaves as if the conversion were done in the root locale.
Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.
Parameters:

identifier - the name of a Java constant, or null.

Returns:

the identifier like an English sentence, or null if the given identifier argument was null.
camelCaseToSentence
public static CharSequence camelCaseToSentence(CharSequence identifier)
Given a string in camel cases (typically an identifier), returns a string formatted like an English sentence. This heuristic method performs the following steps:
1. Invoke camelCaseToWords(CharSequence, boolean), which separate the words on the basis of character case. For example, "transferFunctionType" become transfer function type. This works fine for ISO 19115 identifiers.
2. Next replace all occurrence of '_' by spaces in order to take in account another common naming convention, which uses '_' as a word separator. This convention is used by netCDF attributes like "project_name".
3. Finally ensure that the first character is upper-case.
Exception to the above rules
If the given identifier contains only upper-case letters, digits and the '_' character, then the identifier is returned "as is" except for the '_' characters which are replaced by '-'. This work well for identifiers like "UTF-8" or "ISO-LATIN-1" for instance.
Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.
Parameters:

identifier - an identifier with no space, words begin with an upper-case character, or null.

Returns:

the identifier with spaces inserted after what looks like words, or null if the given identifier argument was null.
camelCaseToWords

public static CharSequence camelCaseToWords(CharSequence identifier, boolean toLowerCase)

Given a string in camel cases, returns a string with the same words separated by spaces. A word begins with a upper-case character following a lower-case character. For example if the given string is "PixelInterleavedSampleModel", then this method returns Pixel Interleaved Sample Model or Pixel interleaved sample model depending on the value of the toLowerCase argument.
If toLowerCase is false, then this method inserts spaces but does not change the case of characters. If toLowerCase is true, then this method changes to lower case the first character after each spaces inserted by this method (note that this intentionally exclude the very first character in the given string), except if the second character is upper case, in which case the word is assumed an acronym.

The given string is usually a programmatic identifier like a class name or a method name.

Parameters:

identifier - an identifier with no space, words begin with an upper-case character.

toLowerCase - true for changing the first character of words to lower case, except for the first word and acronyms.

Returns:

the identifier with spaces inserted after what looks like words, or null if the given identifier argument was null.
camelCaseToAcronym

public static CharSequence camelCaseToAcronym(CharSequence text)

Creates an acronym from the given text. This method returns a string containing the first character of each word, where the words are separated by the camel case convention, the '_' character, or any character which is not a Unicode identifier part (including spaces).
An exception to the above rule happens if the given text is a Unicode identifier without the '_' character, and every characters are upper case. In such case the text is returned unchanged on the assumption that it is already an acronym.

Examples: given "northEast", this method returns "NE". Given "Open Geospatial Consortium", this method returns "OGC".

Parameters:

text - the text for which to create an acronym, or null.

Returns:

the acronym, or null if the given text was null.
isAcronymForWords

public static boolean isAcronymForWords(CharSequence acronym, CharSequence words)

Returns true if the first string is likely to be an acronym of the second string. An acronym is a sequence of letters or digits built from at least one character of each word in the words string. More than one character from the same word may appear in the acronym, but they must always be the first consecutive characters. The comparison is case-insensitive. If any of the given arguments is null, this method returns false.
If a word contains digits, than the digits shall either be all absent or all present in the acronym. An acronym with only the first digits is not considered as a match because it changes the numerical value.

Example
Given the "Open Geospatial Consortium" words, the following strings are recognized as acronyms: "OGC", "ogc", "O.G.C.", "OpGeoCon".

Parameters:

acronym - a possible acronym of the sequence of words, or null.

words - the sequence of words, or null.

Returns:

true if the first string is an acronym of the second one.

isUnicodeIdentifier

public static boolean isUnicodeIdentifier(CharSequence identifier)

Returns true if the given identifier is a legal Unicode identifier. This method returns true if the identifier length is greater than zero, the first character is a Unicode identifier start and all remaining characters (if any) are Unicode identifier parts.

Relationship with legal XML identifiers

Most legal Unicode identifiers are also legal XML identifiers, but the converse is not true. The most noticeable differences are the ‘:’, ‘-’ and ‘.’ characters, which are legal in XML identifiers but not in Unicode.

Characters legal in one set but not in the other
Not legal in Unicode		Not legal in XML
`:`	(colon)	`µ`	(micro sign)
`-`	(hyphen or minus)	`ª`	(feminine ordinal indicator)
`.`	(dot)	`º`	(masculine ordinal indicator)
`·`	(middle dot)	`⁔`	(inverted undertie)
Many punctuation, symbols, etc.		Identifier ignorable characters.

Note that the ‘_’ (underscore) character is legal according both Unicode and XML, while spaces, ‘!’, ‘#’, ‘*’, ‘/’, ‘?’ and most other punctuation characters are not.

Usage in Apache SIS

In its handling of identifiers, Apache SIS favors Unicode identifiers without ignorable characters since those identifiers are legal XML identifiers except for the above-cited rarely used characters. As a side effect, this policy excludes ‘:’, ‘-’ and ‘.’ which would normally be legal XML identifiers. But since those characters could easily be confused with namespace separators, this exclusion is considered desirable.

Parameters:

identifier - the character sequence to test, or null.

Returns:

true if the given character sequence is a legal Unicode identifier.

See Also:

isUpperCase
public static boolean isUpperCase(CharSequence text)

Returns true if the given text is non-null, contains at least one upper-case character and no lower-case character. Space and punctuation are ignored.
Parameters:

text - the character sequence to test (may be null).

Returns:

true if non-null, contains at least one upper-case character and no lower-case character.

Since:

0.7

See Also:
String.toUpperCase()
equalsFiltered
public static boolean equalsFiltered(CharSequence s1, CharSequence s2, Characters.Filter filter, boolean ignoreCase)
Returns true if the given texts are equal, optionally ignoring case and filtered-out characters. This method is sometimes used for comparing identifiers in a lenient way.
Example: the following call compares the two strings ignoring case and any characters which are not letter or digit. In particular, spaces and punctuation characters like '_' and '-' are ignored:
assert equalsFiltered("WGS84", "WGS_84", Characters.Filter.LETTERS_AND_DIGITS, true) == true;
Parameters:

s1 - the first characters sequence to compare, or null.

s2 - the second characters sequence to compare, or null.

filter - the subset of characters to compare, or null for comparing all characters.

ignoreCase - true for ignoring cases, or false for requiring exact match.

Returns:

true if both arguments are null or if the two given texts are equal, optionally ignoring case and filtered-out characters.
equalsIgnoreCase
public static boolean equalsIgnoreCase(CharSequence s1, CharSequence s2)

Returns true if the two given texts are equal, ignoring case. This method is similar to String.equalsIgnoreCase(String), except it works on arbitrary character sequences and compares code points instead of characters.
Parameters:

s1 - the first string to compare, or null.

s2 - the second string to compare, or null.

Returns:

true if the two given texts are equal, ignoring case, or if both arguments are null.

See Also:
String.equalsIgnoreCase(String)
equals
public static boolean equals(CharSequence s1, CharSequence s2)

Returns true if the two given texts are equal. This method delegates to String.contentEquals(CharSequence) if possible. This method never invoke CharSequence.toString() in order to avoid a potentially large copy of data.
Parameters:

s1 - the first string to compare, or null.

s2 - the second string to compare, or null.

Returns:

true if the two given texts are equal, or if both arguments are null.

See Also:
String.contentEquals(CharSequence)
regionMatches
public static boolean regionMatches(CharSequence text, int fromIndex, CharSequence part)
Returns true if the given text at the given offset contains the given part, in a case-sensitive comparison. This method is equivalent to the following code, except that this method works on arbitrary CharSequence objects instead of Strings only:
return text.regionMatches(offset, part, 0, part.length());
This method does not thrown IndexOutOfBoundsException. Instead, if fromIndex < 0 or fromIndex + part.length() > text.length(), then this method returns false.
Parameters:

text - the character sequence for which to tests for the presence of part.

fromIndex - the offset in text where to test for the presence of part.

part - the part which may be present in text.

Returns:

true if text contains part at the given offset.

Throws:

NullPointerException - if any of the arguments is null.

See Also:
String.regionMatches(int, String, int, int)
regionMatches
public static boolean regionMatches(CharSequence text, int fromIndex, CharSequence part, boolean ignoreCase)
Returns true if the given text at the given offset contains the given part, optionally in a case-insensitive way. This method is equivalent to the following code, except that this method works on arbitrary CharSequence objects instead of Strings only:
return text.regionMatches(ignoreCase, offset, part, 0, part.length());
This method does not thrown IndexOutOfBoundsException. Instead, if fromIndex < 0 or fromIndex + part.length() > text.length(), then this method returns false.
Parameters:

text - the character sequence for which to tests for the presence of part.

fromIndex - the offset in text where to test for the presence of part.

part - the part which may be present in text.

ignoreCase - true if the case should be ignored.

Returns:

true if text contains part at the given offset.

Throws:

NullPointerException - if any of the arguments is null.

Since:

0.4

See Also:
String.regionMatches(boolean, int, String, int, int)
startsWith

public static boolean startsWith(CharSequence text, CharSequence prefix, boolean ignoreCase)

Returns true if the given character sequence starts with the given prefix.

Parameters:

text - the characters sequence to test.

prefix - the expected prefix.

ignoreCase - true if the case should be ignored.

Returns:

true if the given sequence starts with the given prefix.

Throws:

NullPointerException - if any of the arguments is null.
endsWith

public static boolean endsWith(CharSequence text, CharSequence suffix, boolean ignoreCase)

Returns true if the given character sequence ends with the given suffix.

Parameters:

text - the characters sequence to test.

suffix - the expected suffix.

ignoreCase - true if the case should be ignored.

Returns:

true if the given sequence ends with the given suffix.

Throws:

NullPointerException - if any of the arguments is null.
commonPrefix

public static CharSequence commonPrefix(CharSequence s1, CharSequence s2)

Returns the longest sequence of characters which is found at the beginning of the two given texts. If one of those texts is null, then the other text is returned. If there is no common prefix, then this method returns an empty string.

Parameters:

s1 - the first text, or null.

s2 - the second text, or null.

Returns:

the common prefix of both texts (may be empty), or null if both texts are null.
commonSuffix

public static CharSequence commonSuffix(CharSequence s1, CharSequence s2)

Returns the longest sequence of characters which is found at the end of the two given texts. If one of those texts is null, then the other text is returned. If there is no common suffix, then this method returns an empty string.

Parameters:

s1 - the first text, or null.

s2 - the second text, or null.

Returns:

the common suffix of both texts (may be empty), or null if both texts are null.
commonWords
public static CharSequence commonWords(CharSequence s1, CharSequence s2)
Returns the words found at the beginning and end of both texts. The returned string is the concatenation of the common prefix with the common suffix, with prefix and suffix eventually made shorter for avoiding to cut in the middle of a word.
The purpose of this method is to create a global identifier from a list of component identifiers. The latter are often eastward and northward components of a vector, in which case this method provides an identifier for the vector as a whole.

If one of the given texts is null, then the other text is returned. If there are no common words, then this method returns an empty string.

Example
Given the following inputs:
- "baroclinic_eastward_velocity"
- "baroclinic_northward_velocity"
This method returns "baroclinic_velocity". Note that the "ward" characters are a common suffix of both texts but nevertheless omitted because they cut a word.
Possible future evolution
Current implementation searches only for a common prefix and a common suffix, ignoring any common words that may appear in the middle of the strings. A character is considered the beginning of a word if it is a letter or digit which is not preceded by another letter or digit (as leading "s" and "c" in "snake_case"), or if it is an upper case letter preceded by a lower case letter or no letter (as both "C" in "CamelCase").
Parameters:

s1 - the first text, or null.

s2 - the second text, or null.

Returns:

the common suffix of both texts (may be empty), or null if both texts are null.

Since:

1.1
token
public static CharSequence token(CharSequence text, int fromIndex)
Returns the token starting at the given offset in the given text. For the purpose of this method, a "token" is any sequence of consecutive characters of the same type, as defined below.
Let define c as the first non-blank character located at an index equals or greater than the given offset. Then the characters that are considered of the same type are:
- If c is a Unicode identifier start, then any following characters that are Unicode identifier part.
- If c is a dash punctuation of a connector punctuation, then all following punctuation characters of the same type followed by all characters that are Unicode identifier part.
- Otherwise any character for which Character.getType(int) returns the same value as for c.
Parameters:

text - the text for which to get the token.

fromIndex - index of the first character to consider in the given text.

Returns:

a sub-sequence of text starting at the given offset, or an empty string if there are no non-blank character at or after the given offset.

Throws:

NullPointerException - if the text argument is null.
replace
public static CharSequence replace(CharSequence text, CharSequence toSearch, CharSequence replaceBy)

Replaces all occurrences of a given string in the given character sequence. If no occurrence of toSearch is found in the given text or if toSearch is equal to replaceBy, then this method returns the text unchanged. Otherwise this method returns a new character sequence with all occurrences replaced by replaceBy.
This method is similar to String.replace(CharSequence, CharSequence) except that is accepts arbitrary CharSequence objects. As of Java 10, another difference is that this method does not create a new String if toSearch is equal to replaceBy.
Parameters:

text - the character sequence in which to perform the replacements, or null.

toSearch - the string to replace.

replaceBy - the replacement for the searched string.

Returns:

the given text with replacements applied, or text if no replacement has been applied, or null if the given text was null

Since:

0.4

See Also:
String.replace(char, char)

StringBuilders.replace(StringBuilder, String, String)

String.replace(CharSequence, CharSequence)
copyChars
public static void copyChars(CharSequence src, int srcOffset, char[] dst, int dstOffset, int length)

Copies a sequence of characters in the given char[] array.
Parameters:

src - the characters sequence from which to copy characters.

srcOffset - index of the first character from src to copy.

dst - the array where to copy the characters.

dstOffset - index where to write the first character in dst.

length - number of characters to copy.

See Also:
String.getChars(int, int, char[], int)

StringBuilder.getChars(int, int, char[], int)

StringBuffer.getChars(int, int, char[], int)

CharBuffer.get(char[], int, int)

Segment.array

Class CharSequences

Unicode support

Policy on space characters

Handling of null values

Field Summary

Method Summary

Methods inherited from class Object

Field Details

EMPTY_ARRAY

Method Details

spaces

Use case

length

codePointCount

codePointCount

count

count

indexOf

indexOf

lastIndexOf

indexOfLineStart

skipLeadingWhitespaces

skipTrailingWhitespaces

split

splitOnEOL

Performance note

parseDoubles

parseFloats

parseLongs

parseInts

parseShorts

parseBytes

toASCII

trimWhitespaces

trimWhitespaces

trimIgnorables

Relationship with XML

trimFractionalPart

Examples

Use case

shortSentence

upperCaseToSentence

camelCaseToSentence

Exception to the above rules

camelCaseToWords

camelCaseToAcronym

isAcronymForWords

Example

isUnicodeIdentifier

Relationship with legal XML identifiers

Usage in Apache SIS

isUpperCase

equalsFiltered

equalsIgnoreCase

equals

regionMatches

regionMatches

startsWith

endsWith

commonPrefix

commonSuffix

commonWords

Example

Possible future evolution

token

replace

copyChars