Class CharSequences

Object
Static
CharSequences

public final class CharSequences extends Static
Static methods working with Char­Sequence instances. Some methods defined in this class duplicate the functionalities already provided in the standard String class, but works on a generic Char­Sequence instance instead of String.

Unicode support

Every methods defined in this class work on code points instead of characters when appropriate. Consequently, those methods should behave correctly with characters outside the Basic Multilingual Plane (BMP).

Policy on space characters

Java defines two methods for testing if a character is a white space: Character​.is­Whitespace(int) and Character​.is­Space­Char(int). Those two methods differ in the way they handle no-break spaces, tabulations and line feeds. The general policy in the SIS library is:
  • Use is­Whitespace(…) when separating entities (words, numbers, tokens, etc.) in a list. Using that method, characters separated by a no-break space are considered as part of the same entity.
  • Use is­Space­Char(…) when parsing a single entity, for example a single word. Using this method, no-break spaces are considered as part of the entity while line feeds or tabulations are entity boundaries.
For example numbers formatted in the French locale use no-break spaces as group separators. When parsing a list of numbers, ordinary spaces around the numbers may need to be ignored, but no-break spaces shall be considered as part of the numbers. Consequently, is­Whitespace(…) is appropriate for skipping spaces between the numbers. But if there is spaces to skip inside a single number, then is­Space­Char(…) is a good choice for accepting no-break spaces and for stopping the parse operation at tabulations or line feed character. A tabulation or line feed between two characters is very likely to separate two distinct values. In practice, the Format implementations in the SIS library typically use is­Space­Char(…) while most of the rest of the SIS library, including this Char­Sequences class, consistently uses is­Whitespace(…).

Note that the String​.trim() method doesn't follow any of those policies and should generally be avoided. That trim() method removes every ISO control characters without distinction about whether the characters are space or not, and ignore all Unicode spaces. The trim­Whitespaces(String) method defined in this class can be used as an alternative.

Handling of null values

Most methods in this class accept a null Char­Sequence argument. In such cases the method return value is either a null Char­Sequence, an empty array, or a 0 or false primitive type calculated as if the input was an empty string.
Since:
0.3
See Also:
  • Field Details

  • Method Details

    • spaces

      public static CharSequence spaces(int length)
      Returns a character sequence of the specified length filled with white spaces.

      Use case

      This method is typically invoked for performing right-alignment of text on the console or other device using monospaced font. Callers compute a value for the length argument by (desired width - used width). Since the used width value may be greater than expected, this method handle negative length values as if the value was zero.
      Parameters:
      length - the string length. Negative values are clamped to 0.
      Returns:
      a string of length length filled with white spaces.
    • length

      public static int length(CharSequence text)
      Returns the length of the given characters sequence, or 0 if null.
      Parameters:
      text - the character sequence from which to get the length, or null.
      Returns:
      the length of the character sequence, or 0 if the argument is null.
    • codePointCount

      public static int codePointCount(CharSequence text)
      Returns the number of Unicode code points in the given characters sequence, or 0 if null. Unpaired surrogates within the text count as one code point each.
      Parameters:
      text - the character sequence from which to get the count, or null.
      Returns:
      the number of Unicode code points, or 0 if the argument is null.
      See Also:
    • codePointCount

      public static int codePointCount(CharSequence text, int fromIndex, int toIndex)
      Returns the number of Unicode code points in the given characters sub-sequence, or 0 if null. Unpaired surrogates within the text count as one code point each.

      This method performs the same work than the standard Character​.code­Point­Count(Char­Sequence, int, int) method, except that it tries to delegate to the optimized methods from the String, String­Builder, String­Buffer or Char­Buffer classes if possible.

      Parameters:
      text - the character sequence from which to get the count, or null.
      from­Index - the index from which to start the computation.
      to­Index - the index after the last character to take in account.
      Returns:
      the number of Unicode code points, or 0 if the argument is null.
      See Also:
    • count

      public static int count(CharSequence text, String toSearch)
      Returns the number of occurrences of the to­Search string in the given text. The search is case-sensitive.
      Parameters:
      text - the character sequence to count occurrences, or null.
      to­Search - the string to search in the given text. It shall contain at least one character.
      Returns:
      the number of occurrences of to­Search in text, or 0 if text was null or empty.
      Throws:
      Null­Pointer­Exception - if the to­Search argument is null.
      Illegal­Argument­Exception - if the to­Search argument is empty.
    • count

      public static int count(CharSequence text, char toSearch)
      Counts the number of occurrence of the given character in the given character sequence.
      Parameters:
      text - the character sequence to count occurrences, or null.
      to­Search - the character to count.
      Returns:
      the number of occurrences of the given character, or 0 if the text is null.
    • indexOf

      public static int indexOf(CharSequence text, CharSequence toSearch, int fromIndex, int toIndex)
      Returns the index within the given strings of the first occurrence of the specified part, starting at the specified index. This method is equivalent to the following method call, except that this method works on arbitrary Char­Sequence objects instead of Strings only, and that the upper limit can be specified:
      return text.indexOf(part, fromIndex);
      
      There is no restriction on the value of from­Index. If negative or greater than to­Index, then the behavior of this method is as if the search started from 0 or to­Index respectively. This is consistent with the String​.index­Of(String, int) behavior.
      Parameters:
      text - the string in which to perform the search.
      to­Search - the substring for which to search.
      from­Index - the index from which to start the search.
      to­Index - the index after the last character where to perform the search.
      Returns:
      the index within the text of the first occurrence of the specified part, starting at the specified index, or -1 if no occurrence has been found or if the text argument is null.
      Throws:
      Null­Pointer­Exception - if the to­Search argument is null.
      Illegal­Argument­Exception - if the to­Search argument is empty.
      See Also:
    • indexOf

      public static int indexOf(CharSequence text, int toSearch, int fromIndex, int toIndex)
      Returns the index within the given character sequence of the first occurrence of the specified character, starting the search at the specified index. If the character is not found, then this method returns -1.

      There is no restriction on the value of from­Index. If negative or greater than to­Index, then the behavior of this method is as if the search started from 0 or to­Index respectively. This is consistent with the behavior documented in String​.index­Of(int, int).

      Parameters:
      text - the character sequence in which to perform the search, or null.
      to­Search - the Unicode code point of the character to search.
      from­Index - the index to start the search from.
      to­Index - the index after the last character where to perform the search.
      Returns:
      the index of the first occurrence of the given character in the specified sub-sequence, or -1 if no occurrence has been found or if the text argument is null.
      See Also:
    • lastIndexOf

      public static int lastIndexOf(CharSequence text, int toSearch, int fromIndex, int toIndex)
      Returns the index within the given character sequence of the last occurrence of the specified character, searching backward in the given index range. If the character is not found, then this method returns -1.

      There is no restriction on the value of to­Index. If greater than the text length or less than from­Index, then the behavior of this method is as if the search started from length or from­Index respectively. This is consistent with the behavior documented in String​.last­Index­Of(int, int).

      Parameters:
      text - the character sequence in which to perform the search, or null.
      to­Search - the Unicode code point of the character to search.
      from­Index - the index of the first character in the range where to perform the search.
      to­Index - the index after the last character in the range where to perform the search.
      Returns:
      the index of the last occurrence of the given character in the specified sub-sequence, or -1 if no occurrence has been found or if the text argument is null.
      See Also:
    • indexOfLineStart

      public static int indexOfLineStart(CharSequence text, int numLines, int fromIndex)
      Returns the index of the first character after the given number of lines. This method counts the number of occurrence of '\n', '\r' or "\r\n" starting from the given position. When num­Lines occurrences have been found, the index of the first character after the last occurrence is returned.

      If the num­Lines argument is positive, this method searches forward. If negative, this method searches backward. If 0, this method returns the beginning of the current line.

      If this method reaches the end of text while searching forward, then text​.length() is returned. If this method reaches the beginning of text while searching backward, then 0 is returned.

      Parameters:
      text - the string in which to skip a determined amount of lines.
      num­Lines - the number of lines to skip. Can be positive, zero or negative.
      from­Index - index at which to start the search, from 0 to text​.length() inclusive.
      Returns:
      index of the first character after the last skipped line.
      Throws:
      Null­Pointer­Exception - if the text argument is null.
      Index­Out­Of­Bounds­Exception - if from­Index is out of bounds.
    • skipLeadingWhitespaces

      public static int skipLeadingWhitespaces(CharSequence text, int fromIndex, int toIndex)
      Returns the index of the first non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character after the given range, which is always equals or greater than to­Index. Note that this character may not exist if to­Index is equal to the text length.

      Special cases:

      • If from­Index is greater than to­Index, then this method unconditionally returns from­Index.
      • If the given range contains only space characters and the character at to­Index-1 is the high surrogate of a valid supplementary code point, then this method returns to­Index+1, which is the index of the next code point.
      • If from­Index is negative or to­Index is greater than the text length, then the behavior of this method is undefined.
      Space characters are identified by the Character​.is­Whitespace(int) method.
      Parameters:
      text - the string in which to perform the search (cannot be null).
      from­Index - the index from which to start the search (cannot be negative).
      to­Index - the index after the last character where to perform the search.
      Returns:
      the index within the text of the first occurrence of a non-space character, starting at the specified index, or a value equals or greater than to­Index if none.
      Throws:
      Null­Pointer­Exception - if the text argument is null.
      See Also:
    • skipTrailingWhitespaces

      public static int skipTrailingWhitespaces(CharSequence text, int fromIndex, int toIndex)
      Returns the index after the last non-white character in the given range. If the given range contains only space characters, then this method returns the index of the first character in the given range, which is always equals or lower than from­Index.

      Special cases:

      • If from­Index is lower than to­Index, then this method unconditionally returns to­Index.
      • If the given range contains only space characters and the character at from­Index is the low surrogate of a valid supplementary code point, then this method returns from­Index-1, which is the index of the code point.
      • If from­Index is negative or to­Index is greater than the text length, then the behavior of this method is undefined.
      Space characters are identified by the Character​.is­Whitespace(int) method.
      Parameters:
      text - the string in which to perform the search (cannot be null).
      from­Index - the index from which to start the search (cannot be negative).
      to­Index - the index after the last character where to perform the search.
      Returns:
      the index within the text of the last occurrence of a non-space character, starting at the specified index, or a value equals or lower than from­Index if none.
      Throws:
      Null­Pointer­Exception - if the text argument is null.
      See Also:
    • split

      public static CharSequence[] split(CharSequence text, char separator)
      Splits a text around the given character. The array returned by this method contains all subsequences of the given text that is terminated by the given character or is terminated by the end of the text. The subsequences in the array are in the order in which they occur in the given text. If the character is not found in the input, then the resulting array has just one element, which is the whole given text.

      This method is similar to the standard String​.split(String) method except for the following:

      • It accepts generic character sequences.
      • It accepts null argument, in which case an empty array is returned.
      • The separator is a simple character instead of a regular expression.
      • If the separator argument is '\n' or '\r', then this method splits around any of "\r", "\n" or "\r\n" characters sequences.
      • The leading and trailing spaces of each subsequences are trimmed.
      Parameters:
      text - the text to split, or null.
      separator - the delimiting character (typically the coma).
      Returns:
      the array of subsequences computed by splitting the given text around the given character, or an empty array if text was null.
      See Also:
    • splitOnEOL

      public static CharSequence[] splitOnEOL(CharSequence text)
      Splits a text around the End Of Line (EOL) characters. EOL characters can be any of "\r", "\n" or "\r\n" sequences. Each element in the returned array will be a single line. If the given text is already a single line, then this method returns a singleton containing only the given text.

      Notes:

      Performance note

      Prior Java 8 this method was usually cheap because all string instances created by String​.substring(int,int) shared the same char[] internal array. However, since Java 8, the new String implementation copies the data in new arrays. Consequently, it is better to use index rather than this method for splitting large Strings. However, this method still useful for other Char­Sequence implementations providing an efficient sub­Sequence(int,int) method.
      Parameters:
      text - the multi-line text from which to get the individual lines, or null.
      Returns:
      the lines in the text, or an empty array if the given text was null.
      See Also:
    • parseDoubles

      public static double[] parseDoubles(CharSequence values, char separator) throws NumberFormatException
      Splits the given text around the given character, then parses each item as a double. Empty sub-sequences are parsed as Double​.Na­N.
      Parameters:
      values - the text containing the values to parse, or null.
      separator - the delimiting character (typically the coma).
      Returns:
      the array of numbers parsed from the given text, or an empty array if values was null.
      Throws:
      Number­Format­Exception - if at least one number cannot be parsed.
    • parseFloats

      public static float[] parseFloats(CharSequence values, char separator) throws NumberFormatException
      Splits the given text around the given character, then parses each item as a float. Empty sub-sequences are parsed as Float​.Na­N.
      Parameters:
      values - the text containing the values to parse, or null.
      separator - the delimiting character (typically the coma).
      Returns:
      the array of numbers parsed from the given text, or an empty array if values was null.
      Throws:
      Number­Format­Exception - if at least one number cannot be parsed.
    • parseLongs

      public static long[] parseLongs(CharSequence values, char separator, int radix) throws NumberFormatException
      Splits the given text around the given character, then parses each item as a long.
      Parameters:
      values - the text containing the values to parse, or null.
      separator - the delimiting character (typically the coma).
      radix - the radix to be used for parsing. This is usually 10.
      Returns:
      the array of numbers parsed from the given text, or an empty array if values was null.
      Throws:
      Number­Format­Exception - if at least one number cannot be parsed.
    • parseInts

      public static int[] parseInts(CharSequence values, char separator, int radix) throws NumberFormatException
      Splits the given text around the given character, then parses each item as an int.
      Parameters:
      values - the text containing the values to parse, or null.
      separator - the delimiting character (typically the coma).
      radix - the radix to be used for parsing. This is usually 10.
      Returns:
      the array of numbers parsed from the given text, or an empty array if values was null.
      Throws:
      Number­Format­Exception - if at least one number cannot be parsed.
    • parseShorts

      public static short[] parseShorts(CharSequence values, char separator, int radix) throws NumberFormatException
      Splits the given text around the given character, then parses each item as a short.
      Parameters:
      values - the text containing the values to parse, or null.
      separator - the delimiting character (typically the coma).
      radix - the radix to be used for parsing. This is usually 10.
      Returns:
      the array of numbers parsed from the given text, or an empty array if values was null.
      Throws:
      Number­Format­Exception - if at least one number cannot be parsed.
    • parseBytes

      public static byte[] parseBytes(CharSequence values, char separator, int radix) throws NumberFormatException
      Splits the given text around the given character, then parses each item as a byte.
      Parameters:
      values - the text containing the values to parse, or null.
      separator - the delimiting character (typically the coma).
      radix - the radix to be used for parsing. This is usually 10.
      Returns:
      the array of numbers parsed from the given text, or an empty array if values was null.
      Throws:
      Number­Format­Exception - if at least one number cannot be parsed.
    • toASCII

      public static CharSequence toASCII(CharSequence text)
      Replaces some Unicode characters by ASCII characters on a "best effort basis". For example, the “ é ” character is replaced by “ e ” (without accent), the “ ″ ” symbol for minutes of angle is replaced by straight double quotes “ " ”, and combined characters like ㎏, ㎎, ㎝, ㎞, ㎢, ㎦, ㎖, ㎧, ㎩, ㎐, etc. are replaced by the corresponding sequences of characters.
      Note: the replacement of Greek letters is a more complex task than what this method can do, since it depends on the context. For example if the Greek letters are abbreviations for coordinate system axes like φ and λ, then the replacements depend on the enclosing coordinate system. See Transliterator for more information.
      Parameters:
      text - the text to scan for Unicode characters to replace by ASCII characters, or null.
      Returns:
      the given text with substitutions applied, or text if no replacement has been applied, or null if the given text was null.
      See Also:
    • trimWhitespaces

      @Deprecated(since="1.4", forRemoval=true) public static String trimWhitespaces(String text)
      Deprecated, for removal: This API element is subject to removal in a future version.
      Replaced by String​.strip() in JDK 11.
      Returns a string with leading and trailing whitespace characters omitted. This method is similar in purpose to String​.trim(), except that the latter considers every ISO control codes below 32 to be a whitespace. That String​.trim() behavior has the side effect of removing the heading of ANSI escape sequences (a.k.a. X3.64), and to ignore Unicode spaces. This trim­Whitespaces(…) method is built on the more accurate Character​.is­Whitespace(int) method instead.

      This method performs the same work than trim­Whitespaces(Char­Sequence), but is overloaded for the String type because of its frequent use.

      Parameters:
      text - the text from which to remove leading and trailing whitespaces, or null.
      Returns:
      a string with leading and trailing whitespaces removed, or null is the given text was null.
    • trimWhitespaces

      public static CharSequence trimWhitespaces(CharSequence text)
      Returns a text with leading and trailing whitespace characters omitted. Space characters are identified by the Character​.is­Whitespace(int) method.

      This method is the generic version of trim­Whitespaces(String).

      Parameters:
      text - the text from which to remove leading and trailing whitespaces, or null.
      Returns:
      a characters sequence with leading and trailing whitespaces removed, or null is the given text was null.
      See Also:
    • trimWhitespaces

      public static CharSequence trimWhitespaces(CharSequence text, int lower, int upper)
      Returns a sub-sequence with leading and trailing whitespace characters omitted. Space characters are identified by the Character​.is­Whitespace(int) method.

      Invoking this method is functionally equivalent to the following code snippet, except that the sub­Sequence method is invoked only once instead of two times:

      text = trimWhitespaces(text.subSequence(lower, upper));
      
      Parameters:
      text - the text from which to remove leading and trailing white spaces.
      lower - index of the first character to consider for inclusion in the sub-sequence.
      upper - index after the last character to consider for inclusion in the sub-sequence.
      Returns:
      a characters sequence with leading and trailing white spaces removed, or null if the text argument is null.
      Throws:
      Index­Out­Of­Bounds­Exception - if lower or upper is out of bounds.
    • trimFractionalPart

      public static CharSequence trimFractionalPart(CharSequence value)
      Trims the fractional part of the given formatted number, provided that it doesn't change the value. This method assumes that the number is formatted in the US locale, typically by the Double​.to­String(double) method.

      More specifically if the given value ends with a '.' character followed by a sequence of '0' characters, then those characters are omitted. Otherwise this method returns the text unchanged. This is a "all or nothing" method: either the fractional part is completely removed, or either it is left unchanged.

      Examples

      This method returns "4" if the given value is "4.", "4.0" or "4.00", but returns "4.10" unchanged (including the trailing '0' character) if the input is "4.10".

      Use case

      This method is useful before to parse a number if that number should preferably be parsed as an integer before attempting to parse it as a floating point number.
      Parameters:
      value - the value to trim if possible, or null.
      Returns:
      the value without the trailing ".0" part (if any), or null if the given text was null.
      See Also:
    • shortSentence

      public static CharSequence shortSentence(CharSequence text, int maxLength)
      Makes sure that the text string is not longer than max­Length characters. If text is not longer, then it is returned unchanged. Otherwise this method returns a copy of text with some characters substituted by the "(…)" string.

      If the text needs to be shortened, then this method tries to apply the above-cited substitution between two words. For example, the following text:

      "This sentence given as an example is way too long to be included in a short name."
      May be shortened to something like this:
      "This sentence given (…) in a short name."
      Parameters:
      text - the sentence to reduce if it is too long, or null.
      max­Length - the maximum length allowed for text.
      Returns:
      a sentence not longer than max­Length, or null if the given text was null.
    • upperCaseToSentence

      public static CharSequence upperCaseToSentence(CharSequence identifier)
      Given a string in upper cases (typically a Java constant), returns a string formatted like an English sentence. This heuristic method performs the following steps:
      1. Replace all occurrences of '_' by spaces.
      2. Converts all letters except the first one to lower case letters using Character​.to­Lower­Case(int). Note that this method does not use the String​.to­Lower­Case() method. Consequently, the system locale is ignored. This method behaves as if the conversion were done in the root locale.

      Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.

      Parameters:
      identifier - the name of a Java constant, or null.
      Returns:
      the identifier like an English sentence, or null if the given identifier argument was null.
    • camelCaseToSentence

      public static CharSequence camelCaseToSentence(CharSequence identifier)
      Given a string in camel cases (typically an identifier), returns a string formatted like an English sentence. This heuristic method performs the following steps:
      1. Invoke camel­Case­To­Words(Char­Sequence, boolean), which separate the words on the basis of character case. For example, "transfer­Function­Type" become "transfer function type". This works fine for ISO 19115 identifiers.
      2. Next replace all occurrence of '_' by spaces in order to take in account another common naming convention, which uses '_' as a word separator. This convention is used by netCDF attributes like "project_name".
      3. Finally ensure that the first character is upper-case.

      Exception to the above rules

      If the given identifier contains only upper-case letters, digits and the '_' character, then the identifier is returned "as is" except for the '_' characters which are replaced by '-'. This work well for identifiers like "UTF-8" or "ISO-LATIN-1" for instance.

      Note that those heuristic rules may be modified in future SIS versions, depending on the practical experience gained.

      Parameters:
      identifier - an identifier with no space, words begin with an upper-case character, or null.
      Returns:
      the identifier with spaces inserted after what looks like words, or null if the given identifier argument was null.
    • camelCaseToWords

      public static CharSequence camelCaseToWords(CharSequence identifier, boolean toLowerCase)
      Given a string in camel cases, returns a string with the same words separated by spaces. A word begins with a upper-case character following a lower-case character. For example if the given string is "Pixel­Interleaved­Sample­Model", then this method returns "Pixel Interleaved Sample Model" or "Pixel interleaved sample model" depending on the value of the to­Lower­Case argument.

      If to­Lower­Case is false, then this method inserts spaces but does not change the case of characters. If to­Lower­Case is true, then this method changes to lower case the first character after each spaces inserted by this method (note that this intentionally exclude the very first character in the given string), except if the second character is upper case, in which case the word is assumed an acronym.

      The given string is usually a programmatic identifier like a class name or a method name.

      Parameters:
      identifier - an identifier with no space, words begin with an upper-case character.
      to­Lower­Case - true for changing the first character of words to lower case, except for the first word and acronyms.
      Returns:
      the identifier with spaces inserted after what looks like words, or null if the given identifier argument was null.
    • camelCaseToAcronym

      public static CharSequence camelCaseToAcronym(CharSequence text)
      Creates an acronym from the given text. This method returns a string containing the first character of each word, where the words are separated by the camel case convention, the '_' character, or any character which is not a Unicode identifier part (including spaces).

      An exception to the above rule happens if the given text is a Unicode identifier without the '_' character, and every characters are upper case. In such case the text is returned unchanged on the assumption that it is already an acronym.

      Examples: given "north­East", this method returns "NE". Given "Open Geospatial Consortium", this method returns "OGC".

      Parameters:
      text - the text for which to create an acronym, or null.
      Returns:
      the acronym, or null if the given text was null.
    • isAcronymForWords

      public static boolean isAcronymForWords(CharSequence acronym, CharSequence words)
      Returns true if the first string is likely to be an acronym of the second string. An acronym is a sequence of letters or digits built from at least one character of each word in the words string. More than one character from the same word may appear in the acronym, but they must always be the first consecutive characters. The comparison is case-insensitive. If any of the given arguments is null, this method returns false.

      Example

      Given the "Open Geospatial Consortium" words, the following strings are recognized as acronyms: "OGC", "ogc", "O​.G​.C.", "Op­Geo­Con".
      Parameters:
      acronym - a possible acronym of the sequence of words, or null.
      words - the sequence of words, or null.
      Returns:
      true if the first string is an acronym of the second one.
    • isUnicodeIdentifier

      public static boolean isUnicodeIdentifier(CharSequence identifier)
      Returns true if the given identifier is a legal Unicode identifier. This method returns true if the identifier length is greater than zero, the first character is a Unicode identifier start and all remaining characters (if any) are Unicode identifier parts. Most legal Unicode identifiers are also legal XML identifiers, but the converse is not true. The most noticeable differences are the ‘:’, ‘-’ and ‘.’ characters, which are legal in XML identifiers but not in Unicode.
      Characters legal in one set but not in the other
      Not legal in Unicode Not legal in XML
      :(colon) µ(micro sign)
      -(hyphen or minus) ª(feminine ordinal indicator)
      .(dot) º(masculine ordinal indicator)
      ·(middle dot) (inverted undertie)
      Many punctuation, symbols, etc. Identifier ignorable characters.
      Note that the ‘_’ (underscore) character is legal according both Unicode and XML, while spaces, ‘!’, ‘#’, ‘*’, ‘/’, ‘?’ and most other punctuation characters are not.

      Usage in Apache SIS

      In its handling of identifiers, Apache SIS favors Unicode identifiers without ignorable characters since those identifiers are legal XML identifiers except for the above-cited rarely used characters. As a side effect, this policy excludes ‘:’, ‘-’ and ‘.’ which would normally be legal XML identifiers. But since those characters could easily be confused with namespace separators, this exclusion is considered desirable.
      Parameters:
      identifier - the character sequence to test, or null.
      Returns:
      true if the given character sequence is a legal Unicode identifier.
      See Also:
    • isUpperCase

      public static boolean isUpperCase(CharSequence text)
      Returns true if the given text is non-null, contains at least one upper-case character and no lower-case character. Space and punctuation are ignored.
      Parameters:
      text - the character sequence to test (may be null).
      Returns:
      true if non-null, contains at least one upper-case character and no lower-case character.
      Since:
      0.7
      See Also:
    • equalsFiltered

      public static boolean equalsFiltered(CharSequence s1, CharSequence s2, Characters.Filter filter, boolean ignoreCase)
      Returns true if the given texts are equal, optionally ignoring case and filtered-out characters. This method is sometimes used for comparing identifiers in a lenient way.

      Example: the following call compares the two strings ignoring case and any characters which are not letter or digit. In particular, spaces and punctuation characters like '_' and '-' are ignored:

      assert equalsFiltered("WGS84", "WGS_84", Characters.Filter.LETTERS_AND_DIGITS, true) == true;
      
      Parameters:
      s1 - the first characters sequence to compare, or null.
      s2 - the second characters sequence to compare, or null.
      filter - the subset of characters to compare, or null for comparing all characters.
      ignore­Case - true for ignoring cases, or false for requiring exact match.
      Returns:
      true if both arguments are null or if the two given texts are equal, optionally ignoring case and filtered-out characters.
    • equalsIgnoreCase

      public static boolean equalsIgnoreCase(CharSequence s1, CharSequence s2)
      Returns true if the two given texts are equal, ignoring case. This method is similar to String​.equals­Ignore­Case(String), except it works on arbitrary character sequences and compares code points instead of characters.
      Parameters:
      s1 - the first string to compare, or null.
      s2 - the second string to compare, or null.
      Returns:
      true if the two given texts are equal, ignoring case, or if both arguments are null.
      See Also:
    • equals

      public static boolean equals(CharSequence s1, CharSequence s2)
      Returns true if the two given texts are equal. This method delegates to String​.content­Equals(Char­Sequence) if possible. This method never invoke Char­Sequence​.to­String() in order to avoid a potentially large copy of data.
      Parameters:
      s1 - the first string to compare, or null.
      s2 - the second string to compare, or null.
      Returns:
      true if the two given texts are equal, or if both arguments are null.
      See Also:
    • regionMatches

      public static boolean regionMatches(CharSequence text, int fromIndex, CharSequence part)
      Returns true if the given text at the given offset contains the given part, in a case-sensitive comparison. This method is equivalent to the following code, except that this method works on arbitrary Char­Sequence objects instead of Strings only:
      return text.regionMatches(offset, part, 0, part.length());
      
      This method does not thrown Index­Out­Of­Bounds­Exception. Instead, if from­Index < 0 or from­Index + part​.length() > text​.length(), then this method returns false.
      Parameters:
      text - the character sequence for which to tests for the presence of part.
      from­Index - the offset in text where to test for the presence of part.
      part - the part which may be present in text.
      Returns:
      true if text contains part at the given offset.
      Throws:
      Null­Pointer­Exception - if any of the arguments is null.
      See Also:
    • regionMatches

      public static boolean regionMatches(CharSequence text, int fromIndex, CharSequence part, boolean ignoreCase)
      Returns true if the given text at the given offset contains the given part, optionally in a case-insensitive way. This method is equivalent to the following code, except that this method works on arbitrary Char­Sequence objects instead of Strings only:
      return text.regionMatches(ignoreCase, offset, part, 0, part.length());
      
      This method does not thrown Index­Out­Of­Bounds­Exception. Instead, if from­Index < 0 or from­Index + part​.length() > text​.length(), then this method returns false.
      Parameters:
      text - the character sequence for which to tests for the presence of part.
      from­Index - the offset in text where to test for the presence of part.
      part - the part which may be present in text.
      ignore­Case - true if the case should be ignored.
      Returns:
      true if text contains part at the given offset.
      Throws:
      Null­Pointer­Exception - if any of the arguments is null.
      Since:
      0.4
      See Also:
    • startsWith

      public static boolean startsWith(CharSequence text, CharSequence prefix, boolean ignoreCase)
      Returns true if the given character sequence starts with the given prefix.
      Parameters:
      text - the characters sequence to test.
      prefix - the expected prefix.
      ignore­Case - true if the case should be ignored.
      Returns:
      true if the given sequence starts with the given prefix.
      Throws:
      Null­Pointer­Exception - if any of the arguments is null.
    • endsWith

      public static boolean endsWith(CharSequence text, CharSequence suffix, boolean ignoreCase)
      Returns true if the given character sequence ends with the given suffix.
      Parameters:
      text - the characters sequence to test.
      suffix - the expected suffix.
      ignore­Case - true if the case should be ignored.
      Returns:
      true if the given sequence ends with the given suffix.
      Throws:
      Null­Pointer­Exception - if any of the arguments is null.
    • commonPrefix

      public static CharSequence commonPrefix(CharSequence s1, CharSequence s2)
      Returns the longest sequence of characters which is found at the beginning of the two given texts. If one of those texts is null, then the other text is returned. If there is no common prefix, then this method returns an empty string.
      Parameters:
      s1 - the first text, or null.
      s2 - the second text, or null.
      Returns:
      the common prefix of both texts (may be empty), or null if both texts are null.
    • commonSuffix

      public static CharSequence commonSuffix(CharSequence s1, CharSequence s2)
      Returns the longest sequence of characters which is found at the end of the two given texts. If one of those texts is null, then the other text is returned. If there is no common suffix, then this method returns an empty string.
      Parameters:
      s1 - the first text, or null.
      s2 - the second text, or null.
      Returns:
      the common suffix of both texts (may be empty), or null if both texts are null.
    • commonWords

      public static CharSequence commonWords(CharSequence s1, CharSequence s2)
      Returns the words found at the beginning and end of both texts. The returned string is the concatenation of the common prefix with the common suffix, with prefix and suffix eventually made shorter for avoiding to cut in the middle of a word.

      The purpose of this method is to create a global identifier from a list of component identifiers. The latter are often eastward and northward components of a vector, in which case this method provides an identifier for the vector as a whole.

      If one of the given texts is null, then the other text is returned. If there are no common words, then this method returns an empty string.

      Example

      Given the following inputs:
      • "baroclinic_eastward_velocity"
      • "baroclinic_northward_velocity"
      This method returns "baroclinic_velocity". Note that the "ward" characters are a common suffix of both texts but nevertheless omitted because they cut a word.

      Possible future evolution

      Current implementation searches only for a common prefix and a common suffix, ignoring any common words that may appear in the middle of the strings. A character is considered the beginning of a word if it is a letter or digit which is not preceded by another letter or digit (as leading "s" and "c" in "snake_case"), or if it is an upper case letter preceded by a lower case letter or no letter (as both "C" in "CamelCase").
      Parameters:
      s1 - the first text, or null.
      s2 - the second text, or null.
      Returns:
      the common suffix of both texts (may be empty), or null if both texts are null.
      Since:
      1.1
    • token

      public static CharSequence token(CharSequence text, int fromIndex)
      Returns the token starting at the given offset in the given text. For the purpose of this method, a "token" is any sequence of consecutive characters of the same type, as defined below.

      Let define c as the first non-blank character located at an index equals or greater than the given offset. Then the characters that are considered of the same type are:

      Parameters:
      text - the text for which to get the token.
      from­Index - index of the first character to consider in the given text.
      Returns:
      a sub-sequence of text starting at the given offset, or an empty string if there are no non-blank character at or after the given offset.
      Throws:
      Null­Pointer­Exception - if the text argument is null.
    • replace

      public static CharSequence replace(CharSequence text, CharSequence toSearch, CharSequence replaceBy)
      Replaces all occurrences of a given string in the given character sequence. If no occurrence of to­Search is found in the given text or if to­Search is equal to replace­By, then this method returns the text unchanged. Otherwise this method returns a new character sequence with all occurrences replaced by replace­By.

      This method is similar to String​.replace(Char­Sequence, Char­Sequence) except that is accepts arbitrary Char­Sequence objects. As of Java 10, another difference is that this method does not create a new String if to­Search is equal to replace­By.

      Parameters:
      text - the character sequence in which to perform the replacements, or null.
      to­Search - the string to replace.
      replace­By - the replacement for the searched string.
      Returns:
      the given text with replacements applied, or text if no replacement has been applied, or null if the given text was null
      Since:
      0.4
      See Also:
    • copyChars

      public static void copyChars(CharSequence src, int srcOffset, char[] dst, int dstOffset, int length)
      Copies a sequence of characters in the given char[] array.
      Parameters:
      src - the characters sequence from which to copy characters.
      src­Offset - index of the first character from src to copy.
      dst - the array where to copy the characters.
      dst­Offset - index where to write the first character in dst.
      length - number of characters to copy.
      See Also: