Class Transliterator

  • All Implemented Interfaces:
    Serializable

    public abstract class Transliterator
    extends Object
    implements Serializable
    Controls the replacement of characters, abbreviations and names between the objects in memory and their WKT representations. The mapping is not necessarily one-to-one, for example the replacement of a Unicode character by an ASCII character may not be reversible. The mapping may also depend on the element to transliterate, for example some Greek letters like φ, λ and θ are mapped differently when they are used as mathematical symbols in axis abbreviations rather than texts. Some mappings may also apply to words instead than characters, when the word come from a controlled vocabulary.
    Permitted characters in Well Known Text
    The ISO 19162 standard restricts Well Known Text to the following characters in all quoted texts except in REMARKS["…"] elements:
    A-Z a-z 0-9 _ [ ] ( ) { } < = > . , : ; + - (space) % & ' " * ^ / \ ? | °
    They are ASCII codes 32 to 125 inclusive except ! (33), # (35), $ (36), @ (64) and ` (96), plus the addition of ° (176) despite being formally outside the ASCII character set. The only exception to this rules is for the text inside REMARKS["…"] elements, where all Unicode characters are allowed.

    The filter(String) method is responsible for replacing or removing characters outside the above-cited set of permitted characters.

    Application to mathematical symbols
    For Greek letters used as mathematical symbols in coordinate axis abbreviations, the ISO 19162 standard recommends:
    • (P, L) as the transliteration of the Greek letters (phi, lambda), or (B, L) from German “Breite” and “Länge” used in academic texts worldwide, or (lat, long).
    • (U) for (θ) in polar coordinate systems.
    • (U, V) for (Ω, θ) in spherical coordinate systems.
    Note: at least two conventions exist about the meaning of (r, θ, φ) in a spherical coordinate system (see Wikipedia or MathWorld for more information). When using the mathematics convention, θ is the azimuthal angle in the equatorial plane (roughly equivalent to longitude λ) while φ is an angle measured from a pole (also known as colatitude). But when using the physics convention, the meaning of θ and φ are interchanged. Furthermore some other conventions may measure the φ angle from the equatorial plane – like latitude – instead than from the pole. This class does not need to care about the meaning of those angles. The only recommendation is that φ is mapped to U and θ is mapped to V, regardless of their meaning.
    The to­Latin­Abbreviation(…) and to­Unicode­Abbreviation(…) methods are responsible for doing the transliteration at formatting and parsing time, respectively.
    Replacement of names
    The longitude and latitude axis names are explicitly fixed by ISO 19111:2007 to "Geodetic longitude" and "Geodetic latitude". But ISO 19162:2015 §7.5.3(ii) said that the "Geodetic" part in those names shall be omitted at WKT formatting time. The to­Short­Axis­Name(…) and to­Long­Axis­Name(…) methods are responsible for doing the transliteration at formatting and parsing time, respectively.
    Since:
    0.6
    See Also:
    Characters​.is­Valid­WKT(int), WKT 2 specification §7.5.3, Serialized Form

    Defined in the sis-referencing module

    • Field Detail

      • DEFAULT

        public static final Transliterator DEFAULT
        A transliterator compliant with ISO 19162 on a "best effort" basis. All methods perform the default implementation documented in this Transliterator class.
      • IDENTITY

        public static final Transliterator IDENTITY
        A transliterator that does not perform any replacement. All methods let names, abbreviations and Unicode characters pass-through unchanged.
    • Constructor Detail

      • Transliterator

        protected Transliterator()
        For sub-class constructors.
    • Method Detail

      • filter

        public String filter​(String text)
        Returns a character sequences with the non-ASCII characters replaced or removed. For example this method replaces “ç” by “c” in “Triangulation française”. This operation is usually not reversible; there is no converse method.

        Implementations shall not care about opening or closing quotes. The quotes will be doubled by the caller if needed after this method has been invoked.

        The default implementation invokes Char­Sequences​.to­ASCII(Char­Sequence), replaces line feed and tabulations by single spaces, then remove control characters.

        Parameters:
        text - the text to format without non-ASCII characters.
        Returns:
        the text to write in Well Known Text.
        See Also:
        Characters​.is­Valid­WKT(int)
      • toShortAxisName

        public String toShortAxisName​(CoordinateSystem cs,
                                      AxisDirection direction,
                                      String name)
        Returns the axis name to format in WKT, or null if none. This method performs the mapping between the names of axes in memory (designated by "long axis names" in this class) and the names to format in the WKT (designated by "short axis names").
        Note: the "long axis names" are defined by ISO 19111 — referencing by coordinates while the "short axis names" are defined by ISO 19162 — Well-known text representation of coordinate reference systems.
        This method can return null if the name should be omitted. ISO 19162 recommends to omit the axis name when it is already given through the mandatory axis direction.

        The default implementation performs at least the following replacements:

        • Replace “Geodetic latitude” (case insensitive) by “Latitude”.
        • Replace “Geodetic longitude” (case insensitive) by “Longitude”.
        • Return null if the axis direction is Axis­Direction​.GEOCENTRIC_X, GEOCENTRIC_Y or GEOCENTRIC_Z and the name is the same than the axis direction (ignoring case).
        Parameters:
        cs - the enclosing coordinate system, or null if unknown.
        direction - the direction of the axis to format.
        name - the axis name, to be eventually replaced by this method.
        Returns:
        the axis name to format, or null if the name shall be omitted.
        See Also:
        Default­Coordinate­System­Axis​.format­To(Formatter)
      • toLongAxisName

        public String toLongAxisName​(String csType,
                                     AxisDirection direction,
                                     String name)
        Returns the axis name to use in memory for an axis parsed from a WKT. Since this method is invoked before the Coordinate­System instance is created, most coordinate system characteristics are known only as String. In particular the cs­Type argument, if non-null, should be one of the following values:
        "affine", "Cartesian" (note the upper-case "C"), "cylindrical", "ellipsoidal", "linear", "parametric", "polar", "spherical", "temporal" or "vertical"
        This method is the converse of to­Short­Axis­Name(Coordinate­System, Axis­Direction, String). The default implementation performs at least the following replacements:
        • Replace “Lat” or “Latitude” (case insensitive) by “Geodetic latitude” or “Spherical latitude”, depending on whether the axis is part of an ellipsoidal or spherical CS respectively.
        • Replace “Lon”, “Long” or “Longitude” (case insensitive) by “Geodetic longitude” or “Spherical longitude”, depending on whether the axis is part of an ellipsoidal or spherical CS respectively.
        • Return “Geocentric X”, “Geocentric Y” and “Geocentric Z” for Axis­Direction​.GEOCENTRIC_X, GEOCENTRIC_Y and GEOCENTRIC_Z respectively in a Cartesian CS, if the given axis name is only an abbreviation.
        • Use unique camel-case names for axis names defined by ISO 19111 and ISO 19162. For example this method replaces ellipsoidal height” by Ellipsoidal height”.
        Rational: Axis names are not really free text. They are specified by ISO 19111 and ISO 19162. SIS does not put restriction on axis names, but we nevertheless try to use a unique name when we recognize it.
        Parameters:
        cs­Type - the type of the coordinate system, or null if unknown.
        direction - the parsed axis direction.
        name - the parsed axis abbreviation, to be eventually replaced by this method.
        Returns:
        the axis name to use. Can not be null.
      • toLatinAbbreviation

        public String toLatinAbbreviation​(CoordinateSystem cs,
                                          AxisDirection direction,
                                          String abbreviation)
        Returns the axis abbreviation to format in WKT, or null if none. The given abbreviation may contain Greek letters, in particular φ, λ and θ. This to­Latin­Abbreviation(…) method is responsible for replacing Greek letters by Latin letters for ISO 19162 compliance, if desired.

        The default implementation performs at least the following mapping:

        Note that while this method may return a string of any length, ISO 19162 requires abbreviations to be a single Latin character.
        Parameters:
        cs - the enclosing coordinate system, or null if unknown.
        direction - the direction of the axis to format.
        abbreviation - the axis abbreviation, to be eventually replaced by this method.
        Returns:
        the axis abbreviation to format.
        See Also:
        Default­Coordinate­System­Axis​.format­To(Formatter)
      • toUnicodeAbbreviation

        public String toUnicodeAbbreviation​(String csType,
                                            AxisDirection direction,
                                            String abbreviation)
        Returns the axis abbreviation to use in memory for an axis parsed from a WKT. Since this method is invoked before the Coordinate­System instance is created, most coordinate system characteristics are known only as String. In particular the cs­Type argument, if non-null, should be one of the following values:
        "affine", "Cartesian" (note the upper-case "C"), "cylindrical", "ellipsoidal", "linear", "parametric", "polar", "spherical", "temporal" or "vertical"
        This method is the converse of to­Latin­Abbreviation(Coordinate­System, Axis­Direction, String). The default implementation performs at least the following mapping:
        • P or L → λ if cs­Type is "ellipsoidal".
        • B → φ if cs­Type is "ellipsoidal".
        • U → Ω if cs­Type is "spherical", regardless of coordinate system convention.
        • V → θ if cs­Type is "spherical", regardless of coordinate system convention.
        • U → θ if cs­Type is "polar".
        Parameters:
        cs­Type - the type of the coordinate system, or null if unknown.
        direction - the parsed axis direction.
        abbreviation - the parsed axis abbreviation, to be eventually replaced by this method.
        Returns:
        the axis abbreviation to use. Can not be null.