Introduction to Apache SIS®

(English | Français)

Martin Desruisseaux
Partially translated by Christina Hough

This work is licensed under the Apache 2 license.


Table of content

1. Standards and norms

A geospatial information community is a collection of systems or individuals capable of exchanging their geospatial data through the use of common standards, allowing them to communicate with one another. As there are many ways to represent geospatial information, each community tends to structure this information in light of its areas of interest. This diversity complicates the task of Spatial Information System (SIS) users by confronting them with an apparently chaotic variety of data formats and structures. The characteristics of these structures vary according to the observed phenomenon and measurement methods, as well as the habits of the organizations producing the data. Such a variety represents an obstacle in studies that require heterogeneous combinations of data, especially when they originate in communities that are traditionally distinct. For example, a researcher studying cholera might be interested in populations of shrimp as a propagation vector of the disease. But as doctors and oceanographers may not be used to share their work, the participants of such a study may be limited by the effort required to convert the data.

We cannot impose a uniform format on all data collections, as the diversity of formats is tied to factors such as the constraints imposed by the measuring apparatus, and the statistical distribution of values. A more flexible solution is to ensure the interoperability of data across a common programming interface (API). This API is not necessarily defined in a programming language; the actual tendency is rather to define conventions that use existing web protocols, which we can translate into various programming languages. But in order for this approach to be viable, the API must be generally accepted by independent developers. In other words, the API must come as near as possible to industrial standards.

For example, one task that benefit from a successful standardization is the accessing of relational databases. The industry has established a common language — the SQL standard — that the creators of Java have embedded in standard JDBC programming interfaces. Today, these interfaces are implemented by many software programs, both free and commercial. Like databases, methods of accessing geographic information have been standardized. In this case, however, the efforts have been more recent, and their integration in software — especially in older programs — is incomplete and not always coherent. At the time of writing, no product to our knowledge has implemented all of the specifications in their entirety. However, there are many implementations that cover a fairly large spectrum. One of these is the Apache SIS® library that is described in this document.

Apache SIS is characterized by a sustained effort to comply with standards. In general, complying with standards demands a greater effort than would be required for an isolated development, but rewards us with a double advantage: not only does it improve the interoperability of our data with that of external projects, it also points towards a robust way of elaborating the conceptual model reflected in the API. In effect, the groups of experts who conceived the standards anticipated difficulties that sometimes escape the engineer at the beginning of a project, but which risk to hit them before the end.

1.1. Sources of conceptual models used by Apache SIS

Most standards used by Apache SIS have been devised by the Open Geospatial Consortium (OGC), sometimes in collaboration with the International Organization for Standardization (ISO). Some ISO standards themselves become European standards via the INSPIRE Directive. These standards offer two key features:

These standards are made available to the international community for free, as specifications (PDF files) or as schemas (XSD files). Standardization organizations do not create software; to obtain an implementation of these specifications, users must choose one of the compliant products available on the market, or develop their own solutions. Such voluntary compliance with these specifications allow independent communities to more easily exchange geographic information.

More about standardization process

OGC standardization process

The work of the OGC is done by email, teleconferences, and at in-person meetings. The OGC organizes four meetings per year, each lasting five days, and hosted by member organizations that sponsor the event (companies, universities, research centres, etc). The host continent alternates between Europe and North America, with a growing presence in Asia since 2011. These meetings are usually attended by between 50 and 100 participants from among the hundreds of members of the OGC. Some participants are present at almost all the meetings, forming the pillars of the organization. The meetings of the OGC offer opportunities for exchange among members from diverse backgrounds.

The creation of a OGC standard begins with a gathering of organizations or individuals with a common interest in an issue. A working group is proposed as a Domain Working Group (DWG) or as a Standard Working Group (SWG). DWGs are open to all members of the OGC, while SWGs require that their participants enter into an agreement not to hinder the distribution of the standard through intellectual property claims.

Standard Working Group (SWG) procedures

In order to be accepted, a standardization project must be supported by a minimum number of members belonging to distinct organizations. These founding members draft a charter defining the objectives of the SWG, which must be approved by the Technical Committee of the OGC. Each founding member is endowed with the right to vote, with a limit of one voting member per organization. Each new member that wishes to join the SWG after its creation is granted the role of observer, and receives on request the right to vote after several months of observation.

A SWG may contain several dozen members, but the volunteers performing the bulk of the work are usually fewer. Their proposals are submitted to the entire membership of the group, who may accept them by unanimous consent. Any objections must be debated, and an alternative proposed. SWGs usually try to debate an issue until a consensus emerges rather than move ahead despite negative votes, even if those opposed are in a minority. The decisions of the group are then integrated into the specifications by a member who assumes the role of editor.

As far as possible, the working group must structure the specifications as a core around which various extensions might be built. A series of tests must accompany the standard, allowing implementations to be classified by the level of test passed. There must be at least one reference implementation that passes all the tests in order to demonstrate that the standard is usable.

When the standard is considered ready, the SWG votes on a motion proposing its submission to a vote by the higher authorities of the OGC. This process takes several months. There is a faster process for approving de facto standards, but it is applied sparingly.

The Architecture Board (OAB) and the Technical Committee (TC)

All proposals for standards are first examined by the OGC Architecture Board (OAB). This board ensures that the standard conforms to the requirements of the OGC in form, modularization, and in terms of integration with other standards. If the OAB approves it, the standard is next submitted to a vote by the members of the Technical Committee (TC). This committee consists of the principal members of the OGC, and only they are capable of granting final approval. If approved, the standard is made publicly available for comments during a period of several months. At the end of this period, the SWG must examine and respond to each comment. The eventual modifications of the standard are submitted to the OAB, then the standard is published in its final form. This distribution is announced in a press release by the OGC.

Certain members of the OGC and the TC also act as liaisons with the International Organization for Standardization (ISO). Cooperation between the two organizations goes two ways: the OGC adopts the ISO standards as a foundation on which to develop new standards, and certain OGC standards become ISO standards.

Procedure for the submission of proposals for modification

All users, whether or not they are members of the Open Geospatial Consortium, may propose modifications to OGC standards. A list of current proposals for changes, along with a form for submitting new proposals, is available online. Each proposal is reviewed by the SWG.

Some working groups use other parallel systems for submissions, for example GitHub merge requests, hosted outside of the structures of the OGC.

Besides these formal standardization organizations, there are organizations that are not officially dedicated to the creation of standards, but whose work has largely been adopted as de facto standards. In particular, the EPSG database offers numeric codes which allow the easy identification of a Coordinates Reference System (CRS) among several thousand. This database is offered by petroleum companies that have an interest in ensuring their explorations are conducted in the correct place, even when using map produced by another party. Other examples of de facto standards include GeoTIFF for data distributed on a grid (such as images), and Shapefile for vector data (such as geometric shapes).

OGC standards are specified in several dozen documents. Each document outlines a service — for example, the transformation of coordinates. The function of each service is described by a collection of object classes and their interactions. These elements are illustrated by UML (Unified Modeling Language) diagrams in specifications called “abstracts”. Abstract specifications do not refer to any specific computer language. Their concepts may be applied more or less directly to a programming language, a database or an XML schema. There is always an element of arbitrariness in the method of applying an abstract specification, given that adjustments are often necessary to take into account the constraints or conventions of the target language. Certain data structures only exist in a few languages — for example, unions that exist in C/C++ but not in Java.

More about “implementation specifications”

Historical note

At the turn of the millennium, the abstract specifications were explicitly concretized in implementation specifications. The term “implementation” is used here in the sense of all types of interfaces (Java or others) derived from UML diagrams, and not implementations in the Java sense. Such specifications existed for SQL, CORBA, COM, and Java languages. As these languages are capable of executing procedures, the specifications of this period define not only data structures, but also operations that apply to these structures.

Thereafter, enthusiasm for “Web 2.0” increased interest for XML over other languages. Older implementation specifications were deprecated, and XSD schemas became the main concretization of abstract specifications. Even the way abstract specifications are designed has evolved: they are less likely to define operations, and so what remains is closer to descriptions of database schemas. Some operations that were defined in older standards now appear, in another form, in web service specifications. Finally, the term “implementation specification” has been deprecated, to be subsumed under the term “OGC standard.” But despite their depreciation, old implementation specifications remain useful to programs in Java, because:

  • Their simpler models, applied to the same concepts, are helpful in understanding new specifications.
  • They sometimes define easy ways to perform common tasks, where the newer specifications limit themselves to general cases.
  • As operations are more often omitted from the newer specifications, the old ones remain a useful supplement when defining APIs.

The Apache SIS project is based on the most recent specifications, drawing from the archives of the OGC to complete certain abstract standards or make them more usable. Some old definitions are preserved as “convenience methods”, not always bringing new functionality, but facilitating the practical use of a library.

The following table lists the main norms used by the project. Many norms are published both as ISO standards and as OGC standards, and their corresponding names are listed next to one another in the first two columns. The “implementation specifications” section lists specifications that bring few new concepts compared to abstract specifications, but detail how to represent those concepts in specific environments like XML documents. Standards that are deprecated but still partially used appear struck through. Finally, GeoAPI packages will be introduced in upcoming chapters.

Main Standards Related to the Apache SIS project
ISO Norm OGC Norm Titre GeoAPI package Apache SIS package
Abstract Specifications
ISO 19103 Conceptual schema language org.opengis.util org.apache.sis.util.iso
ISO 19115-1 Topic 11 Metadata org.opengis.metadata org.apache.sis.metadata.iso
ISO 19115-2 Metadata — extensions for imagery and gridded data org.opengis.metadata org.apache.sis.metadata.iso
ISO 19111 Topic 2 Spatial referencing by coordinates org.opengis.referencing org.apache.sis.referencing
ISO 19111-2 Referencing — extension for parametric values org.opengis.referencing org.apache.sis.referencing
ISO 19108 Temporal Schema org.opengis.temporal
ISO 19107 Topic 1 Feature geometry org.opengis.geometry org.apache.sis.geometry
ISO 19101 Topic 5 Features org.opengis.feature org.apache.sis.feature
ISO 19123 Topic 6 Schema for coverage geometry and functions org.opengis.coverage org.apache.sis.coverage
ISO 19156 Topic 20 Observations and measurements org.opengis.observation
Implementation Specifications
ISO 19139 Metadata XML schema implementation org.apache.sis.xml
ISO 19136 OGC 07-036 Geography Markup Language (GML) Encoding Standard org.apache.sis.xml
ISO 19162 OGC 12-063 Well-known text representation of coordinate reference systems org.apache.sis.io.wkt
ISO 13249 SQL spatial
OGC 01-009 Coordinate Transformation Services org.opengis.referencing org.apache.sis.referencing
OGC 01-004 Grid Coverage org.opengis.coverage org.apache.sis.coverage
SLD Styled Layer Descriptor org.opengis.style
Web Services
ISO 19128 WMS Web Map Service
WMTS Web Map Tile Service
ISO 19142 WFS Web Feature Service
WCS Web Coverage Service
WPS Web Processing Service
OpenLS Location Services
SWE Sensor Web Enablement
SOS Sensor Observation Service

1.2. From conceptual models to Java interfaces: GeoAPI

The GeoAPI project offers a set of Java interfaces for geospatial applications. In a series of org.opengis.* packages, GeoAPI defines structures representing metadata, coordinate reference systems and operations that perform cartographic projections. In a part that is not yet standardized — called pending — GeoAPI defines structures that represent geo-referenced images, geometries, filters that can be applied to queries, and other features. These interfaces closely follow the specifications of the OGC, while interpreting and adapting them to meet the needs of Java developers — for example, conforming with naming conventions. These interfaces benefit both client applications and libraries:

More about the GeoAPI project

GeoAPI project history

In 2001, the Open GIS Consortium (the former name of the Open Geospatial Consortium) published OGC implementation specification 01-009: Coordinate Transformation Services. This specification, developed by the Computer Aided Development Corporation (Cadcorp), was accompanied by COM, CORBA, and Java interfaces. At this time, the wave of web services had not yet eclipsed classical programming interfaces. The interfaces of the OGC did anticipate a networked world, but invested rather — in the case of Java — in RMI (Remote Method Invocation) technology. As the GeoAPI project did not yet exist, we retroactively designate these historical interfaces “GeoAPI 0.1”. These interfaces already used the package name org.opengis, which would be adopted by GeoAPI.

In 2002, developers of free projects launched a call for the creation of a geospatial API. The initial proposal attracted the interest of at least five free projects. The project was created using SourceForge, which has since hosted the source code in a Subversion repository. It was then that the project assumed the name “GeoAPI”, and used the interfaces of the OGC specification 01-009 as a starting point.

A few months later, the OGC launched the GO-1: Geographic Objects project, which pursued goals similar to those of GeoAPI. In the meantime, the OGC abandonned some of their specifications in favor of ISO standards. GeoAPI and GO-1 worked jointly to rework the GeoAPI interfaces and base them on the new ISO norms. Their first interation, GeoAPI 1.0, served as a starting point for the first draft of the OGC specification 03-064 by the GO-1 working group. The final version of this specification became an OGC standard in 2005, and GeoAPI 2.0 was published at that time.

The GO-1 project was largely supported by a company called Polexis. Its acquisition by Sys Technology, and the change in priorities under the new owners, brought a halt to the GO-1 project, which in turn slowed development on GeoAPI. In order to resume development, a new working group entitled “GeoAPI 3.0” was created at the OGC. This group took a narrower focus compared to GeoAPI 2.0, concentrating on the most stable interfaces, and putting the others — such as geometries — in a module entitled “pending”, for future consideration. GeoAPI 3.0 became an OGC standard in 2011. This version was the first to be deployed in the Maven central repository.

GeoAPI interfaces are sometime generated from other files provided by OGC, like XSD files. But there is always a manual revision, and often modifications compared to automatically generated Java files. Deviations from the standards are documented in each affected class and method. Each mention of a deviation is also collected on a single page in order to provide an overview. Since these deviations blur the relationships between the standards and certain Java interfaces, the correspondence between these languages is explained by @UML annotations and property files described in the following section.

More about the reasons for manual definition of Java interfaces

From OGC specifications to Java interfaces

It is possible to automatically generate Java interfaces OGC standards using existing tools. One of the most commonly-used approaches is to transform XSD schemas into Java interfaces using command line utility xjc. As this utility is included in most Java distributions (it is one of the JAXB tools), this approach is favoured by many projects found on the Internet. Other approaches use tools integrated into the Eclipse Development Environment, which is based on UML schemas rather than XSD ones.

A similar approach was attempted in the early days of the GeoAPI project, but was quickly abandoned. We favor a manual approach for the following reasons:

  • Some XSD schemas are much more verbose than the original UML schemas. Converting from XSD schemas introduces — at least in the case of metadata — almost double the number of interfaces actually defined by the standard, without adding any new features. XSD schemas also define attributes specific to XML documents (id, uuid, xlink:href, etc.), that do not exist in the original UML diagrams, and which we do not necessarily wish to expose in a Java API. Converting from UML schemas avoids this problem, but tools capable of performing this operation are less common.

    Example: XSD metadata schemas insert a <gmd:CI_Citation> element inside a <gmd:citation>, a <gmd:CI_OnlineResource> element inside a <gmd:onlineResource>, and so on for the hundreds of classes defined by ISO 19115 standard. This redundancy is certainly not necessary in a Java program.

  • OGC standards use different naming conventions than Java. In particular, the names of almost all OGC classes begin with a two-letter prefix, such as MD_Identifier. This prefixes fulfill the same role as package names in Java. GeoAPI adapts this practice by using interface names without prefixes and placing these interfaces in packages corresponding to the prefixes, but with more descriptive names. Occasionally we also change the names; for example, to avoid acronyms, or to conform to an established convention such as JavaBeans.

    Example: The OGC class MD_Identifier becomes the Identifier interface in the org.opengis.metadata package. The OGC class SC_CRS becomes the CoordinateReferenceSystem interface, and the usesDatum association becomes a getDatum() method, rather than the “getUsesDatum()” that would result from an automatic conversion tool. We do not allow programs to blindly apply rules that ignore the conventions of the community whose schemas we translate.

  • The standards may contain structures that do not have a direct equivalent in Java, such as unions similar to what we would find in C/C++. The strategy used to obtain an equivalent feature in Java depends on the context: multiple inheritance of interfaces, modification of the hierarchy, or simply omitting the union. These decisions are made case-by-case based on a needs analysis.

    Example: ISO 19111 standard defines different types of coordinate systems, such as spherical, cylindrical, polar or Cartesian. It then defines several subsets of these types of coordinate systems systems. These subsets, represented by unions, serve to specify that a class may only be associated with a particular type of coordinate system. For example, a union of types may be associated with an image, named CS_ImageCS, which can only contain CS_CartesianCS and CS_AffineCS. In this case, we get the desired effect in Java through a modification of the hierarchy of classes: we define the CartesianCS interface as a specialization of AffineCS, which is semantically correct. But it is not possible to apply a similar strategy to other unions without violating the semantics.

  • Several specifications overlap. GeoAPI performs the work of integration by replacing some duplicate structures with references to equivalent structures from the standards that best represent them.

    Example: ISO 19115:2003 standard, which defines metadata structures, also attempts to describe a few structures representing coordinate reference systems (CRS). Yet these are also the focus of another standard: ISO 19111. At the same time, ISO 19111:2007 states in section 3 that it reuses all of the elements of ISO 19115:2003 except MD_CRS and its components. GeoAPI interfaces reduce the redundancy by applying the exclusion recommended by ISO 19111 to the entire project.

  • The complexity of some standards have increased for historical reasons rather than technical ones, related to the standardization process. GeoAPI reduces the technical debt by designing interfaces with each element in its proper place, regardless of the chronological order in which the standards were published.

    Exemple: ISO 19115-2 standard is an extension of ISO 19115-1 standard, adding image metadata structures. These metadata were defined in a separate standard because they were not yet ready when the first part of the standard was published. As it was not possible for administrative reasons to add attributes to already-published classes, the new attributes were added in a sub-class bearing almost the same name. Thus, ISO 19115-2 defines the class MI_Band, which extends the class MD_Band from ISO 19115-1 by adding attributes that would have appeared directly in the parent class if there were ready on time. In GeoAPI, we have chosen to “repair” these anomalies by fusing these two classes into a single interface.

GeoAPI is composed of many modules. The geoapi and geoapi-pending modules provide interfaces derived from UML schemas of international standards. The conceptual model will be explained in detail in the chapters describing Apache SIS implementation. However, we can get an overview of its content by consulting the page listing the mapping between GeoAPI methods and the standards where they come from.

More about GeoAPI modules

GeoAPI modules

The GeoAPI project consists of a standardized part (geoapi) and an experimental part (geoapi-pending). As these two parts are mutually exclusive, users must take care not to mix them in the same project. This separation is guaranteed for all projects that depend only on the Maven central repository (including the final versions of Apache SIS), as the geoapi-pending module is never deployed on this central repository. By contrast, certain SIS development branches may depend on geoapi-pending.

GeoAPI modules are:

  • geoapi — includes interfaces covered by the GeoAPI standard of the OGC. The final versions of Apache SIS depend on this module.

  • geoapi-pending — contains a copy of all interfaces in the geoapi module (not a dependence) with additions that have not yet been approved as an OGC standard. Some additions appear in interfaces normally defined by the geoapi module, hence the need to copy them. Apache SIS's development branches jdk6, jdk7 and jdk8 depend on this module, but this dependence becomes a dependence on the geoapi standard module when the branches are merged to the trunk.

  • geoapi-conformance — includes a JUnit test suite that developers may use to test their implementations.

  • geoapi-examples — includes examples of relatively simple implementations. These examples are placed in the public domain in order to encourage users to copy and adapt them to their needs if Apache SIS services are unsuitable.

  • geoapi-proj4 — contains a partial implementation of org.opengis.referencing packages as adaptors based on the C/C++ Proj.4 library. This module may be used as an alternative to the sis-referencing module for certain functions.

  • geoapi-netcdf — contains a partial implementation of org.opengis.referencing and org.opengis.coverage packages as adaptors based on the UCAR NetCDF library. The series of tests in this module was developed in such a way as to be reusable for other projects. Apache SIS uses them to test its own sis-netcdf module.

  • geoapi-openoffice — contains an add-in for the OpenOffice.org office suite.

1.2.1. Explicit mapping given by @UML annotations

For each class, method and constant defined by an OGC or ISO standard, GeoAPI indicates its provenance using annotations defined in the org.opengis.annotation package. In particular, the @UML annotations indicates the standard, the name of the element in that standard, and also its obligation. For example, in the following code snippet, the first @UML code indicates that the Java interface that follows (ProjectedCRS) is defined using the SC_ProjectedCRS type of ISO 19111 standard. The second @UML annotation, this time applied to the getCoordinateSystem() method, indicates that this method is defined using the coordinateSystem association of ISO 19111 standard, and that this association is mandatory — meaning, in Java, that the method is not allowed to return a null value.

package org.opengis.referencing.crs;

/**
 * A 2D coordinate reference system used to approximate the shape of the earth on a planar surface.
 */
@UML(specification=ISO_19111, identifier="SC_ProjectedCRS")
public interface ProjectedCRS extends GeneralDerivedCRS {
    /**
     * Returns the coordinate system, which must be Cartesian.
     */
    @UML(obligation=MANDATORY, specification=ISO_19111, identifier="coordinateSystem")
    CartesianCS getCoordinateSystem();
}

Java reflection methods allow access to this information during the execution of an application. This is useful for displaying UML identifiers for users familiar with OGC standards, or for writing elements in an XML document. Class org.apache.sis.util.iso.Types provides static convenience methods like getStandardName(Class) for such operations. For example the following code will display “Standard name of type org.opengis.referencing.crs.ProjectedCRS is SC_ProjectedCRS”:

Class<?> type = ProjectedCRS.class;
System.out.println("Standard name of type " + type.getName() + " is " + Types.getStandardName(type));

The Types​.forStandardName(String) convenience method performs the reverse operation. Applications who want to perform those operations without SIS convenience methods can follow indications provided in a separated chapter.

1.2.2. Implicit mapping to standard JDK

Some classes and methods have neither an @UML annotation, nor an entry in the class-index.properties file. They are either extensions of GeoAPI, or else types defined in other libraries, such as standard JDK. In this last case, the mapping to ISO standards is implicit. The following table describes this mapping for ISO 19103 types. Java's primitive types are preferred when applicable, but where necessary their wrappers are used in order to authorize null values.

Mapping between ISO 19103 and JDK types
ISO type JDK type Remarks
Numbers
Integer int Sometimes java.lang.Integer for optional attributes.
Integer (in some cases) long Sometimes java.lang.Long for optional attributes.
Real double Sometimes java.lang.Double for optional attributes.
Decimal java.math.BigDecimal
Number java.lang.Number
Texts
FreeText (no equivalent) See org.opengis.util.InternationalString below.
CharacterString java.lang.String Often org.opengis.util.InternationalString (see below).
LocalisedCharacterString java.lang.String
Sequence<Character> java.lang.CharSequence
Character char
Dates and hours
Date java.util.Date
Time java.util.Date
DateTime java.util.Date
Collections
Collection java.util.Collection
Bag java.util.Collection A Bag is similar to a Set without being restricted by uniqueness.
Set java.util.Set
Sequence java.util.List
Dictionary java.util.Map
KeyValuePair java.util.Map.Entry
Enumerations
Enumeration java.lang.Enum
CodeList (no equivalent) See org.opengis.util.CodeList below.
Various
Boolean boolean Sometimes java.lang.Boolean for optional attributes.
Any java.lang.Object

The nearest equivalent for CharacterString is the String class, but GeoAPI often uses the InternationalString interface, allowing the client to choose the language. For example, it is useful on a server that simultaneously provides pages in multiple languages. By returning translations when objects are used rather than at the time of their creation, we allow the SIS library to provide the same instances of Metadata or Coverage (for example) for the same data, regardless of the client's language. Translations may be made on the fly with the help of the application's ResourceBundle, or may be provided directly with the data (as in the case of Metadata).

An Enumeration corresponds to an Enum in Java. Both define all authorized values, without allowing the user to add any. A CodeList is similar to an enumeration, except that users may add their own items. Standard JDK does not offer this possibility. GeoAPI defines an abstract CodeList class that reproduces some of the functionality of Enum while being extensible. Extensions are made available by the valueOf(String) static method, which, in contrast to Enum, creates new instances if the name provided does not correspond to the name of an existing instance.

MediumName cdRom  = MediumName.CD_ROM;
MediumName usbKey = MediumName.valueOf("USB_KEY"); // There is no constraint on this value.
assert MediumName.valueOf("CD_ROM")  == cdRom  : "valueOf must return existing constants.";
assert MediumName.valueOf("USB_KEY") == usbKey : "valueOf must cache the previously requested values.";

1.2.3. Implementations provided by Apache SIS

Apache SIS implements most GeoAPI interfaces by a class of the same name than the interface but prefixed by “Abstract”, “Default” or “General”. Apache SIS classes prefixed by “Default” can be instantiated directly by a new DefaultXXX(…) statement or by a call to the createXXX(…) method in a factory.

Example: to represent a projected coordinate reference system (Mercator, Lambert, etc):
  • org.opengis.referencing.crs.ProjectedCRS is the GeoAPI interface derived from ISO 19111 standard, and
  • org.apache.sis.referencing.crs.DefaultProjectedCRS is the implementation provided by Apache SIS.
An instance can be created by:
  • ProjectedCRS crs = new DefaultProjectedCRS(…), ou
  • ProjectedCRS crs = CRSFactory​.createProjectedCRS(…).
Both approaches expect the same arguments (omitted in this example for brevity).

In the default Apache SIS configuration, using CRSFactory​.createXXX(…) or new DefaultXXX(…) is almost the same except that Factory may return existing instances instead than creating new instances, and that exceptions thrown in case of invalid arguments are different types. In more advanced configurations, using Factory reduces the direct dependencies toward Apache SIS and allows inversion of control.

The “General” prefix is sometime used instead than “Default” to indicate that alternative implementations are available for some specific cases. For example the Envelope interface is implemented by at least two Apache SIS classes: GeneralEnvelope and Envelope2D. The first implementation can represent envelopes with any number of dimensions while the second implementation is specialized for two-dimensional envelopes.

Apache SIS classes prefixed by “Abstract” should not – in principle – be instantiated. Users should instantiate a non-abstract subclass instead. But many SIS classes are only conceptually abstract, without abstract Java keyword in class definition. Such classes can be instantiated by a new AbstractXXX(…) statement – but not by Factory – despite being conceptually abstract. However such instantiations should be done only in last resort, when it is not possible to determine the exact subtype.

1.3. Conventions used in this guide

Standards sometimes favour the application of certain generic terms to particular contexts, which may differ from the context in which other communities use these terms. For example, the terms domain and range may apply to arbitrary functions in order to designate a set of possible values of inputs and outputs respectively. But the functions to which they are applied by certain ISO standards are not the same as the functions to which they are applied by other libraries. For example, ISO 19123 applies these terms to CV_Coverage objects, seen as functions in which the domain is the set of spatio-temporal coordinates encompassed by the data, and the range is the set of values encompassed. But UCAR's NetCDF library applies these terms instead to the function of converting pixel indices (its domain) to spatial-temporal coordinates (its range). Thus the UCAR library's range may be the domain of ISO 19123.

The Apache SIS library prefers as much as possible to use terms in the sense of OGC and ISO norms. Particular care must be taken, however, with the interfaces between SIS and certain other external libraries, in order to reduce the risk of confusion.

1.3.1. Code colors

The elements defined in a computer language, such as classes and methods in Java or elements in an XML document, appear in monospaced font. In order to facilitate an understanding of the relationships between Apache SIS and the standards, these elements are also represented using the following colour codes:

Text in gray boxes are for information purpose only and can be ignored.

2. Spatial reference systems

For locating a point on Earth one can use identifiers like city name or postal address — an approach known as spatial reference systems by identifiers — or use numerical values valid in a given coordinate system like latitudes and longitudes — an approach known as spatial reference systems by coordinates. Each reference system implies approximations like the choice of a figure of the Earth (geoid, ellipsoid, etc.) used as an approximation of Earth shape, the choice of geometric properties (angles, distances, etc.) to be preserved when a map is shown on plane surface, and a lost of precision when coordinates are transformed to systems using a different datum.

A very common misbelief is that one can avoid this complexity by using a single coordinate reference system (typically WGS84) as a universal system for all data. The next chapters will explain why the reality is not so simple. Whether a universal reference system can suit an application needs or not depends on the desired positional accuracy and the kind of calculations to be performed with the data. Unless otherwise specified, Apache SIS aims to represent coordinates on Earth with an accuracy of one centimetre or better. But the accuracy can be altered by various situations:

“Early binding” versus “late binding” implementations

Because of the WGS84 ubiquity, it is tempting to use that system as a hub or a pivot system for all coordinate transformations. The use of an “universal” system as a pivot simplifies the design of coordinate transformations libraries. For example transformations from datum A to datum B can be done by first transforming from A to WGS84, then from WGS84 to B. With such approach, a coordinate transformations library would only need to associate each GeodeticDatum instance with the transformation parameters from that datum to WGS84. This approach was encouraged in version 1 of WKT format, since that format specified a TOWGS84[…] element (removed in WKT version 2) precisely for that purpose. This approach is known in EPSG guidance notes as “early binding” implementations since information about coordinate transformations are associated early in geodetic object definitions, usually right at GeographicCRS creation time. While EPSG acknowledges that this approach is commonly used, this is not a recommended strategy for the following reasons:

  • More than one transformation may exist from datum A to datum B, where each transformation is designed for a different geographic area.
  • Some operations are designed specifically for transformations from A to B and do not have the same accuracy than an operation using WGS84 as an intermediate step.
  • WGS84 itself has been updated many times, which makes it a kind of moving target (admittedly slowly) for coordinate transformations libraries.
  • Different systems could be used as the pivot system, for example the Galileo Reference Frame (GTRF) created for the European GPS competitor.

Example: the EPSG geodetic dataset defines about 50 transformations from NAD27 to NAD83. In an early binding approach, the same geographic CRS (namely “NAD27”) in the WKT 1 format would need to be defined with a TOWGS84[-8, 160, 176] element for coordinates in USA or with a TOWGS84[-10, 158, 187] element for coordinates in Canada. Different parameter values exist for other regions like Cuba, so it is not possible to represent such diversity with a single TOWGS84[…] element associated to a CRS. But even when restricting CRS usage to the domain of validity of its single TOWGS84[…] element, those transformations are still approximative with a 10 metres accuracy in the USA case. More accurate transformations exist in the form of NADCON grid shift files, but those transformations are from NAD27 to NAD83 (which move together on the same continental plate), not to WGS84 (which move independently). The difference was often ignored when NAD83 and WGS84 were considered as practically equivalent, but that assumption is subject to more caution today.

EPSG rather recommends the use of “late binding” approach, in which coordinate transformation methods and parameters are defined for “A to B” pairs of systems (eventually completed with domain of validity) rather than associated to standalone datums. Apache SIS is a “late binding” implementation, while some reminiscences of “early binding” approach still exist in the form of the DefaultGeodeticDatum​.getBursaWolfParameters() property. The later is used only if SIS fails to apply the late binding approach for given reference systems.

The sis-referencing module provides a set of classes implementing different specializations of the ReferenceSystem interface, together with required components. Those implementations store spatial reference system descriptions, together with metadata like their domain of validity. However those objects do not perform any operation on coordinate values. Coordinates conversions or transformations are performed by another family of types, with CoordinateOperation as the root interface. Those types will be discussed in another section.

2.1. Components of a reference system by coordinates

Spatial reference systems by coordinates provide necessary information for mapping numerical coordinate values to real-world locations. In Apache SIS, most information is contained (directly or indirectly) in classes with a name ending in CRS, the abbreviation of Coordinate Reference System. Those objects contain:

Those systems are described by the ISO 19111 standard (Referencing by Coordinates), which replaces for most parts the older OGC 01-009 standard (Coordinate Transformation Services). Those standards are completed by two other standards defining exchange formats: ISO 19136 and 19162 respectively for the Geographic Markup Language (GML) — a XML format which is quite detailed but verbose — and the Well-Known Text (WKT) — a text format easier to read by humans.

2.1.1. Geoid et ellipsoid

Since the real topographic surface is difficult to represent mathematically, it is not used directly. A slightly more convenient surface is the geoid, a surface where the gravitational field has the same value everywhere (an equipotential surface). This surface is perpendicular to the direction of a plumb line at all points. The geoid surface would be equivalent to the mean sea level if all oceans where at rest, without winds or permanent currents like the Gulf Stream.

While much smoother than topographic surface, the geoid surface still have hollows and bumps caused by the uneven distribution of mass inside Earth. For more convenient mathematical operations, the geoid surface is approximated by an ellipsoid. This “figure of Earth” is represented in GeoAPI by the Ellipsoid interface, a fundamental component in coordinate reference systems of kind GeographicCRS and ProjectedCRS. Tenth of ellipsoids are commonly used for datum definitions. Some of them provide a very good approximation for a particular geographic area at the expense of the rest of the world for which the datum was not designed. Other datums are compromises applicable to the whole world.

Example: the EPSG geodetic dataset defines among others the “WGS 84”, “Clarke 1866”, “Clarke 1880”, “GRS 1980” and “GRS 1980 Authalic Sphere” (a sphere of same surface than the GRS 1980 ellipsoid). Ellipsoids may be used in various places of the world or may be defined for a very specific region. For example in USA at the beginning of XXth century, the Michigan state used an ellipsoid based on the “Clarke 1866” ellipsoid but with axis lengths expanded by 800 feet. This modification aimed to take in account the average state height above mean sea level.

2.1.2. Geodetic datum

For defining a geodetic system in a country, a national authority selects an ellipsoid matching closely the country surface. Differences between that ellipsoid and the geoid’s hollows and bumps are usually less than 100 metres. Parameters that relate an Ellipsoid to the Earth surface (for example the position of ellipsoid center) are represented by instances of GeodeticDatum. Many GeodeticDatum definitions can use the same Ellipsoid, but with different orientations or center positions.

Before the satellite era, geodetic measurements were performed exclusively from Earth surface. Consequently, two islands or continents not in range of sight from each other were not geodetically related. So the North American Datum 1983 (NAD83) and the European Datum 1950 (ED50) are independent: their ellipsoids have different sizes and are centered at a different positions. The same geographic coordinate will map different locations on Earth depending on whether the coordinate uses one reference system or the other.

The GPS invention implied the creation of a world geodetic system named WGS84. The ellipsoid is then unique and centered at the Earth gravity center. GPS provides at any moment the receptor absolute position on that world geodetic system. But since WGS84 is a world-wide system, it may differs significantly from local systems. For example the difference between WGS84 and the European system ED50 is about 150 metres, and the average difference between WGS84 and the Réunion 1947 system is 1.5 kilometres. Consequently we shall not blindly use GPS coordinates on a map, as transformations to the local system may be required. Those transformations are represented in GeoAPI by instances of the Transformation interface.

The WGS84 ubiquity tends to reduce the need for Transformation operations with recent data, but does not eliminate it. The Earth moves under the effect of plate tectonic and new systems are defined every years for taking that fact in account. For example while NAD83 was originally defined as practically equivalent to WGS84, there is now (as of 2016) a 1.5 metres difference. The Japanese Geodetic Datum 2000 was also defined as practically equivalent to WGS84, but the Japanese Geodetic Datum 2011 now differs. Even the WGS84 datum, which was a terrestrial model realization at a specific time, got revisions because of improvements in instruments accuracy. Today, at least six WGS84 versions exist. Furthermore many borders were legally defined in legacy datums, for example NAD27 in USA. Updating data to the new datum would imply transforming some straight lines or simple geometric shapes into more irregular shapes, if the shapes are large enough.

2.1.3. Coordinate systems

TODO

2.1.3.1. Axis order

The axis order is specified by the authority (typically a national agency) defining the Coordinate Reference System (CRS). The order depends on the CRS type and the country defining the CRS. In the case of geographic CRS, the (latitude, longitude) axis order is widely used by geographers and pilots for centuries. However software developers tend to consistently use the (x, y) order for every kind of CRS. Those different practices resulted in contradictory definitions of axis order for almost every CRS of kind GeographicCRS, for some ProjectedCRS in the South hemisphere (South Africa, Australia, etc.) and for some polar projections among others.

Recent OGC standards mandate the use of axis order as defined by the authority. Oldest OGC standards used the (x, y) axis order instead, ignoring any authority specification. Many softwares still use the old (x, y) axis order, maybe because such uniformization makes CRS implementation and usage apparently easier. Apache SIS supports both conventions with the following approach: by default, SIS creates CRS with axis order as defined by the authority. Those CRS are created by calls to the CRS.forCode(String) method and the actual axis order can be verified after the CRS creation with System.out​.println(crs). But if (x, y) axis order is wanted for compatibility with older OGC specifications or other softwares, then CRS forced to longitude first axis order can be created by a call to the following method:

CoordinateReferenceSystem crs = …;               // CRS obtained by any means.
crs = AbstractCRS.castOrCopy(crs).forConvention(AxesConvention.RIGHT_HANDED)

Among the legacy OGC standards that used the non-conform axis order, an influent one is version 1 of the Well Known Text (WKT) format specification. According that widely-used format, WKT 1 definitions without explicit AXIS[…] elements shall default to (longitude, latitude) or (x, y) axis order. In version 2 of WKT format, AXIS[…] elements are no longer optional and should contain an explicit ORDER[…] sub-element for making the intended order yet more obvious. But if AXIS[…] elements are nevertheless missing in a WKT 2 definition, Apache SIS defaults to (latitude, longitude) order. So in summary:

To avoid ambiguities, users are encouraged to always provide explicit AXIS[…] elements in their WKT. The WKT format will be presented in more details in the next sections.

2.1.4. Geographic reference systems

TODO

2.1.4.1. Well-Known Text format

TODO

2.1.5. Map projections

Map projections represent a curved surface (the Earth) on a plane surface (a map or a computer screen) with some control over deformations: one can preserve either the angles or the areas, but not both in same time. The geometric properties to preserve depend on the feature to represent and the work to do on that feature. For example countries elongated along the East-West axis often use a Lambert projection, while countries elongated along the North-South axis prefer a Transverse Mercator projection.

TODO

2.1.5.1. Well-Known Text format

TODO

2.1.6. Vertical and temporal dimensions

TODO

2.1.6.1. Well-Known Text format

TODO

2.2. Fetching a spatial reference system

TODO

2.2.1. Looking CRS defined by authorities

TODO

2.2.2. Reading definitions in GML or WKT format

TODO

2.2.3. Constructing programmatically

TODO

2.2.4. Adding new CRS definitions

TODO

2.3. Coordinate operations

Given a source coordinate reference system (CRS) in which existing coordinate values are expressed, and a target coordinate reference system in which coordinate values are desired, Apache SIS can provide a coordinate operation performing the conversion or transformation work. The search for coordinate operations may use a third argument, optional but recommended, which is the geographic area of the data to transform. That later argument is recommended because coordinate operations are often valid only in a some geographic area (typically a particular country or state), and many transformations may exist for the same pair of source and target CRS but different domain of validity. Different coordinate operations may also be different compromises between accuracy and their domain of validity, and specifying a smaller area of interest may allow Apache SIS to select a more accurate operation.

Example: the EPSG geodetic dataset (as of version 7.9) defines 77 coordinate operations from the North American Datum 1927 (EPSG:4267) coordinate reference system to the World Geodetic System 1984 (EPSG:4326) CRS. There is one operation valid only for coordinate transformations in Québec, another operation valid for coordinate transformations in Texas west of 100°W, another operation for the same state but east of 100°W, etc. If the user did not specified any geographic area of interest, then Apache SIS defaults on the coordinate operation which is valid in the largest area. In this example, the “largest area” criterion results in the selection of a coordinate operation valid for Canada, not USA.

The easiest way to obtain a coordinate operation from above-cited information is to use the org.apache.sis.referencing.CRS convenience class:

CoordinateOperation cop = CRS.findOperation(sourceCRS, targetCRS, areaOfInterest);

Among the information provided by CoordinateOperation object, the following are of special interest:

If the coordinate operation is an instance of Transformation, then the instance selected by SIS may be one among many possibilities depending on the area of interest. Furthermore its accuracy is certainly less than the centimetric accuracy that we can expect from a Conversion. Consequently verifying the domain of validity and the positional accuracy declared in the transformation metadata is of particular importance.

2.3.1. Executing an operation on coordinate values

The CoordinateOperation object introduced in above section provides high-level informations (source and target CRS, domain of validity, positional accuracy, operation parameters, etc). The actual mathematical work is performed by a separated object obtained by a call to CoordinateOperation​.getMathTransform(). At the difference of CoordinateOperation instances, MathTransform instances do not carry any metadata. They are kind of black box which know nothing about the source and target CRS (actually the same MathTransform can be used for different pairs of CRS if the mathematical work is the same), domain or accuracy. Furthermore MathTransform may be implemented in a very different way than what CoordinateOperation said. In particular many conceptually different coordinate operations (e.g. longitude rotations, change of units of measurement, conversions between two Mercator projections on the same datum, etc.) are implemented by MathTransform as affine transforms and concatenated for efficiency, even if CoordinateOperation reports them as a chain of Mercator and other operations. The “conceptual versus real chain of coordinate operations” section explains the differences in more details.

The following Java code performs a map projection from geographic coordinates on the World Geodetic System 1984 (WGS84) datum coordinates in the WGS 84 / UTM zone 33N coordinate reference system. In order to make the example a little bit simpler, this code uses predefined constants given by the CommonCRS convenience class. But more advanced applications will typically use EPSG codes instead. Note that all geographic coordinates below express latitude before longitude.

import org.opengis.geometry.DirectPosition;
import org.opengis.referencing.crs.CoordinateReferenceSystem;
import org.opengis.referencing.operation.CoordinateOperation;
import org.opengis.referencing.operation.TransformException;
import org.opengis.util.FactoryException;
import org.apache.sis.referencing.CRS;
import org.apache.sis.referencing.CommonCRS;
import org.apache.sis.geometry.DirectPosition2D;

public class MyApp {
    public static void main(String[] args) throws FactoryException, TransformException {
        CoordinateReferenceSystem sourceCRS = CommonCRS.WGS84.geographic();
        CoordinateReferenceSystem targetCRS = CommonCRS.WGS84.UTM(40, 14);  // Get whatever zone is valid for 14°E.
        CoordinateOperation operation = CRS.findOperation(sourceCRS, targetCRS, null);

        // The above lines are costly and should be performed only once before to project many points.
        // In this example, the operation that we got is valid for coordinates in geographic area from
        // 12°E to 18°E (UTM zone 33) and 0°N to 84°N.

        DirectPosition ptSrc = new DirectPosition2D(40, 14);           // 40°N 14°E
        DirectPosition ptDst = operation.getMathTransform().transform(ptSrc, null);

        System.out.println("Source: " + ptSrc);
        System.out.println("Target: " + ptDst);
    }
}

2.3.2. Partial derivatives of coordinate operations

Previous section shows how to project a coordinate from one reference system to another one. There is another, less known, operation which does not compute the projected coordinates of a given point, but instead the derivative of the projection function at that point. This operation was defined in an older Open Geospatial specification, OGC 01-009, now considered obsolete but still useful. Let P be a map projection converting degrees of latitude and longitude (φλ) into projected coordinates (xy) in metres. The formula below represents the map projection result as a column matrix (reason will become clearer soon):

The map projection partial derivate at this point can be represented by a Jacobian matrix:

Remaining equations in this section will abridge ∂x(λφ) as ∂x and ∂y(λφ) as ∂y, but reader should keep in mind that each of those derivative values depends on the (λφ) coordinate given at Jacobian matrix calculation time. The first matrix column tells us that if we apply a small displacement of ∂φ degrees of latitude from the (φλ) position, — in other words if we move at the (φ + ∂φ, λ) geographic position — then the projected coordinate will be displaced by (∂x, ∂λ) metres — in other words it will become (x + ∂x, y + ∂λ). Similarly the last matrix column gives us the displacement that happen on the projected coordinate if we apply a small displacement of ∂λ degrees of longitude on the source geographic coordinate. We can visualize such displacements in a figure like below. This figure shows the derivative at two points, P1 and P2, for emphasing that the result change for every points. In that figure, vectors U et V stand for the first and second column respectively in the Jacobian matrices.

TODO

2.3.2.1. Transform derivatives applied to envelopes

TODO

TODO

TODO

2.3.2.2. Transform derivatives applied to rasters

TODO

TODO

TODO

Intersection of derivatives

TODO

2.3.2.3. Getting the derivative at a point

TODO Example:

AbstractMathTransform projection = ...;         // An Apache SIS map projection.
double[] sourcePoint = {longitude, latitude};   // The geographic coordinate to project.
double[] targetPoint = new double[2];           // Where to store the projection result.
Matrix   derivative  = projection.transform(sourcePoint, 0, targetPoint, 0, true);

TODO

@Override
public Matrix derivative(DirectPosition p) throws TransformException {
    Matrix jac = inverse().derivative(transform(p));
    return Matrices.inverse(jac);
}

2.3.3. Conceptual versus real chain of coordinate operations

Coordinate operations may include many steps, each with their own set of parameters. For example transformations from one datum (e.g. NAD27) to another datum (e.g. WGS84) can be approximated by an affine transform (translation, rotation and scale) applied on the geocentric coordinates. This implies that the coordinates must be converted from geographic to geocentric domain before the affine transform, then back to geographic domain after the affine transform. The result is a three-steps process illustrated in the “Conceptual chain of operations” column of the example below. However because that operation chain is very common, the EPSG geodetic dataset provides a shortcut named “Geocentric translation in geographic domain”. Using this operation, the conversion steps between geographic and geocentric CRS are implicit. Consequently the datum shifts as specified by EPSG appears as if it was a single operation, but this is not the real operation executed by Apache SIS.

Example: transformation of geographic coordinates from NAD27 to WGS84 in Canada can be approximated by the EPSG:1172 coordinate operation. This single EPSG operation is actually a chain of three operations in which two steps are implicit. The operation as specified by EPSG is shown in the first column below. The same operation with the two hidden steps made explicit is shown in the second column. The last column shows the same operation as implemented by Apache SIS under the hood, which contains additional operations discussed below. For all columns, input coordinates of the first step and output coordinates of the last step are (latitude, longitude) coordinates in degrees.

Operation specified by EPSG:
  1. Geocentric translation in geographic domain
    • X-axis translation = -10 m
    • Y-axis translation = 158 m
    • Z-axis translation = 187 m
Conversions between geographic and geocentric domains are implicit. The semi-major and semi-minor axis lengths required for those conversions are inferred from the source and target datum.
Conceptual chain of operations:
  1. Geographic to geocentric conversion
    • Source semi-major = 6378206.4 m
    • Source semi-minor = 6356583.8 m
  2. Geocentric translation
    • X-axis translation = -10 m
    • Y-axis translation = 158 m
    • Z-axis translation = 187 m
  3. Geocentric to geographic conversion
    • Target semi-major = 6378137.0 m
    • Target semi-minor ≈ 6356752.3 m
Axis order and units are implicitly defined by the source and target CRS. It is implementation responsibility to perform any needed unit conversions and/or axis swapping.
Operations actually performed by Apache SIS:
  1. Affine parametric conversion
    • Scale factors (λ and φ) = 0
    • Shear factors (λ and φ) = π/180
  2. Ellipsoid (radians domain) to centric conversion
    • Eccentricity ≈ 0.08227
  3. Affine parametric transformation
    • Scale factors (X, Y and Z) ≈ 1.00001088
    • X-axis translation ≈ -1.568 E-6
    • Y-axis translation ≈ 24.772 E-6
    • Z-axis translation ≈ 29.319 E-6
  4. Centric to ellipsoid (radians domain) conversion
    • Eccentricity ≈ 0.08182
  5. Affine parametric conversion
    • Scale factors (λ and φ) = 0
    • Shear factors (λ and φ) = 180/π

The operation chain actually performed by Apache SIS is very different than the conceptual operation chain because the coordinate systems are not the same. Except for the first and last ones, all Apache SIS steps work on right-handed coordinate systems (as opposed to the left-handed coordinate system when latitude is before longitude), with angular units in radians (instead than degrees) and linear units relative to an ellipsoid of semi-major axis length of 1 (instead than Earth’s size). Working in those coordinate systems requires additional steps for unit conversions and axes swapping at the beginning and at the end of the chain. Apache SIS uses affine parametric conversions for this purpose, which allow to combine axes swapping and unit conversions in a single step (see affine transform for more information). The reason why Apache SIS splits conceptual operations in such fine-grained operations is to allow more efficient concatenations of operation steps. This approach often allows cancellation of two consecutive affine transforms, for example a conversion from radians to degrees (e.g. after a geocentric to ellipsoid conversion) immediately followed by a conversion from degrees to radians (e.g. before a map projection). Another example is the Affine parametric transformation step above, which combines both the geocentric translation step and a scale factor implied by the ellipsoid change.

All those operation chains can be viewed in Well Known Text (WKT) or pseudo-WKT format. The simplest operation chain, as specified by the authority, is given directly by the String representation of the CoordinateOperation instance. This WKT 2 representation contains not only a description of operations with their parameter values, but also additional information about the context in which the operation applies (the source and target CRS) together with some metadata like the accuracy and domain of validity. Some operation steps and parameters may be omitted if they can be inferred from the context.

Example: the WKT 2 representation on the right is for the same coordinate operation than the one used in previous example. This representation can be obtained by a call to System.out​.println(cop) where cop is a CoordinateOperation instance. Some characteristics of this representation are:

  • The SourceCRS and TargetCRS elements determine axis order and units. For this reason, axis swapping and unit conversions do not need to be represented in this WKT.

  • The “Geocentric translation in geographic domain” operation implies conversions between geographic and geocentric coordinate reference systems. Ellipsoid semi-axis lengths are inferred from above SourceCRS and TargetCRS elements, so they do not need to be specified in this WKT.

  • The operation accuracy (20 metres) is much greater than the numerical floating-point precision. This kind of metadata could hardly be guessed from the mathematical function alone.

CoordinateOperation["NAD27 to WGS 84 (3)",
  SourceCRS[full CRS definition required here but omitted for brevity],
  TargetCRS[full CRS definition required here but omitted for brevity],
  Method["Geocentric translations (geog2D domain)"],
    Parameter["X-axis translation", -10.0, Unit["metre", 1]],
    Parameter["Y-axis translation", 158.0, Unit["metre", 1]],
    Parameter["Z-axis translation", 187.0, Unit["metre", 1]],
  OperationAccuracy[20.0],
  Area["Canada - onshore and offshore"],
  BBox[40.04, -141.01, 86.46, -47.74],
  Id["EPSG", 1172, "8.9"]]

An operation chain closer to what Apache SIS really performs is given by the String representation of the MathTransform instance. In this WKT 1 representation, contextual information and metadata are lost; a MathTransform is like a mathematical function with no knowledge about the meaning of the coordinates on which it operates. Since contextual information are lost, implicit operations and parameters become explicit. This representation is useful for debugging since any axis swapping operation (for example) become visible. Apache SIS constructs this representation from the data structure in memory, but convert them in a more convenient form for human, for example by converting radians to degrees.

Example: the WKT 1 representation on the right is for the same coordinate operation than the one used in previous example. This representation can be obtained by a call to System.out​.println(cop​.getMathTransform()) where cop is a CoordinateOperation instance. Some characteristics of this representation are:

  • Since there is not anymore (on intend) any information about source and target CRS, axis swapping (if needed) and unit conversions must be performed explicitly. This is the task of the first and last affine operations in this WKT.

  • The “Geocentric translation” operation is not anymore applied in the geographic domain, but in the geocentric domain. Consequently conversions between geographic and geocentric coordinate reference systems must be made explicit. Those explicit steps are also necessary for specifying the ellipsoid semi-axis lengths, since they can not anymore by inferred for source and target CRS.

  • Conversions between geographic and geocentric coordinates are three-dimensional. Consequently operations for increasing and reducing the number of dimensions are inserted. By default the ellipsoidal height before conversion is set to zero.

Concat_MT[
  Param_MT["Affine parametric transformation",
    Parameter[parameters performing axis swapping omitted for brevity]],
  Inverse_MT[Param_MT["Geographic3D to 2D conversion"]],
  Param_MT["Geographic/geocentric conversions",
    Parameter["semi_major", 6378206.4],
    Parameter["semi_minor", 6356583.8]],
  Param_MT["Geocentric translations (geocentric domain)",
    Parameter["X-axis translation", -10.0],
    Parameter["Y-axis translation", 158.0],
    Parameter["Z-axis translation", 187.0]],
  Param_MT["Geocentric_To_Ellipsoid",
    Parameter["semi_major", 6378137.0],
    Parameter["semi_minor", 6356752.314245179]],
  Param_MT["Geographic3D to 2D conversion"],
  Param_MT["Affine parametric transformation",
    Parameter[parameters performing axis swapping omitted for brevity]]]

Finally, the raw operation chain can be view by a call to AbstractMathTransform​.toString(Convention.INTERNAL). This pseudo-WKT representation shows exactly what Apache SIS does, but is rarely used because difficult to read. It may occasionally be useful for advanced debugging.

3. Geometries

This chapter introduces a few aspects of ISO 19107 standard (Spatial schema) and the Apache SIS classes that implement them.

3.1. Base classes

Each geometric object is considered an infinite set of points. As a set, their most fundamental operations are of the same nature as the standard operations of Java collections. We may therefore see a geometry as a kind of java.util.Set in which the elements are points, except that the number of elements contained in the set is infinite (with the exception of geometries representing a simple point). To better represent this concept, the ISO standard and GeoAPI define a TransfiniteSet interface which we could see as a Set of infinite size. Although a parent relationship exists conceptually between these interfaces, GeoAPI does not define TransfiniteSet as a sub-interface of java.util.Set, as the definition of certain methods such as size() and iterator() would be problematic. However, we find very similar methods such as contains(…) and intersects(…).

All geometries are specializations of TransfiniteSet. The parent class of those geometries is called GM_Object in ISO 19107 standard. GeoAPI interfaces use the Geometry name instead, as the omission of the GM_ prefix (as prescribed in GeoAPI convention) would leave a name too similar to Java's Object class.

3.1.1. Direct points and positions

ISO 19107 defines two types of structures to represent a point: GM_Point and DirectPosition. The first type is a true geometry and may therefore be relatively cumbersome, depending on the implementation. The second type is not formally considered to be a geometry; it extends neither GM_Object nor TransfiniteSet. It barely defines any operations besides the storing of a sequence of numbers representing a coordinate. It may therefore be a more lightweight object.

In order to allow the API to work equally with these two types of positions, ISO 19107 defines Position as a union of DirectPosition and GM_Point. It is a union in the sense of C/C++. For the Java language, GeoAPI obtains the same effect by defining Position as the parent interface of DirectPosition and Point. In practice, the great majority of Apache SIS's API works on DirectPositions, or occasionally on Positions when it seems useful to also allow geometric points.

3.1.2. Envelopes

Envelopes store minimal and maximal coordinate values of a geometry. Envelopes are not geometries themselves; they are not infinite sets of points (TransfiniteSet). There is no guarantee that all the positions contained within the limits of an envelope are geographically valid. Envelopes must be seen as information about extreme values that might take the coordinates of a geometry as if each dimension were independent of the others, nothing more. Nevertheless, we speak of envelopes as rectangles, cubes or hyper-cubes (depending on the number of dimensions) in order to facilitate discussion, while bearing in mind their non-geometric nature.

Example: We could test whether a position is within the limits of an envelope. A positive result does not guarantee that the position is within the geometry delimited by the envelope, but a negative result guarantees that it is outside the envelope. We can perform intersection tests in the same way. On the other hand, it makes little sense to apply a rotation to an envelope, as the result may be very different from that which we would obtain be performing a rotation on the original geometry, and then recalculating its envelope.

An envelope might be represented by two positions corresponding to two opposite corners of a rectangle, cube or hyper-cube. For the first corner, we often take the one whose ordinates all have the maximal value (upperCorner). When displayed using a conventional system of coordinates (with y axis values running upwards), these two positions appear respectively in the lower left corner and the upper right corner of a rectangle. Care must be taken with different coordinate systems, however, which may vary the positions of these corners on the screen. The expressions lower corner and upper corner should thus be understood in the mathematical rather than the visual sense.

3.1.2.1. Envelopes that cross the antimeridian

Minimums and maximums are the values most often assigned to lowerCorner and upperCorner. But the situation becomes complicated when an envelope crosses the antimeridian (-180° or 180° longitude). For example, an envelope 10° in size may begin at 175° longitude and end at -175°. In this case, the longitude value assigned to lowerCorner is greater than that assigned to upperCorner. Apache SIS therefore uses a slightly different definition of these two corners:

If the envelope does not cross the antimeridian, these two definitions are equivalent to the selection of minimal and maximal values respectively. This is the case in the green rectangle in the figure below. When the envelope crosses the antimeridian, the lowerCorner and the upperCorner appear again at the bottom and top of the rectangle (assuming a standard system of coordinates), so their names remain appropriate from a visual standpoint. However, the left and right positions are switched. This case is illustrated by the red rectangle in the figure below.

Envelope example with and without anti-meridian spanning.

The notions of inclusion and intersection, however, interpreted slightly differently in these two cases. In the usual case where we do not cross the antimeridian, the green rectangle covers a region of inclusion. The regions excluded from this rectangle continue on to infinity in all directions. In other words, the region of inclusion is not repeated every 360°. But in the case of the red rectangle, the information provided by the envelope actually covers a region of exclusion between the two edges of the rectangle. The region of inclusion extends to infinity to the left and right. We could stipulate that all longitudes below -180° or above 180° are considered excluded, but this would be an arbitrary decision that would not be an exact counterpart to the usual case (green rectangle). A developer may wish to use these values, for example, in a mosaic where the map of the world is repeated several times horizontally and each repetition is considered distinct. If developers wish to perform operations as though the regions of inclusion or exclusion were repeated every 360°, they themselves will have to bring the longitudinal values between -180° and 180° in advance. All the add(…), contains(…), intersect(…), etc. functions of all the envelopes defined in the org.apache.sis.geometry package perform their calculations according to this convention.

In order for functions such as add(…) to work correctly, all objects involved must use the same coordinate reference system, including the same range of values. Thus an envelope that expresses longitudes in the range [-180 … +180]° is not compatible with an envelope that expresses longitudes in the range [0 … 360]°. The conversions, if necessary, are up to the user (the Envelopes class provides convenience methods to do this). Moreover, the envelope's coordinates must be included within the system of coordinates, unless the developer explicitly decides to consider (for example) 300° longitude as a position distinct from -60°. The GeneralEnvelope class provides a normalize() method to bring coordinates within the desired limits, sometimes at the coast of lower values being higher than upper values.

4. Data coverages

Images, or rasters, are a particular case of a data structure called a coverage. We could think of this as a “coverage of data.” The title of the ISO 19123 standard that describes them, “Coverage Geometry and Functions”, nicely summarizes the two essential elements of coverages:

The characteristics of the spatial domain are defined by ISO 19123 standard, while the characteristics of range are not included in the standard. The standard simply mentions that ranges may be finite or infinite, and are not necessarily numerical. For example, the values returned by a coverage may come from an enumeration (“this is a forest,” “this is a lake,” etc.). However, the standard defines two important types of coverage which have an impact on the types of authorized ranges: discrete coverages and continuous coverages. Stated simply, continuous coverages are functions that can use interpolation methods. Thus, since interpolations are only possible with numeric values, the ranges of non-numeric values may only be used with coverages of the CV_DiscreteCoverage type.

5. Utility classes and methods

This chapter describes aspects of Apache SIS that apply to the entire library. Most of these utilities are not specific to spatial information systems.

5.1. Comparison modes of objects

There are various opinions on how to implement Java standard’s Object​.equals(Object) method. According to some, it should be possible to compare different implementations of the same interface or base class. But to follow this policy, each interface or base class’s javadoc must define the algorithms that all implementations shall use for their equals(Object) and hashCode() methods. This approach is common in java.util.Collection and its child interfaces. Transferring this approach to certain GeoAPI interfaces, however, would be a difficult task, and would probably not be followed in many implementations. Moreover, it comes at the expense of being able to take into account supplementary attributes in the child interfaces, unless this possibility has been specified in the parent interface. This constraint arises from the following points of the equals(Object) and hashCode() method contracts:

For example, these three constraints are violated if A (and eventually C) can contain attributes which B ignores. To bypass this problem, an alternative approach is to require that the objects compared by the Object​.equals(Object) method be of the same class; in other words, A.getClass() == B.getClass(). This approach is sometimes regarded as contrary to the principles of object oriented programming. In practice, for relatively complex applications, the important accorded to these principles depends on the context in which the objects are compared: if the objects are added to a HashSet or used as keys in a HashMap, we would need a stricter adherence to the equals(Object) and hashCode() contract. But if the developer is comparing the objects his- or herself, for example to check that the relevant information has been changed, then the constraints of symmetry, transitivity or coherence with the hash values may be of little interest. More permissive comparisons may be desirable, sometimes going so far as to tolerate minor discrepancies in numerical values.

In order to allow developers a certain amount of flexibility, many classes in the SIS library implement the org.apache.sis.util.LenientComparable interface, which defines a equals(Object, ComparisonMode) method. The principle modes of comparison are:

The default mode, used in all equals(Object) methods in SIS, is STRICT. This mode is chosen for a safe operation — particularly with HashMap — without the need to rigorously define equals(Object) and hashCode() operations in every interface. With this mode, the order of objects (A.equals(B) or B.equals(A)) is unimportant. It is, however, the only mode that offers this guarantee. In the expression A.equals(B), the BY_CONTRACT mode (and so by extension all other modes that depend on it) only compares the properties known to A, regardless of whether B knows more.

5.2. Object converters

There is sometime a need to convert instances from a source Java type to a target Java type while those types are unknown at compile time. Various projects (Apache Common Convert, Spring, etc.) have created their own interface for performing object conversions between types known only at runtime. Details vary, but such interfaces typically look like below:

interface ObjectConverter<S,T> {   // Some projects use only "Converter" as interface name.
    T apply(S object);             // Another method name commonly found in other projects is "convert".
}

Like other projects, Apache SIS also defines its own ObjectConverter interface. The main difference between SIS converter interface and the interfaces found in other projects is that SIS converters provide some information about their mathematical properties. An Apache SIS converter can have zero, one or many of the following properties:

Injective
A function is injective if no pair of S values can produce the same T value.

Example: the IntegerString conversion performed by Integer​.toString() is an injective function because if two Integer values are not equal, then it is guaranteed that their conversions will result in different String values. However the StringInteger conversion performed by Integer​.valueOf(String) is not an injective function because many distinct String values can be converted to the same Integer value. For example converting the "42", "+42" and "0042" character strings all result in the same 42 integer value.

Surjective
A function is surjective if each values of T can be created from at least one value of S.

Example: the StringInteger conversion performed by Integer​.valueOf(String) is a surjective function because every Integer value can be created from at least one String value. However the IntegerString conversion performed by Integer​.toString() is not a surjective function because it can not produce all possible String values. For example there is no way to produce the "ABC" value with the Integer​.toString() method.

Bijective
A function is bijective if there is a one-to-one relationship between S and T values.

Note: the bijective property is defined here for clarity, but actually does not have an explicit item in Apache SIS FunctionProperty enumeration. It is not necessary since a function that is both injective and surjective is necessarily bijective.

Order preserving
A function is order preserving if any sequence of increasing S values is mapped to a sequence of increasing T values.

Example: conversion from Integer to Long preserve the natural ordering of elements. However conversions from Integer to String do not preserve natural ordering, because some sequences of increasing integer values are ordered differently when their string representations are sorted by lexicographic order. For example 1, 2, 10 become "1", "10", "2".

Order reversing
A function is order reversing if any sequence of increasing S values is mapped to a sequence of decreasing T values.

Example: a conversion that reverses the sign of numbers.

Above information may seem unnecessary when values are converted without taking in account the context in which the values appear. But when the value to convert is part of a bigger object, then above information can affect the way the converted value will be used. For example conversion of a [minmax] range is straightforward when the converter is order preserving. But if the converter is order reversing, then the minimum and maximum values need to be interchanged. For example if the converter reverses the sign of values, then the converted range is [-max … -min]. If the converter is neither order preserving or order reversing, then range conversion is not allowed at all (because it does not contain the same set of values) even if the minimum and maximum values could be converted individually.

5.3. Internationalization

In an architecture where a program executed on a server provides its data to multiple clients, the server’s locale conventions are not necessarily the same as those of the clients. Conventions may differ in language, but also in the way they write numeric values (even between two countries that speak the same language) as well in time zone. To produce messages that conform to the client’s conventions, SIS uses two approaches, distinguished by their level of granularity: at the level of the messages themselves, or at the level of the objects that create the messages. The approach used also determines whether it is possible to share the same instance of an object for all languages.

5.3.1. Distinct character sequences for each locale

Some classes are only designed to function according to one locale convention at a time. This is of course true for the standard implementations of java.text.Format, as they are entirely dedicated to the work of internationalization. But it is also the case for other less obvious classes like javax.imageio.ImageReader/ImageWriter and for Exception subclasses. When one of these classes is implemented by SIS, we identify it by implementing the org.apache.sis.util.Localized interface. The getLocale() method of this interface can determine the locale conventions by which the instance produces its message.

Some sub-classes of Exception defined by SIS also implement the Localized interface. For these exceptions, the error message may be produced according to two locale conventions, for either the administrator or the client respectively: getMessage() returns the exception message according to the system default conventions, while getLocalizedMessage() returns the exception message according to the locale conventions specified by getLocale(). This Locale will be determined by the Localized object that threw the exception.

Example: Given an environment in which the default language is English and an AngleFormat object is created to read angles according to French conventions. If a ParseException is thrown when using this formatter, getMessage() returns the error message in English, while getLocalizedMessage() returns the error message in French.

The exceptions defined by SIS do not implement all of the Localized interface. Only those most likely to be shown to the user are localized in this way. ParseException are good candidates because they often occur due to an incorrect entry by the client. By contrast, NullPointerException are generally caused by a programming error; they may be localized in the system default language, but that is usually all.

The utility class org.apache.sis.util.Exceptions provides convenience methods to get messages according to the conventions of a given locale, when this information is available.

5.3.2. Single instance for all supported locales

The API conventions defined by SIS or inherited by GeoAPI favour the use of the InternationalString type when the value of a String type would likely be localized. This approach allows us to defer the internationalization process to the time when a character sequence is requested, rather than the time when the object that contains them is created. This is particularly useful for immutable classes used for creating unique instances independently of locale conventions.

Example: SIS includes only one instance of the OperationMethod type representing the Mercator projection, regardless of the client’s language. But its getName() method (indirectly) provides an instance of InternationalString, so that toString(Locale.ENGLISH) returns Mercator projection while toString(Locale.FRENCH) returns Projection de Mercator.

When defining spatial objects independently of locale conventions, we reduce the risk of computational overload. For example, it is easier to detect that two maps use the same cartographic projection if this last is represented by the same instance of CoordinateOperation, even if the projection has a different name depending on the country. Moreover, certain types of CoordinateOperation may require coordinate transformation matrices, so sharing a single instance becomes even more preferable in order to reduce memory consumption.

5.3.3. Locale.ROOT convention

All SIS methods receiving or returning the value of a Locale type accept the Locale.ROOT value. This value is interpreted as specifying not to localize the text. The notion of a non-localized text is a little false, as it is always necessary to chose a formatting convention. This convention however, though very close to English, is usually slightly different. For example:

5.3.4. Treatment of characters

In Java, sequences of characters use UTF-16 encoding. There is a direct correspondence between the values of the char type and the great majority of characters, which facilitates the use of sequences so long as these characters are sufficient. However, certain Unicode characters cannot be represented by a single char. These supplementary characters include certain ideograms, but also road and geographical symbols in the 1F680 to 1F700 range. Support for these supplementary characters requires slightly more complex interactions than the classic case, where we may assume a direct correspondence. Thus, instead of the loop on the left below, international applications must generally use the loop on the right:

SIS supports supplementary characters by using the loop on the right where necessary, but the loop on the left is occasionally used when it is known that the characters searched for are not supplementary characters, even if some may be present in the sequence in which we are searching.

5.3.4.1. Blank spaces interpretation

Standard Java provides two methods for determining whether a character is a blank space: Character​.isWhitespace(…) and Character​.isSpaceChar(…). These two methods differ in their interpretations of non-breaking spaces, tabs and line breaks. The first method conforms to the interpretation currently used in languages such as Java, C/C++ and XML, which considers tabs and line breaks to be blank spaces, while non-breaking spaces are read as not blank. The second method — which conforms strictly to the Unicode definition — makes the opposite interpretation.

SIS uses each of these methods in different contexts. isWhitespace(…) is used to separate the elements of a list (numbers, dates, words, etc.), while isSpaceChar(…) is used to ignore blank spaces inside a single element.

Example: Take a list of numbers represented according to French conventions. Each number may contain non-breaking spaces as thousands separators, while the different numbers in the list may be separated by ordinary spaces, tabs or line breaks. When analyzing a number, we want to consider the non-breaking spaces as being part of the number, whereas a tab or a line break most likely indicates a separation between this number and the next. We would thus use isSpaceChar(…). Conversely, when separating the numbers in the list, we want to consider tabs and line breaks as separators, but not non-breaking spaces. We would thus use isWhitespace(…). The role of ordinary spaces, to which either case might apply, should be decided beforehand.

In practice, this distinction is reflected in the use of isSpaceChar(…) in the implementations of java.text.Format, or the use of isWhitespace(…) in nearly all the rest of the SIS library.

6. Representing objects in XML

Objects defined by OGC/ISO standards must be able to communicate with remote machines via the Internet, using different software written in different languages. Some of the better known formats include WKT (Well-Known Text) and WKB (Well-Known Binary). But the most exhaustive and often referred format is XML, to the point where the representation of ISO objects in this format is itself sometimes the entire focus of an international standard. Thus are metadata classes described in ISO 19115-1 standard (an abstract specification), while the representation of these classes in XML is described in ISO 19115-3 and 19139 standards.

Different OGC/ISO standards do not always use the same strategy to express objects in XML. ISO 19115-3 standard in particular uses a more verbose approach than other standards, and will be the subject of its own section. But most XML formats define supplementary types and attributes that are not part of the original abstract specifications. These supplementary attributes are usually specific to XML and may not appear in the API of Apache SIS. However, some of these attributes, such as id, uuid and xlink:href, remain accessible in the form of key-value pairs.

XML documents may use any prefixes, but the following prefixes are commonly used. They therefore appear by default in documents produced by Apache SIS. These prefixes are defined in the org.apache.sis.xml.Namespaces class.

Common XML namespace prefixes
Prefix Namespace
gco http://www.isotc211.org/2005/gco
gfc http://www.isotc211.org/2005/gfc
gmd http://www.isotc211.org/2005/gmd
gmi http://www.isotc211.org/2005/gmi
gmx http://www.isotc211.org/2005/gmx
gml http://www.opengis.net/gml/3.2
xlink http://www.w3.org/1999/xlink

6.1. Representing metadata according to ISO 19115-3

For each metadata class, there is an XML type with the same name than in the abstract specification (for example, gmd:MD_Metadata and gmd:CI_Citation). All of these types may be used as the root of an XML document. It is therefore possible to write a document representing a complete MD_Metadata object, or to write a document representing only a CI_Citation object.

ISO 19115-3 standard arranges the content of these objects in an unusual way: for each property whose value type is itself another class of ISO 19115, the value is wrapped in an element that represents its type, rather than being written directly. For example, in an object of the CI_Citation type, the value of the citedResponsibleParty property is incorporated into a CI_Responsibility element. This practice doubles the depth of the hierarchy, and introduces duplication at all levels for each value, as in the following example:

<MD_Metadata>
  <identificationInfo>
    <MD_DataIdentification>
      <citation>
        <CI_Citation>
          <citedResponsibleParty>
            <CI_Responsibility>
              <party>
                <CI_Party>
                  <contactInfo>
                    <CI_Contact>
                      <onlineResource>
                        <CI_OnlineResource>
                          <linkage>
                            <URL>http://www.opengeospatial.org</URL>
                          </linkage>
                        </CI_OnlineResource>
                      </onlineResource>
                    </CI_Contact>
                  </contactInfo>
                </CI_Party>
              </party>
            </CI_Responsibility>
          </citedResponsibleParty>
        </CI_Citation>
      </citation>
    </MD_DataIdentification>
  </identificationInfo>
</MD_Metadata>

The preceding example, like all documents that conform to ISO 19115-3, consists of a systematic alternation of two types of XML elements:

  1. It begins with the name of the property, which always begins with a lower-case letter (ignoring prefixes). In Java APIs, each property corresponds to a method in its enclosing class. In the example above, gmd:identificationInfo (a property of MD_Metadata class) corresponds to the Metadata​.getIdentificationInfo() method.

  2. The value type is included under each property, unless it has been replaced with a reference (the following sub-section will elaborate on this subject). The value type is an XML element whose name always begins with an upper-case letter, ignoring prefixes. In the example above we had MD_DataIdentification, which corresponds to the DataIdentification Java interface. It is this XML element that contains the child properties.

In order to reduce the complexity of the libraries, GeoAPI and Apache SIS only expose publicly a single unified view of these two types of elements. The public API basically corresponds to the second group. However, when writing an XML document, elements of the first group must be temporarily recreated. The corresponding classes are defined in internal SIS packages. These classes may be ignored, unless the developer wishes to implement his or her own classes whose instances must be written in JAXB.

6.1.1. Identification of already-defined instances

The parent element may contain an id or uuid attribute. If one of these attributes is present, the parent attribute may be completely omitted; it will be replaced at the time of reading by the element referenced by the attribute. In the following example, the part on the left defines an element associated with the identifier “my_id,” while the part on the right references this element:

The decision of which attribute to use depends on the scope of the referenced item:

In the SIS library, all objects that can be identified in an XML document implements the org.apache.sis.xml.IdentifiedObject interface. Each instance of this interface provides a view of its identifiers in the form of a Map<Citation,String>, in which the Citation key indicates the type of identifier and the value is the identifier itself. Some constants representing different types of identifiers are listed in IdentifierSpace, including ID, UUID and HREF. Each of these keys may be associated with a different type of value (usually String, UUID or URI) depending on the key. For example, the following code defines a value for the uuid attribute:

import org.apache.sis.metadata.iso.DefaultMetadata;
import org.apache.sis.xml.IdentifierSpace;
import java.util.UUID;

public class MyClass {
    public void myMethod() {
        UUID identifier = UUID.randomUUID();
        DefaultMetadata metadata = new DefaultMetadata();
        metadata.getIdentifierMap().putSpecialized(IdentifierSpace.UUID, identifier);
    }
}

Although this mechanism has been defined in order to better support the representation of XML attributes of the gco:ObjectIdentification group, it also conveniently allows other types of identifiers to be manipulated. For example, the ISBN and ISSN attributes of Citation may be manipulated in this way. The methods of the IdentifiedObject interface therefore provides a specific location where all types of identifiers (not only XML) associated with an object may be manipulated.

6.1.2. Representing missing values

When a property is not defined, the corresponding GeoAPI method usually returns null. However, things become complicated when the missing property is a value considered mandatory by ISO 19115 standard. ISO 19115-3 allows for the omission of mandatory properties so long as the reason for the missing value is indicated. The reason may be that the property is not applicable (inapplicable), that the value probably exists but is not known (unknown), that the value cannot exist (missing), or that the value cannot be revealed (withheld), etc. The transmission of this information requires the use of a non-nul object, even when the value is missing. SIS will then return an object that, besides implementing the desired GeoAPI interface, also implements the org.apache.sis.xml.NilObject interface. This interface flags the instances where all methods return an empty collection, an empty table, null, NaN, 0 or false, in this preference order, as permitted by the return types of the methods. Each instance that implements NilObject provides a getNilReason() method indicating why the object is nil.

In the following example, the left side shows a CI_Citation element containing a CI_Series element, while on the right side the series is unknown. If the CI_Series element had been completely omitted, then the Citation​.getSeries() method would return null in Java. But when a nilReason is present, the SIS implementation of getSeries() returns instead an object that implements both the Series and NilReason interfaces, and in which the getNilReason() method returns the constant UNKNOWN.

7. Annexes

7.1. Reduce direct dependency to Apache SIS

Previous chapters used Apache SIS static methods for convenience. In some cases, usage of those convenience methods can be replaced by Java code using only GeoAPI methods. Such replacements may be desirable for applications who want to reduce direct dependency toward Apache SIS, for example in order to ease migrations between SIS and other GeoAPI implementations. However this may require that applications write their own convenience methods. The following sections provide some tip for easing this task.

7.1.1. Mapping given by @UML annotations

For each class, method and constant defined by an OGC or ISO standard, GeoAPI indicates its provenance using annotations defined in the org.opengis.annotation package. This mapping is described in the chapter about GeoAPI. Java reflection methods allow access to this information during the execution of an application. Class org.apache.sis.util.iso.Types provides static convenience methods like getStandardName(Class), but one can avoid those methods. The following example displays the standard name for the method getTitle() from the Citation interface:

Class<?> type   = Citation.class;
Method   method = type.getMethod("getTitle", (Class<?>[]) null);
UML      annot  = method.getAnnotation(UML.class);
String   id     = annot.identifier();
System.out.println("The standard name for the " + method.getName() + " method is " + id);

The reverse operation — getting the Java class and method from a standard name — is a bit more complicated. It requires reading the class-index.properties file provided in the org.opengis.annotation package. The following example reads the files just before searching for the name of the interface corresponding to CI_Citation. Users are always encouraged to only read this file once and then save its contents in their application's cache.

Properties isoToGeoAPI = new Properties();
try (InputStream in = UML.class.getResourceAsStream("class-index.properties")) {
    isoToGeoAPI.load(in);
}
String isoName = "CI_Citation";
String geoName = getProperty(isoName);
Class<?>  type = Class.forName(geoName);
System.out.println("The GeoAPI interface for ISO type " + isoName + " is " + type);

The org.apache.sis.util.iso.Types convenience method for above task is forStandardName(String).

7.1.2. Fetching implementations of GeoAPI interfaces

GeoAPI defines factories (Factory) that can create implementations of interfaces. For example, DatumFactory provides methods that can create instances which implement interfaces of the org.opengis.referencing.datum package. A Factory must be implemented by a geospatial library, and declared as a service as defined by the java.util.ServiceLoader class. The ServiceLoader javadoc explains this procedure. In brief, the library must create a file in the META-INF/services/ directory, with a name corresponding to the complete name of an interface in the factory (in the preceding example, org.opengis.referencing.datum.DatumFactory). On one line, this text file must include the complete name of the class that implements this interface. This class may be hidden from users, as they do not need to know of its existence.

If the library has correctly declared its factories as services, users may import them by using ServiceLoader, as in the example below. This example only takes the first factory located; if there is more than one factory - for example when multiple libraries coexist — then the choice is left to the user.

import org.opengis.referencing.GeodeticDatum;
import org.opengis.referencing.DatumFactory;
import java.util.ServiceLoader;

public class MyApplication {
    public void createMyDatum() {
        ServiceLoader  loader = ServiceLoader.load(DatumFactory.class);
        DatumFactory  factory = loader.iterator().next();
        GeodeticDatum myDatum = factory.createGeodeticDatum(…);
    }
}

7.1.2.1. Defining custom implementations

Implementing GeoAPI oneself in order to meet very specific needs is not difficult. A developer might concentrate on a handful of interfaces among the hundreds available, while keeping other interfaces as extension points to eventually implement as needed.

The conceptual model that the interfaces represent is complex. But this complexity may be reduced by combining certain interfaces. For example, many libraries, even well-known ones, do not distinguish between a Coordinate System (CS) and a Coordinate Reference System (CRS). A developer that also wishes not to make this distinction may implement these two interfaces with the same class. The resulting implementation may have a simpler class hierarchy than that of GeoAPI interfaces. The geoapi-examples module, discussed later, provides such combinations. The following table lists a few possible combinations:

Main Interface Auxiliary Interface Use
CoordinateReferenceSystem CoordinateSystem Description of a spatial reference system (CRS).
GeodeticDatum Ellipsoid Description of the geodetic datum.
CoordinateOperation MathTransform Coordinate transformation operations.
IdentifiedObject ReferenceIdentifier An objet (usually a CRS) that we can identify by a code.
Citation InternationalString Bibliographic reference consisting of a simple title.
GeographicBoundingBox Extent Spatial area in degrees of longitude and latitude.
ParameterValue ParameterDescriptor Description of a parameter (name, type) associated with its value.
ParameterValueGroup ParameterDescriptorGroup Description of a set of parameters associated with their values.

The geoapi-examples module provides examples of simple implementations. Many of these classes implement more than one interface at a time in order to provide a simpler conceptual model. The javadoc for this module lists key packages and classes along with the combinations performed. This module illustrates not only how GeoAPI might be implemented, but also how the implementation might be tested using geoapi-conformance.

Although its primary goal is to serve as a source of inspiration for implementors, geoapi-examples was also designed so as to be usable by applications with very simple needs. As all the examples are in the public domain, developers are invited to freely adapt copies of these classes as necessary. However, if changes are made outside the framework of the GeoAPI project, fair use demands that modified copies be placed in a package with a different name than org.opengis.

For somewhat more involved needs, developers are invited to examine the geoapi-proj4 and geoapi-netcdf modules. These two modules provide examples of adaptors that are allowed, via GeoAPI interfaces, to use some of the features of external libraries (Proj.4 and NetCDF). The advantage of using these interfaces is to provide a unified model to operate two very different APIs, while retaining the ability to switch easily to another library if desired.

7.2. Test suites

In addition to its own tests, Apache SIS uses tests defined by GeoAPI. One advantages is that those tests provide an external source for the definition of expected results (for example the numerical values of coordinates obtained after a map projection). Such external source reduce the risk that some tests are actually anti-regression tests instead of correctness tests. Those tests can also be used by projects other than Apache SIS.

The geoapi-conformance module provides validators, a JUnit test suite, and report generators in the form of HTML pages. This module may be used with any GeoAPI implementation. For developers of a geospatial library, it offers the following advantages:

7.2.1. Instance validations

GeoAPI can validate an instance of its interfaces by checking that certain constraints are observed. Many constraints can not be expressed in the method signature. Those constraints are usually described textually in the abstract specifications or in the javadoc.

Example: A coordinate conversion or transformation (CC_CoordinateOperation) may require a sequence of several steps. In such a sequence of operations (CC_ConcatenatedOperation), for each step (CC_SingleOperation) the number of output dimensions must equal the number of input dimensions in the next operation. Expressed in Java, this constraint stipulates that for the entire index 0 < i < n where n is the number of operations, we have coordOperation[i].targetDimensions == coordOperation[i-1].sourceDimensions.

The easiest way to perform these verifications is to call the static methods validate(…) of the org.opengis.test.Validators class. As all of Validators methods bear the same name, it is enough to write “validate(value)” and then let the compiler choose the most appropriate method for the type of object given in argument. If the object type is not known at the time of compilation, the dispatch(Object) method can be invoked for redirecting the work to the most appropriate validate(…) method.

All validate(…) functions follow a chain of dependencies, meaning that they will also validate each component of the object to be validated. For example, the validation of a GeographicCRS implies the validation of its component GeodeticDatum, which itself implies the validation of its component Ellipsoid, and so on. Thus it is unnecessary to validate the components explicitely, unless the developer wishes to isolate the test for a particular item known to cause problems.

By default, validations are as strict as possible. It is always possible to relax certain rules. The most common is to tolerate the absence of attributes that would normally be mandatory. This rule and a few others may be modified globally for all tests executed by the currently running JVM, as in the following example:

import org.opengis.metadata.Metadata;
import org.opengis.test.Validators;
import org.junit.Test;

public class MyTest {
    /*
     * Tolerate the absence of mandatory attributes in metadata and citation packages.
     * This modification applies to all tests executed by the currently running JVM.
     * If there are multiple test classes, this initialization may be performed
     * in a parent class to all test classes.
     */
    static {
        Validators.DEFAULT.metadata.requireMandatoryAttributes = false;
        Validators.DEFAULT.citation.requireMandatoryAttributes = false;
    }

    @Test
    public void testMyMetadata() {
        Metadata myObject = …; // Create an object here.
        Validators.validate(myObject);
    }
}

Rules may also be modified for a particular test suite without affecting the default configuration of the standard JVM. This approach requires the creation of a new instance of the validator that we wish to modify the configuration.

import org.opengis.metadata.Metadata;
import org.opengis.test.ValidatorContainer;
import org.junit.Test;

public class MyTest {
    private final ValidatorContainer validators;

    public MyTest() {
        validators = new ValidatorContainer();
        validators.metadata.requireMandatoryAttributes = false;
        validators.citation.requireMandatoryAttributes = false;
    }

    @Test
    public void testMyMetadata() {
        Metadata myObject = …; // Create an object here.
        validators.validate(myObject);
    }
}

7.2.2. Executing pre-defined tests

JUnit tests are defined in the org.opengis.test sub-packages. All test classes bear a name ending in "Test". GeoAPI also provides an org.opengis.test.TestSuite class including all test classes defined in the geoapi-conformance module, but Apache SIS does not use it. Instead, Apache SIS inherits GeoAPI’s *Test classes on a case-by-case basis, in the appropriate modules. The example below gives an example of a customized GeoAPI test: The parent test javadoc documents the tests performed in detail. In this example, only one test is modified and all the others are inherited as they are (it is not necessary to repeat them in the sub-class). However, this example adds a supplemental verification, annotated with @After, which will be executed after each test.

import org.junit.*;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;
import org.opengis.test.referencing.ParameterizedTransformTest;
import static org.junit.Assert.*;

@RunWith(JUnit4.class)
public class MyTest extends ParameterizedTransformTest {
    /**
     * Specify our own coordinate transformation factory for the GeoAPI tests.
     * GeoAPI will test the objects created by this factory.
     */
    public MyTest() {
        super(new MyMathTransformFactory());
    }

    /**
     * Changes the behaviour of a test. This example relaxes the requirements of this test a little,
     * by accepting errors of up to 10 centimetres, rather than the default value of 1 cm.
     * This change only applies to this method, and does not affect the other inherited tests.
     */
    @Test
    @Override
    public void testLambertAzimuthalEqualArea() throws FactoryException, TransformException {
        tolerance = 0.1; // 10 cm tolerance.
        super.testLambertAzimuthalEqualArea();
    }

    /**
     * Supplemental verification performed after each test, inherited or not.
     * In this example, we are verifying that the transformation tested
     * works correctly in two-dimensional spaces.
     */
    @After
    public void ensureAllTransformAreMath2D() {
        assertTrue(transform instanceof MathTransform2D);
    }
}

7.3. Design notes

Following chapters explain the rational behind some implementation choices done in Apache SIS.

7.3.1. Affine transform

Among the many kinds of operations performed by GIS softwares on spatial coordinates, affine transforms are both relatively simple and very common. Affine transforms can represent any combination of scales, shears, flips, rotations and translations, which are linear operations. Affine transforms can not handle non-linear operations like map projections, but the affine transform capabilities nevertheless cover many other cases:

Affine transforms can be concatenated efficiently. No matter how many affine transforms are chained, the result can be represented by a single affine transform. This property is more easily seen when affine transforms are represented by matrices: in order to concatenate those operations, we only need to multiply those matrices. The “pixel to geographic coordinate conversions” case below gives an example.

Example: given an image with pixel coordinates represented by (x,y) tuples and given the following assumptions:

  • There is no shear, no rotation and no flip.
  • All pixels have the same width in degrees of longitude.
  • All pixels have the same height in degrees of latitude.
  • Pixel indices are positive integers starting at (0,0) inclusive.

Then conversions from pixel coordinates (x,y) to geographic coordinates (λ,φ) can be represented by the following equations, where Nx is the image width and Ny the image height in number of pixels:

λ = Sλx+Tλ         where         Sλ = λmax- λmin Nx     and     Tλ = λmin φ = Sφy+Tφ         where         Sφ = φmax- φmin Ny     and     Tφ = φmin

Above equations can be represented in matrix form as below:

λ φ 1 = Sλ 0 Tλ 0 Sφ Tφ 0 0 1 × x y 1

In this particular case, scale factors S are the pixel size in degrees and translation terms T are the geographic coordinate of an image corner (not necessarily the lower-left corner if some axes have been flipped). This straightforward interpretation holds because of above-cited assumptions, but matrix coefficients become more complex if the image has shear or rotation or if pixel coordinates do not start at (0,0). However it is not necessary to use more complex equations for supporting more generic cases. The following example starts with an “initial conversion” matrix where the S and T terms are set to the most straightforward values. Then the y axis direction is reversed for matching the most common convention in image coordinate systems (change 1), and axis are swapped resulting in latitude before longitude (change 2). Note that when affine transform concatenations are written as matrix multiplications, operations are ordered from right to left: A×B×C is equivalent to first applying operation C, then operation B and finally operation A.

A key principle is that there is no need to write Java code dedicated to above kinds of axis changes. Those operations, and many other, can be handled by matrix algebra. This approach makes easier to write generic code and improves performance. Apache SIS follows this principle by using affine transforms for every operations that can be performed by such transform. For instance there is no code dedicated to changing order of ordinate values in a coordinate.

7.3.1.1. Integration with graphical libraries

About all graphical libraries support some kind of coordinate operations, usually as affine transforms or a slight generalization like perspective transforms. Each library defines its own API. Some examples are listed below:

Affine transform implementations in graphical libraries
Library Transform implementation Dimensions
Java2D java.awt.geom.AffineTransform 2
Java3D javax.media.j3d.Transform3D 3
JavaFX javafx.scene.transform.Affine 2 or 3
Java Advanced Imaging (JAI) javax.media.jai.PerspectiveTransform 2
Android android.graphics.Matrix 2

However in many cases, affine or perspective transforms are the only kind of coordinate operations supported by the graphical library. Apache SIS needs to handle a wider range of operations, in which affine transforms are only special cases. In particular most map projections and datum shifts can not be represented by affine transforms. SIS also needs to support arbitrary number of dimensions, while above-cited API restrict the use to a fixed number of dimensions. For those reasons SIS can not use directly the above-cited API. Instead, SIS uses the more abstract org.opengis.referencing.transform.MathTransform interface. But in the special case where the transform is actually affine, SIS may try to use an existing implementation, in particular Java2D. The following Java code can be used in situations where the Java2D object is desired:

MathTransform mt = ...;    // Any math transform created by Apache SIS.
if (mt instanceof AffineTransform) {
    AffineTransform at = (AffineTransform) mt;
    // Use Java2D API from here.
}

Apache SIS uses Java2D on a best effort basis only. The above cast is not guaranteed to succeed, even when the MathTransform meets the requirements allowing Java2D usage.

7.3.2. Specificities of a matrix library for GIS

GIS make an extensive usage of matrices for displaying maps or for transforming coordinates. There is many excellent open source or commercial matrix libraries available. However, GIS have some specific needs that differ a little bit from the goals of many existent libraries. Matrix operations like those described in the affine transform chapter appear in almost all coordinate operations applied by Apache SIS. But the analysis of those operations reveal some patterns:

As a consequence of above points, accuracy of a matrix library is more important than performance for a GIS like Apache SIS. Paradoxically, a good way to improve performance is to invest more CPU time for more accurate matrix operations when preparing (not executing) the coordinate operation, because it increases the chances to correctly detect which operations cancel each other. This investment can save execution time at the place where it matters most: in the code looping over millions of coordinates to transform.

However matrix libraries are often designed for high performances with large matrices, sometime containing thousands of rows and columns. Those libraries can efficiently resolve systems of linear equations with hundreds of unknown variables. Those libraries resolve difficult problems, but not of the same kind than the problems that Apache SIS needs to solve. For that reason, and also for another reason described in next paragraphs, Apache SIS uses its own matrix implementation. This implementation addresses the accuracy issue by using “double-double” arithmetic (a technic for simulating the accuracy of approximatively 120 bits wide floating point numbers) at the cost of performance in a phase (transform preparation) where performance is not considered critical.

7.3.2.1. What to do with non-square matrices (and why)

Apache SIS often needs to inverse matrices, in order to perform a coordinate operation in reverse direction. Matrix inversions are typically performed on square matrices, but SIS also needs to inverse non-square matrices. Depending on whether we have more lines than columns:

To illustrate the issues caused by direct use of libraries designed for linear algebra, consider a (φ₁, λ₁, h) → (φ₂, λ₂) conversion from three-dimensional points to two-dimensional points on a surface. The φ terms are latitudes, the λ terms are longitudes and (φ₂, λ₂) may be different than (φ₁, λ₁) if h axis is not perpendicular to the surface.

φ2 λ2 1 = 1 0 0 0 0 1 0 0 0 0 0 1 × φ1 λ1 h 1

For linear algebra libraries, the above non-square matrix represents an under-determined system of equations and may be considered unresolvable. Indeed the above coordinate operation can not be inverted as a (φ₂, λ₂) → (φ₁, λ₁, h) operation because we do not know which value to assign to h. Ignoring h implies that we can not assign values to (φ₁, λ₁) neither since those values may depend on h. However in GIS case, the ellipsoidal h axis is perpendicular to the ellipsoid surface on which the geodetic latitudes and longitudes (φ, λ) are represented (note that this statement is not true for geocentric latitudes and longitudes). This perpendicularity makes φ₁ and λ₁ independent of h. In such cases, we can can still do some processing.

Apache SIS proceeds by checking if h values are independent of φ and λ values. We identify such cases by checking which matrix coefficients are zero. If SIS can identify independent dimensions, it can temporarily exclude those dimensions and invert the matrix using only the remaining dimensions. If SIS does not found a sufficient amount of independent dimensions, an exception is thrown. But if a matrix inversion has been possible, then we need to decide which value to assign to the dimensions that SIS temporarily excluded. By default, SIS assigns the NaN (Not-a-Number) value to those dimensions. However in the particular case of ellipsoidal height h in a (φ₂, λ₂) → (φ₁, λ₁, h) operation, the zero value may also be appropriate on the assumption that the coordinates are usually close to the ellipsoid surface. In any case, the coefficients that Apache SIS sets to zero or NaN is based on the assumption that the matrix represents a coordinate operation; this is not something that can be done with arbitrary matrices.

The above-described approach allows Apache SIS to resolve some under-determined equation systems commonly found in GIS. In our example using NaN as the default value, the h ordinate stay unknown – we do not create information from nothing – but at least the (φ, λ) coordinates are preserved. The opposite problem, those of over-determined equation systems, is more subtile. An approach commonly applied by linear algebra libraries is to resolve over-determined systems by the least squares method. Such method applied to our example would compute a (φ₂, λ₂, h) → (φ₁, λ₁) operation that seems the best compromise for various φ₂, λ₂ and h values, while being (except special cases) an exact solution for no-one. Furthermore linear combinations between those three variables may be an issue because of heterogenous units of measurement, for instance with h in metres and (φ, λ) in degrees. Apache SIS rather proceeds in the same way than for under-determined systems: by requiring that some dimensions are independent from other dimensions, otherwise the matrix is considered non-invertible. Consequently in over-determined systems case, SIS may refuse to perform some matrix inversions that linear algebra libraries can do, but in case of success the resulting coordinate operation is guaranteed to be exact (ignoring rounding errors).

7.3.2.2. Apache SIS matrix library

In summary, Apache SIS provides its own matrix library for the following reasons:

This library is provided in the org.apache.sis.matrix package of the sis-referencing module.