Understanding the GDAL vector data model
2023-06-16
GDAL has an overview of its vector data model, which is generally pretty good. However, grasping it from prose was still kind of hard for me, so I quickly decided to create some diagrams to help me understand everything.
The basic data model is summarized in the following diagram:
Very quickly: A file (e.g., a Shapefile) is encoded as a Dataset. A Dataset contains some Layers, which themselves contain Features. A Feature is something concrete on the map; thus, it has a geometry. It also has a user-defined field list for all kinds of metadata a feature might have.
Here I describe these concepts again in slightly more detail:
Geometry: A concrete geometry, e.g., a Point or a MultiLine. It has a spatial reference, but usually, this is shared among the whole dataset.
Feature: Contains exactly one geometry. It exists as a first abstraction to a geometry, s.t. it can now include other non-geographic fields. For example, a feature that encodes a tree will have the location of that tree stored as a Point in its geometry and the tree type in the fields of the feature.
Layer: A layer is a list of related features. According to ESRI, one can think of a layer as a legend item on a paper map. For example, all points showing cities will go into a 'cities' layer. A layer has a name and, just like a Dataset, it can have metadata (this depends on the actual format).
Dataset: A dataset represents, typically, a file. (Though it can also represent data in, for example, a PostGIS database) A dataset has a name (usually the filename) and maybe metadata (this depends on the file format). Finally, a dataset contains zero or more layers.
Concretely, some of these classes are abstract, while others aren't. Dataset, for example, is implemented by its actual Driver, which is file-format-specific.
Geometry is also abstract. It is implemented by many different geometries. These are found in the following picture which I found in the QGIS documentation: