How DuckDB Spatial Handles Coordinate Systems
DuckDB Spatial decouples geometric primitives from coordinate reference system (CRS) metadata to optimize columnar execution. Unlike PostGIS, which embeds a 4-byte SRID in every geometry, DuckDB’s GEOMETRY type stores raw WKB coordinates with no inline projection tag. The CRS for any given geometry column is an application-level concern: it must be tracked in a separate column, in table comments, or in a metadata side-table. When no CRS is tracked, spatial operations silently assume whatever coordinate frame the data happens to be in — a frequent source of silent misalignment in downstream joins. As documented in the DuckDB Spatial Architecture & Fundamentals specification, this design reduces serialization overhead but mandates explicit metadata propagation during query planning.
Ingestion Pathways & CRS Metadata Handling
GeoJSON: GeoJSON adheres to RFC 7946, which mandates WGS 84 (EPSG:4326) with [longitude, latitude] axis order. DuckDB’s st_read() and st_geomfromgeojson() parse coordinates directly; no CRS tag survives into the GEOMETRY column because DuckDB stores none. Record the known CRS in application logic.
GeoParquet: The extension reads the geo metadata block in the Parquet footer, which contains either a PROJJSON object or an EPSG code. DuckDB uses this metadata to inform ST_Transform calls but does not embed the SRID into the geometry values themselves. If the footer contains malformed PROJJSON, the parser may fall back silently; always validate the source file with SELECT * FROM parquet_metadata('file.parquet'); before ingestion.
-- Inspect Parquet file metadata to confirm embedded CRS
SELECT key, value
FROM parquet_kv_metadata('s3://bucket/parcels.parquet')
WHERE key = 'geo';
Transformation Pipeline & Memory/IO Boundaries
Coordinate transformations execute via the embedded PROJ library. ST_Transform(geom, 'EPSG:source', 'EPSG:target') materializes intermediate coordinate arrays in contiguous memory. For datasets exceeding 10M rows, transformation memory scales with vertex count and can breach the default memory_limit (80% of system RAM). To prevent OOM termination, enforce an explicit limit and batch transformations on large tables:
SET memory_limit = '12GB';
SET temp_directory = '/mnt/nvme/duckdb_spill';
SET threads = 4;
-- Transform in batches via partitioned COPY if the full table is too large
COPY (
SELECT id, ST_Transform(geom, 'EPSG:32633', 'EPSG:4326') AS geom_4326
FROM large_table
WHERE id BETWEEN 1 AND 5000000
) TO '/output/batch_1.parquet' (FORMAT PARQUET);
PROJ grid files (datum shift grids such as us_noaa_conus.tif) are cached in ~/.duckdb/proj on first use. In air-gapped environments, pre-stage the proj-data package and set PROJ_DATA to the correct path.
Spatial Indexing Internals & CRS-Agnostic Execution
DuckDB’s R-tree index (created with CREATE INDEX ... USING RTREE) is constructed over raw bounding box coordinates in whatever CRS the geometry column holds. The index is strictly CRS-agnostic — it does not account for projection distortion, geodesic curvature, or unit normalization. Queries like ST_DWithin or ST_Intersects operate on untransformed coordinate space, meaning cross-CRS joins produce false negatives or catastrophic bounding box mismatches. Always normalize geometries to a common CRS before index construction and before spatial joins.
CRS Drift Troubleshooting & Incident Resolution
When spatial joins yield zero matches despite overlapping geometries, verify CRS alignment using diagnostic queries.
-- DuckDB exposes no per-column CRS metadata; check data types instead
SELECT column_name, data_type
FROM duckdb_columns()
WHERE table_name = 'target_table';
-- If a layer's CRS is only known out-of-band, transform from the assumed source CRS
-- into the working CRS and materialize the result.
CREATE OR REPLACE TABLE target_norm AS
SELECT * EXCLUDE (geometry),
ST_Transform(geometry, 'EPSG:4326', 'EPSG:3857') AS geometry
FROM target_table;
-- Validate bounding box consistency post-transformation
SELECT ST_XMin(bbox), ST_YMin(bbox), ST_XMax(bbox), ST_YMax(bbox)
FROM (SELECT ST_Extent(geometry) AS bbox FROM target_norm) sub;
DuckDB has no ST_SRID function (there is no inline SRID to read). Detect likely CRS mismatches by checking coordinate ranges: geographic data (EPSG:4326) must fall within ±180/±90. Reject obviously mis-projected rows at ingestion:
-- Guard: reject rows whose lon/lat fall outside valid geographic bounds
DELETE FROM target_table
WHERE NOT (ST_XMin(geometry) BETWEEN -180 AND 180
AND ST_YMin(geometry) BETWEEN -90 AND 90);
For unknown CRS, apply a manual affine correction via ST_Affine once the transform parameters are known, or use ST_Transform with explicit source/target strings once the source CRS is identified.
Enterprise Deployment & Access Control
In multi-tenant environments, isolate spatial workloads using dedicated DuckDB instances or separate database files. DuckDB has no GRANT/row-level security; expose curated views and attach databases read-only to restrict access:
-- Expose only the transformed, validated geometry to downstream consumers
CREATE OR REPLACE VIEW analytics.parcels_4326 AS
SELECT id, geom_4326
FROM staging_transformed
WHERE ST_XMin(geom_4326) BETWEEN -180 AND 180
AND ST_YMin(geom_4326) BETWEEN -90 AND 90;
-- Share via read-only attachment
ATTACH 'analytics.duckdb' AS analytics (READ_ONLY);
For high-throughput ingestion, validate GeoParquet geo metadata before loading to catch PROJJSON compliance failures before they reach the execution engine. Enforce projection consistency with coordinate-range filters inside views rather than per-row CRS checks, which are expensive at scale.