This vignette shows how to use ddbs_join() to perform
fast spatial join operations on large data sets with three different
approaches:
1 In-memory: pass sf objects and get an
sf result (DuckDB runs under the hood, no persistent
DataBase). 2. Connected: pass table names stored in an
existing DuckDB connection and get an sf result. 3.
Write-to-DB: same as (2), but write the result to a new
DuckDB table.
Let’s see a few examples. First, let’s load a few libraries and our sample data:
library(duckspatial)
# library(mapview)
library(sf)
# polygons
countries_sf <- sf::st_read(
system.file("spatial/countries.geojson", package = "duckspatial"),
quiet = TRUE
)
# random points
set.seed(42)
n <- 10000
points_sf <- data.frame(
id = 1:n,
x = runif(n, min = -180, max = 180),
y = runif(n, min = -90, max = 90)
) |>
sf::st_as_sf(coords = c("x","y"), crs = 4326)sf, return sfThe simplest way to perform fast spatial join. You simply pass two
sf objects, and ddbs_join() spins up a
temporary DuckDB, runs the join, and returns an sf.
sfIn the second and third approaches, we make use of a connection to an
existing DuckDB database. So let’s create a fresh DuckDB connection
using the ddbs_create_conn() function, which automatically
install and load DuckDB spatial extension to the connection.
Now, in this second approach you need first to write your layers to DuckDB, and perform the spatial join by referencing their table names. Like this:
# write data to DuckDB
ddbs_write_vector(conn, points_sf, "points", overwrite = TRUE)
ddbs_write_vector(conn, countries_sf, "countries", overwrite = TRUE)
# spatial join inside DuckDB; result returned as sf
out_sf2 <- ddbs_join(
conn,
x = "points",
y = "countries",
join = "within"
)The output of approaches 1 and 2 is an sf object loaded
to your memory. In this third approach, ddbs_join() writes
a new table in the DuckDB database. You simply need to the
name of the new table.
ddbs_join(
conn = conn,
x = "points",
y = "countries",
join = "within",
name = "points_in_countries",
overwrite = TRUE
)
# use the result in SQL (or read back as sf later)
# DBI::dbReadTable(conn, "points_in_countries") |>
# sf::st_as_sf(wkt = 'geometry') |>
# head()A spatial predicate is really just a function that evaluates some
spatial relation between two geometries and returns true or false, e.g.,
“does a contain b” or “is a within distance x of b”. The
join argument accepts the spatial predicates:
"ST_Intersects": Whether a intersects b"ST_Contains": Whether a contains b"ST_ContainsProperly": Whether a contains b without b
touching a’s boundary"ST_Within": Whether a is within b"ST_Overlaps": Whether a overlaps b"ST_Touches": Whether a touches b"ST_Equals": Whether a is equal to b"ST_Crosses": Whether a crosses b"ST_Covers": Whether a covers b"ST_CoveredBy": Whether a is covered by b"ST_DWithin": x) Whether a is within distance x of
b