The {duckspatial} package provides fast and memory-efficient functions to analyze and manipulate large spatial vector datasets in R. It allows R users to benefit directly from the analytical power of DuckDB and its spatial extension, while remaining fully compatible with R’s spatial ecosystem, especially {sf}.
At its core, {duckspatial} bridges two worlds:
This design makes {duckspatial} especially well suited for:
You can install {duckspatial} directly from CRAN with:
Or you can install the development version from GitHub with:
A central design principle of {duckspatial} is that the same spatial operation can be used in different ways, depending on how your data is stored and how you want to manage memory and performance. Most functions in {duckspatial} support four complementary workflows:
sf → Output sfsf → Output DuckDB tablesfLet’s see a few examples to illustrate these workflows with a few sample data sets.
library(duckspatial)
library(sf)
# polygons
countries_sf <- sf::st_read(
system.file("spatial/countries.geojson", package = "duckspatial"),
quiet = TRUE
)
# create random points
set.seed(42)
n <- 10000
points_sf <- data.frame(
id = 1:n,
x = runif(n, min = -180, max = 180),
y = runif(n, min = -90, max = 90)
) |>
sf::st_as_sf(coords = c("x","y"), crs = 4326)sf input → sf outputThe simplest way to perform fast spatial operations. Here you pass
sf objects as inputs, and under the hood {duckspatial}:
In this example, we use ddbs_join() (which is equivalent
to sf::st_join) to determine which country is intersected
by each point.
result_sf <- ddbs_join(
x = points_sf,
y = countries_sf,
join = "intersects"
)
head(result_sf)
#> Simple feature collection with 6 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -72.75607 ymin: -50.15994 xmax: -67.79479 ymax: -43.01591
#> Geodetic CRS: WGS 84
#> id CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date
#> 1 708 AR Argentina ARG Argentina AR 2021-01-01
#> 2 3041 AR Argentina ARG Argentina AR 2021-01-01
#> 3 9309 AR Argentina ARG Argentina AR 2021-01-01
#> 4 2446 AR Argentina ARG Argentina AR 2021-01-01
#> 5 8456 AR Argentina ARG Argentina AR 2021-01-01
#> 6 2707 AR Argentina ARG Argentina AR 2021-01-01
#> geometry
#> 1 POINT (-72.75607 -50.15994)
#> 2 POINT (-68.77194 -48.12303)
#> 3 POINT (-70.82253 -46.23146)
#> 4 POINT (-71.6444 -44.25323)
#> 5 POINT (-71.18679 -44.60882)
#> 6 POINT (-67.79479 -43.01591)The next workflows use a DuckDB connection, which makes these
workflows much more efficient for working with large spatial data sets
in general. To create a DuckDB connection, we use the
ddbs_create_conn() function, which automatically creates
the database connection and installs / loads DuckDB’s spatial extension
in a single call.
sf input → DuckDB tableThis workflow is ideal when you start in R, but want to persist
results efficiently in the database without load the results to memory.
You pass sf objects as input, and {duckspatial} writes the
output directly to DuckDB.
The only difference is that here you also pass the name
of the table that should be written with the output to DuckDB and the
database conn where the table should saved.
ddbs_join(
conn = conn,
x = points_sf,
y = countries_sf,
join = "intersects",
name = "points_in_countries_tbl"
)And if you want to fetch the table to memory, the
ddbs_read_vector() allows you to read a table and return it
as a sf object.
tbl <- ddbs_read_vector(
conn = conn,
name = "points_in_countries_tbl"
)
head(tbl)
#> Simple feature collection with 6 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -72.75607 ymin: -50.15994 xmax: -67.79479 ymax: -43.01591
#> Geodetic CRS: WGS 84
#> id CNTR_ID NAME_ENGL ISO3_CODE CNTR_NAME FID date
#> 1 708 AR Argentina ARG Argentina AR 2021-01-01
#> 2 3041 AR Argentina ARG Argentina AR 2021-01-01
#> 3 9309 AR Argentina ARG Argentina AR 2021-01-01
#> 4 2446 AR Argentina ARG Argentina AR 2021-01-01
#> 5 8456 AR Argentina ARG Argentina AR 2021-01-01
#> 6 2707 AR Argentina ARG Argentina AR 2021-01-01
#> geometry
#> 1 POINT (-72.75607 -50.15994)
#> 2 POINT (-68.77194 -48.12303)
#> 3 POINT (-70.82253 -46.23146)
#> 4 POINT (-71.6444 -44.25323)
#> 5 POINT (-71.18679 -44.60882)
#> 6 POINT (-67.79479 -43.01591)sf outputIn this workflow, your spatial data lives inside DuckDB as tables but
you want to return the output of sf objects to memory.
You can easily write sf objects as tables to duckdb with
the ddbs_write_vector() function by passing the
sf data and indicating the name of the table
to be written in the database.
# write `sf` objects as tables to duckdb
duckspatial::ddbs_write_vector(
conn = conn,
data = countries_sf,
name = "countries"
)
duckspatial::ddbs_write_vector(
conn = conn,
data = points_sf,
name = "points"
)To perform a spatial operation, you pass the table names and you get
an sf object back.
This is the fastest and most scalable workflow. The entire computation happens inside DuckDB, and the result is written to a new database table.
ddbs_join(
conn = conn,
x = "points",
y = "countries",
join = "intersects",
name = "points_in_countries_tbl",
overwrite = TRUE
)
# and read the table to memory as sf
# tbl <- ddbs_read_vector(
# conn = conn,
# name = "points_in_countries_tbl"
# )