The BigDataPE package provides a simple and secure interface for accessing datasets published by the Big Data PE platform from the Government of the State of Pernambuco, Brazil.
In this vignette we walk through the full package workflow using the Dengue, Zika and Chikungunya dataset — a record of notified cases in health units (public and private) in Recife, shared by the State Health Department of Pernambuco (Secretaria Estadual de Saude de Pernambuco).
| Field | Value |
|---|---|
| Classification | Broad (public) |
| Source institution | State Health Department of Pernambuco |
| Ingestion type | SQL |
| Records | 1 009 |
| LGPD level | None |
| Last updated | 2024-05-24 |
Access note: Although this dataset is public, you must submit an access request on the Big Data PE platform and be connected to the PE Conectado network or a VPN to query the data through the API. Without this connection, requests will time out or return a service-unavailable error.
Load the package and the helper libraries used in the analysis:
The first step is to store the authentication token associated with the dataset. This token is provided by the Big Data PE platform after your access request is approved.
bdpe_store_token("dengue", "your-token-here")
#> ✔ Token stored in environment variable: `BigDataPE_dengue`You can verify that the token was stored correctly:
bdpe_fetch_data() is the most direct way to query the
API. Let’s fetch the first 100 records:
data <- bdpe_fetch_data("dengue", limit = 100, offset = 0)
glimpse(data)
#> Rows: 100
#> Columns: 126
#> $ nu_notificacao <chr> "3517726", "3613049", "3507055", ...
#> $ tp_notificacao <chr> "2", "2", "2", ...
#> $ co_cid <chr> "A90", "A90", "A90", ...
#> $ dt_notificacao <chr> "2020-07-07", "2020-06-27", "2020-01-03", ...
#> $ ds_semana_notificacao <chr> "202028", "202026", "202001", ...
#> $ notificacao_ano <chr> "2020", "2020", "2020", ...
#> $ co_municipio_notificacao <chr> "261160", "260790", "261160", ...
#> $ tp_sexo <chr> "M", "M", "M", ...
#> $ febre <chr> "1", "1", "1", ...
#> $ mialgia <chr> "1", "2", "1", ...
#> $ cefaleia <chr> "2", "1", "1", ...
#> $ ...The dataset contains 126 columns with detailed information on each notification, including demographics, symptoms, laboratory results, warning signs, severity indicators and case outcome.
The API supports additional filters through the query
parameter. For example, to fetch only notifications from the year
2020:
dengue_2020 <- bdpe_fetch_data(
"dengue",
limit = 50,
offset = 0,
query = list(notificacao_ano = "2020")
)
nrow(dengue_2020)
#> [1] 50You can also filter by municipality of residence. The IBGE code for
Recife is 261160:
dengue_recife <- bdpe_fetch_data(
"dengue",
limit = 100,
offset = 0,
query = list(co_municipio_residencia = "261160")
)
nrow(dengue_recife)
#> [1] 100Filters can be combined. To fetch female cases in Recife:
When the data volume is large, use bdpe_fetch_chunks()
which paginates automatically. To fetch all 1,009 records in blocks of
500:
With the data in hand we can run quick exploratory summaries.
Symptom fields use "1" for yes and "2" for
no. We can compute the proportion of each symptom:
symptoms <- c("febre", "mialgia", "cefaleia", "exantema", "vomito",
"nausea", "dor_costas", "conjutivite", "artrite",
"artralgia", "dor_retro")
all_data |>
summarise(across(all_of(symptoms), ~ mean(.x == "1", na.rm = TRUE))) |>
tidyr::pivot_longer(everything(),
names_to = "symptom",
values_to = "proportion") |>
arrange(desc(proportion))
#> # A tibble: 11 x 2
#> symptom proportion
#> <chr> <dbl>
#> 1 febre 0.914
#> 2 cefaleia 0.531
#> 3 mialgia 0.512
#> 4 artralgia 0.194
#> 5 exantema 0.181
#> 6 dor_costas 0.155
#> 7 vomito 0.150
#> 8 nausea 0.139
#> 9 dor_retro 0.125
#> 10 conjutivite 0.053
#> 11 artrite 0.046all_data |>
count(st_ocorreu_hospitalizacao) |>
mutate(description = case_match(
st_ocorreu_hospitalizacao,
"1" ~ "Yes",
"2" ~ "No",
"9" ~ "Unknown",
"" ~ "Not reported"
))
#> # A tibble: 4 x 3
#> st_ocorreu_hospitalizacao n description
#> <chr> <int> <chr>
#> 1 330 Not reported
#> 2 1 51 Yes
#> 3 2 565 No
#> 4 9 63 Unknown| Variable | Description |
|---|---|
nu_notificacao |
Notification number |
co_cid |
ICD-10 code (A90 = Dengue) |
dt_notificacao |
Notification date |
notificacao_ano |
Notification year |
co_municipio_notificacao |
IBGE code of the notifying municipality |
dt_diagnostico_sintoma |
Date of first symptoms |
tp_sexo |
Sex (M, F, I = Indeterminate) |
tp_raca_cor |
Race/colour (1–5, 9 = Unknown) |
no_bairro_residencia |
Neighbourhood of residence |
febre … dor_retro |
Symptoms (1 = Yes, 2 = No) |
diabetes … auto_imune |
Pre-existing conditions (1 = Yes, 2 = No) |
st_ocorreu_hospitalizacao |
Hospitalisation occurred (1/2/9) |
tp_classificacao_final |
Final case classification |
tp_evolucao_caso |
Outcome (1 = Recovery, 2 = Death, …) |