DSIR

library(DSIR)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)

DSIR is a small R package for global health data work. It consists of WHO Member State metadata, lightweight clients for the GHO and UN SDG APIs, and reusable WHO-style ggplot2 and flextable themes. DSIR is designed for health professionals, WHO staff, and global health researchers — the kind of users who do the same routine tasks every day.

This vignette walks through the typical workflow: looking up countries, fetching data from GHO and SDG, cleaning the raw response, and producing publication-style charts and tables.

WHO Member State metadata

The who_countries tibble lists all 194 WHO Member States with their ISO3, ISO2, UN M49 codes, official names, short names, and WHO region. For Western Pacific countries, an extra column is_pic identifies the 14 Pacific Island Countries.

who_countries
#> # A tibble: 194 × 7
#>    iso3  iso2  m49_code name_official       name_short         who_region is_pic
#>    <chr> <chr> <chr>    <chr>               <chr>              <chr>      <lgl> 
#>  1 AFG   AF    004      Afghanistan         Afghanistan        EMR        FALSE 
#>  2 ALB   AL    008      Albania             Albania            EUR        FALSE 
#>  3 DZA   DZ    012      Algeria             Algeria            AFR        FALSE 
#>  4 AND   AD    020      Andorra             Andorra            EUR        FALSE 
#>  5 AGO   AO    024      Angola              Angola             AFR        FALSE 
#>  6 ATG   AG    028      Antigua and Barbuda Antigua and Barbu… AMR        FALSE 
#>  7 ARG   AR    032      Argentina           Argentina          AMR        FALSE 
#>  8 ARM   AM    051      Armenia             Armenia            EUR        FALSE 
#>  9 AUS   AU    036      Australia           Australia          WPR        FALSE 
#> 10 AUT   AT    040      Austria             Austria            EUR        FALSE 
#> # ℹ 184 more rows

For convenience, DSIR offers pre-defined vectors of ISO3 codes for each WHO region.

wpro_cty
#>  [1] "AUS" "BRN" "CHN" "COK" "FJI" "FSM" "IDN" "JPN" "KHM" "KIR" "KOR" "LAO"
#> [13] "MHL" "MNG" "MYS" "NIU" "NRU" "NZL" "PHL" "PLW" "PNG" "SGP" "SLB" "TON"
#> [25] "TUV" "VNM" "VUT" "WSM"
length(wpro_cty)   # 28 Member States in WPR (since May 2025)
#> [1] 28

The is_pic flag is useful because Pacific Island Countries are often analysed as a group, given their distinct demographic and geographic profiles.

who_countries |>
  filter(is_pic) |>
  select(iso3, name_short)
#> # A tibble: 14 × 2
#>    iso3  name_short      
#>    <chr> <chr>           
#>  1 COK   Cook Islands    
#>  2 FJI   Fiji            
#>  3 KIR   Kiribati        
#>  4 MHL   Marshall Islands
#>  5 FSM   Micronesia      
#>  6 NRU   Nauru           
#>  7 NIU   Niue            
#>  8 PLW   Palau           
#>  9 PNG   Papua New Guinea
#> 10 WSM   Samoa           
#> 11 SLB   Solomon Islands 
#> 12 TON   Tonga           
#> 13 TUV   Tuvalu          
#> 14 VUT   Vanuatu

When you have a vector of ISO3 codes and need to know which WHO region each belongs to, iso3_to_region() provides the lookup. It is vectorised and returns NA for codes that do not match a WHO Member State.

iso3_to_region(c("PHL", "FRA", "ZAF", "USA", "XYZ"))
#> [1] "WPR" "EUR" "AFR" "AMR" NA
# "WPR" "EUR" "AFR" "AMR" NA

This is convenient when joining external datasets (which often arrive keyed only by ISO3) to the WHO regional structure.

The companion helper iso3_to_m49() converts ISO3 codes to UN M49 numeric codes — useful because the WHO GHO API is keyed by ISO3 ("PHL") while the UN SDG API is keyed by M49 ("608"). The M49 values are returned as three-character zero-padded strings, exactly as stored in who_countries$m49_code.

iso3_to_m49(c("PHL", "FRA", "JPN"))
#> [1] "608" "250" "392"
# "608" "250" "392"

# Case-insensitive; non-Member areas return NA
iso3_to_m49(c("phl", "PRI"))
#> [1] "608" NA
# "608" NA

In practice you can usually skip the explicit conversion: sdg_data() and sdg_coverage() accept ISO3 codes for their area argument and do the lookup internally (see the SDG section below).

Checking availability before fetching

GHO has thousands of indicators, but any single indicator may not cover the countries or years you need. Before issuing a full download with gho_data(), three lightweight helpers let you ask the server what is available without transferring any observations.

gho_has_data() is a quick yes / no for a given indicator and filter — useful when screening a list of candidate indicators.

# Does WHO have life-expectancy data for France?
gho_has_data("WHOSIS_000001", area = "FRA")
#> Assuming `spatial_type` = "country" since `area` was given.
#> ℹ Pass `spatial_type` explicitly to silence this message.
#> Fetching:
#> <https://ghoapi.azureedge.net/api/WHOSIS_000001?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27FRA%27%29&$top=1&$select=Id>
#> [1] TRUE
# TRUE

# Bulk-screen several indicators at once
inds <- c("WHOSIS_000001", "NCDMORT3070", "MDG_0000000026")
vapply(inds, gho_has_data, logical(1), area = "PHL")
#> Assuming `spatial_type` = "country" since `area` was given.
#> ℹ Pass `spatial_type` explicitly to silence this message.
#> Fetching:
#> <https://ghoapi.azureedge.net/api/WHOSIS_000001?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27PHL%27%29&$top=1&$select=Id>
#> Assuming `spatial_type` = "country" since `area` was given.
#> ℹ Pass `spatial_type` explicitly to silence this message.
#> Fetching:
#> <https://ghoapi.azureedge.net/api/NCDMORT3070?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27PHL%27%29&$top=1&$select=Id>
#> Assuming `spatial_type` = "country" since `area` was given.
#> ℹ Pass `spatial_type` explicitly to silence this message.
#> Fetching:
#> <https://ghoapi.azureedge.net/api/MDG_0000000026?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27PHL%27%29&$top=1&$select=Id>
#>  WHOSIS_000001    NCDMORT3070 MDG_0000000026 
#>           TRUE           TRUE           TRUE

It returns TRUE, FALSE, or NA (for request failures, including a non-existent indicator code — GHO returns HTTP 404 in that case).

gho_count() returns the number of rows the same filter would produce, which is useful for sizing a download.

gho_count("WHOSIS_000001", area = wpro_cty)
#> Assuming `spatial_type` = "country" since `area` was given.
#> ℹ Pass `spatial_type` explicitly to silence this message.
#> Fetching:
#> <https://ghoapi.azureedge.net/api/WHOSIS_000001?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27AUS%27%2C%27BRN%27%2C%27CHN%27%2C%27COK%27%2C%27FJI%27%2C%27FSM%27%2C%27IDN%27%2C%27JPN%27%2C%27KHM%27%2C%27KIR%27%2C%27KOR%27%2C%27LAO%27%2C%27MHL%27%2C%27MNG%27%2C%27MYS%27%2C%27NIU%27%2C%27NRU%27%2C%27NZL%27%2C%27PHL%27%2C%27PLW%27%2C%27PNG%27%2C%27SGP%27%2C%27SLB%27%2C%27TON%27%2C%27TUV%27%2C%27VNM%27%2C%27VUT%27%2C%27WSM%27%29&$top=0&$count=true>
#> [1] 1452

gho_coverage() summarises year coverage and observation counts per country. The payload is small because only SpatialDim and TimeDim are requested from the server.

gho_coverage("WHOSIS_000001", area = c("FRA", "DEU", "JPN"))
#> Fetching:
#> <https://ghoapi.azureedge.net/api/WHOSIS_000001?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27FRA%27%2C%27DEU%27%2C%27JPN%27%29&$select=SpatialDim,TimeDim>
#> # A tibble: 3 × 4
#>   location year_min year_max n_obs
#>   <chr>       <int>    <int> <int>
#> 1 DEU          2000     2021    66
#> 2 FRA          2000     2021    66
#> 3 JPN          2000     2021    66
#>   location year_min year_max n_obs
#> 1 DEU          2000     2021    66
#> 2 FRA          2000     2021    66
#> 3 JPN          2000     2021    66

Fetching indicator data from GHO

To fetch indicators from GHO, the typical workflow is three steps: search for the indicator code, fetch the data, then clean the response. The area argument accepts a long ISO3 vector, so a whole region can be pulled in one call.

Step 1: Search for an indicator

gho_indicators("UHC") |> head()
#> Fetching:
#> <https://ghoapi.azureedge.net/api/Indicator?$filter=contains%28tolower%28IndicatorName%29%2C%27uhc%27%29>
#> # A tibble: 6 × 3
#>   IndicatorCode       IndicatorName                                     Language
#>   <chr>               <chr>                                             <chr>   
#> 1 HSS_UHCLEGISLATION  Countries that have passed legislation on Univer… EN      
#> 2 GOE_Q002            strategy refers to the use of ICT to support UHC  EN      
#> 3 GOE_Q004            National eHealth strategy refers to objectives o… EN      
#> 4 GOE_Q070            Strategy includes objectives as to how telehealt… EN      
#> 5 UHC_DATA_AVAIL_CODE Data availability for UHC index of essential ser… EN      
#> 6 UHC_SCI_CAPACITY    UHC Service Coverage sub-index on service capaci… EN

Pick an IndicatorCode from the result — this is the value you pass to gho_data() in the next step.

Step 2: Fetch the data

uhc <- gho_data(
  indicator    = "UHC_INDEX_REPORTED",
  spatial_type = "country",
  area         = wpro_cty,
  year_from    = 2015
)
#> Fetching:
#> <https://ghoapi.azureedge.net/api/UHC_INDEX_REPORTED?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27AUS%27%2C%27BRN%27%2C%27CHN%27%2C%27COK%27%2C%27FJI%27%2C%27FSM%27%2C%27IDN%27%2C%27JPN%27%2C%27KHM%27%2C%27KIR%27%2C%27KOR%27%2C%27LAO%27%2C%27MHL%27%2C%27MNG%27%2C%27MYS%27%2C%27NIU%27%2C%27NRU%27%2C%27NZL%27%2C%27PHL%27%2C%27PLW%27%2C%27PNG%27%2C%27SGP%27%2C%27SLB%27%2C%27TON%27%2C%27TUV%27%2C%27VNM%27%2C%27VUT%27%2C%27WSM%27%29%20and%20TimeDim%20ge%202015>

uhc |> glimpse()
#> Rows: 252
#> Columns: 25
#> $ Id                 <int> 9203621, 9208141, 9222972, 9225884, 9233499, 932452…
#> $ IndicatorCode      <chr> "UHC_INDEX_REPORTED", "UHC_INDEX_REPORTED", "UHC_IN…
#> $ SpatialDimType     <chr> "COUNTRY", "COUNTRY", "COUNTRY", "COUNTRY", "COUNTR…
#> $ SpatialDim         <chr> "CHN", "SLB", "PLW", "FSM", "FSM", "KOR", "BRN", "W…
#> $ TimeDimType        <chr> "YEAR", "YEAR", "YEAR", "YEAR", "YEAR", "YEAR", "YE…
#> $ ParentLocationCode <chr> "WPR", "WPR", "WPR", "WPR", "WPR", "WPR", "WPR", "W…
#> $ ParentLocation     <chr> "Western Pacific", "Western Pacific", "Western Paci…
#> $ Dim1Type           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ TimeDim            <int> 2016, 2021, 2018, 2015, 2022, 2022, 2018, 2023, 201…
#> $ Dim1               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Dim2Type           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Dim2               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Dim3Type           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Dim3               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ DataSourceDimType  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ DataSourceDim      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Value              <chr> "80", "46", "74", "64", "65", "88", "83", "62", "70…
#> $ NumericValue       <dbl> 80, 46, 74, 64, 65, 88, 83, 62, 70, 89, 65, 84, 66,…
#> $ Low                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ High               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Comments           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ Date               <chr> "2025-12-05T11:39:13.277+01:00", "2025-12-05T11:39:…
#> $ TimeDimensionValue <chr> "2016", "2021", "2018", "2015", "2022", "2022", "20…
#> $ TimeDimensionBegin <chr> "2016-01-01T00:00:00+01:00", "2021-01-01T00:00:00+0…
#> $ TimeDimensionEnd   <chr> "2016-12-31T00:00:00+01:00", "2021-12-31T00:00:00+0…

Note that area accepts long ISO3 vectors — here we fetch all 28 WPR countries in one call.

Step 3: Clean the raw response

gho_clean() produces the unified DSIR cleaned-indicator schema — the same 15-column shape as sdg_clean(). Columns include source ("gho"), id, indicator, location, iso3, location_name (empty for GHO), year, value, value_num, low, high, series (empty for GHO), and the three optional GHO dimensions dim1–dim3. Columns absent from the raw response are filled with typed NA.

uhc_clean <- gho_clean(uhc)
#> Fetching: <https://ghoapi.azureedge.net/api/Indicator>
uhc_clean
#> # A tibble: 252 × 15
#>    source id        indicator location iso3  location_name  year value value_num
#>    <chr>  <chr>     <chr>     <chr>    <chr> <chr>         <int> <chr>     <dbl>
#>  1 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2015 89           89
#>  2 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2016 89           89
#>  3 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2017 89           89
#>  4 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2018 89           89
#>  5 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2019 89           89
#>  6 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2020 89           89
#>  7 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2021 89           89
#>  8 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2022 89           89
#>  9 gho    UHC_INDE… UHC Serv… AUS      AUS   Australia      2023 89           89
#> 10 gho    UHC_INDE… UHC Serv… BRN      BRN   Brunei Darus…  2015 84           84
#> # ℹ 242 more rows
#> # ℹ 6 more variables: low <dbl>, high <dbl>, series <chr>, dim1 <chr>,
#> #   dim2 <chr>, dim3 <chr>

Aggregating indicators with geomean()

Some health indicators are constructed as the geometric mean of component values rather than the arithmetic mean. The UHC Service Coverage Index, for example, aggregates 14 tracer indicators using nested geometric means. DSIR provides geomean() for this:

# Unweighted geometric mean
geomean(c(0.6, 0.8, 0.95))
#> [1] 0.7697002
#> 0.7720589

# With optional weights — useful when tracers have different 
# methodological importance
geomean(c(0.6, 0.8, 0.95), w = c(2, 1, 1))
#> [1] 0.7232343

geomean() handles missing values, zeros, and negative values sensibly — see ?geomean for details. It is a small helper, but it removes a common source of bugs when re-implementing index calculations from indicator components.

Plotting with theme_dsi() and theme_dsi_facet()

DSIR provides two paired ggplot2 themes tuned for WHO-style charts — clean panels, modest grids, and a consistent accent colour. Use them as drop-in replacements for theme_minimal() and theme_bw() respectively whenever a chart is heading into a WHO deliverable.

The rule of thumb is simple: single-panel plots use theme_dsi(), faceted plots use theme_dsi_facet(). The two share typography, title treatment, and legend handling, but differ in how they frame the data — the facet variant adds panel borders, light strip backgrounds, and breathing room between panels, all of which would look heavy on a single-panel chart.

Single panel: `theme_dsi()`

theme_dsi() keeps the chart chrome minimal — a half-frame axis, light grid lines, and the WHO-blue accent on the axis line. By default the grid runs in both directions; pass grid = "y" for the minimalist horizontal-only look.

uhc_clean |>
  filter(iso3 %in% c("AUS", "CHN", "PHL", "FJI")) |>
  left_join(who_countries, by = "iso3") |>
  ggplot(aes(x = year, y = value_num, group = iso3, color = name_short)) +
  geom_line(linewidth = .8) +
  geom_point(size = 1.8) +
  theme_dsi() +
  labs(
    title    = "UHC Service Coverage Index, selected WPR Member States",
    subtitle = "2015 onwards",
    x = NULL, y = "SCI", color = NULL
  )

For bar charts, pair theme_dsi() with scale_y_dsi_col() (or scale_x_dsi_col() when value is mapped to x) — these are thin wrappers around scale_*_continuous() that remove the lower axis expansion, so columns sit flush with the axis instead of floating above it.

uhc_clean |>
  filter(year == max(year)) |>
  left_join(who_countries, by = "iso3") |>
  arrange(desc(value_num)) |>
  head(10) |>
  ggplot(aes(reorder(name_short, value_num), value_num)) +
  geom_col(fill = "#0093D5") +
  coord_flip() +
  scale_y_dsi_col() +
  theme_dsi(grid = "x") +
  labs(
    title    = "UHC Service Coverage Index, top 10 WPR Member States",
    subtitle = "Latest available year",
    x = NULL, y = "SCI"
  )

Faceted: `theme_dsi_facet()`

When the same chart is split across many small panels, the half-frame look becomes visually noisy — the accent-blue axis line repeats across every facet. theme_dsi_facet() switches to a full panel border, adds a light grey strip background to clearly mark each facet’s label, and introduces panel spacing so adjacent panels don’t run together.

uhc_clean |>
  left_join(who_countries, by = "iso3") |>
  filter(is_pic) |>
  ggplot(aes(x = year, y = value_num)) +
  geom_line(color = "#0093D5", linewidth = 0.8) +
  geom_point(color = "#0093D5", size = 1.5) +
  facet_wrap(~ name_short, ncol = 4) +
  theme_dsi_facet() +
  labs(
    title    = "UHC Service Coverage Index, Pacific Island Countries",
    subtitle = "Each panel shows one country's trajectory",
    x = NULL, y = "SCI"
  )

The strip_fill argument lets you change the strip background colour for emphasis — for example, a light-blue tone derived from the WHO accent for a deliverable where the strips themselves carry meaning:

uhc_clean |>
  left_join(who_countries, by = "iso3") |>
  filter(is_pic) |>
  ggplot(aes(x = year, y = value_num)) +
  geom_line(color = "#0093D5", linewidth = 0.8) +
  facet_wrap(~ name_short, ncol = 4) +
  theme_dsi_facet(strip_fill = "#E5F4FB") +
  labs(title = "UHC SCI, PIC — with custom strip colour",
       x = NULL, y = "SCI")

Tables with dsi_flextable_defaults()

dsi_flextable_defaults() sets WHO-style defaults for flextable globally — booktabs theme, bold headers, modest padding. Call it once near the top of your report and every subsequent flextable() picks up the formatting.

library(flextable)
dsi_flextable_defaults(font_family = "Geogria")

uhc_clean |>
  filter(year == max(year)) |>
  left_join(who_countries, by = "iso3") |>
  select(name_short, value_num) |>
  arrange(desc(value_num)) |>
  flextable() |>
  set_table_properties("autofit", width = .6) %>%
  set_caption("UHC SCI in WPR, latest year")

UHC SCI in WPR, latest year
name_short	value_num
Australia	89
New Zealand	89
Republic of Korea	88
Singapore	88
Japan	86
China	85
Brunei Darussalam	84
Malaysia	80
Cook Islands	75
Palau	75
Tonga	71
Viet Nam	71
Mongolia	70
Fiji	69
Philippines	69
Indonesia	67
Niue	67
Marshall Islands	66
Micronesia	65
Tuvalu	65
Lao PDR	64
Cambodia	62
Nauru	62
Samoa	62
Vanuatu	52
Kiribati	51
Solomon Islands	47
Papua New Guinea	32

Working with SDG indicators

sdg_data() and sdg_clean() follow the same fetch-then-tidy pattern as their GHO counterparts. The main differences are that indicator codes use the dotted SDG format (e.g. "3.4.1") and that value, low, and high are kept as character — the SDG API returns non-numeric entries ("<0.1", aggregate notes) for some rows, so coerce with as.numeric() only when you are ready to drop them.

sdg_indicators() accepts an optional search argument with the same behaviour as gho_indicators() — multiple keywords are AND-ed together and matched case-insensitively against the indicator description. The filter runs client-side because the UN SDG indicator list is short (~250 rows) and the endpoint is not OData.

# All indicators that mention both mortality and cancer
sdg_indicators("mortality cancer")
#> Fetching:
#> <https://unstats.un.org/sdgs/UNSDGAPIV5/v1/sdg/Indicator/List>
#> # A tibble: 1 × 7
#>   goal  target code  description                              tier  uri   series
#>   <chr> <chr>  <chr> <chr>                                    <chr> <chr> <list>
#> 1 3     3.4    3.4.1 Mortality rate attributed to cardiovasc… 1     /v1/… <df>

# Same as above, but with explicit terms (allows whitespace inside a term)
sdg_indicators(c("maternal", "mortality"))
#> Fetching:
#> <https://unstats.un.org/sdgs/UNSDGAPIV5/v1/sdg/Indicator/List>
#> # A tibble: 1 × 7
#>   goal  target code  description              tier  uri                   series
#>   <chr> <chr>  <chr> <chr>                    <chr> <chr>                 <list>
#> 1 3     3.1    3.1.1 Maternal mortality ratio 1     /v1/sdg/Indicator/3.… <df>

The area argument of sdg_data() and sdg_coverage() accepts either ISO3 codes (converted internally via iso3_to_m49()) or UN M49 numeric codes — so DSIR’s regional vectors (wpro_cty, afro_cty, etc.) work directly, the same way they do with the GHO client. Do not mix the two formats in a single call.

# ISO3 — regional vector passed straight through
sdg <- sdg_data(
  indicator = "3.4.1",
  area      = wpro_cty
)
#> Fetching:
#> <https://unstats.un.org/sdgs/UNSDGAPIV5/v1/sdg/Indicator/Data?indicator=3.4.1&pageSize=1000&areaCode=036&areaCode=096&areaCode=156&areaCode=184&areaCode=242&areaCode=583&areaCode=360&areaCode=392&areaCode=116&areaCode=296&areaCode=410&areaCode=418&areaCode=584&areaCode=496&areaCode=458&areaCode=570&areaCode=520&areaCode=554&areaCode=608&areaCode=585&areaCode=598&areaCode=702&areaCode=090&areaCode=776&areaCode=798&areaCode=704&areaCode=548&areaCode=882&page=1>
sdg |> glimpse()
#> Rows: 462
#> Columns: 21
#> $ goal              <list> "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "…
#> $ target            <list> "3.4", "3.4", "3.4", "3.4", "3.4", "3.4", "3.4", "3…
#> $ indicator         <list> "3.4.1", "3.4.1", "3.4.1", "3.4.1", "3.4.1", "3.4.1…
#> $ series            <chr> "SH_DTH_NCOM", "SH_DTH_NCOM", "SH_DTH_NCOM", "SH_DTH…
#> $ seriesDescription <chr> "Mortality rate attributed to cardiovascular disease…
#> $ seriesCount       <chr> "4326", "4326", "4326", "4326", "4326", "4326", "432…
#> $ geoAreaCode       <chr> "36", "36", "36", "36", "36", "36", "36", "36", "36"…
#> $ geoAreaName       <chr> "Australia", "Australia", "Australia", "Australia", …
#> $ timePeriodStart   <int> 2000, 2000, 2000, 2005, 2005, 2005, 2010, 2010, 2010…
#> $ value             <chr> "9.8", "13", "16", "11.4", "8.7", "14", "12.1", "9.9…
#> $ valueType         <chr> "Float", "Float", "Float", "Float", "Float", "Float"…
#> $ time_detail       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ timeCoverage      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ upperBound        <chr> "11.2", "14.7", "18", "12.9", "10", "15.8", "13.8", …
#> $ lowerBound        <chr> "8.4", "11.3", "14.1", "9.8", "7.5", "12.2", "10.4",…
#> $ basePeriod        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ source            <chr> "Global Health Estimates 2021: Deaths by Cause, Age,…
#> $ geoInfoUrl        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ footnotes         <list> "Data was previously disseminated with a different …
#> $ attributes        <df[,1]> <data.frame[26 x 1]>
#> $ dimensions        <df[,1]> <data.frame[26 x 1]>

# M49 also works (e.g. when copy-pasting codes from sdg_areas())
sdg_data("3.4.1", area = c("608", "250"))
#> Fetching:
#> <https://unstats.un.org/sdgs/UNSDGAPIV5/v1/sdg/Indicator/Data?indicator=3.4.1&pageSize=1000&areaCode=608&areaCode=250&page=1>
#> # A tibble: 42 × 21
#>    goal      target indicator series   seriesDescription seriesCount geoAreaCode
#>    <list>    <list> <list>    <chr>    <chr>             <chr>       <chr>      
#>  1 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  2 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  3 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  4 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  5 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  6 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  7 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  8 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#>  9 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#> 10 <chr [1]> <chr>  <chr [1]> SH_DTH_… Mortality rate a… 4326        250        
#> # ℹ 32 more rows
#> # ℹ 14 more variables: geoAreaName <chr>, timePeriodStart <int>, value <chr>,
#> #   valueType <chr>, time_detail <lgl>, timeCoverage <lgl>, upperBound <chr>,
#> #   lowerBound <chr>, basePeriod <lgl>, source <chr>, geoInfoUrl <lgl>,
#> #   footnotes <list>, attributes <df[,1]>, dimensions <df[,1]>

sdg_clean(sdg)
#> # A tibble: 462 × 15
#>    source id    indicator     location iso3  location_name  year value value_num
#>    <chr>  <chr> <chr>         <chr>    <chr> <chr>         <int> <chr>     <dbl>
#>  1 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2000 28.1       28.1
#>  2 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2000 25.4       25.4
#>  3 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2000 31.8       31.8
#>  4 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2005 22.5       22.5
#>  5 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2005 25.6       25.6
#>  6 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2005 29.7       29.7
#>  7 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2010 24.4       24.4
#>  8 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2010 20.9       20.9
#>  9 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2010 29.1       29.1
#> 10 sdg    3.4.1 Mortality ra… 116      KHM   Cambodia       2015 28.3       28.3
#> # ℹ 452 more rows
#> # ℹ 6 more variables: low <dbl>, high <dbl>, series <chr>, dim1 <chr>,
#> #   dim2 <chr>, dim3 <chr>

sdg_clean() produces the same 15-column schema as gho_clean(), so the two outputs can be combined directly with bind_indicators(). SDG rows populate the series column (and the iso3 column via [m49_to_iso3()] for Member States), while leaving the GHO-only dim1–dim3 columns as NA.

Combining GHO and SDG with bind_indicators()

When an analysis pulls indicators from both sources, bind_indicators() stacks any number of cleaned tibbles into one. The source column ("gho" / "sdg") lets you filter or facet by origin without remembering which frame came from which API.

# Two indicators on the same topic from different APIs:
#   GHO NCDMORT3070 (probability of premature NCD mortality)
#   SDG 3.4.1       (mortality rate from NCDs)
gho_ncd <- gho_data("NCDMORT3070", area = wpro_cty) |> gho_clean()
#> Assuming `spatial_type` = "country" since `area` was given.
#> ℹ Pass `spatial_type` explicitly to silence this message.
#> Fetching:
#> <https://ghoapi.azureedge.net/api/NCDMORT3070?$filter=SpatialDimType%20eq%20%27COUNTRY%27%20and%20SpatialDim%20in%20%28%27AUS%27%2C%27BRN%27%2C%27CHN%27%2C%27COK%27%2C%27FJI%27%2C%27FSM%27%2C%27IDN%27%2C%27JPN%27%2C%27KHM%27%2C%27KIR%27%2C%27KOR%27%2C%27LAO%27%2C%27MHL%27%2C%27MNG%27%2C%27MYS%27%2C%27NIU%27%2C%27NRU%27%2C%27NZL%27%2C%27PHL%27%2C%27PLW%27%2C%27PNG%27%2C%27SGP%27%2C%27SLB%27%2C%27TON%27%2C%27TUV%27%2C%27VNM%27%2C%27VUT%27%2C%27WSM%27%29>
sdg_ncd <- sdg_data("3.4.1",        area = wpro_cty) |> sdg_clean()
#> Fetching:
#> <https://unstats.un.org/sdgs/UNSDGAPIV5/v1/sdg/Indicator/Data?indicator=3.4.1&pageSize=1000&areaCode=036&areaCode=096&areaCode=156&areaCode=184&areaCode=242&areaCode=583&areaCode=360&areaCode=392&areaCode=116&areaCode=296&areaCode=410&areaCode=418&areaCode=584&areaCode=496&areaCode=458&areaCode=570&areaCode=520&areaCode=554&areaCode=608&areaCode=585&areaCode=598&areaCode=702&areaCode=090&areaCode=776&areaCode=798&areaCode=704&areaCode=548&areaCode=882&page=1>
bind_indicators(gho_ncd, sdg_ncd) |> glimpse()
#> Rows: 1,914
#> Columns: 15
#> $ source        <chr> "gho", "gho", "gho", "gho", "gho", "gho", "gho", "gho", …
#> $ id            <chr> "NCDMORT3070", "NCDMORT3070", "NCDMORT3070", "NCDMORT307…
#> $ indicator     <chr> "Probability (%) of dying between age 30 and exact age 7…
#> $ location      <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", …
#> $ iso3          <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", …
#> $ location_name <chr> "Australia", "Australia", "Australia", "Australia", "Aus…
#> $ year          <int> 2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002, 20…
#> $ value         <chr> "13.0 [11.3-14.7]", "16.0 [14.1-18.0]", "9.8 [8.4-11.2]"…
#> $ value_num     <dbl> 13.0, 16.0, 9.8, 9.6, 15.6, 12.6, 15.0, 9.6, 12.3, 9.1, …
#> $ low           <dbl> 11.3, 14.1, 8.4, 8.2, 13.7, 11.0, 13.1, 8.2, 10.7, 7.8, …
#> $ high          <dbl> 14.7, 18.0, 11.2, 11.0, 17.5, 14.3, 16.8, 10.9, 13.9, 10…
#> $ series        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ dim1          <chr> "SEX_BTSX", "SEX_MLE", "SEX_FMLE", "SEX_FMLE", "SEX_MLE"…
#> $ dim2          <chr> "AGEGROUP_YEARS30-69", "AGEGROUP_YEARS30-69", "AGEGROUP_…
#> $ dim3          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

Exploring series with sdg_coverage()

A single SDG indicator often contains several series — for example different vaccines, sex strata, or causes of death — each with its own country and year coverage. Indicator "3.b.1" (vaccine coverage) is a clear case: it is published as four separate series (DTP3, MCV2, PCV3, HPV), and the year coverage of the newer vaccines is much shorter than that of DTP3.

sdg_coverage() summarises the year range and observation count per (location, series), so you can inspect what series exist and how each is covered before deciding which one to analyse.

sdg_coverage("3.b.1", area = c("156", "608"))
#> Fetching:
#> <https://unstats.un.org/sdgs/UNSDGAPIV5/v1/sdg/Indicator/Data?indicator=3.b.1&pageSize=1000&areaCode=156&areaCode=608&page=1>
#> # A tibble: 8 × 5
#>   location series      year_min year_max n_obs
#>   <chr>    <chr>          <int>    <int> <int>
#> 1 156      SH_ACS_DTP3     2000     2024    25
#> 2 156      SH_ACS_HPV      2010     2024    15
#> 3 156      SH_ACS_MCV2     2000     2024    25
#> 4 156      SH_ACS_PCV3     2008     2024    17
#> 5 608      SH_ACS_DTP3     2000     2024    25
#> 6 608      SH_ACS_HPV      2010     2024    15
#> 7 608      SH_ACS_MCV2     2000     2024    25
#> 8 608      SH_ACS_PCV3     2008     2024    17
#>   location series      year_min year_max n_obs
#> 1 156      SH_ACS_DTP3     2000     2023    24
#> 2 156      SH_ACS_HPV      2018     2023     6
#> 3 156      SH_ACS_MCV2     2000     2023    24
#> 4 156      SH_ACS_PCV3     2017     2023     7
#> 5 608      SH_ACS_DTP3     2000     2023    24
#> 6 608      SH_ACS_HPV      2017     2023     7
#> 7 608      SH_ACS_MCV2     2000     2023    24
#> 8 608      SH_ACS_PCV3     2014     2023    10

Note that DSIR intentionally does not provide SDG analogues of gho_has_data() and gho_count(). SDG data is generally complete enough that those screening helpers add little value — the more useful pre-analysis question for SDG is “which series are available?”, which is what sdg_coverage() answers.

Where to next

Source code lives at https://github.com/shanlong-who/DSIR.
Bug reports, feature requests, and pull requests are all welcome — please file them on the GitHub issue tracker.