Introduction to CodelistGenerator

Creating a code list for dementia

For this example we are going to generate a candidate codelist for dementia, only looking for codes in the condition domain. Let’s first load some libraries

Connect to the OMOP CDM vocabularies

CodelistGenerator works with a cdm_reference to the vocabularies tables of the OMOP CDM using the CDMConnector package.

# example with postgres database connection details
db <- DBI::dbConnect(RPostgres::Postgres(),
  dbname = Sys.getenv("server"),
  port = Sys.getenv("port"),
  host = Sys.getenv("host"),
  user = Sys.getenv("user"),
  password = Sys.getenv("password")
)

# create cdm reference
cdm <- CDMConnector::cdm_from_con(
  con = db,
  cdm_schema = Sys.getenv("vocabulary_schema")
)

Check version of the vocabularies

It is important to note that the results from CodelistGenerator will be specific to a particular version of the OMOP CDM vocabularies. We can see the version of the vocabulary being used like so

getVocabVersion(cdm = cdm)
#> [1] "v5.0 13-JUL-21"

A code list from “Dementia” (4182210) and its descendants

The simplest approach to identifying potential codes is to take a high-level code and include all its descendants.

codesFromDescendants <- tbl(
  db,
  sql(paste0(
    "SELECT * FROM ",
    vocabularyDatabaseSchema,
    ".concept_ancestor"
  ))
) %>%
  filter(ancestor_concept_id == "4182210") %>%
  select("descendant_concept_id") %>%
  rename("concept_id" = "descendant_concept_id") %>%
  left_join(tbl(db, sql(paste0(
    "SELECT * FROM ",
    vocabularyDatabaseSchema,
    ".concept"
  )))) %>%
  select(
    "concept_id", "concept_name",
    "domain_id", "vocabulary_id"
  ) %>%
  collect()
datatable(codesFromDescendants,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    lengthMenu = c(10, 20, 50)
  )
)

This looks to pick up most relevant codes. But, this approach misses codes that are not a descendant of 4182210. For example, codes such as “Wandering due to dementia” (37312577; https://athena.ohdsi.org/search-terms/terms/37312577) and “Anxiety due to dementia” (37312031; https://athena.ohdsi.org/search-terms/terms/37312031) are not picked up.

Generating a candidate code list using CodelistGenerator

To try and include all such terms that could be included we can use CodelistGenerator.

First, let’s do a simple search for a single keyword of “dementia”, including descendants of the identified codes.

dementiaCodes1 <- getCandidateCodes(
  cdm = cdm,
  keywords = "dementia",
  domains = "Condition",
  includeDescendants = TRUE
)
datatable(dementiaCodes1,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    lengthMenu = c(10, 20, 50)
  )
)

Comparing code lists

What is the difference between this code list and the one from 4182210 and its descendants?

codeComparison <- compareCodelists(
  codesFromDescendants,
  dementiaCodes1
)
kable(codeComparison %>%
  group_by(codelist) %>%
  tally())
codelist n
Both 139
Only codelist 2 37

What are these extra codes picked up by CodelistGenerator?

datatable(
  codeComparison %>%
    filter(codelist == "Only codelist 2"),
  rownames = FALSE,
  options = list(
    pageLength = 10,
    lengthMenu = c(10, 20, 50)
  )
)

Review mappings from non-standard vocabularies

Perhaps we want to see what ICD10CM codes map to our candidate code list. We can get these by running

icdMappings <- getMappings(
  cdm = cdm,
  candidateCodelist = dementiaCodes1,
  nonStandardVocabularies = "ICD10CM"
)
datatable(icdMappings,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    lengthMenu = c(10, 20, 50)
  )
)
readMappings <- getMappings(
  cdm = cdm,
  candidateCodelist = dementiaCodes1,
  nonStandardVocabularies = "Read"
)
datatable(readMappings,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    lengthMenu = c(10, 20, 50)
  )
)