DRAFT: This is a work-in-progress!

Totalnum / Patient Counting in i2b2

Counts whisper softly,
concepts bloom into numbers,
truth hides in the tree.

ETL is wild,
but totalnum shows the gaps,
missingness exposed.

Researchers rejoice,
“<10” protects the secret,
yet trends still emerge.

Fast counts now arrive,
SQL flexes its muscles,
coffee still is hot.

Figure: i2b2 Totalnum Workflow
Ontology terms and patient data feed into the totalnum counting process, producing per-concept patient counts (`c_totalnum`). These counts are written back into the ontology for query optimization and visualization. Obfuscated versions are stored in `totalnum_report`, which can then be aggregated across sites to generate privacy-preserving network-level statistics for feasibility studies and data quality assessment.

Overview

The totalnum feature in i2b2 enables sites to calculate and store the number of patients associated with every concept in their ontology. Originally created as a query optimization tool, it has since evolved into a critical component for query development and data quality assessment.

In practice, each i2b2 ontology concept (e.g., a diagnosis code, a lab test, a demographic value) can be associated with the total number of unique patients who have at least one record matching that concept or any of its descendants. These counts can be displayed in the i2b2 web client, helping researchers gauge cohort feasibility before running queries.

Beyond individual sites, federated networks such as ENACT can aggregate these patient counts across institutions. This aggregation supports:

Feasibility studies – researchers can quickly see if a condition is represented across sites.
Data quality checks – missing or unexpected counts can signal mapping errors, ETL problems, or local coding differences.
Ontology usage insights – counts highlight which parts of the ontology are actively populated and which may be underused.
Privacy-preserving comparisons – obfuscation algorithms ensure sensitive patient counts are not directly exposed across sites.

With the introduction of Fast Totalnum in i2b2 v1.8, counts can now be generated 5–10× faster, making it feasible to refresh them regularly as part of ETL workflows.

Technical FAQ

What is `c_totalnum`?

c_totalnum is a column in i2b2 ontology tables that stores the number of distinct patients for each ontology item. It can be populated by running the totalnum stored procedures.

Why should I generate totalnum counts?

Query building: Users see approximate patient counts next to ontology items in the i2b2 web client.
Query optimization: The i2b2 server can use counts to execute queries more efficiently.
Data quality: Sites can identify suspicious gaps (e.g., missing common diagnoses) or outliers.
Network participation: ENACT and other networks use aggregated, obfuscated counts for benchmarking and feasibility analyses.

Who can view patient counts in the web client?

Only users with the DATA_AGG permission or higher can view counts in the ontology tree. This protects sensitive information while supporting legitimate research use.

How are counts computed?

The process involves:

Mapping ontology codes → Each ontology concept is linked to one or more codes in the fact or dimension tables.
Traversing the hierarchy → Counts include all child codes under a given concept.
Counting unique patients → Distinct patients are counted from observation, visit, or patient dimension tables.
Storing results → The counts are written back into ontology tables (c_totalnum) and optionally exported to totalnum_report for obfuscated sharing.

What is “Fast Totalnum”?

Introduced in i2b2 v1.8.
Uses a transitive closure table (precomputed ancestor–descendant pairs) to speed up hierarchical counting.
Typically reduces runtime from >10 hours to 1–3 hours, depending on database size.
Available for SQL Server, Oracle, and PostgreSQL.

What are the prerequisites for running totalnum?

Stored procedures must be installed in the metadata schema.
Procedures must have read access to the CRC schema (observation_fact, visit_dimension, patient_dimension).
Ontology basecodes (c_basecode) must match the codes in your fact tables—mapping through concept_dimension or adapter files is not supported.
Permissions: the database user must be able to both read CRC tables and update ontology tables.

How often should I refresh counts?

Best practice is to refresh after every ETL load or major ontology update. This keeps the patient counts aligned with current data and ensures network-level benchmarking is valid.

How does obfuscation work in networks like ENACT/SHRINE?

To protect privacy, counts are transformed before sharing:

Add Gaussian noise (σ = 6.5, capped at ±10).
Round to the nearest 5.
Return <10 if the result is below 10.

This makes individual site counts non-identifiable while preserving useful aggregate trends.

Figure: Counts appear in the ontology tree.

Figure: Data Quality Explorer network statistics.

How do I run Fast Totalnum?

SQL Server (T-SQL)

EXEC FastTotalnumPrep;       -- Run once (when ontology changes)
EXEC FastTotalnumCount;      -- Run patient counting
EXEC FastTotalnumOutput;     -- Write results to ontology + report tables

Oracle (PL/SQL)

BEGIN FastTotalnumPrep; END;
BEGIN FastTotalnumCount; END;
BEGIN FastTotalnumOutput; END;

Postgres

SELECT fasttotalnumprep();   -- Run once (ontology changes)
CALL fasttotalnumcount();    -- Counting step
CALL fasttotalnumoutput();   -- Output results

Can I use totalnum with OMOP data?

Yes. i2b2 v1.8 supports OMOP integration. Use the OMOP-specific totalnum preparation script (e.g., totalnum_fast_prep_OMOP.sql) and configure ontology mappings to OMOP tables such as drug_exposure.

Where are results stored?

Ontology tables (c_totalnum) → for query optimization and web client display.
totalnum_report table → obfuscated counts, keyed for cross-site aggregation.

DRAFT: This is a work-in-progress!

Totalnum / Patient Counting in i2b2

Overview

Technical FAQ

What is `c_totalnum`?

Why should I generate totalnum counts?

Who can view patient counts in the web client?

How are counts computed?

What is “Fast Totalnum”?

What are the prerequisites for running totalnum?

How often should I refresh counts?

How does obfuscation work in networks like ENACT/SHRINE?

How do I run Fast Totalnum?

Can I use totalnum with OMOP data?

Where are results stored?

Pages

Recently updated

Ontologies 102 - Patient Counts ("totalnum")

DRAFT: This is a work-in-progress!

Totalnum / Patient Counting in i2b2

Overview

Technical FAQ

What is c_totalnum?

Why should I generate totalnum counts?

Who can view patient counts in the web client?

How are counts computed?

What is “Fast Totalnum”?

What are the prerequisites for running totalnum?

How often should I refresh counts?

How does obfuscation work in networks like ENACT/SHRINE?

How do I run Fast Totalnum?

Can I use totalnum with OMOP data?

Where are results stored?

Pages

Recently updated

What is `c_totalnum`?