To create a high‐quality electronic health record ()–derived mortality dataset for retrospective and prospective real‐world evidence generation.
Oncology data, supplemented with external commercial and Social Security Death Index data, benchmarked to the National Death Index ().
We developed a recent, linkable, high‐quality mortality variable amalgamated from multiple data sources to supplement data, benchmarked against the highest completeness U.S. mortality data, the . Data quality of the mortality variable version 2.0 is reported here.
For advanced non‐small‐cell lung cancer, sensitivity of mortality information improved from 66 percent in structured data to 91 percent in the composite dataset, with high date agreement compared to the . For advanced melanoma, metastatic colorectal cancer, and metastatic breast cancer, sensitivity of the final variable was 85 to 88 percent. Kaplan–Meier survival analyses showed that improving mortality data completeness minimized overestimation of survival relative to ‐based estimates.
For ‐derived data to yield reliable real‐world evidence, it needs to be of known and sufficiently high quality. Considering the impact of mortality data completeness on survival endpoints, we highlight the importance of data quality assessment and advocate benchmarking to the .
Data Sources/Study Setting