Electronic health records often include diseases and procedures recorded as ICD codes. When building models based on health records, it is helpful to convert ICD codes to CCS codes, because CCS codes are better summaries of concrete disease entities and ICD codes are too fine-grained for most applications. There are over 14,400 different codes in the ICD-10 base classification; some expanded ICD editions include over 70,000 codes. In contrast, there are about 270 CCS codes.
What are ICD Codes?
ICD refers to “The International Statistical Classification of Diseases and Related Health Problems.” It was originally designed as a comprehensive healthcare classification system and is now commonly used for billing purposes, which is why you will often see ICD codes in medical records. The ICD system has been through various revisions. You are most likely to see codes from ICD-9 and ICD-10. If you want to look at all the ICD codes in one place, you can download the full list (e.g. here, for ICD-9.)
Here are some illustrative examples of how incredibly fine-grained ICD codes can be (taken from ICD-9):
“Food poisoning” includes:
0050 | Staphylococcal food poisoning |
0051 | Botulism food poisoning |
0052 | Food poisoning due to Clostridium perfringens (C. welchii) |
0053 | Food poisoning due to other Clostridia |
0054 | Food poisoning due to Vibrio parahaemolyticus |
00581 | Food poisoning due to Vibrio vulnificus |
00589 | Other bacterial food poisoning |
0059 | Food poisoning, unspecified |
A few of the codes referring to erythematous (red) conditions of the skin:
69550 | Exfoliation due to erythematous condition involving less than 10 percent of body surface |
69551 | Exfoliation due to erythematous condition involving 10-19 percent of body surface |
69552 | Exfoliation due to erythematous condition involving 20-29 percent of body surface |
69553 | Exfoliation due to erythematous condition involving 30-39 percent of body surface |
69554 | Exfoliation due to erythematous condition involving 40-49 percent of body surface |
69555 | Exfoliation due to erythematous condition involving 50-59 percent of body surface |
69556 | Exfoliation due to erythematous condition involving 60-69 percent of body surface |
69557 | Exfoliation due to erythematous condition involving 70-79 percent of body surface |
69558 | Exfoliation due to erythematous condition involving 80-89 percent of body surface |
69559 | Exfoliation due to erythematous condition involving 90 percent or more of body surface |
Some bad encounters with nature…
E9050 | Venomous snakes and lizards causing poisoning and toxic reactions |
E9051 | Venomous spiders causing poisoning and toxic reactions |
E9052 | Scorpion sting causing poisoning and toxic reactions |
E9053 | Sting of hornets, wasps, and bees causing poisoning and toxic reactions |
E9054 | Centipede and venomous millipede (tropical) bite causing poisoning and toxic reactions |
E9055 | Other venomous arthropods causing poisoning and toxic reactions |
E9056 | Venomous marine animals and plants causing poisoning and toxic reactions |
E9057 | Poisoning and toxic reactions caused by other plants |
E9058 | Poisoning and toxic reactions caused by other specified animals and plants |
E9059 | Poisoning and toxic reactions caused by unspecified animals and plants |
E9060 | Dog bite |
E9061 | Rat bite |
E9062 | Bite of nonvenomous snakes and lizards |
E9063 | Bite of other animal except arthropod |
E9064 | Bite of nonvenomous arthropod |
E9065 | Bite by unspecified animal |
E9068 | Other specified injury caused by animal |
E9069 | Unspecified injury caused by animal |
E907 | Accident due to lightning |
E9080 | Hurricane |
E9081 | Tornado |
E9082 | Floods |
E9083 | Blizzard (snow) (ice) |
E9084 | Dust storm |
E9088 | Other cataclysmic storms |
E9089 | Unspecified cataclysmic storms, and floods resulting from storms |
E9090 | Earthquakes |
E9091 | Volcanic eruptions |
E9092 | Avalanche, landslide, or mudslide |
E9093 | Collapse of dam or man-made structure |
E9094 | Tidalwave caused by earthquake |
E9098 | Other cataclysmic earth surface movements and eruptions |
E9099 | Unspecified cataclysmic earth surface movements and eruptions |
How about some diabetes-related codes?
Basically, ICD codes can get quite specific, splitting a single health-related concept (e.g. “diabetes” or “food poisoning”) into many subcategories.
Why are ICD codes like this?
Because, sadly, they were not designed for downstream machine learning tasks. Using ICD codes to “label diseases” in a machine learning model ends up being somewhat analogous to using a face data set labeled with the individual person’s name (“Lee”, “Rowan”) rather than physical characteristics (“with glasses”, “female”…)
The solution: CCS codes
Luckily, you do not have to create a mapping from “super detailed ICD code” to “medical concept” because experts have already created one for you, in the form of CCS codes. Each CCS code corresponds to a high-level medical concept and is paired with the corresponding ICD codes. CCS codes were originally developed as part of the Healthcare Cost and Utilization Project (HCUP), sponsored by the AHRQ. The mapping between CCS codes and ICD codes is completely free to download. There are CCS codes for diagnoses as well as for procedures. CCS codes have varying degrees of specificity: single-level CCS codes have categories like, “Tuberculosis,” “Septicemia,” “E Codes: Motor vehicle traffic (MVT),” and “Other inflammatory condition of skin.” Multi-level CCS codes provide even higher-level groupings, e.g. “Tuberculosis” and “Septicemia” are grouped together (with some other conditions) under “Infectious and parasitic diseases.”
In general, even using the most granular CCS codes is a vast improvement over the wild granularity of ICD codes. The only data science circumstance in which I can imagine using raw ICD codes is if the project focuses on very specific subcategories of a single disease. In pretty much any other case with a large EHR data set, e.g. future diagnosis prediction, a model will benefit from use of CCS codes rather than ICD codes.
Tip: ICD-9 differs from ICD-10
If you have a data set that includes both ICD-9 and ICD-10 codes, you have to pay attention to the indicator that says whether the code is from ICD-9 or ICD-10. While it may appear that ICD-9 and ICD-10 codes are entirely different, they are not; a subset of ICD-9 and ICD-10 codes numerically overlap but represent totally unrelated medical entities. Thus, be cautious when translating ICD-9 or ICD-10 to CCS, and ensure that you use the “translation” appropriate for the ICD version.
About the Featured Image
What’s with the flaming water ski guy in the featured image? As it turns out, the creators of the ICD system exercised some creativity when coming up with ICD codes. Here are some of the best ICD codes, including “burn due to water-skis on fire, subsequent encounter”:
(Yes, these are real ICD codes! Originally the list is from here, but I also double-checked them by searching the codes on the ICD website.)
The featured image is composed of this water-skiier and this fire.