- by Jimmy Fisher
- Nov 02, 2024
Concatenating 2018-2022 results in a single dataframe of 15,859,795 death records in the U.S. from 2018 to 2022, all of which include our selected variables.
As part of my initial exploratory data analysis of this giant NCHS U.S. mortality file, I built a Dash app to demonstrate how intermediate visualizations can make data easier to explore for programmers working on projects. I mapped encoding values onto the hover-overs to make each viz easier to understand, and added in some axis controls. The purpose of this viz is to explore the dataset as a data scientist, so I'm not going to spend a lot of time polishing it, but here is a quick walkthrough.
There are 4 visualizations controlled by the top global filters:
You can select a value from the 39, 113, or 358 cause recodes and also toggle on/off male and female records.
At the top-left is a line graph showing deaths by sex and year for the selected cause (in this case, "Malnutrition" from the 113 Cause Recode). The top-right shows deaths for the selected year (2022) on a scatter/jitter plot, with Age Recode 27 and Race Recode 40 selected for my axes. The hover-over box you see within the top-right viz shows the raw values for race_recode_40 and age_recode_27 from our data, but also the description (desc) of that value for the selected variable, such as "Samoan" and "75-79 years old".
The first of the lower two visualizations is a stacked bar-chart at the at the lower left, showing the distribution of deaths for the selected cause recode by year and age group. While this tile defaults to "All Years" (2018-2022), I selected 2022 and hovered over one segment, spawning a hover-over box telling us that 1,375 men 85-89 years of age died of malnutrition in the United States.
The fourth and final viz is a horizontal stacked bar chart with a toggle to show either where people were when they died or the manner of their death (e.g., natural, suicide, homicide, etc.). In the below screenshot for malnutrition, hovering over the orange segment of Place of Death 4 shows interpretable detail, that 5,520 women died in 2022 of malnutrition while at their homes.
The entire Jupyter Notebook is available on GitHub and the Dash App visualizer is available for your exploration through the following links:
NOTE: At the top of the Dash app code, you'll notice after loading the cause recode maps from .csv files, I created mappings for race, age, education, manner of death, and place of death. Then, I could reference them in each visualization's dataframe. The general layout of Dash apps is covered in THIS great article, and you can find some sophisticated examples HERE, but consider the code in blocks. First, below the variable maps, you'll find an app layout where the global filters and visualization locations (2 in the top row & 2 in the bottom row). Following this layout, each visualization is built in turn, beginning with @app.callback before specifying outputs and inputs. Take your time and, if you have access to an LLM, one great way to better understand the code is to copy-paste a section into it and ask for a detailed explanation.
For those interested in building visualizations such as these professionally, keep in mind that additional refinements to this approach would likely be necessary for laypersons to more easily navigate it, such as a single year selector affecting all tiles. The explicit purpose of this dashboard is to help data scientists with exploratory data analysis prior to conducting statistical tests and building AI/ML models.