Skip to content

Instantly share code, notes, and snippets.

@bgourlie
Last active December 2, 2017 21:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bgourlie/044ca86397003b5bd8d1f30533322ab3 to your computer and use it in GitHub Desktop.
Save bgourlie/044ca86397003b5bd8d1f30533322ab3 to your computer and use it in GitHub Desktop.
Interpreting DUSCMPUB-formatted mortality data

Interpreting mortality data in DUSCMPUB format

This gist explains how to interpret the DUSCMPUB-formatted mortality data found here in conjunction with the reference PDF.

The reference PDF uses 3 components to describe the type of data, its location within the row, and its size:

  • data item: A specific datapoint, for example, a value representing the highest completed level of education.
  • tape location: The column within the line where a data item is located. Each character in a line represents a column, with each line having 472 columns.
  • size: The number of columns used to represent a particular data item.

The reference PDF contains one or more tables for each data item describing how it is to be interpreted.

Example

Page 3 of the reference PDF says that the data item resident status is at tape location 20 with a size of 1. Referencing figure 1 below, the value at column 20 is 1:

                   1                                          3101  M1084 422210  1M1                2015U7BN                                    I500230 067   22 0211I500 61L031                                                                                                                                                                   02 I500 L031                                                                                            01  11                                 100 601`

Figure 1: An example row of data in DUSCMCPUB format.

In order to understand what a 1 means when representing resident status, we reference table 1 below:

1 ... RESIDENTS
      State and County of Occurrence and Residence are the same.
2 ... INTRASTATE NONRESIDENTS
      State of Occurrence and Residence are the same, but County is
      different.
3 ... INTERSTATE NONRESIDENTS
      State of Occurrence and Residence are different, but both are in the U.S.
4 ... FOREIGN RESIDENTS
      State of Occurrence is one of the 50 States or the District of Columbia,
      but Place of Residence is outside of the U.S.

Table 1: A definition of the numerical values used to represent resident status, taken from page 3 of the reference PDF.

Some data items have an "internal format." For example, education data exists at tape location 61 and has a size of 4, but consists of 3 distinct values, which is further broken down in table 2 below:

61-62: Education (1989 revision)
   63: Education (2003 revision)
   64: Education Reporting flag

Table 2: A breakdown of the columns used to represent education data, taken from page 5 of the reference PDF.

Table 2 indicates that column 64 describes which version of education data is present:

0 ... 1989 revision of education item on certificate
1 ... 2003 revision of education item on certificate
2 ... no education item on certificate

Table 3: A definition of values used to represent the version of education data present, taken from page 5 of the reference PDF.

In other words:

  • If the value at column 64 is 0, then the relevant value exists at columns 61 and 62
  • If the value at column 64 is 1, the relevant value is at column 63
  • If the value at column 64 is 2, there is no education data present

We reference table 4 to interpret the aforementioned value depending on which version of the education data exists:

Education (1989 revision)
00 ... No formal education
01-08 ... Years of elementary school
09 ... 1 year of high school
10 ... 2 years of high school
11 ... 3 years of high school
12 ... 4 years of high school
13 ... 1 year of college
14 ... 2 years of college
15 ... 3 years of college
16 ... 4 years of college
17 ... 5 or more years of college
99 ... Not stated

Education (2003 revision)
Field is blank for registration areas that are using the 1989 revision format of the item.
1 ... 8th grade or less
2 ... 9 - 12th grade, no diploma
3 ... high school graduate or GED completed
4 ... some college credit, but no degree
5 ... Associate degree
6 ... Bachelor’s degree
7 ... Master’s degree
8 ... Doctorate or professional degree
9 ... Unknown

Table 4: A definition of values used to represent level-of-education broken down by format, taken from page 5 of the PDF.

Using these references, we can interpret the first two data items from figure 1 as being a RESIDENT with an education level of high school graduate or GED completed. Using the reference PDF, all data items in a row of DUSCMCPUB formatted data can be meaningfully interpreted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment