Title: | A Toolbox for Using the CPS’s Voting and Registration Supplement |
Version: | 0.1.0 |
Description: | Provides automated methods for downloading, recoding, and merging selected years of the Current Population Survey's Voting and Registration Supplement, a large N national survey about registration, voting, and non-voting in United States federal elections. Provides documentation for appropriate use of sample weights to generate statistical estimates, drawing from Hur & Achen (2013) <doi:10.1093/poq/nft042> and McDonald (2018) http://www.electproject.org/home/voter-turnout/voter-turnout-data. |
URL: | https://github.com/Reed-EVIC/cpsvote |
BugReports: | https://github.com/Reed-EVIC/cpsvote/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 3.6.0) |
Suggests: | knitr, rmarkdown, survey, srvyr, here, scales, ggplot2, usmap |
VignetteBuilder: | knitr |
RoxygenNote: | 7.1.1 |
Imports: | magrittr, readr, dplyr, stringr, forcats, rlang |
NeedsCompilation: | no |
Packaged: | 2020-10-27 16:14:53 UTC; jaylee |
Author: | Jay Lee [aut, cre], Paul Gronke [aut], Canyon Foot [ctb] |
Maintainer: | Jay Lee <jaylee@reed.edu> |
Repository: | CRAN |
Date/Publication: | 2020-11-05 16:00:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
A sample of the raw 2016 CPS dataset
Description
This is a 10,000 row sample of the data that comes out of
cps_read(years = 2016)
.
Usage
cps_2016_10k
Format
A tibble with 10,000 rows and 17 columns:
- FILE
Which default file the case came from
- YEAR
Year of interview
- STATE
State postal abbreviation
- AGE
Person's age as of the end of survey week; topcoded at 80 and 85
- SEX
Binary sex
- EDUCATION
Highest level of school completed or degree received
- RACE
Race
- HISPANIC
Hispanic status
- WEIGHT
Original CPS survey weight
- VRS_VOTE
Whether respondent voted in the election; self-reported
- VRS_REG
Whether respondent was registered to vote in the election; self-reported
- VRS_REG_WHYNOT
Reason for not being registered to vote
- VRS_VOTE_WHYNOT
Reason for not voting
- VRS_VOTEMODE_2004toPRESENT
Whether respondent voted by mail
- VRS_VOTEWHEN_2004toPRESENT
Whether respondent voted on election day or before
- VRS_REG_METHOD
Method of registration
- VRS_RESIDENCE
Duration of time living at current address
A sample of the full CPS dataset
Description
This is a 10,000 row sample of the data that comes out of
cpsvote::cps_load_basic
.
Usage
cps_allyears_10k
Format
A tibble with 10,000 rows and 25 columns:
- FILE
Which default file the case came from
- YEAR
Year of interview
- STATE
State postal abbreviation
- AGE
Person's age as of the end of survey week; topcoded at 90 until 2002, 80 in 2004, and 80/85 after
- SEX
Binary sex
- EDUCATION
Highest level of school completed or degree received
- RACE
Race
- HISPANIC
Hispanic status
- WEIGHT
Original CPS survey weight
- VRS_VOTE
Whether respondent voted in the election; self-reported
- VRS_REG
Whether respondent was registered to vote in the election; self-reported
- VRS_VOTE_TIME
What time of day respondent voted
- VRS_RESIDENCE
Duration of time living at current address
- VRS_VOTE_WHYNOT
Reason for not voting
- VRS_VOTEMETHOD_1996to2002
Method of voting, pre-2004
- VRS_REG_SINCE95
Whether respondent had registered to vote since 1995
- VRS_REG_DMV
Whether respondent registered at the DMV
- VRS_REG_METHOD
Method of registration
- VRS_REG_WHYNOT
Reason for not being registered to vote
- VRS_VOTEMODE_2004toPRESENT
Whether respondent voted by mail, 2004 on
- VRS_VOTEWHEN_2004toPRESENT
Whether respondent voted on election day or before, 2004 on
- VRS_VOTEMETHOD_CON
A consolidation of VRS_VOTEMETHOD_1996to2002, VRS_VOTEMODE_2004toPRESENT, and VRS_VOTEWHEN_2004toPRESENT
- cps_turnout
Recode of VRS_VOTE for CPS turnout calculation
- hurachen_turnout
Recode of VRS_VOTE for adjusted Hur & Achen turnout calculation
- turnout_weight
Adjusted weight for calculating voter turnout (per Hur & Achen)
Sample column specifications for reading CPS data
Description
Because the CPS is a fixed-width file that changes data locations (and variable names) across years, to correctly read the data you have to specify which start/end positions correspond to which column names in each year. This is one such specification. To add extra data or change column names, see the Vignette.
Usage
cps_cols
Format
A data frame with 204 rows and 8 columns:
- year
year
- cps_name
original column name as given by the CPS
- new_name
a new name, which tries to describe the variable and join sensibly across multiple years
- start_pos
which character of a line the variable starts with
- end_pos
which character of a line the variable ends with
- col_type
whether the column is character, numeric, or a factor
- description
the question text/description from the CPS
- notes
any notes for question administration or analysis
Download CPS microdata
Description
Download CPS microdata
Usage
cps_download_data(
path = "cps_data",
years = seq(1994, 2018, 2),
overwrite = FALSE
)
Arguments
path |
A file path (relative or absolute) where the downloads should go. |
years |
Which years of data to download. Defaults to all even-numbered years from 1994 to 2018. |
overwrite |
Logical, whether to write over existing files or not. Defaults to FALSE. |
Details
File names will be written in the style "cps_nov2018.zip", with the appropriate years.
The Voting and Registration Supplement is only conducted in even-numbered years (since 1964), so any entry in
years
outside of this will be skipped.Currently the package only supports downloads from 1994 onwards, so any entry in
years
before 1994 will be skipped.
Examples
## Not run:
cps_download_data(path = "cps_docs", years = 2016, overwrite = TRUE)
## End(Not run)
Download CPS technical documentation
Description
Download CPS technical documentation
Usage
cps_download_docs(
path = "cps_docs",
years = seq(1994, 2018, 2),
overwrite = FALSE
)
Arguments
path |
A file path (relative or absolute) where the downloads should go. |
years |
Which years of documentation to download. Defaults to all even-numbered years from 1994 to 2018. |
overwrite |
Logical, whether to write over existing files or not. Defaults to FALSE. |
Details
File names will be written in the style "cps_nov2018.pdf", with the appropriate years.
The Voting and Registration Supplement is only conducted in even-numbered years (since 1964), so any entry in
years
outside of this will be skipped.Currently the package only supports downloads from 1994 onwards, so any entry in
years
before 1994 will be skipped.
Examples
## Not run:
cps_download_docs(path = "cps_docs", years = 2016, overwrite = TRUE)
## End(Not run)
Sample factor specifications for reading CPS data
Description
Because the CPS changes factor levels across years, to correctly read the data you have to specify which numeric codes correspond to which character values in each year. This is one such specification. To add extra data, see the Vignette.
Usage
cps_factors
Format
A data frame with 204 rows and 8 columns:
- year
year
- cps_name
original column name as given by the CPS
- new_name
a new name, which tries to describe the variable and join sensibly across multiple years
- code
the numeric code contained in the raw CPS data
- value
the character value corresponding to each numeric code
Details
These match the exact specifications from the CPS, including NA codes and any typos that occur (e.g., "Hipsanic" is common in older years).
Apply factor levels to raw CPS data
Description
The CPS publishes their data in a numeric format, with a separate
PDF codebook (not machine readable) describing factor values. This function
labels the raw numeric CPS data according to a supplied factor key. Codes
that appear in a given year and are not included in factors
will be
recoded as NA
.
Usage
cps_label(
data,
factors = cpsvote::cps_factors,
names_col = "new_name",
na_vals = c("-1", "BLANK", "NOT IN UNIVERSE"),
expand_year = TRUE,
rescale_weight = TRUE,
toupper = TRUE
)
Arguments
data |
The raw CPS data that factors should be applied to |
factors |
A data frame containing the label codes to be applied |
names_col |
Which column of |
na_vals |
Which character values should be considered "missing" across the dataset and be set to NA after labelling |
expand_year |
Whether to change the two-digit year listed in earlier surveys (94, 96) into a four-digit year (1994, 1996) |
rescale_weight |
Whether to rescale the weight, dividing by 10,000. The CPS describes the given weight as having "four implied decimals", so this rescaling adjusts the weight to produce sensible population totals. |
toupper |
Whether to convert all factor levels to uppercase |
Value
CPS data with factor labels in place of the raw numeric data
Examples
cps_label(cps_2016_10k)
load some basic/default CPS data into the environment
Description
This function is a quick starter to working with the CPS, using all of the
defaults that are baked into this package. Because the data is so large, it
made more sense to ship a "basic" CPS data set as a function rather than as a
package data object (which would have been over 10 MB). This function will
take you from nothing to having some basic CPS data in your environment, with
the option to save this data locally for future ease. A sample of the data
that comes out of this function is provided as cpsvote::cps_allyears_10k
.
Usage
cps_load_basic(years = seq(1994, 2018, 2), datadir = "cps_data", outdir = NULL)
Arguments
years |
Which years should be read |
datadir |
The location where the CPS zip files live (or should be downloaded to) |
outdir |
The location where the final data file should be saved to |
Examples
## Not run: cps_load-basic(years = 2016, outdir = "data")
Read in CPS data
Description
Load multiple years of data from the Current Population Survey.
This function will also download the data for you, if it is not present in
the given dir
.
Usage
cps_read(
years = seq(1994, 2018, 2),
dir = "cps_data",
cols = cpsvote::cps_cols,
names_col = "new_name",
join_dfs = TRUE
)
Arguments
years |
Which years to read in. Thie function will read data from files
in |
dir |
The folder where the CPS data files live. These files should follow a naming scheme that contains the 4-digit year of the results in question, and have a ".zip" or ".gz" extension. |
cols |
Which columns to read. This must be a data frame, with required
columns |
names_col |
The column in |
join_dfs |
Whether to combine all of the years into a single data frame,
or leave them as a list of data frames. Defaults to |
Value
a data frame, or list of data frames
Examples
## Not run: cps_read(years = 2016, names_col = "new_name")
Load a single CPS file
Description
Read one year of data from the Current Population Survey
Usage
cps_read_year(
file,
cols = cpsvote::cps_cols,
names_col = "new_name",
year = as.numeric(stringr::str_extract(file, "\\d{4}"))
)
Arguments
file |
Where the fixed-width or zip/gz file for this year's data lives |
cols |
Which columns to read. This must be a data frame, with required
columns |
names_col |
The column in |
year |
Which year is being read; defaults to 4-digit year in file name |
Value
a data frame, with dimensions depending on the year and columns specified
recode the voting variable for turnout calculations
Description
When the CPS calculates voter turnout, they consider the values "Don't know",
"Refused", and "No response" to be non-voters, that is they lump these in
with "No". With increased levels of survey non-response in recent years, this
has caused turnout estimates to artificially deflate when compared to
measures of voter turnout from state election offices. This function adds two
recodes of the original voting variable, one which applies the CPS recoding
where multiple categories map to "No", and one which follows the guidelines
from Hur & Achen (2013) of setting these categories to NA
. See the Vignette
for more information on this process.
Usage
cps_recode_vote(
data,
vote_col = "VRS_VOTE",
items = c("DON'T KNOW", "REFUSED", "NO RESPONSE")
)
Arguments
data |
the input data set |
vote_col |
which column contains the voting variable |
items |
which items should be "No" in the CPS coding and |
Value
data
with two columns attached, cps_turnout
and hurachen_turnout
,
voting variables recoded according to the process above
Examples
cps_recode_vote(cps_refactor(cps_label(cps_2016_10k)))
combine factor levels across years
Description
The response sets in certain CPS questions change between years. This function
consolidates several of these response sets across years (and fixes typos
from the CPS documentation), specifically race, Hispanic status, duration of
residency, reason for not voting, and method of registration. Additionally,
this creates a new column VRS_VOTEMETHOD_CON
which consolidates multiple
expressions of vote method across years (By Mail, Early, and Election Day)
into one variable.
Usage
cps_refactor(data, move_levels = TRUE)
Arguments
data |
A dataset containing already-labelled CPS data |
move_levels |
Whether to move the levels "OTHER", "DON'T KNOW", and "REFUSED" to the end of each factor's level set |
Details
While consolidating response sets across multiple surveys can be
fraught with peril, this function attempts to combine disparate levels for
race and other CPS variable across multiple years. Some of these are
relatively straightforward typos fixes ("NON-HIPSANIC" should clearly match
"NON-HISPANIC"), but others have differing degrees of subjectivity applied.
Take this function with a grain of salt, as it depends on some exact variable
names you may or may not be using, and recode variables as needed for your
own uses. To explore exactly how these variables were recoded, you can run
table(data$RACE, cps_refactor(data)$RACE)
in the console, substituting
your column of interest in for RACE
.
Examples
cps_refactor(cps_label(cps_2016_10k))
Calculations to reweight properly for voter turnout
Description
While the U.S. Census Bureau provides one weight with the CPS, a modified
weight is needed to properly calculate voter turnout. This data set provides
those calculations, according to Hur and Achen (2013). The comparison data
comes from Dr. Michael McDonald's estimates of voter turnout among the
voting-eligible population (VEP). It can be joined with CPS data to
calculate the new weights needed for analysis, using the function
cps_reweight_turnout
.
Usage
cps_reweight
Format
A tibble with 1,326 rows and 6 columns:
- YEAR
year
- STATE
state
- response
indicator of turnout in recent election
- vep_turnout
proportion of turnout indicator, calculated by McDonald
- cps_turnout
proportion of turnout indicator, calculated by CPS
- reweight
the factor by which to scale original CPS weights
Source
Turnout data from http://www.electproject.org/home/voter-turnout/voter-turnout-data
apply weight correction for voter turnout
Description
This function applies the turnout correction recommended by Hur & Achen
(2013). The data set containing the scaling factor is cpsvote::cps_reweight
.
Usage
cps_reweight_turnout(data)
Arguments
data |
the input data set, containing columns |
Examples
cps_reweight_turnout(cps_recode_vote(cps_refactor(cps_label(cps_2016_10k))))
vectorized na_if
Description
vectorized na_if
Usage
na_ifin(x, y)
Arguments
x |
the vector to be checked |
y |
the values which should be replaced with NA |