Introducing educationdata

Kyle Ueyama

2021-05-26

The educationdata package allows the user to retrieve data from the Urban Institute’s Education Data API as a data.frame for analysis. The package contains one major function, get_education_data, which will get data from a specified API endpoint and return a data.frame to the user.

NOTE: By downloading and using this programming package, you agree to abide by the Data Policy and Terms of Use of the Education Data Portal. For more information, see https://educationdata.urban.org/documentation/#terms

Usage

The get_education_data function will return a data.frame from a call to the Education Data API.

library(educationdata)
get_education_data(level, source, topic, by, filters, add_labels, csv)

where:

This simple example will obtain ‘college-university’ level data from the ‘ipeds’ source for the ‘student-faculty-ratio’ topic:

library(educationdata)
 
df <- get_education_data(
   level = 'college-university',
   source = 'ipeds',
   topic = 'student-faculty-ratio'
 )

head(df)
#>   unitid year fips student_faculty_ratio
#> 1 100654 2009    1                    14
#> 2 100663 2009    1                    17
#> 3 100690 2009    1                    10
#> 4 100706 2009    1                    17
#> 5 100724 2009    1                    17
#> 6 100751 2009    1                    20

A somewhat more complex example will obtain ‘school’ level data from the ‘ccd’ source for the ‘enrollment’ topic, broken out by ‘race’ and ‘sex’. The API query is subset with filters for the ‘year’ 2008, ‘grade’ 9 through 12, and a ‘ncessch’ code of 340606000122. Finally, the add_labels flag will map integer codes to their factor labels (‘race’ and ‘sex’ in this instance).

library(educationdata)

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 2008,
                                        grade = 9:12,
                                        ncessch = '340606000122'),
                         add_labels = TRUE)
#> Warning in get_education_data(level = "schools", source = "ccd", topic = "enrollment", : The `by` argument has been deprecated in favor of `subtopic`.
#> Please update your script to use `subtopic` instead.

head(df)
#>   year      ncessch ncessch_num grade                             race    sex
#> 1 2008 340606000122 3.40606e+11     9                            Black   Male
#> 2 2008 340606000122 3.40606e+11     9                         Hispanic   Male
#> 3 2008 340606000122 3.40606e+11     9 American Indian or Alaska Native Female
#> 4 2008 340606000122 3.40606e+11     9 American Indian or Alaska Native   Male
#> 5 2008 340606000122 3.40606e+11     9                            Black Female
#> 6 2008 340606000122 3.40606e+11     9                            Asian Female
#>   enrollment       fips   leaid
#> 1         41 New Jersey 3406060
#> 2         39 New Jersey 3406060
#> 3          0 New Jersey 3406060
#> 4          0 New Jersey 3406060
#> 5         46 New Jersey 3406060
#> 6         32 New Jersey 3406060

Available Endpoints

Level Source Topic By Main Filters Years Available
college-university fsa 90-10-revenue-percentages NA year 2014–2017
college-university fsa campus-based-volume NA year 2001–2017
college-university fsa financial-responsibility NA year 2006–2016
college-university fsa grants NA year 1999–2018
college-university fsa loans NA year 1999–2018
college-university ipeds academic-libraries NA year 2013–2019
college-university ipeds academic-year-room-board-other NA year 1999–2020
college-university ipeds academic-year-tuition-prof-program NA year 1986–2008, 2010–2020
college-university ipeds academic-year-tuition NA year 1986–2020
college-university ipeds admissions-enrollment NA year 2001–2019
college-university ipeds admissions-requirements NA year 1990–2019
college-university ipeds completers NA year 2011–2019
college-university ipeds completions-cip-2 NA year 1991–2019
college-university ipeds completions-cip-6 NA year 1983–2019
college-university ipeds directory NA year 1980, 1984–2020
college-university ipeds enrollment-full-time-equivalent NA year, level_of_study 1997–2018
college-university ipeds enrollment-headcount NA year, level_of_study 1996–2018
college-university ipeds fall-enrollment residence year 1986, 1988, 1992, 1994, 1996, 1998, 2000–2020
college-university ipeds fall-enrollment age, sex year, level_of_study 1991, 1993, 1995, 1997, 1999–2020
college-university ipeds fall-enrollment race, sex year, level_of_study 1986–2020
college-university ipeds fall-retention NA year 2003–2020
college-university ipeds finance NA year 1979, 1983–2017
college-university ipeds grad-rates-200pct NA year 2007–2017
college-university ipeds grad-rates-pell NA year 2015–2017
college-university ipeds grad-rates NA year 1996–2017
college-university ipeds institutional-characteristics NA year 1980, 1984–2020
college-university ipeds outcome-measures NA year 2015–2018
college-university ipeds program-year-room-board-other NA year 1999–2020
college-university ipeds program-year-tuition-cip NA year 1987–2020
college-university ipeds salaries-instructional-staff NA year 1980, 1984, 1985, 1987, 1989–1999, 2001–2018
college-university ipeds salaries-noninstructional-staff NA year 2012–2018
college-university ipeds sfa-all-undergraduates NA year 2007–2017
college-university ipeds sfa-by-living-arrangement NA year 2008–2017
college-university ipeds sfa-by-tuition-type NA year 1999–2017
college-university ipeds sfa-ftft NA year 1999–2017
college-university ipeds sfa-grants-and-net-price NA year 2008–2017
college-university ipeds student-faculty-ratio NA year 2009–2020
college-university nacubo endowments NA year 2012–2018
college-university nccs 990-forms NA year 1993–2016
college-university nhgis census-1990 NA year 1980, 1984–2017
college-university nhgis census-2000 NA year 1980, 1984–2017
college-university nhgis census-2010 NA year 1980, 1984–2017
college-university scorecard default NA year 1996–2017
college-university scorecard earnings NA year 2003–2014
college-university scorecard institutional-characteristics NA year 1996–2017
college-university scorecard repayment NA year 2007–2016
college-university scorecard student-characteristics aid-applicants year 1997–2016
college-university scorecard student-characteristics home-neighborhood year 1997–2016
school-districts ccd directory NA year 1986–2020
school-districts ccd enrollment NA year, grade 1986–2020
school-districts ccd enrollment race year, grade 1986–2020
school-districts ccd enrollment race, sex year, grade 1986–2020
school-districts ccd enrollment sex year, grade 1986–2020
school-districts ccd finance NA year 1991, 1994–2018
school-districts edfacts assessments NA year, grade_edfacts 2009–2018
school-districts edfacts assessments race year, grade_edfacts 2009–2018
school-districts edfacts assessments sex year, grade_edfacts 2009–2018
school-districts edfacts assessments special-populations year, grade_edfacts 2009–2018
school-districts edfacts grad-rates NA year 2010–2018
school-districts saipe NA NA year 1995, 1997, 1999–2018
schools ccd directory NA year 1986–2020
schools ccd enrollment NA year, grade 1986–2020
schools ccd enrollment race year, grade 1986–2020
schools ccd enrollment race, sex year, grade 1986–2020
schools ccd enrollment sex year, grade 1986–2020
schools crdc algebra1 disability, sex year 2011, 2013, 2015, 2017
schools crdc algebra1 lep, sex year 2011, 2013, 2015, 2017
schools crdc algebra1 race, sex year 2011, 2013, 2015, 2017
schools crdc ap-exams disability, sex year 2011, 2013, 2015, 2017
schools crdc ap-exams lep, sex year 2011, 2013, 2015, 2017
schools crdc ap-exams race, sex year 2011, 2013, 2015, 2017
schools crdc ap-ib-enrollment disability, sex year 2011, 2013, 2015, 2017
schools crdc ap-ib-enrollment lep, sex year 2011, 2013, 2015, 2017
schools crdc ap-ib-enrollment race, sex year 2011, 2013, 2015, 2017
schools crdc chronic-absenteeism disability, sex year 2013, 2015
schools crdc chronic-absenteeism lep, sex year 2013, 2015
schools crdc chronic-absenteeism race, sex year 2013, 2015
schools crdc credit-recovery NA year 2015, 2017
schools crdc directory NA year 2011, 2013, 2015, 2017
schools crdc discipline-instances NA year 2015, 2017
schools crdc discipline disability, lep, sex year 2011, 2013, 2015, 2017
schools crdc discipline disability, race, sex year 2011, 2013, 2015, 2017
schools crdc discipline disability, sex year 2011, 2013, 2015, 2017
schools crdc dual-enrollment disability, sex year 2013, 2015, 2017
schools crdc dual-enrollment lep, sex year 2013, 2015, 2017
schools crdc dual-enrollment race, sex year 2013, 2015, 2017
schools crdc enrollment disability, sex year 2011, 2013, 2015, 2017
schools crdc enrollment lep, sex year 2011, 2013, 2015, 2017
schools crdc enrollment race, sex year 2011, 2013, 2015, 2017
schools crdc harassment-or-bullying allegations year 2013, 2015, 2017
schools crdc harassment-or-bullying disability, sex year 2011, 2013, 2015, 2017
schools crdc harassment-or-bullying lep, sex year 2011, 2013, 2015, 2017
schools crdc harassment-or-bullying race, sex year 2011, 2013, 2015, 2017
schools crdc math-and-science disability, sex year 2011, 2013, 2015, 2017
schools crdc math-and-science lep, sex year 2011, 2013, 2015, 2017
schools crdc math-and-science race, sex year 2011, 2013, 2015, 2017
schools crdc offenses NA year 2015, 2017
schools crdc offerings NA year 2011, 2013, 2015, 2017
schools crdc restraint-and-seclusion disability, lep, sex year 2011, 2013, 2015, 2017
schools crdc restraint-and-seclusion disability, race, sex year 2011, 2013, 2015, 2017
schools crdc restraint-and-seclusion disability, sex year 2011, 2013, 2015, 2017
schools crdc restraint-and-seclusion instances year 2013, 2015, 2017
schools crdc retention disability, sex year, grade 2011, 2013, 2015, 2017
schools crdc retention lep, sex year, grade 2011, 2013, 2015, 2017
schools crdc retention race, sex year, grade 2011, 2013, 2015, 2017
schools crdc sat-act-participation disability, sex year 2011, 2013, 2015, 2017
schools crdc sat-act-participation lep, sex year 2011, 2013, 2015, 2017
schools crdc sat-act-participation race, sex year 2011, 2013, 2015, 2017
schools crdc school-finance NA year 2011, 2013, 2015, 2017
schools crdc suspensions-days disability, sex year 2015, 2017
schools crdc suspensions-days lep, sex year 2015, 2017
schools crdc suspensions-days race, sex year 2015, 2017
schools crdc teachers-staff NA year 2011, 2013, 2015, 2017
schools edfacts assessments NA year, grade_edfacts 2009–2018
schools edfacts assessments race year, grade_edfacts 2009–2018
schools edfacts assessments sex year, grade_edfacts 2009–2018
schools edfacts assessments special-populations year, grade_edfacts 2009–2018
schools edfacts grad-rates NA year 2010–2018
schools meps NA NA year 2013–2018
schools nhgis census-1990 NA year 1986–2020
schools nhgis census-2000 NA year 1986–2020
schools nhgis census-2010 NA year 1986–2020

Main Filters

Due to the way the API is set-up, the variables listed within ‘main filters’ are often the fastest way to subset an API call.

In addition to year, the other main filters for certain endpoints accept the following values:

Grade

Filter Argument Grade
grade = 'grade-pk' Pre-K
grade = 'grade-k' Kindergarten
grade = 'grade-1' Grade 1
grade = 'grade-2' Grade 2
grade = 'grade-3' Grade 3
grade = 'grade-4' Grade 4
grade = 'grade-5' Grade 5
grade = 'grade-6' Grade 6
grade = 'grade-7' Grade 7
grade = 'grade-8' Grade 8
grade = 'grade-9' Grade 9
grade = 'grade-10' Grade 10
grade = 'grade-11' Grade 11
grade = 'grade-12' Grade 12
grade = 'grade-13' Grade 13
grade = 'grade-14' Adult Education
grade = 'grade-15' Ungraded
grade = 'grade-16' K-12
grade = 'grade-20' Grades 7 and 8
grade = 'grade-21' Grade 9 and 10
grade = 'grade-22' Grades 11 and 12
grade = 'grade-99' Total

Level of Study

Filter Argument Level of Study
level_of_study = 'undergraduate' Undergraduate
level_of_study = 'graduate' Graduate
level_of_study = 'first-professional' First Professional
level_of_study = 'post-baccalaureate' Post-baccalaureate
level_of_study = '99' Total

Examples

Let’s build up some examples, from the following set of endpoints.

Level Source Topic By Main Filters Years Available
schools ccd enrollment NA year, grade 1986–2020
schools ccd enrollment race year, grade 1986–2020
schools ccd enrollment race, sex year, grade 1986–2020
schools ccd enrollment sex year, grade 1986–2020
schools crdc enrollment disability, sex year 2011, 2013, 2015, 2017
schools crdc enrollment lep, sex year 2011, 2013, 2015, 2017
schools crdc enrollment race, sex year 2011, 2013, 2015, 2017
NA NA NA NULL NULL NA

The following will return a data.frame across all years and grades:

library(educationdata)
df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment')

Note that this endpoint is also callable by certain variables:

These variables can be added to the by argument:

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'))

You may also filter the results of an API call. In this case year and grade will provide the most time-efficient subsets, and can be vectorized:

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8))

Additional variables can also be passed to filters to subset further:

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8,
                                        ncessch = '010000200277'))

Finally, the add_labels flag will map variables to a factor from their labels in the API.

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8,
                                        ncessch = '010000200277'),
                         add_labels = TRUE)

Finally, the csv flag can be set to download the full .csv data frame. In general, the csv functionality is much faster when retrieving the full data frame (or a large subset) and much slower when retrieving a small subset of a data frame (especially ones with a lot of filters added). In this example, the full csv for 2008 must be downloaded and then subset to the 96 observations.

df <- get_education_data(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         by = list('race', 'sex'),
                         filters = list(year = 1988:1990,
                                        grade = 6:8,
                                        ncessch = '010000200277'),
                         add_labels = TRUE,
                         csv = TRUE)