Skip to contents

Using rscorecard to download data from the College Scorecard API requires two steps:

  1. Setting your API key
  2. Making a request

1. Setting your API key

If you don’t already have one, reqest your (free) API key from https://api.data.gov/signup. It should only take a few moments to register and receive your key.

Once you’ve gotten your key, you can store it usig sc_key(). In the absence of a key value argument, sc_get() will search your R environment for DATAGOV_API_KEY. It will complete the data request if found. sc_key() command will store your key in DATAGOV_API_KEY, which will persist until the R session is closed.

# NB: You must use a real key, of course... 
sc_key("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")

If you want a more permanent solution, you can add the following line (with your actual key, of course) to your .Renviron file. See this appendix for more information.

# NB: You must use a real key, of course... 
DATAGOV_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

2. Simple request

Each request requires the following four commands piped together using |>:

  1. sc_init()
  2. sc_filter()
  3. sc_select()
  4. sc_get()

The command chain must begin with sc_init() and end with sc_get. All other commands can come in any order.

The request belower should return a tibble with the name, IPEDS ID, state, and degree-seeking undergrad enrollment of all primarily Baccalaureate colleges in the Mid East region located in rural areas:

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds) |> 
    sc_get()
#> Request complete!
df
#> # A tibble: 4 × 5
#>   unitid instnm                                  stabbr  ugds year  
#>    <int> <chr>                                   <chr>  <int> <chr> 
#> 1 191676 Houghton University                     NY       745 latest
#> 2 194392 Paul Smiths College of Arts and Science NY       668 latest
#> 3 196051 SUNY Morrisville                        NY      1776 latest
#> 4 197230 Wells College                           NY       349 latest

Because we didn’t include a specific year, the latest data are returned. We could have specifically asked for the latest data using sc_year("latest"):

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds) |>
    sc_year("latest") |> 
    sc_get()
#> Request complete!
df
#> # A tibble: 4 × 5
#>   unitid instnm                                  stabbr  ugds year  
#>    <int> <chr>                                   <chr>  <int> <chr> 
#> 1 191676 Houghton University                     NY       745 latest
#> 2 194392 Paul Smiths College of Arts and Science NY       668 latest
#> 3 196051 SUNY Morrisville                        NY      1776 latest
#> 4 197230 Wells College                           NY       349 latest

For a prior year’s data, change the value in sc_year():

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds) |>
    sc_year(2005) |> 
    sc_get()
#> Request complete!
df
#> # A tibble: 4 × 5
#>   unitid instnm                                  stabbr  ugds  year
#>    <int> <chr>                                   <chr>  <int> <dbl>
#> 1 191676 Houghton University                     NY      1368  2005
#> 2 194392 Paul Smiths College of Arts and Science NY       841  2005
#> 3 196051 SUNY Morrisville                        NY      2964  2005
#> 4 197230 Wells College                           NY       407  2005

Field of study data

In the fall of 2019, the College Scorecard released field of study-level data elements (4 digit CIP code level). These data elements can be requested alongside institution-level data:

df <- sc_init() |> 
    sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |> 
    sc_select(unitid, instnm, stabbr, ugds, cipcode, cipdesc, debt_mdn) |>
    sc_year("latest") |> 
    sc_get()
#> Request complete!
## filter to show only those with non-NA values for median debt
df |> dplyr::filter(!is.na(debt_mdn))
#> # A tibble: 179 × 8
#>    unitid instnm              stabbr  ugds cipcode cipdesc        debt_mdn year 
#>     <int> <chr>               <chr>  <int> <chr>   <chr>             <int> <chr>
#>  1 191676 Houghton University NY       745 0105    Agricultural …    20313 late…
#>  2 191676 Houghton University NY       745 0501    Area Studies.     20313 late…
#>  3 191676 Houghton University NY       745 0901    Communication…    20313 late…
#>  4 191676 Houghton University NY       745 1101    Computer and …    20313 late…
#>  5 191676 Houghton University NY       745 1104    Information S…    20313 late…
#>  6 191676 Houghton University NY       745 1312    Teacher Educa…    20313 late…
#>  7 191676 Houghton University NY       745 1313    Teacher Educa…    20313 late…
#>  8 191676 Houghton University NY       745 1313    Teacher Educa…    20313 late…
#>  9 191676 Houghton University NY       745 1314    Teaching Engl…    20313 late…
#> 10 191676 Houghton University NY       745 1412    Engineering P…    20313 late…
#> # ℹ 169 more rows

Important note:

The mapping scheme of data across years isn’t consistent across data elements. From the technical documentation for institution-level data:

The data contain diverse measures of institutional performance constructed both with an eye towards the type of information that would be most useful to prospective students, as well as towards how the measures might promote accountability for institutions. The measures require different definitions of cohorts. Users of the data should be aware of this, particularly when constructing analyses of the relationship between different measures. Moreover, reporting inaccuracies in some data elements used for cohort definitions are also important. (p. 37)

That is, while the reporting year (e.g., sc_year(2016)) may be the same, the measurement year may not directly align. The same holds true when trying to align institution-level data with field of study-level data (see the technical documentation for field of study-level data for more information).

The upshot is that rscorecard will return data based on what the API call returns, but the user should take care to ensure that returned data elements align with expectations and project needs.

More information and examples

For more information about each command, see Commands.

For more examples, see More examples.