Using rscorecard to download data from the College Scorecard API requires two steps:
- Setting your API key
- Making a request
1. Setting your API key
If you don’t already have one, reqest your (free) API key from https://api.data.gov/signup. It should only take a few moments to register and receive your key.
Once you’ve gotten your key, you can store it usig
sc_key()
. In the absence of a key value argument,
sc_get()
will search your R environment for
DATAGOV_API_KEY
. It will complete the data request if
found. sc_key()
command will store your key in
DATAGOV_API_KEY
, which will persist until the R session is
closed.
# NB: You must use a real key, of course...
sc_key("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
If you want a more permanent solution, you can add the following line
(with your actual key, of course) to your .Renviron
file.
See this appendix
for more information.
# NB: You must use a real key, of course...
DATAGOV_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2. Simple request
Each request requires the following four commands piped together
using |>
:
The command chain must begin with sc_init()
and end with
sc_get
. All other commands can come in any order.
The request belower should return a tibble with the name, IPEDS ID, state, and degree-seeking undergrad enrollment of all primarily Baccalaureate colleges in the Mid East region located in rural areas:
df <- sc_init() |>
sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |>
sc_select(unitid, instnm, stabbr, ugds) |>
sc_get()
#> Request complete!
df
#> # A tibble: 4 × 5
#> unitid instnm stabbr ugds year
#> <int> <chr> <chr> <int> <chr>
#> 1 191676 Houghton University NY 745 latest
#> 2 194392 Paul Smiths College of Arts and Science NY 668 latest
#> 3 196051 SUNY Morrisville NY 1776 latest
#> 4 197230 Wells College NY 349 latest
Because we didn’t include a specific year, the latest
data are returned. We could have specifically asked for the latest data
using sc_year("latest")
:
df <- sc_init() |>
sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |>
sc_select(unitid, instnm, stabbr, ugds) |>
sc_year("latest") |>
sc_get()
#> Request complete!
df
#> # A tibble: 4 × 5
#> unitid instnm stabbr ugds year
#> <int> <chr> <chr> <int> <chr>
#> 1 191676 Houghton University NY 745 latest
#> 2 194392 Paul Smiths College of Arts and Science NY 668 latest
#> 3 196051 SUNY Morrisville NY 1776 latest
#> 4 197230 Wells College NY 349 latest
For a prior year’s data, change the value in
sc_year()
:
df <- sc_init() |>
sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |>
sc_select(unitid, instnm, stabbr, ugds) |>
sc_year(2005) |>
sc_get()
#> Request complete!
df
#> # A tibble: 4 × 5
#> unitid instnm stabbr ugds year
#> <int> <chr> <chr> <int> <dbl>
#> 1 191676 Houghton University NY 1368 2005
#> 2 194392 Paul Smiths College of Arts and Science NY 841 2005
#> 3 196051 SUNY Morrisville NY 2964 2005
#> 4 197230 Wells College NY 407 2005
Field of study data
In the fall of 2019, the College Scorecard released field of study-level data elements (4 digit CIP code level). These data elements can be requested alongside institution-level data:
df <- sc_init() |>
sc_filter(region == 2, ccbasic == c(21,22,23), locale == 41:43) |>
sc_select(unitid, instnm, stabbr, ugds, cipcode, cipdesc, debt_mdn) |>
sc_year("latest") |>
sc_get()
#> Request complete!
## filter to show only those with non-NA values for median debt
df |> dplyr::filter(!is.na(debt_mdn))
#> # A tibble: 179 × 8
#> unitid instnm stabbr ugds cipcode cipdesc debt_mdn year
#> <int> <chr> <chr> <int> <chr> <chr> <int> <chr>
#> 1 191676 Houghton University NY 745 0105 Agricultural … 20313 late…
#> 2 191676 Houghton University NY 745 0501 Area Studies. 20313 late…
#> 3 191676 Houghton University NY 745 0901 Communication… 20313 late…
#> 4 191676 Houghton University NY 745 1101 Computer and … 20313 late…
#> 5 191676 Houghton University NY 745 1104 Information S… 20313 late…
#> 6 191676 Houghton University NY 745 1312 Teacher Educa… 20313 late…
#> 7 191676 Houghton University NY 745 1313 Teacher Educa… 20313 late…
#> 8 191676 Houghton University NY 745 1313 Teacher Educa… 20313 late…
#> 9 191676 Houghton University NY 745 1314 Teaching Engl… 20313 late…
#> 10 191676 Houghton University NY 745 1412 Engineering P… 20313 late…
#> # ℹ 169 more rows
Important note:
The mapping scheme of data across years isn’t consistent across data elements. From the technical documentation for institution-level data:
The data contain diverse measures of institutional performance constructed both with an eye towards the type of information that would be most useful to prospective students, as well as towards how the measures might promote accountability for institutions. The measures require different definitions of cohorts. Users of the data should be aware of this, particularly when constructing analyses of the relationship between different measures. Moreover, reporting inaccuracies in some data elements used for cohort definitions are also important. (p. 37)
That is, while the reporting year (e.g.,
sc_year(2016)
) may be the same, the measurement
year may not directly align. The same holds true when trying to align
institution-level data with field of study-level data (see the technical
documentation for field of study-level data for more
information).
The upshot is that rscorecard will return data based on what the API call returns, but the user should take care to ensure that returned data elements align with expectations and project needs.
More information and examples
For more information about each command, see Commands.
For more examples, see More examples.