A generated data set containing data on 1200 imaginary individual K-12 students in Wisconsin. They are nested within 6 schools in 3 districts. In adapting this from the source, Sam switched the school and district variables (there had been multiple districts per school) and made other minor changes, including dropping columns that I didn't understand or that didn't seem relevant (e.g., variables like "luck" that were used to calculate the reading and math scores).
Format
A data frame with 2700 rows and 26 variables:
- student_id
numeric: student's unique ID #
- grade
numeric: grade level
- district
numeric: district code
- school
numeric: school code
- white
numeric: is the student white?
- black
numeric: is the student black?
- hisp
numeric: is the student Hispanic?
- indian
numeric: is the student Native-American Indian?
- asian
numeric: is the student Asian?
- econ
numeric: is the student economically-disadvantaged?
- female
numeric: is the student female?
- ell
numeric: is the student an English Language Learner?
- disab
numeric: does the student have a learning disability?
- year
numeric: school year
- attday
numeric: days attended
- readSS
numeric: student's reading standardized test score
- mathSS
numeric: student's math standardized test score
- proflvl
factor: student's proficiency level
- race
factor: student's single-category race
...
Source
https://github.com/jknowles/r_tutorial_ed/, posted under a Creative Commons license. The script used to generate the data set is here, although not very well documented: https://github.com/jknowles/r_tutorial_ed/blob/master/data/simulate_data.R