New Zealand School data

Some people have the naive idea that the national standards data and the decile data will not be put together. Given that at some point in time all data will be available, there is no technical reason to not have merged the data, which I have done in this post for an early release.

Stuff made data for the National Standards for a bit over 1,000 schools available here as an Excel File. I cleaned it up a bit to read it in R, saved it to csv format, of which you can get a copy here. Decile data is available from the Ministry of Education, which I transformed to csv and uploaded here. We can now merge the datasets using R:

options(stringsAsFactors = FALSE)
deciles <- read.csv('DecileChanges20072008.csv')
standards <- read.csv('SchoolReport_data_distributable.csv')
standards <- merge(standards, deciles, by = '')

But there are some name discrepancies in the merge, which we can check using:

standards$name.discrepancy <- factor(ifelse(standards$ != standards$, 1, 0))
tocheck <- subset(standards[, c(1, 2, 64, 77)], name.discrepancy == 1)
#                                   name.discrepancy
#9          82              Aidanfield Christian School           Canterbury Christian College                1
#33        266                   Waipa Christian School                Bethel Christian School                1
#73        429                        Excellere College                 Kamo Christian College                1
#143      1245 Christ the King Catholic School (Owairak    Christ The King School (Mt Roskill)                1
#150      1256            Cornwall Park District School                   Cornwall Park School                1
#211      1360       Marist Catholic School (Herne Bay)              Marist School (Herne Bay)                1
#270      1500     St Leo's Catholic School (Devonport)             St Leos School (Devonport)                1
#297      1588 St Francis Xavier Catholic School (Whang   St Francis Xavier School (Whangarei)                1
#305      1650                  Drummond Primary School Central Southland Rural Primary School                1
#314      1678                   Te Kura o Waikaremoana                 Te Kura O Waikaremoana                1
#335      1744              Horahora School (Cambridge)                        Horahora School                1
#389      1991                  Tauranga Primary School                        Tauranga School                1
#511      2424                Parkland School (P North)                        Parkland School                1
#587      2663                 Reignier Catholic School             Reignier School (Taradale)                1
#627      2841        Fergusson Intermediate (Trentham)                 Fergusson Intermediate                1
#771      3213               Parklands School (Motueka)                       Parklands School                1
#956      3801               Pine Hill School (Dunedin)                       Pine Hill School                1

There are four schools that have very different names in the dataset: 82, 266, 429 and 1650. They may just have changed names or something could be wrong. At this point one may choose to drop them from any analysis; up to you. We then just rename the second variable to and save the file to csv (available here standards.csv for your enjoyment).

names(standards)[2] <- ''
write.csv(standards, 'standards.csv', 
          row.names = FALSE, quote = FALSE)

Now we should try to merge any other economic and social information available in Statistics New Zealand.

Note 2012-09-26: I have updated data and added a few plots in this post.

Leave a Reply