Evolving notes, images and sounds by Luis Apiolaza

Category: policy (Page 2 of 3)

Scraping pages and downloading files using R

I have written a few posts discussing descriptive analyses of evaluation of National Standards for New Zealand primary schools.The data for roughly half of the schools was made available by the media, but the full version of the dataset is provided in a single-school basis. In the page for a given school there may be link to a PDF file with the information on standards sent by the school to the Ministry of Education.
Continue reading

A word of caution: the sample may have an effect

This week I’ve tried to i-stay mostly in the descriptive statistics realm and ii-surround any simple(istic) models with caveats and pointing that they are very preliminary. We are working with a sample of ~1,000 schools that did reply to Fairfax’s request, while there is a number of schools that either ignored the request or told Fairfax to go and F themselves. Why am I saying this? If one goes and gets a simple table of the number of schools by type and decile there is something quite interesting: we have different percentages for different types of schools represented in the sample and the possibility of bias on the reporting to Fairfax, due to potential low performance (references to datasets correspond to the ones I used in this post):

summary(standards$school.type)
#         Composite (Year 1-10)          Composite (Year 1-15)        Contributing (Year 1-6)
#                             1                             29                            403
#       Full Primary (Year 1-8)    Intermediate (year 7 and 8) Restricted Composite (Yr 7-10)
#                           458                             62                              1
#         Secondary (Year 7-15)
#                            56

Now let’s compare this number with the school directory:

summary(factor(directory$school.type))
#         Composite (Year 1-10)          Composite (Year 1-15)        Contributing (Year 1-6)
#                             4                            149                            775
#         Correspondence School        Full Primary (Year 1-8)    Intermediate (year 7 and 8)
#                             1                           1101                            122
#Restricted Composite (Yr 7-10)         Secondary (Year 11-15)          Secondary (Year 7-10)
#                             4                              2                              2
#         Secondary (Year 7-15)          Secondary (Year 9-15)                 Special School
#                           100                            238                             39
#              Teen Parent Unit
#                            20

As a proportion we are missing more secondary schools. We can use the following code to get an idea of how similar are school types, because the small number of different composite schools is a pain. If

# Performance of Contributing (Year 1-6) and
# Full Primary (Year 1-8) looks pretty much the
# same. Composites could be safely merged
qplot(school.type, reading.OK, 
      data = standards, geom = 'jitter')

qplot(school.type, writing.OK, 
      data = standards, geom = 'jitter')

qplot(school.type, math.OK, 
      data = standards, geom = 'jitter')

# Merging school types and plotting them colored
# by decile
standards$school.type.4 <- standards$school.type
levels(standards$school.type.4) <- c('Composite', 'Composite', 'Primary',
'Primary', 'Intermediate',
'Composite', 'Secondary')

qplot(school.type.4, reading.OK, colour = decile,
      data = standards, geom = 'jitter')
Representation of different schools types and deciles is uneven.
Different participations in the sample for school types. This type is performance in mathematics.


I’m using jittering rather than box and whisker plots to i- depict all the schools and ii- get an idea of the different participation of school types in the dataset. Sigh. Another caveat to add in the discussion.

P.S. 2012-09-27 16:15. Originally I mentioned in this post the lack of secondary schools (Year 9-15) but, well, they are not supposed to be here, because National Standards apply to years 1 to 8 (Thanks to Michael MacAskill for pointing out my error.)

Updating and expanding New Zealand school data

In two previous posts I put together a data set and presented some exploratory data analysis on school achievement for national standards. After those posts I exchanged emails with a few people about the sources of data and Jeremy Greenbrook-Held pointed out Education Counts as a good source of additional variables, including number of teachers per school and proportions for different ethnic groups.

The code below call three files: Directory-Schools-Current.csv, teacher-numbers.csv and SchoolReport_data_distributable.csv, which you can download from the links.
Continue reading

New Zealand school performance: beyond the headlines

I like the idea of having data on school performance, not to directly rank schools—hard, to say the least, at this stage—but because we can start having a look at the factors influencing test results. I imagine the opportunity in the not so distant future to run hierarchical models combining Ministry of Education data with Census/Statistics New Zealand data.

At the same time, there is the temptation to come up with very simple analyses that would make appealing newspaper headlines. I’ll read the data and create a headline and then I’ll move to something that, personally, seems more important. In my previous post I combined the national standards for around 1,000 schools with decile information to create the standards.csv file.
Continue reading

« Older posts Newer posts »

© 2024 Palimpsest

Theme by Anders NorenUp ↑