Irish Census 2016 & Privacy

I’ve been looking at the 2016 census results with the last few years and there is a great deal of suppression of values for relevant Small Areas. The CSO suppress results or aggregate them depending on the number of people living in a Small Area. If the population is too small and could lead to individuals being identified, the data is suppressed. They are legally required to undertake this exercise under s33 of the Statistics Act, 1993.

I’ve been looking at a selection of variables and after reading this piece on the traveller accommodation crisis by RTÉ I decided to map the percentage travellers per Small Area. I have all this data in a PostGIS database but I’ll quickly run through how to do it without having to use PostGIS. I downloaded the Small Areas shapefile (generalised to 50m) and the CSV of all of the Small Area values from the CSO here. Instead of having to use a spreadsheet or QGIS to manually delete the 802 fields I didn’t need I used the pandas library, the python code below that took 0.3 seconds to run. It opens the relevant CSV and only selects the columns that I need and then strips the first 7 characters from the ‘GEOGID’ string as these are not needed for the join I’ll do in QGIS later.

import pandas as pd, time

start = time.time()
df = pd.read_csv('SAPS2016_SA2017.csv', usecols=['GUID', 'GEOGID', 'GEOGDESC', 'T1_1AGETT','T2_2WIT'])

df.GEOGID.apply(str)

df['GEOGID'] = df['GEOGID'].str[7:]

df.to_csv('SAPS2016_SA2017_New_GEOGID.csv')
end = time.time()
print(end - start)

I then opened the shapefile in QGIS, imported the CSV and joined them. This was  subsequently exported to a GeoPackage and I used GDAL’s ogr2ogr library to convert it to a GeoJSON in order to upload it to Carto.

ogr2ogr command to convert to GeoJSON
ogr2ogr command to convert to GeoJSON

Below is the resultant map with some formatting of headings undertaken to make it more legible. You can make it full screen using the button on the left. What struck me about this was how with a small amount of work it was very easy to visualise accurately the resident locations of one of the most vulnerable groups of society. Obviously this information is useful to local governments, state government agencies, NGOs and so forth but I question whether this data should been available to the general public regardless of it being aggregated to the Small Area geography.

 

Ireland-Census 2016

 

I’ve had to work recently on an older Linux based machine and as such most of my usual routes to edit and display data aren’t available to me. I needed to preform a join between the Small Areas geometry and the Small Areas table, both of which are available from the CSO’s website here. Even though the csv only has ~18,000 rows, the field calculator in QGIS 3.2 Bonn couldn’t cope and kept crashing.

Enter python to the rescue, I downloaded the Geany python IDE which I find to be nice and lightweight for older computers. I needed to remove the first 7 characters from the ‘GEOGID’ field. All of the values in this column started with ‘SA2017_’. The following is a quick few lines in python 2 to remove the first 7 characters using python’s built in csv module. For reference, on this very average laptop from 2011 it took 3 seconds to run.

import csv

with open('SAPS2016_SA2017.csv', 'rb') as input_file, open('output.csv', 'w') as output_file:
    reader, writer = csv.reader(input_file), csv.writer(output_file)
    first_row = reader.next()
    first_row.append("Strp_GeogID")
    writer.writerow(first_row)
    for row in reader:
        item_to_change = row[1]
        modified_item = item_to_change[7:]
        row.append(modified_item)
        writer.writerow(row)

Highest Number of Persons Born in the UK, Living in Ireland-Census 2016

I was reading an article online the other day about Brexit and I got thinking about all the Irish people (myself included, at least for the next two weeks) that live in the UK. I’ve never heard much said about the people from the UK that live in the Republic.

With no surprise, the border counties contain the highest percentage of persons living in them who were born in the UK. So, the question now is, if we exclude the border counties (Donegal, Leitrim, Cavan, Monaghan and Louth) where in Ireland has the highest percentage of persons living there who were born in the UK?

I used the small area spatial unit for this analysis. The answer is, Templenoe, County Kerry. Templenoe is 6km to the west of Kenmare. Obviously, I can’t say for sure but I would be tempted to guess that part of the reason for this is the presence of the Ring of Kerry Golf Course. Twenty of the sixty-nine people who live there were born in the UK.

UK Born - Census 2016

Percentage Persons Born in the UK

Commute to Work-Ireland

I was reading ‘Project Ireland 2040-National Planning Framework‘ and it got me thinking about what percentage of people in each ED commute for an hour or more to work. This is exactly the type of unsustainable living that needs to be avoided by promoting as much infill development as possible in existing urban centres. Below is a map I created that shows the commuting times that people face, obviously, it is important to bear in mind that the stark red colour still only equates to a maximum of 34% of people commuting for an hour or more. This is still just over one third, which is significant. Although not designed with the purpose in mind it gives a good indication of the functional urban area of the major cities (especially Dublin). Commuting Time Ireland-Census 2016

 

 

Ireland, A Country in Motion: Methodology

I promised late last year that I’d do a blog post explaining how I created the ‘Ireland in Motion’ commuting map. Well, this is that post!

The first thing to say is, that until the ’16 census results came out it wasn’t possible (as a member of the public) to create this type of map as the Central Statistics Office (CSO) just didn’t release the data. Before now (and is still the case) in order to access the full Place of Work, School or College data (POWSCAR) you must attend a training program and sign up to be an ‘Officer of Statistics’. The deciding factor for myself was that you have to be resident in Ireland, which I am currently not. You also have to be a ‘bona fide’ researcher.

So, imagine my delight when I found out that they were releasing an aggregated anonymised dataset for the entire country! The data is aggregated by electoral divisions (ED) and county level. The POWSCAR website where the data can be downloaded is located here. There are two important caveats when talking about this data, EDs where fewer than 10 persons commuted have been excluded and records where no work, school or college were able to be geocoded have been removed. Below is an extract from the CSO’s website showing the categories available.

RESIDENCE_ED_GUID Geographic Unique Identification (GUID) Code for origin Electoral Division (ED)
RESIDENCE_CSOED CSO ED code for origin ED
RESIDENCE_CSOED_LABEL Name of origin CSO ED
RESIDENCE_COUNTY County code for origin county
RESIDENCE_COUNTY_LABEL Name of origin county
POWSC_ED_GUID GUID for destination ED
POWSC_CSOED CSO ED code for destination ED
POWSC_CSOED_LABEL Name of destination ED
POWSC_COUNTY County code for destination county
POWSC_COUNTY_LABEL Name of destination county
COUNT Number of persons commuting

The downloaded zip file when extracted was a 42MB CSV file. CSVs are an ideal format because they are supported by a huge number of programs. I knew that for the type of map I was going to create that I wanted to use create straight lines between the centroids of each ED. The basic methodology I followed was as follows:

  1.  Download CSV, inspect and clean the data (remove any extraneous records).
  2. Download the ungeneralised shapefile of the EDs (available here).
  3. Use QGIS to create the polygon centroids of each ED.
  4. Use the VLOOKUP and concatenate functions in Excel to create well-known text linestrings for the commutes between each EDs.
  5. Use python to parse the CSV file and multiply each row by the number (count) of commutes between each ED. Each row represents one commute between two EDs.
  6. Load the CSV into QGIS and save as a shapefile.
  7. Use FME to load the shapefile file into a PostGIS database.
  8. Connect database to QGIS and create the map.

Detailed Methodology:

1 CSV:

The original number of commutes in the CSV was 2, 750, 239. The following records were removed:

A. The destination was within the same ED (478,884)

B. There was no fixed place of work (174,628)

C. Work/school from home (114,189)

D. Commute to Northern Ireland (9,336)

E. Commute overseas(!) (3,531) were removed.

 

This left the grand total of 1,969, 671 Commutes to be mapped.

2  Download Ungeneralised shapefile:

The ungeneralised shapefile was downloaded from here.

3 Use QGIS to Create the Polygon Centroids:

The centroids of each polygon was quickly calculated in QGIS.

4 Vlookup and Concatenate in Excel:

The attribute table of centroids was exported to Excel and the Vlookup and Concatenate functions were used to create the linestrings for individual commutes as shown below:

1001,1002,1,-6.92771,52.83721,-6.93919,52.83783,"LINESTRING (-6.92771 52.83721, -6.93919 52.83783)"

5 Python:

A simple python script was used to multiply each line string by the count, so that each individual commute would be represented by a separate line on the map.

6 QGIS-Load CSV:

The CSV file was quickly and easily loaded into QGIS and exported as a shapefile. A better method to do this would probably have been to use FME to load the CSV directly into PostGIS and that’s something I will bear in mind for the future.

7 FME Shapefile:

FME 2017 was used to load the shapefile to PostGIS, and a simple reproject was used to get the data into Irish Transverse Mercator (EPSG 2157).

8 Connect PostGIS to QGIS:

A PostGIS layer can be added in a few clicks from within QGIS. The advantage of using PostGIS is that it will load the 1.96 million lines a lot faster than a shapefile for example, shapefiles have their uses (widely supported for example) but they are an archaic format that will hopefully go the way of the Dodo (this is already happening with the support for Geopackage in QGIS 3 for example).

 

The above is a quick overview of how I carried out the data processing for the map. It’s remarkable that almost all the software used to create the map was open-source. I’d be curious to try and do it totally open-source (replace FME with OGR and Excel with LibreOffice Calc) but as I have a home use licence for FME and Office ’16 I decided to use those.

Irish Census 2016

The CSO released the results and geometry for the small areas (the largest scale available) on the 20th of July 2017 (available here). I downloaded all the data and used FME 2017 (with my shiny new home use licence) to join the geometry and CSVs and write it to PostGIS which was then brought into QGIS, definitely the quickest way to display the data. I used Google Fonts and Adobe Kuler. Below is a quick map I put together of my home county.