Most Remote Building in Ireland

Introduction

I read a few excellent blog posts over the last number of years regarding how to calculate remoteness of buildings from each other.

The two that stand out for me are Topi Tjukanov’s Searching for Isolation with GIS and Simon Wrigley’s Finding the most remote buildings in Britain which offer excellent methodologies on how to find the most remote building in Flanders and Britain respectively . I have always been curious as to the most remote building* in Ireland (as measured from another building).  I have been waiting (almost with bated breath) for Microsoft to release a building footprints dataset for Ireland, which they did earlier last year

*Note – I thought long and hard and for this purpose, I am only interested in the most remote building from another building. For the purposes of this analysis, the building cannot be a ruin and must be weathertight. 

Data Download and Processing

I downloaded the data for Ireland (using the provided scripts). As GeoJSONs aren’t an appropriate format to work with, I used GDAL’s ogrmerge.py command to merge all them into one, reproject to the Irish Transverse Mercator (EPSG: 2157) and output it as one GeoPackage file. I then uploaded the GeoPackage to a PostGIS database (using GDAL) in order to be better able to work with it.

Below is an overview of the building footprints — it is obvious that there are two substantial areas missing around Dublin and Limerick. The dataset contains 2,270,112 buildings and the quality in places is dubious due to how it was created. I found a number of instances of silage pits and rocks etc. being mistaken for buildings. It’s a great effort by Microsoft to compile and, frankly, it’s the only dataset available as Tailte Éireann (the newly created public body with responsibility for mapping, property registration and valuation) has not released anything akin to the UK Ordnance Survey’s OpenMap Local Buildings product.

Building Footprints of Ireland from Microsoft

Building Footprints of Ireland from Microsoft

Missing Data

There are two large areas clearly missing from the above dataset, Greater Dublin and Greater Limerick. To get a sense of the problem, I downloaded population data from WorldPop, digitised the missing areas and used Zonal Statistics to calculate that the missing areas contain approximately 206,335 people in Greater Limerick and 2,117,635 for Greater Dublin (as shown below).

Missing Building Footprints - Greater Limerick

Missing Building Footprints – Greater Limerick

Missing Building Footprints - Greater Dublin

Missing Building Footprints – Greater Dublin

The two polygons shown above contain approximately 46% of the population of the country. Ordinarily, missing buildings over an area of the country that includes nearly half of its population would be a terrible outcome. The inverse held true for this analysis though;  the more populated an area is, the more buildings it contains and the less likely it is to contain the most remote building in the country from another building. That’s not to say that the most remote building in the country isn’t located in these are, just that it’s unlikely.

There are two obvious areas within both these polygons that could contain the most remote building, the islands in the Shannon Estuary and the Wicklow Mountains National Park. To give me some idea that the most remote building wouldn’t be in these areas, I manually (rather quickly) used the MapGenie basemap on Geohive to look for buildings. In the missing Limerick polygon, there were only two candidate areas that might contain the most remote building, one on Feenish and one on Inishtubbrid (neither of which was a greater distance to another building than the building identified at the end of this analysis).  For this area I also looked at the Slieve Bearnagh range which didn’t produce any building greater than 500 metres from another building.

For the Wicklow Mountains, I examined the MapGenie basemap in detail but no immediate contenders were apparent.

Shannon Estuary Islands

Shannon Estuary Islands – Contenders for Most Remote Building

Analysis

I am indebted to Topi and Simon, and it’s Simon’s methodology I’m following here (almost to the letter). In order to reduce the computing power and time required, I did two things; I converted the building footprints in PostGIS to points (using ST_Centroid), and I used MMQGIS to create a 3km hex grid in QGIS that I imported into PostGIS using the below command. I then clipped the hex grid to the outline of Ireland and created spatial indexes for all tables.

 ogr2ogr \
-f "PostgreSQL" PG:"dbname='Ireland_Buildings' host=localhost port=xxxx user= 'postgres'" \
-nlt PROMOTE_TO_MULTI \
"Ireland_Admin_Outline.shp"

I adapted Simon’s very helpful building count SQL query and mapped it the buildings per hexagon (shown below).

Building Count - Ireland

Building Count – Ireland

To reduce processing time, Simon counted the building footprints for each hexagon and only focused on the hexagons that had ≤1 building per grid cell. Because of the nature of the geography of both countries (essentially, the remoteness afforded by the Scottish Highlands), this exact approach wouldn’t work for Ireland. After some careful experimentation, I chose ≤5 as the number to use.

 

Less than or equal to 5 buildings per hexagon

Less than or equal to 5 buildings per hexagon

The results accord well with areas of the country that one would consider remote as they don’t have the best quality agricultural land (such as West Connaught). It is important to state here that I ran several iterations of the above with various numbers of buildings and then followed Simon’s step of merging the resultant grid cells and buffering the cluster by 1km in order to find the most remote buildings. This took a considerable amount of time and I think I deleted about 50 contenders through this process.

Method: K-Nearest Neighbours

Simon did some excellent work creating Voronoi polygons to give a visual indication of the most remote building. If I’m being honest, I skipped this step and went straight to a nearest neighbour analysis. I’m not going to reinvent the wheel here so please have a read of Simon’s excellent methodology for the detailed SQL code.

I ran the nearest neighbour analysis (limiting the results to the top 10) and methodically went through all of them to see whether the first entry (being the most remote building listed was correct) and—🥁—it was.

Bearing in mind that Ireland isn’t a particularly large country with vast remote areas etc. I wasn’t expecting the result to be a huge distance. I had discounted a lot of buildings along the way that were clearly not weathertight (such as the below) and, through that process, I knew the final result wouldn’t be numerically impressive.

False Positive - Cleary not weathertight

False Positive – Cleary not weathertight

Most Remote Building

I think it’s crucial here to reiterate the main caveats to this answer, those are:

  1. The dataset was created using machine learning (limitations being a large number of false positives and potentially missing buildings); and
  2. There are areas around Dublin and Limerick missing that may contain a more remote building.

All that being said, the most remote building is:

Blackhead Lighthouse, Burren Co. Clare - 2,270 metres to Nearest Building
Black Head Lighthouse, Co. Clare

Black Head Lighthouse, Co. Clare

It’s quite interesting that the lighthouse is in the Burren, Co. Clare, which is a karst landscape that is >500km² in area. I’ve inspected it on StreetView and it appears to meet the main criteria (i.e. weathertight). To ensure that there wasn’t any nasty surprises I again examined the MapGenie basemap on Geohive and, thankfully, there were no missing buildings in the vicinity. A very interesting structure to be the most remote (you can read more about it, including a detailed history, on the Commissioners of Irish Lights website).

Black Head Lighthouse by Yair Haklai is licence under a under the Creative Commons Attribution-Share Alike 4.0 International license.

Black Head Lighthouse by Yair Haklai is licence under a under the Creative Commons Attribution-Share Alike 4.0 International license.

CAVEAT

This was the most remote building based on the data available and the methodology used. I will rerun the analysis if and when the entire country becomes available. If you’ve gotten this far, thanks for reading.

North or South of the River

This post is in response to a chat I had a few days ago with someone who was wondering whether there are more people north or south of the river in Perth.

To answer this I’ve taken the ABS’ definition of Greater Perth (being the Greater Capital City Statistical Area [GCCSA]). The overall population of the Perth GCCSA is 2,116,647 (Census ’21). The greater city area is separated by both the Swan and Avon Rivers. I downloaded the polyline of the rivers from https://overpass-turbo.eu/ and then created polygons of both the north and south of the river to quickly select the SA1 areas within each. I then quickly ran a Select by Location to get the figures for the number of people that live north and south of the river.

Perth - SA1 Units with the Greater Capital City Statistical Area

Perth – SA1 Units with the Greater Capital City Statistical Area

Once I had gotten the SA1 units north of the river it was simply a case of repeating the process for the SA1 areas south of the river.

SA1 Areas North of the River

SA1 Areas North of the River

SA1 Areas South of the River

SA1 Areas South of the River

 

The results are as follows:

North of the River –

Area = 1,653.8km² | Population = 962, 257 | Percentage of Perth GCCSA Population: 45% | Population Density = 581.9 Persons/km²

South of the River –

Area = 4,762km² | Population = 1,153,941 | Percentage of Perth GCCSA Population: 55% | Population Density = 242.2 Persons/km²

The above data is out to the tune of 179 people due to what I presume is data suppression in the very sparsely populated SA1 areas by the ABS. An interesting takeaway is that the population density south of the river is only 41.6% that of what’s north of the river.

Put another way, if the southern part of the City had the same density as the north, then there would be 2.784 million people south of the river—really puts it in perspective.

Titanic Wreck Site

A quick map that shows the location of the Titanic wreck site with graticules included:

 

Titanic Wreck Location with Graticules Included

Titanic Wreck Location with Graticules Included

Camino de Santiago

I was talking to someone recently about the Camino de Santiago and whether it plays a role in sustaining the population of the provinces it traverses. It’s a rainy Sunday here so I sat down with a cup of coffee and decided to create a population density map using data from: OSM, WorldPop, Natural Earth and Peter Rukavina’s website ruk.ca.

It proved quite difficult to get GIS data on the route of the Camino through Spain. I first downloaded the Spain.osm.pbf file from Geofabrik and filtered it using Osmium however no matter what combination of tags and names etc. I used, I couldn’t get more than a few thousand disjointed lines that were not fit for purpose. I then found Peter’s website where he had the route as a geojson that I could easily use. I then spent some time in QGIS making the below map. I purposefully didn’t use the minimum and maximum numbers in the legend as I didn’t think it would add much value instead I used low to high.

I also haven’t marked that it’s technically the Camino Francés as it’s the most popular route.

To answer the population question, I buffered the route by 5km and counted the number of people that live within this area. It worked out at 1,316,141 or 2.77% of Spain’s population, I was surprised, I thought the figure would be significantly larger.

Camino de Santiago Population Density

Camino de Santiago Population Density

Over the Horizon

I recently moved to the south-west of Western Australia, close to a town called Busselton. The town is famous for, among other things, the longest timber-piled jetty in the Southern Hemisphere (at 1,841m). The first time I walked the jetty I stood at the end and gazed out wondering if I could follow my gaze in a straight line where would it make landfall?

Busselton Jetty

Busselton Jetty

The End of the Jetty

The End of the Jetty

THEORY

To answer the question accurately there are two concepts worth noting. The first is that of a rhumb line — if I digitised a line (in a projected coordinate system) representing the jetty and extended that angle, it would result in a rhumb line which at the intersect point would be off to the order of a couple of hundred kilometres.

Rhumb Line

Alvesgaspar – CC BY-SA 2.5 (https://creativecommons.org/licenses/by-sa/2.5) via Wikimedia Commons

The shortest distance between any two points on the surface of the earth is the minor arc of a great circle. So if I was standing where the signpost is shown above, that is in the centre of the end of the jetty, I would need to create the minor arc of a great circle line. This would be, as the great circle wiki article states ‘analogous to “straight lines” in Euclidean geometry‘.

Minor Arc of Great Circle

CheCheDaWaff – CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0) via Wikimedia Commons

CREATION OF THE LINE

The process I followed to create the line was as follows:

  1. Create two points at the end of the jetty to represent the angle looking out.
  2. Create a line between the two points.
  3. Calculate the angle of the line and extend it for 1/4 the circumference of the earth at the equator.

The below GIF visualises the process-

Creation of Line

Creation of Line.

If you’d like to carry out the same analysis yourself below is the SQL code. I’d like to thank Darrell Fuhriman for his help.

-- assume two points, calculate the angle, then extend it.
-- point 1 = -33.6307920509579, 115.338797474435  
-- point 2 = -33.6290969607648, 115.338316140276 
 SELECT
ST_Segmentize( -- break it up into segments so it looks better when re-projecting for display
    ST_MakeLine(
        ST_SetSRID(ST_Point(115.338797474435, -33.6307920509579), 4326) -- starting point (ST_MakeLine expects projected coordinates)
        ,ST_Transform(
      	ST_Project( -- find a point a long way away to use  as the second point in the line
                ST_Point(115.338797474435, -33.6307920509579)::geography -- starting point again
                ,10000000 -- 1/4 circumference of earth at the equator (should get us far enough)
                ,ST_Azimuth(ST_Point(115.338797474435, -33.6307920509579)::geography, ST_Point(115.338316140276, -33.6290969607648)::geography) -- angle between my two points
            )::geometry,
        4326)
    )::geography
,20000 -- break it into segments of 20km
);

RESULT

An imaginary line from the end of the jetty travels 802km before intersecting with Tamala, WA. It then travels for another 2,127km before meeting the district of Ayah in the Kebumen Regency in the province of Central Java, Indonesia. Where it meets land is almost equidistant between the towns of Kebasen and Kebumen (the latterhad a population of 131,750 based on the 2020 census [source]).

Busselton Jetty Line

Busselton Jetty Line

Jetty Line on a Globe

Jetty Line on a Globe

If you’re taken by the above and you ever want to travel to where it touches land in Indonesia, the coordinates are: -7.7628936, 109.4018789.

A Long Way Home – Lion

Recently I finished reading Saroo Brierley’s book, A Long Way Home and titled Lion in the feature film. In it he recounts his extraordinary life story of getting separated from his brother at a train station whose name began with ‘B’ a few hours from home in rural India in 1987. He was five years old and his brother Guddu had taken him the few hours from home to make some extra money sweeping out trains. Guddu leaves him on a bench in the train station to get some sleep while he goes to work. The young Saroo wakes a little later but can find no trace of Guddu, in his panic he hops on a train looking for Guddu, the doors shut and he (over what he believes to be 12-15 hours) ends up in Kolkata (then Calcutta). Here, against all odds he survives on the streets and eventually ends up being adopted by (from the sounds of it) a wonderful couple in Tasmania. In his 20s he begins trying to find his mother and siblings in India by using Google Earth to pinpoint both the town he boarded the train and in turn his hometown.

Howrah Junction Station, Kolkata

Howrah Junction Station, Kolkata

His task was an unenviable one, India is the second most populous country in the world with 1,367,139,484 people (17.5% of the world’s population, Wikipedia). Below is a population density map of India to give some context as to how the population is distributed. These figures are from present day, the population of India in 1987 when Saroo became lost was 819,800,000.

Population Density of India

Population Density of India

The whole story piqued my interest and I wondered if we were to try the search today using the benefit of open data and open-source software could we make it more efficient. I would like to heavily caveat what is to follow by stating that I am using software and data that would not have been as extensive, complete or fully-featured when Saroo began his search in 2011.

I kept very careful notes whilst I was reading the book and below is an exhaustive list of the criteria that he would follow from his recollections from his five year old self. The criteria is both from his hometown and the town that began with ‘B’ where he got on the train that eventually took him to Kolkata.

Initial Search Criteria

Initial Search Criteria

From the initial criteria above it became clear that a number of the criteria would not be usable in replicating the search; such as that it wasn’t in the colder north of India (too subjective) or that they lived side by side with Muslims (a common occurrence in India). Below is a refined list of criteria that I am going to use in order to try and replicate the search.

Usable Criteria for SearchUsable Criteria

Usable Criteria for Search

Saroo’s methodology started with tracing his steps backwards from Kolkata. He knew he got on the train at a station that sounded like ‘Berampur‘ and he thought he was on the train for approximately 12-15 hours. Based on this, he consulted Indian friends of his at college about how to start searching. One friend in particular,  Amreen whose father worked for the Indian Railway in New Delhi proved helpful. Her father made an educated guess that trains in India in the 1980s travelled at between 70kph-80kph. Based on this Saroo calculated that he would have travelled 1,000km in that time. He started searching methodically outwards from Kolkata to try and find the station that began with ‘B’ and hopefully, then, his hometown that was about an hour away from this.

If I was to try and recreate the search for Saroo’s hometown, I would need access to as much free geographical data as possible so I turned to OpenStreetMap and specifically the downloads available from Geofabrik. I downloaded the Protocolbuffer Binary Format (PBF) file of the entire of India. The first items I was interested in were the railway lines and railway stations of India. QGIS can load the PBF files natively but the entire of India is a bit of a stretch for it regardless of computing power available.

BASH Osmium Commands

BASH Osmium Commands

I used the Osmium tool to extract every railway station and line in India and then GDAL to convert them to the geopackage format. The first analysis I undertook was to follow Saroo’s and ascertain how many railway stations are within 1,000km of Howrah railway station. To give some context, OSM has listed 7,979 railway stations and 108,000km of railway line (the Wikipedia page lists 68,155 km (the discrepancy may be accounted for by in the inclusion of every siding and historic railway line etc. in the OSM data). The below map shows every railway station and line from OSM.

Railway Stations and Railway Lines

Railway Stations and Railway Lines

To give an idea of Saroo’s methodology of drawing a 1,000km buffer of Kolkata and working outwards that left him with 2,905 to search. Below is a map of all of these.

Every Railway Station within 1,000km of Kolkata

Every Railway Station within 1,000km of Kolkata

I decided not to use Saroo’s methodology of using a buffer distance from Kolkata. I had the entire OSM database for India at my disposal so I decided that the first and easiest step to undertake was to find all the railway stations that began with ‘B’ and contained ‘p’, ‘u’ and ‘r’.  I used QGIS’s inbuilt python functionality, PyQGIS. I wrote a small script that would use the regular expression module to find stations that matched the above criteria.

import time, re

start = time.time()
# Start Message
print("Program is Starting...")

layer = iface.activeLayer()
prov = layer.dataProvider()

if layer.dataProvider().fieldNameIndex("Relevant_Rail_Stations") == -1:
    layer.dataProvider().addAttributes([QgsField("Criteria_Test", QVariant.String)])
    layer.updateFields()
    
pattern = '^b.*.[p].*[u].*[r]*$'
   
# starting layer editing
layer.startEditing()

features = layer.getFeatures()

for feat in features:
    Regex_Stations_Search = feat['name']
    Regex_String = re.compile(pattern, re.IGNORECASE)
    Regex_Match = Regex_String.search(str(Regex_Stations_Search))
    if Regex_Match:
        layer.changeAttributeValue(feat.id(), 11, "Meets_Criteria")
    else:
        layer.changeAttributeValue(feat.id(), 11, "No_Match")
        
layer.commitChanges()
iface.vectorLayerTools().stopEditing(layer)

end = time.time()

print("Finished running - the program took " + str((round((end - start), 2))) + " seconds")

The above script took 15 seconds to run and it narrowed down the search field from 7,779 to 91 as shown below. I don’t think it is too much of a stretch to take the various ways that he thought it might have been spelled (as below) and to then take the common letters from those spelling and narrow down the search that way.

  1. Burampour
  2. Birampur
  3. Berampur
  4. Bramapour
  5. Berampur

Before and After 'B' Stations

Now that we’ve narrowed down the number of possible ‘B’-towns to 91, the next step is to try and use the hometown criteria to find the correct location. We know that Saroo’s hometown had a water tower, river, bridge, dam and a fountain in the park near the station. He believed the town’s name began with ‘G’ and sounded liked ‘Ginestlay’. I could proceed with the other criteria for the ‘B’-town however I felt that with a dam, bridge, river, fountain and water-tower there were enough unique entities that I could try at this stage to find the hometown without dedicating any further resources or time to the ‘B’-town.

The assumption that Saroo was working on from his friend Amreen’s father was that trains travelled between 70kph-80kph in the mid-80s. As processing power wouldn’t be an issue for what I was trying to do I decided to use the upper limit of 80kph for the speed of the trains. As the PBF file for the entire of India wouldn’t load correctly in QGIS I extracted all of the bridges, dams, rivers and water towers. Below are maps showing the distribution of each from OSM.

Rivers of India

Rivers of India

Dams of India

Dams of India

Bridges of India

Bridges of India

Water Towers of India

Water Towers of India

I omitted pedestrian over and underpasses at this stage as they were too prevalent to help narrow down the location. I now needed to buffer each of the possible ‘B’-towns by the distance the train would have travelled. Saroo knew that night that Guddu and he travelled for about an hour. I added a small bit of ‘fat’ to my buffer so I used 100km as the buffer distance. I then dissolved the buffers together and clipped them to the coastline. Saroo thought that he been on the train for 12-15 hours. I was thinking that it would be useful to do a buffer from Kolkata of perhaps 6 hours and omit everything inside this buffer as he knew he travelled a farther distance.

A quick GIF of this process is shown below.

GIF of Process

GIF of Process

Below is the final search area with the every dam, river, bridge, water-tower and train station within 100km of the ‘B’-town stations.

Final Search Area

Final Search Area

For the final analysis I wanted to find only the stations that had a water-tower, dam, river and bridge within 2km of the station. I picked 2km because I think that would be a reasonable distance for these features to be from a town-centre for 5 year old Saroo to walk to regularly enough to remember them distinctly. We didn’t want to include the ‘B’-town stations themselves so I simply clipped out the data for a distance of 10km from each ‘B’-town as Saroo said that Guddu and him were on the train for about an hour so 10km seems like a safe distance to clip.

Example of Final Search Area

Example of Final Search Area

 

2km Buffers of Each Station

2km Buffers of Each Station

The next step was to find each station that had our relevant criteria within the buffer area. For this I turned to PostGIS (I tried running the query the SQL query in QGIS but it crashed every time). I decided at the first iteration to only use—bridges, water towers and dams as my logic was that these would provide more conclusive than pedestrian over and underpasses and rivers which may have proved too common. Plus, the likelihood of a bridge being for a river was much greater than it being for a ravine, gorge etc. There were 3,223 stations within the 100km buffer of each ‘B’-town station as shown below.

Stations within Buffer Search Area

Stations within Buffer Search Area

The SQL query that I ran is below, this ran in about a second and returned each station that had a: bridge, water tower or dam within 2km. It narrowed down the number of possible stations from 3,223 to 24 as shown below.

CREATE VIEW Relevant_Stations 
AS 
SELECT stations_buffered.geom,
     , stations_buffered.name
     , stations_buffered.fid 
  FROM stations_buffered
INNER
  JOIN merged_bridge_river_dam 
    ON ST_Intersects(stations_buffered.geom, merged_bridge_river_dam.geom) 
   AND merged_bridge_river_dam.Feature_Ty IN ('Bridge','DAM','W_Tower')
GROUP
    BY stations_buffered.geom,
     , stations_buffered.name
     , stations_buffered.fid  
HAVING COUNT(DISTINCT merged_bridge_river_dam.Feature_Ty) = 3
24 Stations

The 24 Stations that have a ‘bridge’, ‘water tower’ and ‘dam’ within 2km.

An example of one of the 24 stations that match the criteria is shown below.

Example of Station Matching Criteria

Example of Station Matching Criteria

Saroo thought that the place he was from sounded like ‘Ginestlay’ so the next step was to find all the OSM tags for ‘place’ that started with ‘G’. I used the below Osmium command to extract all of these place names:

osmium tags-filter india-latest.osm.pbf place=* -o Placenames.osm.pbf

This extracted 193,057 place names for the entire of India. To ensure that I didn’t miss anything I converted the polygons and lines to centroids and merged everything together (this took about 2 minutes in QGIS). I then filtered this by every place that began with ‘G’ and used the ‘Select by Location’ tool to narrow down the list from 24 to 12 stations. I have included images of these 12 stations below. Some of these obviously were not where Saroo was from as they formed parts of large cities (such as Lucknow) but I’ve left them in for the sake of showing the full process.

Candidate Area 1

Candidate Area 1

Candidate Area 2

Candidate Area 2

Candidate Area 3

Candidate Area 3

Candidate Area 4

Candidate Area 4

Candidate Area 5

Candidate Area 5

Candidate Area 6

Candidate Area 6

Candidate Area 7

Candidate Area 7

Candidate Area 8

Candidate Area 8

Candidate Area 9

Candidate Area 9

Candidate Area 10

Candidate Area 10

Candidate Area 11

Candidate Area 11

Candidate Area 12

Candidate Area 12

Conclusion

The station that we were looking for, ‘Khandwa Junction’ with the neighbourhood of ‘Ganesh Talai’ is candidate area 3 above. There are a few observations that I’d like to make regarding the above work. Firstly, I tried to stay as faithful to the criteria that Saroo had to work with. Obviously, I had read the book and I knew the answer. I hope that the above doesn’t feel reverse-engineered. I can honestly say that I didn’t look at any of the OSM data for his hometown prior to starting this post.

I’m conscious of the fact that there is a high likelihood Saroo’s own search along with the media coverage of the movie are the reason the OSM data for Khandwa Junction exists at all. However I will state that all of the above steps are very easily customisable and the criteria easy to change. The parameters could be easily change to test another theory—such as using only a ‘B’ and ‘R’ for the ‘B’-town station perhaps. If we didn’t find the answer we were looking for on the first pass we could have used a DEM to find where train lines crossed a gorge outside of a ‘B’-town station. Or we could have used Python to find all the horseshoe shaped roads outside ‘B’-town stations. Finally, we could have used the data for pedestrian overpasses and underpasses and reran the above.

What I can say with certainly is that with the OSM data available today it was very straightforward to reduce the number of stations to be searched from 2,905 (those within 1,000km of Kolkata) to 12. This would then involve only searching 0.41% of the stations that Saroo originally had to sift through. Google Earth was the best tool available to Saroo at the time but there’s no universe in which it wasn’t a brute-force, sledgehammer to crack a nut tool. I hope with the above work I have shown that it is easy to greatly reduce the number of stations to search through using some decent logic and the power of open-source GIS software and data. I also hope that if anyone else is in a bind similar to Saroo’s that the above might in some way help to demonstrate how to search for the right answer and make it home.

Population Below the Line

We all know that the majority of Australia’s population lives in the eastern states. There’s a question I’ve been thinking about with a while — if you drew a line from just north of Brisbane to just west of Adelaide, what percentage of Australia’s population (including islands and Tasmania) live below that line? Well, I had a little time on this rainy Sunday so I decided to find out. I wrangled some data out of the ABS’ Table Builder and joined it with the geopackage of the SA2 geography (I chose this as I felt it provided the right spatial granularity without being too fine) and voilà, the answer to the question very few people asked 82.3%.

Population Below the Line

Population Below the Line

 

Unpopulated Areas

I was hiking at the weekend and it got me thinking about the unpopulated areas of Ireland. I’ve seen maps made for the 2011 census showing the square kilometres that have no usual resident population but I hadn’t seen one for the 2016 census so I put together the below. I purposefully omitted Northern Ireland because the data is nine years old. If anybody would like the replicate the below just leave a comment and I can do a YouTube tutorial or post on here on how I put it together.

 

The Unpopulated Areas of Ireland

The Unpopulated Areas of Ireland

Airbnbs in Ireland

I was reading this Guardian article the other day where they produced maps showing the number of Airbnb listings per 100 dwellings. I thought it was really interesting and I hadn’t seen Airbnb data mapped like that before. I had a few hours to spare yesterday so I set about replicating their method for Ireland. I used the 2016 census electoral divisions (to get the household numbers) and data for Ireland from Inside Airbnb.  I think at best this data is questionable because from the reading I’ve undertaken it seems to still list properties that were briefly on Airbnb a number of years ago and have long since been removed however this is the only data available so I went with it.

Below is the map, it was made with a combination of Bash, GDAL, QGIS, LibreOffice Calc and Illustrator.

Airbnbs per 100 Dwellings in Ireland

Airbnbs per 100 Dwellings in Ireland

Irish Census 2016 & Privacy

I’ve been looking at the 2016 census results with the last few years and there is a great deal of suppression of values for relevant Small Areas. The CSO suppress results or aggregate them depending on the number of people living in a Small Area. If the population is too small and could lead to individuals being identified, the data is suppressed. They are legally required to undertake this exercise under s33 of the Statistics Act, 1993.

I’ve been looking at a selection of variables and after reading this piece on the traveller accommodation crisis by RTÉ I decided to map the percentage travellers per Small Area. I have all this data in a PostGIS database but I’ll quickly run through how to do it without having to use PostGIS. I downloaded the Small Areas shapefile (generalised to 50m) and the CSV of all of the Small Area values from the CSO here. Instead of having to use a spreadsheet or QGIS to manually delete the 802 fields I didn’t need I used the pandas library, the python code below that took 0.3 seconds to run. It opens the relevant CSV and only selects the columns that I need and then strips the first 7 characters from the ‘GEOGID’ string as these are not needed for the join I’ll do in QGIS later.

import pandas as pd, time

start = time.time()
df = pd.read_csv('SAPS2016_SA2017.csv', usecols=['GUID', 'GEOGID', 'GEOGDESC', 'T1_1AGETT','T2_2WIT'])

df.GEOGID.apply(str)

df['GEOGID'] = df['GEOGID'].str[7:]

df.to_csv('SAPS2016_SA2017_New_GEOGID.csv')
end = time.time()
print(end - start)

I then opened the shapefile in QGIS, imported the CSV and joined them. This was  subsequently exported to a GeoPackage and I used GDAL’s ogr2ogr library to convert it to a GeoJSON in order to upload it to Carto.

ogr2ogr command to convert to GeoJSON
ogr2ogr command to convert to GeoJSON

Below is the resultant map with some formatting of headings undertaken to make it more legible. You can make it full screen using the button on the left. What struck me about this was how with a small amount of work it was very easy to visualise accurately the resident locations of one of the most vulnerable groups of society. Obviously this information is useful to local governments, state government agencies, NGOs and so forth but I question whether this data should been available to the general public regardless of it being aggregated to the Small Area geography.