I made a quick map during the week of the birthplaces of Irish Taoisigh. The below gives a quick overview however I thought putting a cartogram together also might below. No surprise to see Dublin has the most. I know we’re a young country but I found it interesting the number of counties that haven’t had a Taoiseach.
I recently moved to the south-west of Western Australia, close to a town called Busselton. The town is famous for, among other things, the longest timber-piled jetty in the Southern Hemisphere (at 1,841m). The first time I walked the jetty I stood at the end and gazed out wondering if I could follow my gaze in a straight line where would it make landfall?
The End of the Jetty
To answer the question accurately there are two concepts worth noting. The first is that of a rhumb line — if I digitised a line (in a projected coordinate system) representing the jetty and extended that angle, it would result in a rhumb line which at the intersect point would be off to the order of a couple of hundred kilometres.
The shortest distance between any two points on the surface of the earth is the minor arc of a great circle. So if I was standing where the signpost is shown above, that is in the centre of the end of the jetty, I would need to create the minor arc of a great circle line. This would be, as the great circle wiki article states ‘analogous to “straight lines” in Euclidean geometry‘.
The process I followed to create the line was as follows:
Create two points at the end of the jetty to represent the angle looking out.
Create a line between the two points.
Calculate the angle of the line and extend it for 1/4 the circumference of the earth at the equator.
The below GIF visualises the process-
Creation of Line.
If you’d like to carry out the same analysis yourself below is the SQL code. I’d like to thank Darrell Fuhriman for his help.
-- assume two points, calculate the angle, then extend it.
-- point 1 = -33.6307920509579, 115.338797474435
-- point 2 = -33.6290969607648, 115.338316140276
ST_Segmentize( -- break it up into segments so it looks better when re-projecting for display
ST_SetSRID(ST_Point(115.338797474435, -33.6307920509579), 4326) -- starting point (ST_MakeLine expects projected coordinates)
ST_Project( -- find a point a long way away to use as the second point in the line
ST_Point(115.338797474435, -33.6307920509579)::geography -- starting point again
,10000000 -- 1/4 circumference of earth at the equator (should get us far enough)
,ST_Azimuth(ST_Point(115.338797474435, -33.6307920509579)::geography, ST_Point(115.338316140276, -33.6290969607648)::geography) -- angle between my two points
,20000 -- break it into segments of 20km
An imaginary line from the end of the jetty travels 802km before intersecting with Tamala, WA. It then travels for another 2,127km before meeting the district of Ayah in the Kebumen Regency in the province of Central Java, Indonesia. Where it meets land is almost equidistant between the towns of Kebasen and Kebumen (the latterhad a population of 131,750 based on the 2020 census [source]).
Busselton Jetty Line
Busselton Jetty Line
If you’re taken by the above and you ever want to travel to where it touches land in Indonesia, the coordinates are: -7.7628936, 109.4018789.
For Halloween this year I wanted to create a spooky, atmospheric map. I settled on mapping the castles of Europe including Bran Castle. I know that Bran Castle doesn’t actually have any historical links to Bram Stoker but I thought it would be nice to include given it’s reputation. I came across a great website called https://download.osmdata.xyz/. It allowed me to easily download a geopackage of all the historic tags from OSM.
Now the the elephant in the room — the actual castle data. I filtered the data by historic=castle. I has tried filtering it by categories such as castle_type but there just wasn’t enough tagged to make a nice map. People commenting on Reddit have been at pains to point out how inaccurate the map is and by and large they are correct. It’s the best that could be make with the data available and I usually wouldn’t publish something where I know the data wasn’t up to scratch however as this was only meant to be a fun Halloween map I thought an exception could be made!
Anyway, I hope you enjoy it (above data caveat aside) as much as I enjoyed making it. Happy Halloween!
I was having a chat at work recently about the place names in Australia that end in ‘up’. It comes from a dialect of the Noongar Aboriginal language of Australia and means ‘place of’. Below are two quick maps I put together. You can clearly see the concentration in the Noongar region of South-West Western Australia.
Place Names Ending in ‘Up’ – Australia
Place Names Ending in ‘UP’ – South West, Western Australia
I’ve been writing a lot about population density at the moment and it got me thinking about density in Ireland. We all know we’re pitiful in terms of other European countries and we know from census 2016 that we’re sitting at 70 persons per km². What I’m curious about is what’s the densest square kilometre in each county?
It is import to note that the densest square kilometre in each county below is the densest square kilometre from a predefined 1km² grid for the entire country which is manifestly different from the true densest square kilometre in each county but it’s the best data I have access to as a member of the public. If you want to read more about this issue it’s called the modifiable areal unit problem. A simple image to explain this is show below. From a predefined grid that’s draped over the country we can see that cell 3 would be the most dense 1km² however if a cell was placed where cell 5 is we can see that it would be densest 1km² by a large amount, that in a nutshell is the MAUP problem.
I used the ’16 census data to generate the below maps to answer the question for each county. What clearly stands out for me is that for a lot of counties it’s housing estates that make up the densest square kilometre. Hopefully with Project Ireland 2040 now in place we can start to do better and go towards sustainable densities throughout the country.
Hyperlinks to maps for each of the other counties:
I realised with the above images that I forgot to include part of the label that showed where in each county the location was. Below are the same maps as above but they now include the location and the lat, long for each square kilometre. Maybe both versions can be used for a very nerdy quiz??
I was looking at this web-map this morning that shows the number of Irish living in the UK. I quickly put together an equivalent for the number of UK born individuals resident in Ireland as reported in the 2016 census.
At some stage in the second half of 2020 the Central Statistics Office released the population for each townland in Ireland for both the 2011 and 2016 censuses. They had a category for both data that has been suppressed ‘-1′ and the townlands that were unpopulated ‘0′. I think it’s a safe assumption that if you read (or have stumbled upon my blog) that you too might be interested in maps of unpopulated places. Below is the static map I posted on Twitter and Reddit and (by popular demand – read as two random people on Reddit) I’ve also included a slippy map that you can examine at your leisure.
Recently I finished reading Saroo Brierley’s book, A Long Way Home and titled Lion in the feature film. In it he recounts his extraordinary life story of getting separated from his brother at a train station whose name began with ‘B’ a few hours from home in rural India in 1987. He was five years old and his brother Guddu had taken him the few hours from home to make some extra money sweeping out trains. Guddu leaves him on a bench in the train station to get some sleep while he goes to work. The young Saroo wakes a little later but can find no trace of Guddu, in his panic he hops on a train looking for Guddu, the doors shut and he (over what he believes to be 12-15 hours) ends up in Kolkata (then Calcutta). Here, against all odds he survives on the streets and eventually ends up being adopted by (from the sounds of it) a wonderful couple in Tasmania. In his 20s he begins trying to find his mother and siblings in India by using Google Earth to pinpoint both the town he boarded the train and in turn his hometown.
Howrah Junction Station, Kolkata
His task was an unenviable one, India is the second most populous country in the world with 1,367,139,484 people (17.5% of the world’s population, Wikipedia). Below is a population density map of India to give some context as to how the population is distributed. These figures are from present day, the population of India in 1987 when Saroo became lost was 819,800,000.
Population Density of India
The whole story piqued my interest and I wondered if we were to try the search today using the benefit of open data and open-source software could we make it more efficient. I would like to heavily caveat what is to follow by stating that I am using software and data that would not have been as extensive, complete or fully-featured when Saroo began his search in 2011.
I kept very careful notes whilst I was reading the book and below is an exhaustive list of the criteria that he would follow from his recollections from his five year old self. The criteria is both from his hometown and the town that began with ‘B’ where he got on the train that eventually took him to Kolkata.
Initial Search Criteria
From the initial criteria above it became clear that a number of the criteria would not be usable in replicating the search; such as that it wasn’t in the colder north of India (too subjective) or that they lived side by side with Muslims (a common occurrence in India). Below is a refined list of criteria that I am going to use in order to try and replicate the search.
Usable Criteria for Search
Saroo’s methodology started with tracing his steps backwards from Kolkata. He knew he got on the train at a station that sounded like ‘Berampur‘ and he thought he was on the train for approximately 12-15 hours. Based on this, he consulted Indian friends of his at college about how to start searching. One friend in particular, Amreen whose father worked for the Indian Railway in New Delhi proved helpful. Her father made an educated guess that trains in India in the 1980s travelled at between 70kph-80kph. Based on this Saroo calculated that he would have travelled 1,000km in that time. He started searching methodically outwards from Kolkata to try and find the station that began with ‘B’ and hopefully, then, his hometown that was about an hour away from this.
If I was to try and recreate the search for Saroo’s hometown, I would need access to as much free geographical data as possible so I turned to OpenStreetMap and specifically the downloads available from Geofabrik. I downloaded the Protocolbuffer Binary Format (PBF) file of the entire of India. The first items I was interested in were the railway lines and railway stations of India. QGIS can load the PBF files natively but the entire of India is a bit of a stretch for it regardless of computing power available.
BASH Osmium Commands
I used the Osmium tool to extract every railway station and line in India and then GDAL to convert them to the geopackage format. The first analysis I undertook was to follow Saroo’s and ascertain how many railway stations are within 1,000km of Howrah railway station. To give some context, OSM has listed 7,979 railway stations and 108,000km of railway line (the Wikipedia page lists 68,155 km (the discrepancy may be accounted for by in the inclusion of every siding and historic railway line etc. in the OSM data). The below map shows every railway station and line from OSM.
Railway Stations and Railway Lines
To give an idea of Saroo’s methodology of drawing a 1,000km buffer of Kolkata and working outwards that left him with 2,905 to search. Below is a map of all of these.
Every Railway Station within 1,000km of Kolkata
I decided not to use Saroo’s methodology of using a buffer distance from Kolkata. I had the entire OSM database for India at my disposal so I decided that the first and easiest step to undertake was to find all the railway stations that began with ‘B’ and contained ‘p’, ‘u’ and ‘r’. I used QGIS’s inbuilt python functionality, PyQGIS. I wrote a small script that would use the regular expression module to find stations that matched the above criteria.
import time, re
start = time.time()
# Start Message
print("Program is Starting...")
layer = iface.activeLayer()
prov = layer.dataProvider()
if layer.dataProvider().fieldNameIndex("Relevant_Rail_Stations") == -1:
pattern = '^b.*.[p].*[u].*[r]*$'
# starting layer editing
features = layer.getFeatures()
for feat in features:
Regex_Stations_Search = feat['name']
Regex_String = re.compile(pattern, re.IGNORECASE)
Regex_Match = Regex_String.search(str(Regex_Stations_Search))
layer.changeAttributeValue(feat.id(), 11, "Meets_Criteria")
layer.changeAttributeValue(feat.id(), 11, "No_Match")
end = time.time()
print("Finished running - the program took " + str((round((end - start), 2))) + " seconds")
The above script took 15 seconds to run and it narrowed down the search field from 7,779 to 91 as shown below. I don’t think it is too much of a stretch to take the various ways that he thought it might have been spelled (as below) and to then take the common letters from those spelling and narrow down the search that way.
Now that we’ve narrowed down the number of possible ‘B’-towns to 91, the next step is to try and use the hometown criteria to find the correct location. We know that Saroo’s hometown had a water tower, river, bridge, dam and a fountain in the park near the station. He believed the town’s name began with ‘G’ and sounded liked ‘Ginestlay’. I could proceed with the other criteria for the ‘B’-town however I felt that with a dam, bridge, river, fountain and water-tower there were enough unique entities that I could try at this stage to find the hometown without dedicating any further resources or time to the ‘B’-town.
The assumption that Saroo was working on from his friend Amreen’s father was that trains travelled between 70kph-80kph in the mid-80s. As processing power wouldn’t be an issue for what I was trying to do I decided to use the upper limit of 80kph for the speed of the trains. As the PBF file for the entire of India wouldn’t load correctly in QGIS I extracted all of the bridges, dams, rivers and water towers. Below are maps showing the distribution of each from OSM.
Rivers of India
Dams of India
Bridges of India
Water Towers of India
I omitted pedestrian over and underpasses at this stage as they were too prevalent to help narrow down the location. I now needed to buffer each of the possible ‘B’-towns by the distance the train would have travelled. Saroo knew that night that Guddu and he travelled for about an hour. I added a small bit of ‘fat’ to my buffer so I used 100km as the buffer distance. I then dissolved the buffers together and clipped them to the coastline. Saroo thought that he been on the train for 12-15 hours. I was thinking that it would be useful to do a buffer from Kolkata of perhaps 6 hours and omit everything inside this buffer as he knew he travelled a farther distance.
A quick GIF of this process is shown below.
GIF of Process
Below is the final search area with the every dam, river, bridge, water-tower and train station within 100km of the ‘B’-town stations.
Final Search Area
For the final analysis I wanted to find only the stations that had a water-tower, dam, river and bridge within 2km of the station. I picked 2km because I think that would be a reasonable distance for these features to be from a town-centre for 5 year old Saroo to walk to regularly enough to remember them distinctly. We didn’t want to include the ‘B’-town stations themselves so I simply clipped out the data for a distance of 10km from each ‘B’-town as Saroo said that Guddu and him were on the train for about an hour so 10km seems like a safe distance to clip.
Example of Final Search Area
2km Buffers of Each Station
The next step was to find each station that had our relevant criteria within the buffer area. For this I turned to PostGIS (I tried running the query the SQL query in QGIS but it crashed every time). I decided at the first iteration to only use—bridges, water towers and dams as my logic was that these would provide more conclusive than pedestrian over and underpasses and rivers which may have proved too common. Plus, the likelihood of a bridge being for a river was much greater than it being for a ravine, gorge etc. There were 3,223 stations within the 100km buffer of each ‘B’-town station as shown below.
Stations within Buffer Search Area
The SQL query that I ran is below, this ran in about a second and returned each station that had a: bridge, water tower or dam within 2km. It narrowed down the number of possible stations from 3,223 to 24 as shown below.
CREATE VIEW Relevant_Stations
ON ST_Intersects(stations_buffered.geom, merged_bridge_river_dam.geom)
AND merged_bridge_river_dam.Feature_Ty IN ('Bridge','DAM','W_Tower')
HAVING COUNT(DISTINCT merged_bridge_river_dam.Feature_Ty) = 3
The 24 Stations that have a ‘bridge’, ‘water tower’ and ‘dam’ within 2km.
An example of one of the 24 stations that match the criteria is shown below.
Example of Station Matching Criteria
Saroo thought that the place he was from sounded like ‘Ginestlay’ so the next step was to find all the OSM tags for ‘place’ that started with ‘G’. I used the below Osmium command to extract all of these place names:
This extracted 193,057 place names for the entire of India. To ensure that I didn’t miss anything I converted the polygons and lines to centroids and merged everything together (this took about 2 minutes in QGIS). I then filtered this by every place that began with ‘G’ and used the ‘Select by Location’ tool to narrow down the list from 24 to 12 stations. I have included images of these 12 stations below. Some of these obviously were not where Saroo was from as they formed parts of large cities (such as Lucknow) but I’ve left them in for the sake of showing the full process.
Candidate Area 1
Candidate Area 2
Candidate Area 3
Candidate Area 4
Candidate Area 5
Candidate Area 6
Candidate Area 7
Candidate Area 8
Candidate Area 9
Candidate Area 10
Candidate Area 11
Candidate Area 12
The station that we were looking for, ‘Khandwa Junction’ with the neighbourhood of ‘Ganesh Talai’ is candidate area 3 above. There are a few observations that I’d like to make regarding the above work. Firstly, I tried to stay as faithful to the criteria that Saroo had to work with. Obviously, I had read the book and I knew the answer. I hope that the above doesn’t feel reverse-engineered. I can honestly say that I didn’t look at any of the OSM data for his hometown prior to starting this post.
I’m conscious of the fact that there is a high likelihood Saroo’s own search along with the media coverage of the movie are the reason the OSM data for Khandwa Junction exists at all. However I will state that all of the above steps are very easily customisable and the criteria easy to change. The parameters could be easily change to test another theory—such as using only a ‘B’ and ‘R’ for the ‘B’-town station perhaps. If we didn’t find the answer we were looking for on the first pass we could have used a DEM to find where train lines crossed a gorge outside of a ‘B’-town station. Or we could have used Python to find all the horseshoe shaped roads outside ‘B’-town stations. Finally, we could have used the data for pedestrian overpasses and underpasses and reran the above.
What I can say with certainly is that with the OSM data available today it was very straightforward to reduce the number of stations to be searched from 2,905 (those within 1,000km of Kolkata) to 12. This would then involve only searching 0.41% of the stations that Saroo originally had to sift through. Google Earth was the best tool available to Saroo at the time but there’s no universe in which it wasn’t a brute-force, sledgehammer to crack a nut tool. I hope with the above work I have shown that it is easy to greatly reduce the number of stations to search through using some decent logic and the power of open-source GIS software and data. I also hope that if anyone else is in a bind similar to Saroo’s that the above might in some way help to demonstrate how to search for the right answer and make it home.
We all know that the majority of Australia’s population lives in the eastern states. There’s a question I’ve been thinking about with a while — if you drew a line from just north of Brisbane to just west of Adelaide, what percentage of Australia’s population (including islands and Tasmania) live below that line? Well, I had a little time on this rainy Sunday so I decided to find out. I wrangled some data out of the ABS’ Table Builder and joined it with the geopackage of the SA2 geography (I chose this as I felt it provided the right spatial granularity without being too fine) and voilà, the answer to the question very few people asked 82.3%.
I was reading Alasdair Rae’s street types blog post yesterday and I thought about doing something similar for Ireland. I didn’t have much time today so I said I’d put something quick together for Cork City. Alasdair was lucky that he could use the amazing open data from the UK’s Ordnance Survey. For Ireland, we don’t have anything similar from our national mapping agency but fear not because it’s OpenStreetMap to the rescue. In my humble opinion individuals often spend far too much time trying to get a usable output from overpass-turbo when there’s an easier way. What I did was download the .osm.pbf file for Ireland from Geofabrik. Windows Subsystem for Linux has been a lifesaver since it came along, most of the heavy lifting I can now do through it. The steps I followed are as follows:
1. Download the .osm.pbf file using wget.
2. Install the osmium-tool
3.Filter out just the ‘highways’ from the dataset and save it to a new .osm.pbf file
4. Convert the .osm.pbf file to a new shapefile ( I don’t normally use shapefile but converting to a geopackage was causing some issues in this instance).
I could have made this more efficient again by combining some of these actions (such as save as a new shapefile and reprojecting), I did it this way to make the process easier to explain.
Next I brought the dataset into QGIS and extracted an area just for Cork City. The main issue I then has was how to ascertain what road types occur most frequently. I exported the Cork City extract as a CSV and imported it into Excel. I then used a generic formula from the folks at exceljet to extract the last work from the ‘name’ column. I then counted the frequency of the extracted work in a pivot table to get an idea of the types I’d like to use.
Frequency of Road Types
The last thing I did was to create a new field using an expression to find the various road types in the ‘name’ column from the OSM data. The expression I used is the below:
WHEN "name" LIKE '%Road%' then 1
WHEN "name" LIKE '%Park%' then 2
WHEN "name" LIKE '%Street%' then 3
WHEN "name" LIKE '%Avenue%' then 4
WHEN "name" LIKE '%Court%' then 5
WHEN "name" LIKE '%Drive%' then 6
WHEN "name" LIKE '%Hill%' then 7
WHEN "name" LIKE '%Lawn%' then 8
WHEN "name" LIKE '%Estate%' then 9
WHEN "name" LIKE '%Place%' then 10
WHEN "name" LIKE '%Lane%' then 11
I then symbolised the data and exported it. The final product is below. Obviously this was a pretty quick and dirty way to do it and if I create an atlas for every major town in Ireland I will use PostGIS which will allow me to count the frequencies and create the new field with ease.