How to Fail at GDELT (with maps)
How to Fail at GDELT (with maps)
Mona Chalabi is at it again with GDELT in Nigeria! Once again, I have to give her props for digging with both hands into one of the richest data sources in social science today, and once again her crude handling of the data has lead her far beyond the realms of reasonable inference into wild conjecture.
Prompted by a reader, Chalabi decided to build a time-series visualization of GDELT on a map. This is a popular treatment for such data, but inference from such things, as for all things GDELT (and perhaps all things social science), is a slippery business. Let’s go through this piece-by-piece.
The 25,247 kidnappings that GDELT has recorded taking place in Nigeria since 1982 seem to have occurred across the country.
First thing: Why 1982? GDELT goes back to 1979. If there’s some compelling reason here, I don’t see it, but this is the most minor of Chalabi’s infringements.
Second and far more importantly, GDELT doesn’t count kidnappings. GDELT counts reports. If two different newspapers report the same thing two different ways, then GDELT will record them as two events. This can be a subtle difference: “Boko Haram Kidnaps Girls” and “Abubakar Shekau kidnaps Girls”.
This point is so critical that it effectively invalidates any inference Chalabi makes beyond it. Never-the-less, let’s trudge ahead:
The rapid acceleration in kidnappings in the past decade is obvious, but deciphering regional patterns is harder — especially for those of you who, like me, don’t have a detailed knowledge of Nigerian geography.
Someone who has never touched GDELT before can be forgiven for thinking this. Chalabi cannot: Last time Chalabi used GDELT, she was kindly informed by “the helpful people at GDELT” (read: Kalev Leetaru) that GDELT grows exponentially. Let me say that again slowly: GDELT. GROWS. EXPONENTIALLY.
It says so right in the Paper! This means that anytime you see growth in any phenomena as recorded by GDELT, it’s probably a product of the fact that GDELT itself grew during that time. I see no evidence of normalization to account for this fact. She’s made this error twice now, and the second time after she was explicitly corrected. Accepting this and moving on…
I looked at population data and calculated the number of kidnappings per 100,000 residents in each of the country’s 37 stateas.
This is a somewhat crude calculation.
We’re counting all geolocated kidnappings in the GDELT database since 1982 and dividing that by each state’s current population. Official kidnapping statistics for Nigeria aren’t available, and our numbers do provide a good relative picture; we can see where kidnappings in Nigeria are most prevalent.
Just a reminder that this is incorrect: She’s counting reports of kidnapping, many of which will have significant overlap with one another.
The kidnapping rate is the highest — 120 for every 100,000 people — in the Federal Capital Territory, which includes Abuja, Nigeria’s capital.
I absolutely don’t believe this, and you shouldn’t either. I don’t know that this is the case, but it looks a lot like that Federal Capital Territory sits very near the centroid of the nation.
If that’s the case, it is the default location for reports in Nigeria without more specific geolocation information. Reports of Kidnappings attributed to this spot almost certainly happened elsewhere within Nigeria, but there’s no way to know where.
Three states in the south also stand out (Rivers, Delta and Bayelsa) with unusually high numbers of kidnappings relative to their population size. One possible explanation is the region’s oil wealth, otherwise known as the curse of the black gold. The United Nations news service has also highlighted how oil extraction in the south of Nigeria has been accompanied by violence and criminality.
Great! Let’s invoke some controversial literature in developmental economics because it seems to fit our narrative. Alternatively, it could just be because crime is more prevalent in urban areas. Better yet, reporting coverage is better in urban areas.
A kidnapping of one girl in the north might be covered by the local rag (if at all). A kidnapping one of the major urban centers might get picked up by the biggest regional or national paper, from which everyone else will begin reporting the same.
One other state that was well above the median kidnapping rate (of five kidnappings per 100,000 people) was Borno in the northeast. That’s where the militant group Boko Haram, which is responsible for the recent mass kidnapping, is based. When we filtered the results to look at the first four months of 2014, Borno had the highest kidnapping rate in Nigeria.
That couldn’t possibly have anything to do with the thousands of international news outlets covering the single kidnapping of 300 girls, could it?
Girls were taken from the Government Girls Secondary School in Chibok, which is in Borno state. At the substate level, Chibok has rapidly become more vulnerable to kidnappings; GDELT has recorded 649 kidnappings there in the first four months of this year. GDELT recorded one in 2013, and none before that.
There’s no support for this conclusion. The very fact that the “trend” she describes moves from 0 to 1 to 649 over the course of three years only suggests that there’s some fundamental change in the coverage between these years. The 649 number also comes from left-field: I’ve heard reports placing the number of kidnapped at around 300.
So Chalabi’s assumption is that, in addition to the 300 all the news is about, there are around 350 other people kidnapped from Borno alone. Wrong! There were 649 reports of kidnappings in her dataset! We can make absolutely no inference about the number of kidnappings. The error Chalabi made in the first sentence has infected every claim she makes in the piece.
I wasn’t the only one annoyed. Twitter was enraged. Erin Simpson in particular had a lot to say. She even put a bounty on a storified version of her rant, which I sadly lost by about five minutes.