Skip to content

Tag: Non-fiction

Computers Are Not Your Friends: the Iowa Caucus, the Shadow App, and the End of Faith

When Alexandria Ocasio-Cortez said AI could be racist, it almost burned down the internet. Smug dudes in baseball caps were hooting and hollering and falling over themselves to laugh at the ridiculous idea that a computer system could hold human values. AOC was right. It’s one of the predominant issues in AI right now—an artificial intelligence is built from human datasets, and the selection of those datasets is done by a human, and that means there’s a chance to program human biases into an AI. 

In 2018, researchers at MIT created a “psychopath AI” called Norman, named after Norman Bates. They exclusively fed Norman horrifying data: car crashes, dead bodies, mutilation and destruction. Norman came out fucked up. Not everything is as dramatic as turning an AI into a serial killer, but we’re seeing similar issues everywhere: facial recognition cameras—predominantly trained on datasets of white men—continue to not recognise black women. There’s something we need to acknowledge if we’re going to have healthy democracies: technology is not impartial. It is made by people and used by people, and it is as capable of bias as those same people. 

We’re currently seeing this at the disastrous Iowa Caucus—the Shadow App that delivered miscounts was made by a secretive company that took funding from Hillary Clinton, Joe Biden, and Pete Buttigieg, and a number of its staff are former Hillary staffers. In one particular caucus, Shadow took Bernie Sanders’ 116 votes, compared them to Buttigieg’s 73 votes, and came out with the same number of delegates. There’s not enough there to say it was intentional, but there’s more than enough to spur conspiracy theories, to destabilise trust in our institutions—to make millions of people around the world shrug and say “eh, fuck it, what’s the point?” and never show up to vote. 

I don’t think the Shadow team called a Ratfucking Meeting and drew out plans to Ratfuck Bernie; I think the Shadow team worked through unconscious biases that would level out the playing field, because their guy is a frontrunner but not the frontrunner, and wanted to see their guy win. Because they’re people, and people have biases, and the machines they make often carry those biases, even when they don’t know they’re doing it. 

We tilt towards people we like. Hell, I’m doing it right now: I like Bernie, and I’ve sat down and tried to be professional and make sure everything in this article is as objective as possible, but extricating the self is hard. I think I’ve succeeded, but if the internet has got a surplus of anything, it’s folks who are ready to loudly disagree. The least I can do is say: I’m a leftist, and that probably changes the data I give you, whether I know I’m doing it or not.  

Maybe somebody on the Shadow team did fuck up, honestly, without bias. That’s where the tech world seems to be leaning on this whole circus. Shadow tried to do a very complex job using limited funding and an extremely short timeframe (two months and $60,000 is nothing in Silicon Valley terms, especially to find a solution to electronic-fucking-voting), and they may well have just dropped the ball. If you want a fun experiment, bring up electronic voting with a group of policymakers, then with a group of engineers. The general consensus from techies is that we’re just not there yet and we can’t guarantee safe or reliable systems, but politicians all over the world are rushing to implement it anyway. The issues with Shadow seem pretty clear-cut, but it’s based on a relatively small dataset, and they might’ve just not considered that. They just didn’t scale their tool correctly, and God knows it wouldn’t be the first time a startup failed to scale down effectively. We wind up with the same problem: we trust our tech too much. We trust it like it’s a fortress and not a matchstick palisade. 

In the UK, a group with strong ties to the LibDems launched a ‘tactical voting site’ that leant heavily LibDem, recommending them as the tactical vote even in strong Labour constituencies. GetVoting claimed impartiality, claimed to be just the data, but it ended up making wildly misleading claims during one of the most crucial elections in recent history. In the end, the Libdems split votes across the UK. Did GetVoting do it on purpose? I think there’s a stronger case there than with Shadow: the results are further from reality, the funding links are tighter. It doesn’t matter: in the end, nobody won.

We often talk about datasets and AIs and applications as though they spring into existence fully-formed from cracks in the earth; we live in an age of perfect miracles, and we trust them with our lives. 

That trust is killing us. 

Sometimes it’s malice, sometimes it’s incompetence, sometimes it’s something more gentle and strange and human that’s hard to put a name to. The end result is the same. Technology can be liberating and empowering, but that same power is dangerous if mishandled, and right now we’re a bunch of drivers who refuse to admit that we’ve blown a tyre; refuse to admit that it’s possible for tyres to blow; drivers who are careening down State Highway 1 with our dicks in our hands screaming that our car can drive all the way to heaven.  

The 2019 NZ Budget Leak: what actually happened

EDIT: This piece has gone much bigger than expected. I’m blown away. I was editing during the day to add clarifications onto the end, but I’ve gone back and worked them into the body of the text.

The Treasury data breach has been a shitshow. I don’t think I’ve ever seen a bigger disconnect between the experts and the pundits, and I don’t say that lightly. I’m not a security guy, for what it’s worth: I’m a writer at a tech firm, but I’m fascinated by security and over the last few days I’ve been talking to people who actually know their stuff. Almost unanimously they’re calling this a breach. Almost unanimously, the pundits are off shouting that it’s “not a hack!”.

Right from the start, I’m setting a rule: we’re not going to talk about “hacking”. It means totally different things to the IT sector (anything from coding at all to randomly kludged spaghetti code that really shouldn’t work) and the public (a man in a trenchcoat saying “I’m in!”), and most InfoSec types shy away from it anyway. I’m not going to bore you with the whole hacking vs cracking debate, but we’re going to call this thing what it is: a data breach.

So what happened?*¹ This is a web server:

Its job is to display web content. Every time you go online, you’re accessing content from web servers. Simple enough? This is a staging server:

It serves as a testing environment. Content intended for the public but not yet released goes on the staging server to make sure it runs smoothly for when the time comes to make it public. Some staging server content never goes live: it either didn’t work as expected or it wasn’t meant to be there, or something changed and it got pulled.

Treasury cloned their web server, put it in the staging server, then added the budget to it for testing. The problem is, they also cloned the index configuration: the instructions that the search used to store search data for later use. Both web and staging server stored their search information in the same place and SOLR—the program running the search function—wasn’t properly instructed to avoid the staging server. That gave the web server access to the search information about documents on the staging server via the search bar, though not the staging documents themselves.

To illustrate, here’s the Spinoff today:

See how you get the title and the first few lines? Using the exploit on the Treasury’s site, somebody pulled snippets of the budget like that from the staging server. Critically, to do this, you would need to know the title of the section. You search for a specific heading in the web server, and it comes up with the title and the first 4-5 lines. It was, all things considered, a pretty small hole:

  1. It required the attacker to know the content was on the staging server
  2. It required the attacker to know the specific wording on the staging server
  3. Even then, it only gave them snippets

So what happened? Well, a leak. The actual leak. The budget didn’t leak: the budget’s search index leaked. That’s essentially a table of contents. The budget ToC being out in the open covered points 1 and 2 above: the fact the budget was ready to go public (thus, probably on the staging server) and a list of searchable titles and subtitles.

“Leak” is a strong word, too: it used the same headings as the 2018 budget. I’m still a little fuzzy on whether the actual index leaked (as in, got sent to the wrong place/got left out somewhere irresponsible/got made public too early) or whether somebody just heard it was the same as last year’s via the Thorndon grapevine and started punching in queries.

What about #3? Well, that’s why there were 2000 searches. They pulled 2000 snippets and put the budget together like a jigsaw. It’s not “just a search”: it’s using a leaked search index to perform 2000 searches, to take advantage of an exploit that pulled small pieces of content from a staging server, then stitching that content together in post. It’s not something Johnny Q Public could do by accident. It’s not an “open door” at all. That’s also why National got some details wrong: they didn’t have a complete picture. They had a very good outline, though. All the titles and subtitles, and the first few lines after each.

It’s all a bit rubbish but—to quote InfoSec luminary Adam Boileau—”it’s not rubbish if it works“.

Metaphors about the door being unlocked do us no favours, unless we really want pundits to be better-equipped to twist the actual events. Whether or not it’s a “hack” doesn’t really matter: it’s an intentional attempt to gain access to private data. It utilised an exploit to pull content that wasn’t meant to be public. It’s a breach. More than that, there are established protocols for what happens if somebody finds an exploit in government software. These rules were written by the National Party in 2014, and National failed to follow them. Their failure to follow protocol merits investigation: they let the particular use of an exploit go undetected for their own political gain. Even if the content was delivered to them anonymously by a no-good samaritan, they bear at least partial responsibility for this because they went public instead of reporting it.

Where did the Treasury fuck up?

  • They should’ve considered their SOLR configuration when they cloned their data to the staging server.
  • They probably shouldn’t have cloned their web server to begin with—making a staging server from scratch with the same dependencies might have been a pain in the ass (I’m honestly not sure: I don’t know what their dependencies look like) but it would’ve been a lot safer.
  • They could’ve been jazzier about this year’s subtitles.

Where did the National Party fuck up?

  • They identified an exploit but—instead of following CERT protocol—they used it for their own personal gain.

I’m not gonna lie, it’s bad. Somebody dropped the ball, and somebody else put a knife into it.

Still, I do not believe Simon Bridges has committed a crime, nor has he committed Breach of Confidence. He has violated his CERT obligations, which at worst means he’ll get a strongly-worded nonbinding letter from MBIE telling him not to do it again. He did a bad thing, but not all bad things result in him being removed from Parliament in a paddy wagon. To quote one of my anonymous sources: “he’s an asshole, not a criminal.”

It’s still ridiculous that pundits are calling for heads to roll. At the end of the day, it wasn’t a big deal. Grant Robertson shrugged and moved on. The Treasury were right: what harm could somebody actually do by using that exploit? Release a half-complete version of the document a day early?

By the by, it’s not dodgy or extreme that anybody called it a ‘hack’. If there’s a problem with the word, it’s not that it doesn’t mean this, it’s that it does mean this because it’s a vague word that means wildly different things to different people. Not all hacking is a man in a trenchcoat typing into a green/black Linux CLI then saying “I’m in!”—It’s not rubbish if it works. Makhlouf and Robertson could’ve maybe been more precise with their language but that’s not a crime either.

And then, of course, the pundits got to it. Either the Treasury were little angels who did no wrong, or they were cringing fools who dropped a box of printed budgets off at the top Lambton Quay. What we actually have here is a pattern pretty typical of data breaches: a small screwup like improper SOLR config let an attacker access to data they shouldn’t have had. I’m sure somebody is going to shout at me that it wasn’t a small mistake, but unless they can explain how to correctly configure Apache SOLR in a Drupal installation so it doesn’t allow partial read access to cloned data in a staging server then they can fuck right off with their piety and condescension. It’s a screwup for sure, but the people talking about “open doors” need to pull their heads in.

What’s really happening is that the pundits smell blood in the water, and they don’t care what actually happened—they just want an excuse to sink their teeth in.

Same old NZPol, I guess.

If you like what you’re reading, stick around and check out some of my fiction, or follow me @understatesmen on Twitter.

*¹ most of this is coming through various DMs and actually talking to people. I am willing to admit I might’ve muddied the details, though I’ve done my best and at the very least—talking to actual experts and having a tech background—I’m doing a better job than the lukewarm tech reckons of blokes who struggle to operate a washing machine.

Credit for assistance to Sana Oshika, and the others who preferred to go unnamed.