Last fall I was at the AI conference hosted by the CPG in Bangkok, sitting next to Erik Vermeulen of Tilburg University and Phillips Electronics, when a speaker brought up Nick Bostrom’s paperclip maximizer argument. In short, Bostrom has suggested that if we build a really smart machine without careful insurance that its values align with ours, it could run amok even if it has seemingly harmless goals. In this case, what if the AI was designed to maximize the manufacture of paperclips and then began devising ways of turning all of the world, including humanity, into paperclips (while simultaneously acting to prevent anything that would prevent its goals, such as being turned off)? To Bostrom’s credit, his real interest aligns with an important issue: he argues that we need top-level friendly-to-human-beings programming in AI. The paperclip maximizer, however, is a really poor point-of-entry to that position. I have a feeling that he did not realize quite how widespread it would become in academic conversations about AI. Alas, the conference presentation was neither the first nor even the most recent time that I’ve been faced with the paperclip maximizer, which is routinely used to describe the potential horrors of an AI future (I just ran across it again this week).
Erik and I immediately realized that the Paperclip Singularity may not present a particularly valuable analysis but there could be a future graphic novel in it. Here is my concept art:
The real problem isn’t, as Bostrom seems to think, that we might get value alignment wrong. The problem, as Charlie Stross illustrates in his fantastically imaginative book Accelerando, is that we might get value alignment right. The majority of AI funding comes through corporate overlords and the military. In a world where the machines know how to violently dominate people and squeeze short term profits (ignoring deferred costs) we’ll be in real trouble.
Compared to that, being turned into a paperclip starts sounding pretty good.
Update: In January 2023, some truly awful words of Bostrom’s are getting around. His apparent attitude about humanity and race strike me as one more important reason that getting value alignment “right” would be a tragedy. Not only do we we want to avoid having the military-industrial complex rule our approach to AI, but we desperately need to avoid having racism, sexism, and other forms of human bias and oppression infect our hypothetical AI co-workers and companions. Compared to having racist, eugenicist robot overlords, becoming a paperclip sounds even better! Let’s build a better world by building better ethics into AI and by creating justice among human beings.
Update 2: I should have updated this ages ago. I’m pretty sure that the paperclip maximizer hypothesis actually comes from Eliezer Yudkowski. I looked again at Bostrom’s paper and note zero citations to that effect. However, Bostrom knows Yudkowski, so he presumably knows that it’s Yudkowski’s (just how he knows the simulation hypothesis actually belongs to Hans Moravec). He just doesn’t say so. Maybe I’m wrong about the precedent here, but I doubt it.