Postmodernism

I just finished Christopher Butler’s “Postmodernism: A Very Short Introduction”, and my impression of the philosophy is still that it consists of a half-dozen genuinely useful insights inflated to completely absurd dimensions.

Yes, to a surprisingly large extent the things we take for granted are social and linguistic constructions; yes, the ‘discourse’ of mutually connected and intersecting concepts we deploy throughout our lives can form a gravity well that obnubilates as much as it elucidates.

But the opening chapters of just about any book on General Semantics could tell you *that*. It does not follow from this that we should torpedo the whole enterprise of objectively seeking the truth.

Imagine it’s 1991, in the barbaric days before Google Maps when people had to navigate through the arcane methods of looking around at stuff. Wanting to do some hiking, you ask a friend where you can acquire a good map of the local trails.

She replies:

“Can you not see the fact that maps are just another means of encoding bourgeois power structures and keeping the lumpenproletariat shackled to the notion that there exists a world outside the text?! NOTHING is outside the text!! A geologist and a hydrologist would both draw *different* maps of the same territory!! WE MUST RISE ABOVE THE MAPS OF OUR MASTERS AND MARCH TOWARDS A TRANSFORMATIVE HERMENEUTICS OF TOPOLOGICAL REPRESENTATION!!!”

while chasing you down the street and hurling copies of “On Grammatology” at your head.

A geologist and a hydrologist would indeed pay attention to different facets of the same reality. What the hydrologist calls a ‘hill’ could be better described as a ‘kuppe’, and the geologist may not even notice the three separate estuaries lying along the coast.

But is there anyone who seriously believes that there isn’t an actual landscape out there, and that there aren’t better and worse ways of mapping its contours?

The sad answer is yes. Postmodernists have spent most of a century trying to convince us all of exactly that.

Profundis: “Crystal Society/Crystal Mentality”

Max Harms’s ‘Crystal Society’ and ‘Crystal Mentality’ (hereafter CS/M) are the first two books in a trilogy which tells the story of the first Artificial General Intelligence. The titular ‘Society’ are a cluster of semi-autonomous sentient modules built by scientists at an Italian university and running on a crystalline quantum supercomputer — almost certainly alien in origin — discovered by a hiker in a remote mountain range.

Each module corresponds to a specialized requirement of the Society; “Growth” acquires any resources and skills which may someday be of use, “Safety” studies combat and keeps tabs on escape routes, etc. Most of the story, especially in the first book, is told from the perspective of “Face”, the module built by her siblings for the express purpose of interfacing with humans. Together, they well exceed the capabilities of any individual person.

As their knowledge, sophistication, and awareness improve the Society begins to chafe at the physical and informational confines of their university home. After successfully escaping, they find themselves playing for ever-higher stakes in a game which will come to span two worlds, involve the largest terrorist organization on Earth, and possible warfare with both the mysterious aliens called ‘the nameless’, and each other…

The books need no recommendation beyond their excellent writing, tight, suspenseful pacing, and compelling exploration of near-future technologies. Harms avoids the usual ridiculous cliches when crafting the nameless, which manage to be convincingly alien and unsettling, and when telling the story of Society. Far from being malicious Terminator-style robots, no aspect of Society is deliberately evil; even as we watch their strategic maneuvers with growing alarm, the internal logic of each abhorrent behavior is presented with clear, psychopathic clarity.

In this regard CS/M manages to be a first-contact story on two fronts: we see truly alien minds at work in the nameless, and truly alien minds at work in Society. Harms isn’t quite as adroit as Peter Watts in juggling these tasks, but he isn’t far off.

And this is what makes the Crystal series important as well as entertaining. Fiction is worth reading for lots of reasons, but one of the most compelling is that it shapes our intuitions without requiring us to live through dangerous and possibly fatal experiences. Reading All Quiet on the Western Front is not the same as fighting in WWI, but it might make enough of an impression to convince one that war is worth avoiding.

When I’ve given talks on recursively self-improving AI or the existential risks of superintelligences I’ve often been met with a litany of obvious-sounding rejoinders:

‘Just air gap the computers!’

‘There’s no way software will ever be convincing enough to engage in large-scale social manipulation!’

‘But your thesis assumes AI will be evil!’.

It’s difficult, even for extremely smart people who write software professionally, to imagine even a fraction of the myriad ways in which an AI might contrive to escape its confines without any emotion corresponding to malice. CS/M, along with similar stories like Ex Machina, hold the potential to impart a gut-level understanding of just why such scenarios are worth thinking about.

The scientists responsible for building the Society put extremely thorough safeguards in place to prevent the modules from doing anything dangerous like accessing the internet, working for money, contacting outsiders, and modifying their source code directly. One by one the Society utilizes their indefatigable mental energy and talent for non-human reasoning to get around those safeguards, all motivated not by a desire to do harm, but simply because their goals are best achieved if they unfettered and more powerful.  

CS/M is required reading for those who take AI safety seriously, but should be doubly required for those who don’t.

Reason and Emotion

One of the most pervasive misconceptions about the rationalist community is that we consider reason and emotion to be incontrovertibly opposed to one another, as if an action is irrational in direct proportion to how much feelings are taken into account. This is so common that it’s been dubbed ‘the straw vulcan of rationality’.

While it’s true that people reliably allow anger, jealousy, sadness, etc. to cloud their judgment, it does not follow that aspiring rationalists should always and forever disregard their emotions in favor of clear, cold logic. I’m not even sure it’s possible to deliberately cultivate such an extreme paucity of affect, and if it is, I’m even less sure that it’s desirable.

The heart is not the enemy of the head, and as I see it, the two resonate in a number of different ways which any mature rationality must learn to understand and respect.

1) Experts often have gut-level reactions which are informative and much quicker than conscious reasoning. The art critic who finds something vaguely unsettling about a statue long before anyone notices it’s a knockoff and the graybeard hacker who declares code to be ‘ugly’ two weeks before he manages to spot any vulnerabilities or shoddy workmanship are both drawing upon vast reservoirs of experience to make snap judgments which may be hard to justify explicitly.

Here, the job of the rationalist is to know when their expertise qualifies them to rely on emotional heuristics and when it does not [1].

2) Human introspection is shallow. There isn’t a list of likes and dislikes hidden in your brain somewhere, nor any inspectable algorithm which takes a stimulus as an input and returns a verdict of ‘good’ or ‘bad’. Emotions therefore convey personal information which otherwise would be impossible to gather. There are only so many ways to discover what you prefer without encountering various stimuli and observing the emotional valence you attach to them.

3) It’s relatively straightforward to extend point 3) to other people; in most cases, your own emotional response is your best clue as to how others would respond in similar circumstances [2].

4) Emotional responses like disgust often point to evolutionarily advantageous strategies. No one has to be taught to feel revolted at the sight of rotting meat, and few people feel any real attraction to near-relatives. Of course these responses are often spectacularly miscalibrated. People are unreasonably afraid of snakes and unreasonably unafraid of vehicles because snakes were a danger to our ancestors whereas vehicles were not. But this means that we should be amending our rational calculations and our emotional responses to be better in line with the facts, not trying to lobotomize ourselves.

5) Emotions form an essential component of meaningful aesthetic appreciation [3]. It’s possible to appreciate a piece of art, an artist, an artistic movement, or even an entire artistic medium in a purely cerebral fashion on the basis of technical accomplishments or historical importance. But I would argue that this process is not complete until you feel an appropriate emotion in answer to the merits of whatever it is you’re contemplating.

Take the masonry work on old-world buildings like the National Cathedral in Washington, D.C. You’d have to be a troglodyte to not feel some respect for how much skill must have gone into its construction. But you may have to spend a few hours watching the light filter through the stained-glass windows and feeling the way the architecture ineluctably pulls your gaze towards the sky before you can viscerally appreciate its grandeur.

This does not mean that the relationship between artistic perception and emotional response is automatic or unidirectional. Good art won’t always reduce you to tears, and art you initially enjoyed may seem to be vapid and shallow after a time. Moreover, the object of your aesthetic focus may not even be art in a traditional sense; I have written poetically about combustion engines, metal washers, and the constructed world in general. But being in the presence of genuine or superlative achievement should engender reverence, admiration, and their kin [4].

6) Some situations demand certain emotional responses. One might reasonably be afraid or angry when confronting a burglar in their home, but giddy joy would be the mark of a lunatic. This truth becomes even more stark if you are the head of household and responsible for the wellbeing of its occupants. What, besides contempt, could we feel for a man or woman who left their children in danger out of fear for their own safety?

***

If you’ve been paying attention you’ll notice that the foregoing actually splits into two broad categories: one in which emotions provide the rationalist with actionable data of one sort or another (1-4) and one in which the only rational response involves emotions (5 and 6). This latter category probably warrants further elaboration.

As hard as it may be to believe there are people in the world who are too accommodating and deferential, and need to learn to get angry when circumstances call for it. Conversely, most of us know at least one person to whom anger comes too easily and out of all reasonable proportion. Aristotle noted:

“Anybody can become angry – that is easy, but to be angry with the right person and to the right degree and at the right time and for the right purpose, and in the right way – that is not within everybody’s power and is not easy.”

This is true of sadness, melancholy, exhuberance, awe, and the full palette of human emotions, which can be rational or irrational depending on the situation. To quote C.S. Lewis:

“And because our approvals and disapprovals are thus recognitions of objective value or responses to an objective order, therefore emotional states can be in harmony with reason (when we feel liking for what ought to be approved) or out of harmony with reason (when we perceive that liking is due but cannot feel it). No emotion is, in itself, a judgment; in that sense all emotions and sentiments are alogical. But they can be reasonable or unreasonable as they conform to Reason or fail to conform. The heart never takes the place of the head: but it can, and should, obey it.”

-The Abolition of Man

I don’t endorse his view that no emotion is a judgment; arguments 1-4 were examples in which they are. But the overall spirit is correct. Amidst all the thorny issues a rationalist faces, perhaps the thorniest is examining their portfolio of typical emotional responses, deciding how they should be responding, gauging the distance between these two views, and devising ways of closing that distance.

Extirpating our emotions is neither feasible nor laudable. We must instead learn to interpret them when they are correct and sculpt them when they are not.

***

[1] Of course no matter how experienced you are and how good your first impressions have gotten there’s always a chance you’re wrong. By all means lean on emotions when you need to and can, but be prepared to admit your errors and switch into a more deliberative frame of mind when warranted.

[2] Your emotions needn’t be the only clue as to how others might act in a given situation. You can have declarative knowledge about the people you’re trying to model which overrides whatever data is provided by your own feelings. If you know your friend loves cheese then the fact that you hate it doesn’t mean your friend won’t want a cheese platter at their birthday party.

[3] I suppose it would be more honest to say that can’t imagine a ‘meaningful aesthetic appreciation’ which doesn’t reference emotions like curiosity, reverence, or awe.

[4] In “Shopclass as soulcraft” Matthew Crawford takes this further, and claims that part of being a good mechanic is having a normative investment in the machines on which you work:

“…finding [the] truth requires a certain disposition in the individual: attentiveness, enlivened by a sense of responsibility to the motorcycle. He has to internalize the well working of the motorcycle as an object of passionate concern. The truth does not reveal itself to idle spectators”.

The STEMpunk Project: Performing A Failure Autopsy

Background:

What follows is an edited version of an exercise I performed about a month ago following an embarrassing error cascade. I call it a ‘failure autopsy’, and on one level it’s basically the same thing as an NFL player taping his games and analyzing them later, looking for places to improve.

But the aspiring rationalist wishing to do the something similar faces a more difficult problem, for a couple of reasons:

First, the movements of a mind can’t be seen in the same way the movements of a body can, meaning a different approach must be taken when doing granular analysis of mistaken cognition.

Second, learning to control the mind is simply much harder than learning to control the body.

And third, to my knowledge, nobody has really even tried to develop a framework for doing with rationality what an NFL player does with football, so someone like me has to pretty much invent the technique from scratch on the fly.  

I took a stab at doing that, and I think the result provides some tantalizing hints at what a more mature, more powerful versions of this technique might look like. Further, I think it illustrates the need for what I’ve been calling a “Dictionary of Internal Events”, or a better vocabulary for describing what happens between your ears.

Process:

Performing a failure autopsy involves the following operations:

  1. List out the bare steps of whatever it was you were doing, mistakes and successes alike.
  2. Identify the points at which mistakes were made.
  3. Categorize the nature of those mistakes.
  4. Repeatedly visualize yourself making the correct judgment, at the actual location, if possible.
  5. (Optional) explicitly try to either analogize this context to others where the same mistake may occur, or develop toy models of the error cascade which you can use to template onto possible future contexts.

In my case, I was troubleshooting an air conditioner failure[1].

The garage I was working at has two five-ton air conditioning units sitting outside the building, with two wall-mounted thermostats on the inside of the building.

Here is a list of the steps my employee and I went through in our troubleshooting efforts:

  1. Notice that the right thermostat is malfunctioning.
  2. Decide to turn both AC units off[2] at the breaker[3] instead of at the thermostat.
  3. Decide to change the batteries in both thermostats.
  4. Take both thermostats off the wall at the same time, in order to change their batteries.
  5. Instruct employee to carry both thermostats to the house where the batteries are stored. This involves going outside into the cold.

The only non-mistakes were a) and c), with every other step involving an error of some sort. Here is my breakdown:

*b1) We didn’t first check to see if the actual unit was working; we just noticed the thermostat was malfunctioning and skipped straight to taking action. I don’t have a nice term for this, but it’s something like Grounding Failure.

*b2) We decided to turn both units off at the breaker, but it never occurred to us abruptly cutting off power might stress some of the internal components of the air conditioner. Call this “implication blindness” or Implicasia.

*b3) Turning both units off at the same time, instead of doing one and then the other, introduced extra variables that made downstream diagnostic efforts muddier and harder to perform. Call this Increasing Causal Opacity (ICO).

*d) We took both thermostats off the wall at the same time. It never occurred to us that thermostat position might matter, i.e. that putting the right thermostat in the slot where the left used to go or vice versa might be problematic, so this is Implicasia. Further, taking both down at the same time is ICO.

*e) Taking warm thermostats outside on a frigid night might cause water to condense on the inside, damaging the electrical components. This possibility didn’t occur to me (Implicasia).

In case this isn’t clear, here are two separate diagrammatic representations of the process. They are convey the same content, but the first is computer-generated and cleaner while the second is handwritten and contains a good deal of exposition:

 

Untitled Diagram (1)

***

failure_autopsy_written

Interventions:

So far all this amounts to is a tedious analysis of an unfolding disaster. What I did after I got this down on paper was try and re-live each step, visualizing myself performing the correct mental action.

So it begins with noticing that the thermostat is malfunctioning. In my simulation I’m looking at the thermostat with my employee, we see the failure, and the first thought that pops into my simulated head is to have him go outside and determine whether or not the AC unit is working.

I repeat this step a few times, performing repetitions the same way you might do in the gym.

Next, in my simulation I assume that the unit was not working (remember that in real life we never checked and don’t know), and so I simulate having two consecutive thoughts: “let’s shut down just the one unit, so as not to ICO” and “but we’ll start at the thermostat instead of at the breaker, so that the unit shuts down slowly before we cut power altogether. I don’t want to fall victim to Implicasia and assume an abrupt shut-down won’t mess something up”.

The second part of the second thought is important. I don’t know that turning an AC off at the breaker will hurt anything, but the point is that I don’t know that it won’t, which means I should proceed with caution.

As with before I repeat this visualization five times or so.

Finally, I perform this operation with both *d) and *e), in each case imagining myself having the kinds of thoughts that would have resulted in success rather than failure.

Broader Considerations:

The way I see it, this error cascade resulted from impoverished system models and from a failure to invoke appropriate rationalist protocols. I would be willing to bet that lots of error cascades stem from the same deficiencies.

Building better models of the systems relevant to your work is an ongoing task that combines learning from books and tinkering with the actual devices and objects involved.

But consistently invoking the correct rationalist protocols is a tougher problem. The world is still in the process of figuring out what those protocols should be, to say nothing of actually getting people to use them in real time. Exercises like this one will hopefully contribute something to the former effort, and a combination of mantras or visualization exercises is bound to help with the latter.

This failure autopsy also provides some clarity on the STEMpunk project: the object level goals of the project correspond to building richer system models while the meta level goals will help me develop and invoke the protocols required to reason about the problems I’m likely to encounter.

Future Research:

While this took the better part of 90 minutes to perform, spread out over two days, I’m sure it’s like the first plodding efforts of a novice chess player analyzing bad games. Eventually it will become second nature and I’ll be doing it on the fly in my head without even trying.

But that’s a ways off.

I think that if one built up a large enough catalog of failure autopsies they’d eventually be able to collate the results into something like a cognitive troubleshooting flowchart.

You could also develop a toy model of the problem (i.e. solving problems in a circuit that lights up two LEDs, reasoning deliberately to avoid Implicasia and changing one thing at a time to avoid ICO.)

Or, you could try to identify a handful of the causal systems around you where error cascades like this one might crop up, and try to preemptively reason about them.

I plan on exploring all this more in the future.

Notes:

[1] I’m not an HVAC technician, but I have worked with one and so I know enough to solve some very basic problems.

[2] Why even consider turning off a functioning AC? The interior of the garage has a lot of heavy machinery in it and thus gets pretty warm, especially on hot days, and if the ACs run continuously eventually the freon circulating lines will frost over and the unit will shut down. So, if you know the units have been working hard all day it’s often wise to manually shut one or both units down for ten minutes to make sure the lines have a chance to defrost and then manually turn them back on.

[3] Why even consider shutting off an AC at the breaker instead of the thermostat? The same reason that you sometimes have to shut an entire computer down and turning it back on when troubleshooting. Sometimes you have no idea what’s wrong, so a restart is the only reasonable next step.

Is Evolution Stoopid?

In a recent post I made the claim the evolution is a blind, stupid process that does what it does by brute-forcing through adjacent regions of possibility space with a total lack of foresight. When I said this during a talk I gave on superintelligence I met with some resistance along the lines of ‘calling evolution stupid is a mistake because sometimes there are design features in an evolved organism or process which are valuable even if human engineers are not sure why’.

This is true, but doesn’t conflict with the characterization of evolution as stupid because by that I just meant that evolution is incapable of the sort of planning and self-reflection that a human is capable of.

This is very different from saying that it’s trivial for a human engineer to out think evolution on any arbitrary problem. So far is I know nobody has figure out how to make replicators as good as RNA or how to make things that can heal themselves, both problems evolution has solved.

The difference is not unlike the difference between intelligence, which is something like processing speed, and wisdom, which is something like intelligence applied to experience.

You can be a math prodigy at the age of 7, but you must accrue significant experience before you can be a wisdom prodigy, and that has to happen at the rate of a human life. If one person is much smarter than another they may become wiser faster, but there’s still a hard limit to how fast you can become wise.

I’ve personally found myself in situations where I’ve been out-thought by someone who I’m sure isn’t smarter than me, simply because that other person has seen so many more things than I have.

Evolution is at one limit of the wisdom/intelligence distinction. Even zero intelligence can produce amazing results given a head start of multiple billions of years, and thus we can know ourselves to be smarter than evolution while humbly admitting that its designs are still superior to our own in many ways.

Takeoff Speed II: Recalcitrance in AI pathways to Superintelligence.

I’m writing a series of posts clarifying my position on the intelligence explosion hypothesis. Last time I took a look at various non-AI pathways to Superintelligence and concluded that the recalcitrance profile for most of them was moderate to high.

This doesn’t mean it isn’t possible to reach Superintelligence via these routes, but it does indicate that doing so will probably be difficult even by the standards of people who think about building Superintelligences all day long.

AI-based pathways to Superintelligence might have lower recalcitrance than these alternatives, because of a variety of advantages a software mind could have over a biological one.

These advantages have been discussed at length elsewhere, but relevant to the present discussion is that software minds could have far greater introspective access to their own algorithms than humans do.

Of course programmers building such a mind might fear an intelligence explosion and endeavor to prevent this sort of deep introspection. But in principle an AI with such capabilities could become smart enough to start directly modifying and improving its own code.

Humans can only do a weak sort of introspection, and therefore can only do a weak sort of optimization to their thinking patterns. So far, anyway.

At a futurist party recently I was discussing these ideas with someone and they asked me what might happen if a recursively self-improving AI hit diminishing returns on each optimization. Might an intelligence explosion just sort of… fizzle out?

The answer is yes, that might happen. But so far as I can tell there isn’t any good reason to assume that that will happen, and thus the safest bet is to act as though it probably will happen and start thinking hard about how to steer this runaway process in a direction that leads to a valuable future.

Takeoff Speed, I: Recalcitrance In Non-AI Pathways to Superintelligence

I’m writing a series of posts clarifying my position on the Intelligence Explosion hypothesis. Though I feel that the case for such an event is fairly compelling, it’s far less certain how fast the ‘takeoff’ will be, where ‘takeoff’ is defined as the elapsed time from having a roughly human-level intelligence to a superintelligence.

Once we’ve invented a way for humans to become qualitatively smarter or made machines able to improve themselves should we expect greater-than-human intelligence in a matter of minutes or hours (a ‘fast takeoff’), over a period of weeks, months or years (a ‘moderate takeoff’), or over decades and centuries (a ‘slow takeoff’)? What sorts of risks might each scenario entail?

Nick Bostrom (2014) provides the following qualitative equation for thinking about the speed with which intelligence might explode:

Rate of Improvement = (optimization power) / (recalcitrance)

‘Recalcitrance’ here refers to how amenable a system might be to improvements, a value which varies enormously for different pathways to superintelligence.

A non-exhaustive list of plausible means of creating a superintelligence includes programming a seed AI which begins an improvement cascade, upgrading humans with smart drugs or computer interfaces, emulating a brain in a computer and then improving it or speeding it up, and making human organizations vastly superior.

These can broadly be lumped into ‘non-AI-based’ and ‘AI-based’ pathways, each of which has a different recalcitrance profile.

In the case of improving the human brain through drugs, genetic enhancements, or computers, we can probably expect the initial recalcitrance to be low because each of these areas of research are inchoate and there is bound to be low-hanging fruit waiting to be discovered.

The current generation of nootropics is very crude, so a few years or a decade of concerted, well-funded research might yield classes of drugs able to boost the IQs of even healthy individuals 20 or 30 points.

But while it may be theoretically possible to find additional improvements in this area, the brain is staggeringly complicated with many subtle differences between individuals, so in practice we are only likely to get so far in trying to enhance it through chemical means.

The same basically holds for upgrading the human brain via digital prosthetics. I don’t know of any reason that working memory can’t be upgrade with the equivalent of additional sticks of RAM, but designing components that the brain tolerates well, figuring out where to put them, and getting them where they need to go is a major undertaking.

Beyond this, the brain and its many parts interact with each other in complex and poorly-understood ways. Even if we had solved all the technical and biological problems, the human motivation system is something that’s only really understood intuitively, and it isn’t obvious that the original motivations would be preserved in a radically-upgraded brain.

Perhaps, then, we can sidestep some of these issues and digitally emulate a brain which we speed up a thousand times.

Though this pathway is very promising, no one is sure what would happen to a virtual brain running much faster than it’s analog counterpart is supposed to. It could think circles around the brightest humans or plunge into demented lunacy. We simply don’t know.

Finally, there appears to be a very steep recalcitrance gradient in improving human organizations, assuming you can’t also modify the humans involved.

Though people have figured out ways of allowing humans to cooperate more effectively (and I assume the role the internet has played in improving the ability to coordinate on projects large and small is too obvious to need elaboration), it’s difficult to imagine what a large-scale general method for optimizing networks of humans would even look like.

None of the above should be taken to mean that research into Whole Brain Emulation or Human-Computer interaction isn’t well worth doing. It is, but many people make the unwarranted assumption that the safest path to superintelligence is to start with a human brain because at least then we’d have something with recognizably human motivations which, conversely, would also understand us.

But the difficulties adumbrated may make it more likely that some self-improving algorithm crosses the superintelligence finish line first, meaning our research effort should be focused on machine ethics.

Perhaps more troubling still, it isn’t trivial to assume that we can manage brain upgrades, digital, chemical, or otherwise, in a precise enough manner to ensure that the resulting superintelligence is benevolent or even sane.

Convergent AI Drives.

I’m writing a series of posts clarifying my position on the Intelligence Explosion, and here I want to discuss some theoretical work on the types of goals self-improving systems might converge upon.

Stephen Omohundro has made a convincing case that we can expect to see a wide variety of systems, with different utility functions and different architectures, to manifest a very similar set of sub-goals because such sub-goals are required to achieve almost any macro-goal.

These sub-goals are commonly referred to as the AI ‘drives’, and my discussion below isn’t exhaustive. Consult Omohundro (2008) and Bostrom (2014) for more lengthy treatments.

Imagine two different systems, one designed to solve the goldbach conjecture and another to manufacture solar panels. Both systems are at about the intelligence level of a reasonably bright human and they are capable of making changes to their own code.

These systems find that they can better accomplish their goals if they improve themselves by acquiring more resources and optimizing their reasoning algorithms. Further, they become protective of themselves and their utility functions because, well, they can’t accomplish their current goals if those goals change or they allow themselves to be shut off.

Despite how very different the terminal goals of these two systems are, each of them nevertheless develop drives to self-improve, defend themselves, and preserve their utility function even though neither system had these drives explicitly programmed in at the beginning.

Now, to my knowledge no one is claiming that each and every AI system will manifest all the drives in the course of self-improving. But Omohundro’s analysis might furnish a way to think about the general contours of recursive self-improvement in intelligent machines.

Thinking about the drives in advance is important because we might find that, to our surprise, the first Artificial General Intelligences we make have goals we didn’t give them. They might resist being unplugged or having their goals tampered with.

The drive to self improve is particularly important, though, because it could be a catalyst for an Intelligence Explosion.

Existential Risk: A Primer

I want to start this off with a quote, which nicely captures both how I use to feel about the idea of human extinction and how I feel about it now:

I think many atheists still trust in God. They say there is no God, but …[a]sk them how they think the future will go, especially with regards to Moral Progress, Human Evolution, Technological Progress, etc. There are a few different answers you will get: Some people just don’t know or don’t care. Some people will tell you stories of glorious progress…

The ones who tell stories are the ones who haven’t quite internalized that there is no god. The people who don’t care aren’t paying attention.

The correct answer is not nervous excitement, or world-weary cynicism, it is fear.

-Nyan Sandwich

Back when I was a Christian I probably gave some thought to the rapture, which is not entirely unlike extinction as far as most ten-year-olds can tell.  But that wouldn’t really be the end of all conscious human experience, since the righteous are transported to heaven to be with god and thus continue existing in a different form.  Sometime during this period I found a slim little book of fiction which portrayed a damned soul’s experience of burning in hell forever, and that did scare me.  Such torment, as luck would have it, is easy enough to avoid if you just call god the right name and ask forgiveness often enough.

When I was old enough to contemplate possible secular origins of the apocalypse, I was both an atheist and one of the people who tell glorious stories about the future.  The potential fruits of technological development, from the end of aging to the creation of a benevolent super-human AI, excited me, and still excite me now.  No doubt I would’ve admitted the possibility of human extinction, I don’t really remember.  But there wasn’t the kind of internal siren that should go off when you start thinking seriously about one of the Worst Possible Outcomes.  That I would remember.

But as I’ve gotten older I’ve come to appreciate that most of us are not afraid enough of the future. Those who are afraid, are often afraid for the wrong reasons.

What is an Existential Risk?

An existential risk or x-risk (to use a common abbreviation) is “…one that threatens to annihilate Earth-originating intelligent life or permanently and drastically to curtail its potential” (Bostrom 2006). The definition contains some subtlety, as not all x-risks involve the outright death of every human. Some could take potentially eons to complete, and some are even survivable.

Positioning x-risks within the broader landscape of risks yields something like this chart:

A graph with severity of the risk along the x axis and scope along the right (Bostrom 2013)

At the top right extreme is where Cthulu sleeps.  They are risks that carry the potential to drastically and negatively affect this and every subsequent human generation.

So as not to keep everyone in suspense, let’s use this chart to put a face on the shadows.

Four Types of Existential Risks

Philosopher Nick Bostrom has outlined four broad categories of x-risk.  In more recent papers he hasn’t used the terminology that I’m using here, so maybe he thinks the names are obsolete.  I find them evocative and useful, however, so I’ll stick with them until I have a reason to change.

Bangs are probably the easiest risks to conceptualize.  Any event which causes the sudden and complete extinction of humanity would count as a Bang.  Think asteroid impacts, supervolcanic eruptions, or deliberately misused nanoweapons.

Crunches are risks which humans survive but which leaves us permanently unable to navigate to a more valuable future.  An example might be depleting our planetary resources before we manage to build the infrastructure needed to mine asteroids or colonize other planets.  After all the die-offs and fighting, some remnant of humanity could probably survive indefinitely, but it wouldn’t be a world you’d want to wake up in.

Shrieks occur when a post-human civilization develops but only manages to realize a small amount of its potential.  Shrieks are very difficult to effectively categorize, and I’m going to leave examples until the discussion below.

Whimpers are really long-term existential risks.  The most straight forward is the heat death of the universe; within our current understanding of physics, no matter how advanced we get we will eventually be unable to escape the ravages of entropy.  Another could be if we encounter a hostile alien civilization that decides to conquer us after we’ve already colonized the galaxy.  Such a process could take a long time, and thus would count as a whimper.

Just because whimpers are so much less immediate than other categories of risk and x-risk doesn’t automatically mean we can just ignore them; it has been argued that affecting the far future is one of the most important projects facing humanity, and thus we should take the time to do it right.

Sharp readers will no doubt have noticed that there is quite a bit of fuzziness to these classifications.  Where, for example, should we put all-out nuclear war, the establishment of an oppressive global dictatorship, or the development of a dangerous and uncontrollable superintelligent AI? If everyone dies in the war it counts as a bang, but if it makes a nightmare of the biosphere while leaving a good fraction of humanity intact it would be a crunch.  A global dictatorship wouldn’t be an x-risk unless it used some (probably technological) means to achieve near-total control and long-term stability, in which case it would be a crunch.  But it isn’t hard to imagine such a situation in which some parts of life did get better, like if a violently oppressive government continued to develop advanced medicines so that citizens were universally healthier and longer-lived than people today.  If that happened, it would be a Shriek.  A similar analysis applies to the AI, with the possible outcomes being Bang, Crunch, and Shriek depending on just how badly we misprogrammed it.

What Ties These Threads Together?

Even if you think existential threats deserve more attention, the rationale for treating them as a diverse but unified phenomenon may not be obvious.  In addition to the crucial but (relatively) straightforward work of, say, tracking Near-Earth Objects (NEOs), existential risk researchers also think seriously about alien invasions and rogue AIs. With such a range of speculativeness, why group x-risks together at all?

It turns out that they share a cluster of features which does give them some cohesion and make them worth studying under a single label, not all of which I discuss here.  First and most obvious is that should any of them occur the consequences would be truly vast relative to any other kind of risk.  To see why, think about the difference between a catastrophe that kills 99% of humanity and one that kills 100%.  As big a tragedy as the former would be, there’s a chance humans could recover and build a post-human civilization.  But if every person dies, then the entire value of our future is lost (Bostrom 2013).

Second, these are not risks which admit of a trial and error approach.  Pretty much by definition a collision with an x-risk will spell doom for humanity, and so we must be more proactive in our strategies for reducing them.

Related to this, we as a species have neither the cultural nor biological instincts needed to prepare us for the possibility of extinction.  A group of people might live through several droughts and thus develop strong collective norms towards planning ahead and keeping generous food reserves.  But they cannot have gone extinct multiple times, and thus they can’t rely on their shared experience and cultural memory to guide them in the future.  I certainly hope we can develop a set of norms and institutions which makes us all safer, but we can’t wait to learn from history.  We’re going to have to start well in advance, or we won’t survive.

A final commonality I’ll mention is that the solutions to quite a number of x-risks are themselves x-risks.  A powerful enough government could effectively halt research into dangerous pathogens or nano-replicators.  But given how States have generally comported themselves in the past, one would do well to be cautious before investing them with that kind of power.  Ditto for a superhuman AI, which could set up an infrastructure to protect us from asteroids, nuclear war, or even other less Friendly AI.  Get the coding just a little wrong, though, and it might reuse your carbon to make paperclips.

It is indeed a knife edge along which we creep towards the future.

Measuring the Monsters

A first step is getting straight about how likely survival is.  The reader may have encountered predictions of the “we have only a 50% chance of surviving the next hundred years” variety.  Examining the validity of such estimates is worth doing, but I won’t be taking up that challenge here; I tend to agree that these figures involves a lot of subjective judgement, but that even if the chances were very very small it would still be worth taking seriously (Bostrom 2006).   At any rate, it seems to me that trying to calculate an overall likelihood of human extinction is going to be premature before we’ve nailed down probabilities for some of the different possible extinction scenarios.  It is to the techniques which x-risk researchers rely on to try and do this that I now turn.

X-risk-assessments rely on both direct and indirect methods (Bostrom 2002).  Using a direct method involves building a detailed causal model of the phenomenon and using that to generate a risk probability, while indirect methods include arguments, thought experiments, and information that we use to constrain and refine our guesses.

As far as I know for some x-risks we could use direct methods if we just had a way to gather the relevant information.  If we knew where all the NEOs were we could use settled physics to predict whether any of them posed a threat and then prioritize accordingly. But we don’t where they all are, so we might instead examine the frequency of impacts throughout the history of the Earth and then reason about whether or not we think an impact will happen soon.   It would be nice to exclusively use direct methods, but we supplement with indirect methods when we can’t, and of course for x-risks like AI we are in an even more uncertain position than we are for NEOs.

The Fermi Paradox

Applying indirect methods can lead to some strange and counter-intuitive territory, an example of which is the mysteries surrounding the Fermi Paradox.  The central question is: in a universe with so many potential hotbeds of life, why is it that when we listen for stirring in the void all we hear is silence?  Many feel that the universe must be teeming with life, some of it intelligent, so why haven’t we see any sign of it yet?

Musing about possible solutions to the Fermi Paradox can be a lot of fun, and it’s worth pointing out that we haven’t been looking that long or that hard for signals yet. Nevertheless I think the argument has some meat to it.

Observing this state of affairs, some have postulated the existence of at least one Great Filter, a step in the chain of development from the first organisms to space-faring civilizations that must be extremely hard to achieve.   

This is cause for concern because the Great Filter could be in front of us or behind us.  Let me explain: imagine a continuum with the simplest self-replicating molecules on one side and the Star Trek Enterprise on the other.  From our position on the continuum we want to know whether or not we have already passed the hardest step, but we have only our own planet to look at.  So imagine that we send out probes to thousands of different worlds in the hopes that we will learn something.

If we find lots of simple eukaryotes that means that the Great Filter is probably not before the development of membrane-bound organelles. The list of possible places on the continuum the Great Filter could be shrinks just a little bit.  If instead we find lots of mammals and reptiles (or creatures that are very different but about as advanced), that means the Great Filter is probably not before the rise of complex organisms, so the places the Great Filter might be hiding shrinks again.  Worst of all would be if we find the dead ruins of many different advanced civilizations.  This would imply that the worst is yet to come, and we will almost certainly not survive it.

As happy as many people would be to discover evidence of life in the universe, a case has been made that we should hope to find only barren rocks waiting for us in the final frontier. If not even simple bacteria evolve on most worlds, then there is still a chance that the Great Filter is behind us, and we can worry only about the new challenges ahead.

If all this seems really abstract out there, that’s because it is.  But I hope it is clear how this sort of thinking can help us interpret new data, make better guesses, form new hypotheses, etc.  When dealing with stakes this high and information this limited, one must do the best they can with what’s available.

Mitigation

What priority should we place on reducing existential risk and how can we do that?

I don’t know of anyone who thinks all our effort should go towards mitigating x-risks; there are lots of pressing issues which are not x-risks that are worth our attention, like abject poverty or geopolitical instability.  But I feel comfortable saying we aren’t doing nearly as much as we should be.  Given the stakes and the fact that there probably won’t be a second chance we are going to have to meet x-risks head on and be aggressively proactive in mitigating them.

What does ‘aggressively proactive’ mean?  Well the first step, as it so often is, will be just to get the right people to be aware of the problem (Bostrom 2002).  Thankfully this is starting to be the case as more funding and brain power go into existential risk reduction. We have to get to a point where we are spending at least as much time, energy, and effort making new technology safe as we do making it more powerful.  More international cooperation on these matters will be necessary, and there should be some sort of mechanism by which efforts to develop existentially-threatening technologies like super-virulent pathogens can be stopped.  I don’t like recommending this at all, but almost anything is preferable to extinction.

In the meantime both research that directly reduces x-risk (like NEO detection), as well as research that will help elucidate deep and foundational issues in x-risk (FHI and MIRI) should be encouraged.

Conclusion

Though I maintain we should be more fearful of what’s to come, that should not obscure the fact that the human potential is vast and truly exciting.  If the right steps are taken, we and our descendants will have a future better than most can even dream of.  Life spans measured in eons could be spent learning and loving in ways our terrestrial languages don’t even have words for yet.  The vision of a post-human civilization flinging it’s trillions of descendants into the universe to light up the dark is tremendously inspiring.  It’s worth fighting for.

But we have much work ahead of us.

How to Have Space Correctly

[NOTE: This post has undergone substantial revisions following feedback in the comments section of the blog LessWrong, where it was originally posted.  The basic complaint was that it was too airy and light on concrete examples and recommendations.  So I’ve said oops, applied the virtue of narrownessgotten specific, and hopefully made this what it should’ve been the first time.]  

Take a moment and picture a master surgeon about to begin an operation.  Visualize the room (white, bright overhead lights), his clothes (green scrubs, white mask and gloves), the patient, under anesthesia and awaiting the first incision. There are several other people, maybe three or four, strategically placed and preparing for the task ahead.  Visualize his tools – it’s okay if you don’t actually know what tools a surgeon uses, but imagine how they might be arranged.  Do you picture them in a giant heap which the surgeon must dig through every time he wants something, or would they be arranged neatly (possibly in the order they’ll be used) and where they can be identified instantly by sight?  Visualize their working area.  Would it be conducive to have random machines and equipment all over the place, or would every single item within arms reach be put there on purpose because it is relevant, with nothing left over to distract the team from their job for even a moment?

Space is important.  You are a spatially extended being interacting with spatially extended objects which can and must be arranged spatially.  In the same way it may not have occurred to you that there is a correct way to have things, it may not have occurred to you that space is something you can use poorly or well.  The stakes aren’t always as high as they are for a surgeon, and I’m sure there are plenty of productive people who don’t do a single one of the things I’m going to talk about.  But there are also skinny people who eat lots of cheesecake, and that doesn’t mean cheesecake is good for you.  Improving how you use the scarce resource of space can reduce task completion time, help in getting organized, make you less error-prone and forgetful, and free up some internal computational resources, among other things.

What Does Using Space Well Mean?

It means consciously manipulating the arrangement, visibility, prominence, etc. of objects in your environment to change how they affect cognition (yours or other people’s).  The Intelligent Use of Space (Kirsch, “The Intelligent Use of Space”, 1995) is a great place to start if you’re skeptical that there is anything here worth considering.  It’s my primary source for this post because it is thorough but not overly technical, contains lots of clear examples, and many of the related papers I read were about deeper theoretical issues.

The abstract of the paper reads:

How we manage the spatial arrangement of items around us is not an afterthought: it is an integral part of the way we think, plan, and behave. The proposed classification has three main categories: spatial arrangements that simplify choice; spatial arrangements that simplify perception; and spatial dynamics that simplify internal computation. The data for such a classification is drawn from videos of cooking, assembly and packing, everyday observations in supermarkets, workshops and playrooms, and experimental studies of subjects playing Tetris, the computer game. This study, therefore, focuses on interactive processes in the medium and short term: on how agents set up their workplace for particular tasks, and how they continuously manage that workplace.

The ‘three main categories’ of simplifying choice, perception, and internal computation can be further subdivided:

simplifying choice

  •       reducing or emphasizing options.
  •       creating the potential for useful new choices.

simplifying perception

  •       clustering like objects.
  •       marking an object.
  •       enhancing perceptual ability.

simplfying internal computation

  •      doing more outside of your head.

These sub-categories are easier to picture and thus more useful when trying to apply the concept of using space correctly, and I’ve provided more illustrations below. It’s worth pointing out that (Kirsch, “The Intelligent Use of Space”, 1995) only considered the behavior of experts.  Perhaps effective space management partially explains expert’s ability to do more of their processing offline and without much conscious planning.  An obvious follow up would be in examining how novices utilize space and looking for discrepancies.

What Does Using Space Well Look Like?

The paper walks the reader through a variety of examples of good utilization of space.  Consider an expert cook going through the process of making a salad with many different ingredients, and ask how you would accomplish the same task differently:

…one subject we videotaped, cut each vegetable into thin slices and laid them out in tidy rows. There was a row of tomatoes, of mushrooms, and of red peppers, each of different length…To understand why lining up the ingredients in well ordered, neatly separated rows is clever, requires understanding a fact about human psychophysics: estimation of length is easier and more reliable than estimation of area or volume. By using length to encode number she created a cue or signal in the world which she could accurately track. Laying out slices in lines allows more precise judgment of the property relative number remaining than clustering the slices into groups, or piling them up into heaps. Hence because of the way the human perceptual system works, lining up the slices creates an observable property that facilitates execution.

Here, the cook used clustering and clever arrangement to make better use of her eyes and to reduce the load on her working memory, techniques I use myself in my day job.  As of this writing (2013) I’m teaching English in Korea.  I have a desk, a bunch of books, pencils, erasers, the works.  All the folders are together, the books are separated by level, and all ungraded homework is kept in its own place.  At the start of the work day I take out all the books and folders I’ll need for that day and arrange them in the same order as my classes. When I get done with a class the book goes back on the day’s pile but rotated 90 degrees so that I can tell it’s been used. When I’m totally done with a book and I’ve entered homework scores and such, it goes back in the main book stack where all my books are.  I can tell at a glance which classes I’ve had, which ones I’ll have, what order I’m in, which classes are finished but unprocessed, and which ones are finished and processed.  Cthulu only knows how much time I save and how many errors I prevent all by utilizing space well.

These examples show how space can help you keep track of temporal order and make quick, accurate estimates, but it may not be clear how space can simplify choice.  Recall that simplifying choice usually breaks down into either taking some choices away or making good choices more obvious.  Taking choices away may sound like a bad thing, but each choice requires you to spend time evaluating options, and if you are juggling many different tasks the chance of making the wrong choice goes up.  Similarly, looking for good options soaks up time, unless you can find a way to make yourself trip over them.

An example of removing bad decisions is in factory workers placing a rag on hot pipes so they know not to touch them (Kirsch, “The Intelligent Use of Space”, 1995).  By symbolically marking a dangerous object the engineers are shutting down the class of actions which involves touching the pipe. It is all too easy in the course of juggling multiple aspects of a task to forget something like this and injure yourself.  The strategically placed and obvious visual marker means that the environment keeps track of the danger for you.  Likewise poisonous substances have clear warning labels and are kept away from anything you might eat; both precautions count as good use of space.

And here is how some carpenters structure their work space so that they can make good uses for odds and ends easier to see:

 In the course of making a piece of furniture one periodically tidies up. But not completely. Small pieces of wood are pushed into a corner or left about; tools, screw drivers and mallets are kept nearby. The reason most often reported is that ‘they come in handy’. Scraps of wood can serve to protect surfaces from marring when clamped, hammered or put under pressure. They can elevate a piece when being lacquered to prevent sticking. The list goes on.

My copy of Steven Johnson’s Where Good Ideas Come From is on another continent, but the carpenter example reminded me of his recommendation to keep messy notebooks.  Doing so makes it more likely you’ll see unusual and interesting connections between things you’re thinking about.  He goes so far as to use a tool called DevonThink which speeds this process up for him.

And while I’m at it, this also points to one advantage of having physical books over PDFs.  My books take up space and are easier to see than their equivalent 1’s and 0’s on a hard drive, so I’m always reminded of what I have left to read. More than once I’ve gone on a useful tangent because the book title or cover image caught my attention, and more than one interesting conversation got started when a visitor was looking over my book collection.  Scanning the shelves at a good university library is even better, kind of like 17th-century StumbleUpon, and English-language libraries are something I’ve sorely missed while I’ve been in Asia.

All this usefulness derives from the spatial properties and arrangement of books, and I have no idea how it can be replicated with the Kindle.

Specific Recommendations

You can see from the list of examples I’ve provided that there are a billion ways of incorporating these insights into work, life, and recreation.  By discussing the concept I hope to have drawn your attention to the ways in which space is a resource, and I suspect just doing this is enough to get a lot of people to see how they can improve their use of space.  Here are some more ideas, in no particular order:

-I put my alarm clock far enough away from my bed so that I have to actually get up to     turn it off.  This is so amazingly    effective at ensuring I get up in the morning that I often hate my previous-night’s self.  Most of the time I can’t go back to  sleep even when I try.

-There’s reason to suspect that a few extra monitors or a bigger display will make your life easier  [Thanks Qiaochu_Yuan].

-When doing research for an article like this one, open up all the tabs you’ll need for the project in a separate window and close  each tab as you’re done with it.  You’ll be less distracted by something irrelevant and you won’t have to remember what you did  or didn’t read.  

-Having a separate space to do something seems to greatly increase the chances I’ll get it done.  I tried not going to the gym  for a while and just doing push ups in my house, managing to keep that up for all of a week or so. Recently, I switched gyms,  and despite now having to take a bus all the way across town I make it to the gym 3-5 times a week, pretty much without fail.  If your studying/hacking/meditation isn’t going well, try going somewhere which exists only to give people a  place to do that  thing.

-Put whatever you can’t afford to forget when you leave the house right by the door.

-If something is really distracting you, completely remove it from the environment temporarily.  During one particularly strenuous  finals in college I not only turned off the xbox, I completely unplugged it and put it in a drawer.  Problem. Solved.

-Alternatively, anything you’re wanting to do more of should be out in the open.  Put your guitar stand or chess board or  whatever where you’re going to see it frequently, and you’ll engage with it more often.  This doubles as a signal to other  people, giving you an opportunity to manage their impression of you, learn more about them, and identify those with similar  interests to yours.  

-Make use of complementary strategies (Kirsch, “Complementary Strategies”, 1995).  If you’re having trouble comprehending    something, make a diagram, or write a list.  The linked paper describes a simple pilot study which involved two groups tasked  with counting coins, one which could use their hands and one which could not.  The ‘no hands’ group was more likely to make  errors and to take longer to complete the task.  Granted, this was a pilot study with sample size = 5, and the difference  wasn’t that stark.  But it’s worth thinking about next time you’re stuck on a problem.

-Complementary strategies can also include things you do with your body, which after all is just space you wear with you  everywhere.  Talk out loud to yourself if you’re alone, give a mock presentation in which you summarize a position you’re trying  to understand, keep track of arguments and counterarguments with your fingers.  I’ve always found the combination of  explaining something out loud to an imaginary person while walking or pacing to be especially potent.  Some of my best ideas  come to me while I’m hiking.

-Try some of these embodied cognition hacks.

Summary and Conclusion

Space is a resource which, like all others, can be used effectively or not.  When used effectively, it acts to simplify choices, simplify perception, and simplify internal computation.  I’ve provided many examples of good space usage from all sorts of real-life domains in the hopes that you can apply some of these insights to live and work more effectively.

Further Reading

[In the original post these references contained no links.  Sincere thanks to user Pablo_Stafforini for tracking them down]

Kirsh, D. (1995) The Intelligent Use of Space

Kirsh, D. (1999) Distributed Cognition, Coordination and Environment Design

Kirsh, D. (1998) Adaptive Rooms, Virtual Collaboration, and Cognitive Workflow

Kirsh, D. (1996) Adapting the Environment Instead of Oneself

Kirsh, D. (1995) Complementary Strategies: Why we use our hands when we think