The two topics I’ve been thinking the most about lately:
- What makes some patterns of consciousness feel better than others? I.e. can we crisply reverse-engineer what makes certain areas of mind-space pleasant, and other areas unpleasant?
- If we make a smarter-than-human Artificial Intelligence, how do we make sure it has a positive impact? I.e., how do we make sure future AIs want to help humanity instead of callously using our atoms for their own inscrutable purposes? (for a good overview on why this is hard and important, see Wait But Why on the topic and Nick Bostrom’s book Superintelligence)
I hope to have something concrete to offer on the first question Sometime Soon™. And while I don’t have any one-size-fits-all answer to the second question, I do think the two issues aren’t completely unrelated. The following outlines some possible ways that progress on the first question could help us with the second question.
An important caveat: much depends on whether pain and pleasure (collectively, ‘valence‘) are simple or complexproperties of conscious systems. If they’re on the complex end of the spectrum, many points on this list may not be terribly relevant for the foreseeable future. On the other hand, if they have a relatively small “kolmogorov complexity” (e.g., if a ‘hashing function’ to derive valence could fit on a t-shirt), crisp knowledge of valence may be possible sooner rather than later, and could have some immediate relevance to current Friendly Artificial Intelligence (FAI) research directions.
Additional caveats: it’s important to note that none of these ideas are grand, sweeping panaceas, or are intended to address deep metaphysical questions, or aim to reinvent the wheel- instead, they’re intended to help resolve empirical ambiguities and modestly enlarge the current FAI toolbox.
1. Valence research could simplify the Value Problem and the Value Loading Problem.* If pleasure/happiness is an important core part of what humanity values, or should value, having the exact information-theoretic definition of it on-hand could directly and drastically simplify the problems of what to maximize, and how to load this value into an AGI**.
*The “Value Problem” is what sort of values we should instill into an AGI- what the AGI should try to maximize. The “Value Loading Problem” is how to instill these values into the AGI.
**An AGI is an Artificial General Intelligence. AI researchers use this term to distinguish something generally intelligent and good at solving arbitrary problems (like a human) from something that’s narrowly intelligent (like a program that only plays Chess).
This ‘Value Problem’ is important to get right, because there are a lot of potential failure modes which involve superintelligent AGIs doing exactly what we say, but not what we want (e.g., think of what happened to King Midas). As Max Tegmark puts it in Friendly Artificial Intelligence: the Physics Challenge,
What is the ultimate ethical imperative, i.e., how should we strive to rearrange the particles of our Universe and shape its future? If we fail to answer [this] question rigorously, this future is unlikely to contain humans.
2. Valence research could form the basis for a well-defined ‘sanity check’ on AGI behavior. Even if pleasure isn’t a core terminal value for humans, it could still be used as a useful indirect heuristic for detecting value destruction. I.e., if we’re considering having an AGI carry out some intervention, we could ask it what the expected effect is on whatever pattern precisely corresponds to pleasure/happiness. If there’s be a lot less of that pattern, the intervention is probably a bad idea.
3. Valence research could help us be humane to AGIs and WBEs*. There’s going to be a lot of experimentation involving intelligent systems, and although many of these systems won’t be “sentient” in the way humans are, some system types will approach or even surpass human capacity for suffering. Unfortunately, many of these early systems won’t work well— i.e., they’ll be insane. It would be great if we had a good way to detect profound suffering in such cases and halt the system.
*A WBE is a Whole-Brain Emulation, which is a hypothetical process which involves scanning a brain at a very high resolution, then emulating it in software on a very fast computer. If we do it right, the brain-running-as-software should behave identically with the original brain-running-as-neurons.
4. Valence research could help us prevent Mind Crimes. Nick Bostrom suggests in Superintelligence that AGIs might simulate virtual humans to reverse-engineer human preferences, but that these virtual humans might be sufficiently high-fidelity that they themselves could meaningfully suffer. We can tell AGIs not to do this- but knowing the exact information-theoretic pattern of suffering would make it easier to specify what not to do.
5. Valence research could enable radical forms of cognitive enhancement. Nick Bostrom has argued that there are hard limits on traditional pharmaceutical cognitive enhancement, since if the presence of some simple chemical would help us think better, our brains would probably already be producing it. On the other hand, there seem to be fewer a priori limits on motivational or emotional enhancement. And sure enough, the most effective “cognitive enhancers” such as adderall, modafinil, and so on seem to work by making cognitive tasks seem less unpleasant or more interesting. If we had a crisp theory of valence, this might enable particularly powerful versions of these sorts of drugs.
6. Valence research could help align an AGI’s nominal utility function with visceral happiness. There seems to be a lot of confusion with regard to happiness and utility functions. In short: they are different things! Utility functions are goal abstractions, generally realized either explicitly through high-level state variables or implicitly through dynamic principles. Happiness, on the other hand, seems like an emergent, systemic property of conscious states, and like other qualia but unlike utility functions, it’s probably highly dependent upon low-level architectural and implementational details and dynamics. In practice, most people most of the time can be said to have rough utility functions which are often consistent with increasing happiness, but this is an awfully leaky abstraction.
My point is that constructing an AGI whose utility function is to make paperclips, and constructing a sentient AGI who is viscerally happy when it makes paperclips, are very different tasks. Moreover, I think there could be value in being able to align these two factors— to make an AGI which is viscerally happy to the exact extent that it’s maximizing its nominal utility function.
(Why would we want to do this in the first place? There is the obvious semi-facetious-but-not-completely-trivial answer— that if an AGI turns me into paperclips, I at least want it to be happy while doing so—but I think there’s real potential for safety research here also.)
7. Valence research could help us construct makeshift utility functions for WBEs and Neuromorphic* AGIs.How do we make WBEs or Neuromorphic AGIs do what we want? One approach would be to piggyback off of what they already partially and imperfectly optimize for already, and build a makeshift utility function out of pleasure. Trying to shoehorn a utility function onto any evolved, emergent system is going to involve terrible imperfections, uncertainties, and dangers, but if research trends make neuromorphic AGI likely to occur before other options, it may be a case of “something is probably better than nothing.”
One particular application: constructing a “cryptographic reward token” control scheme for WBEs/neuromorphic AGIs. Carl Shulman has suggested we could incentivize an AGI to do what we want by giving it a steady trickle of cryptographic reward tokens that fulfill its utility function- it knows if it misbehaves (e.g., if it kills all humans), it’ll stop getting these tokens. But if we want to construct reward tokens for types of AGIs that don’t intrinsically have crisp utility functions (such as WBEs or neuromorphic AGIs), we’ll have to understand, on a deep mathematical level, what they do optimize for, which will at least partially involve pleasure.
*A “neuromorphic” AGI is an AGI approach that uses the human brain as a general template for how to build an intelligent system, but isn’t a true copy of any actual brain (i.e., a Whole-Brain Emulation). Nick Bostrom thinks this is the most dangerous of all AGI approaches, since you get the unpredictability of a fantastically convoluted, very-hard-to-understand-or-predict system, without the shared culture, values, and understanding you’d get from a software emulation of an actual brain.
8. Valence research could help us better understand, and perhaps prevent, AGI wireheading. How can AGI researchers prevent their AGIs from wireheading (direct manipulation of their utility functions)? I don’t have a clear answer, and it seems like a complex problem which will require complex, architecture-dependent solutions, but understanding the universe’s algorithm for pleasure might help clarify what kind of problem it is, and how evolution has addressed it in humans.
9. Valence research could help reduce general metaphysical confusion. We’re going to be facing some very weird questions about philosophy of mind and metaphysics when building AGIs, and everybody seems to have their own pet assumptions on how things work. The better we can clear up the fog which surrounds some of these topics, the lower our coordinational friction will be when we have to directly address them.
Successfully reverse-engineering a subset of qualia (valence- perhaps the easiest type to reverse-engineer?) would be a great step in this direction.
10. Valence research could change the social and political landscape AGI research occurs in. This could take many forms: at best, a breakthrough could lead to a happier society where many previously nihilistic individuals suddenly have “skin in the game” with respect to existential risk. At worst, it could be a profound information hazard, and irresponsible disclosure or misuse of such research could lead to mass wireheading, mass emotional manipulation, and totalitarianism. Either way, it would be an important topic to keep abreast of.
These are not all independent issues, and not all are of equal importance. But, taken together, they do seem to imply that reverse-engineering valence will be decently relevant to FAI research, particularly with regard to the Value Problem, reducing metaphysical confusion, and perhaps making the hardest safety cases (e.g., neuromorphic AGIs) a little bit more tractable.
A key implication is that valence/qualia research can (for the most part) be considered safety research without being capabilities research– solving consciousness would make it easier to make an AGI that treats humanity (and all conscious entities) better, without making it easier to create the AGI in the first place (and this is a good thing).
-Edward Michael Johnson, Berkeley
Okay. That mouthful fucked the horny goddess of Runaway Signaling so hard, that it gave her genito-pelvic pain/penetration disorder (GPPD).
Luckily, given this display of hyper-moral engagement, we don’t have to worry that the author is actually a sex offender.
In Enlightenment Now by Steven Pinker, he starts with an anecdote in which a student asks “Why should I live?” upon hearing Pinker’s spiel that mental activity occurs in the tissues of the brain. Pinker responded by noting and complimenting the student’s commitment to reason. And then gifted her with an improvised Humanism 101 introductory paragraph. Inspired by his own response, Pinker decided to package these Enlightenment values into a book vector by virtue of his
profit-seeking motives desire for the flourishing of sentient beings.
However, I believe that we need a world were public intellectual Ph.D’s sound a lot more like Edward Johnson, and less like Pinker. If we are going to replace the religious soul, might as well go all in. Eschatology needs to be epic. It needs to involve the inherent desire for ecstatic final self-destruction of man, namely, the desire for Heaven/Brahman/Nibbana. Now this desire can be translated in rationalist, transhumanist foresight as the creation of the perfect mind-configuration, and the proceeding tiling of the universe with this maximally positive-valence hedonium.
For the self-described “atom machine with a limited scope of intelligence, sprung from selfish genes, inhabiting spacetime,” asking Pinker for guidance through email, it won’t be enough to be reminded that he can flick the tap on the sink and “water runs!” Pinker is smart, and he should know this. There are a great many number of narratives we can construe, and yet none satisfies all. Carl Sagan will love being interwoven into the mechanics of the blind universe, as “its way of experiencing itself.” People with high hedonic set-points and amiability will already be socially-integrated liberals who are happy that water runs and believe themselves to be part of a good human-centric world to which they contribute. Typical Normie will not give a shit as long as there are dank memes and political outrage.
Naively, Pinker tries to reach the angsty-type with appeals to social-centric concerns. This fails because it is like trying to feed carrots to a wolf. The angsty-type will find a way to cling to a self-defeating narrative. My mom leans more towards the anxious type, so she always worried about the agony of purgatory, and never mentioned the promise of Heaven, although this brighter-side is just as accessible within the framework of her Catholic religion. The embroidery in the tokens of language is not as important as the inherent neural predispositions.
Religions adapted to neurodiversity. Buddhism, centrally concerned with the cessation of suffering by extinguishing the flame of existence, also provided a system for laymen who might not be allured by this goal/non-goal of Nibbana. If a significant part of the population is not cognitively disposed to be perfectionist or is depressed/suffering, it’s going to be a hard sell. But if you provide a layman’s path with a karma system by which you can accumulate points and be reborn into a more pleasurable realm, now you can get average humans to cooperate in the project by providing alms for monks, being good citizens, etc.
Pinker’s Humanism is brittle. It provides no room for the Aspies and the types who crave meta-narratives. If we are going to choose a new religion for the Western world, I wager we pick Edward Johnson’s. Rationalist/transhumanist/effective altruist and the rest of that ideological neighborhood do better than mere liberal humanism. In this burgeoning movement, there are cryonics for those who crave resurrection but are smart enough to know better than trusting dead Palestinian carpenters; there are galaxy-is-at-stake hero quests that involve math and computer science, there are donations to charities that help the poor in Africa, there are academics at universities and anti-establishment objectors. You can be as down-to-earth as you want or as cosmically significant, based on the particular drive thresholds in your mesolimbic system.
Oh, but wait, how could I have missed this? The only problem will be that people who take Humanism seriously, and bother to even watch a Youtube video of a public intellectual saying science-y, reason-y things, is already a ghetto in the bell-curve. The slice who might stumble and gravitate around Transhumanism is even slighter. No one is listening! No one is listening Pinker! We are alone.
How did I even have the energy to read the first page of your book? The net is vast and infinite.