ACT Workbook (Linear algebra notes)

ACT Therapy

I've spent a lot of time in therapy over the years. My favorite modality is ACT -- which stands for "Acceptance and Commitment Therapy". I think I like ACT in part because it's hard, and it's complicated. You cannot read Get out of your mind and into your life like you might read a novel. Each page requires careful thought. The exercises take real work. ACT reminds me, in an odd way, of Axler's Linear Algebra Done Right.

Two of the key skills you train in ACT are "cognitive defusion" and "self as context". Traditionally, these are taught through mindfulness exercises. I've done a lot of mindfulness exercises. Listened to many guided meditations. Flirted with Zen practice (I own several zafu). But recently, the approach I'm finding most helpful is to borrow habits of thought from other parts of my life. Sometimes, what "self as context" means for me is that I think very hard about how I might best model the way I'm responding to a stressful situation using some kind of homebrew TTRPG system. I make stat blocks. Model debuffs, special abilities, resource pools, all while looking for vulnerabilities or power-combos. I find it helps. It helps me, personally, more than the meditation does. "Observe your thoughts with curiosity and nonjudgment" is a nice idea. I can do it, from time to time, though it does take work. But once I realize I can approach that same task as game design, well... then it quickly becomes far more engaging. Then it stops feeling like work at all.

Recently, however, I haven't been doing as much pseudo-game design. I've been thinking, instead, about how to best model my own behaviors, and the behaviors of those around me, using relatively simple operators defined in a high-dimensional vector space.

Purpose and Audience

At a very general level, my goal here is to define the sort of rigorous mathematical system that I've been trained to work with during my time in higher ed. Broadly speaking, the purpose of this math is to support my own "self as context" and "cognitive defusion" work. But more than that, it's to give myself cognitive tools that I can use when thinking about thorny topics, specifically words like "autism" or "neurodivergent". Tools that can help me understand those words in a way that's precise enough that I feel comfortable when trying to reason about them.

If you're a clinical psychologist, I do not expect you to find most of what's here particularly interesting or relevant. If you're a fellow traveler on an ACT-inspired therapeutic journey, but you've never taken a formal linear algebra course (or perhaps you did take such a course, but you hated it), this is unlikely to be a useful blog post for you.

However, if you're in the small set of people who did learn to love linear algebra, and you also have been feeling frustrated by the seemingly contradictory ways that words like "neurodivergence" are being thrown around on social media these days, then I invite you to keep reading!

Starting point

Last year I got around to reading Downbelow Station. I immediately became convinced that Signy Mallory must be autistic. I'd been consuming a lot of Taylor Heaton's content at the time, and basically all the character details seemed like a very strong fit for the "autistic woman" experience described by Taylor. The intense masking. The obvious alexithymia. The combination of very strong systems thinking skills and very poor self-understanding. The whole "functional alcoholic" angle. Someone else must have noticed this, I thought. Has CJ Cherryh said anything about this?

A few web-searches later, and as best as I could tell there was no author confirmation. The closest thing I could find was a 12-year-old comment on a fan forum, where a poster with the handle BlueCatShip said something like "I don't know about any particular diagnosis, but I do think sensitivity runs on a spectrum."

The more I think about it, the more I like this way of framing the discussion. I am also suspicious of diagnostic labels when discussing fictional characters. Terms like ASD come from the healthcare community, and while I do think these DSM-5 people are stumbling towards something real, medical research is a different project than one person's journey towards self-discovery. If you are not a doctor, and if what you're trying to do is better understand yourself and the world that you live in, I think borrowing terms from the DSM is dangerous. Perhaps we can get closer to the truth by using terms like "sensitivity", but I'd suggest tweaking it slightly. "Sensitivities run on a spectrum." If you care about math, the plural really matters. The plural means we're not just dealing with one number. It means this is not a thing that can be ranked 0-5. A matrix might be a better model. ASD, ADHD, OCD -- my own guess is that these are all crude labels that hint at the ways an unusual sensitivity set can isolate a person from their peers, leaving them socially maladapted, struggling with CPTSD. (Suggested treatment for the same is typically a combination of CBT, DBT, or ACT, often in combination with various psychoactives.)

But again, it's not really the healthcare implications that I most care about. It's the landscape of human experience that starts to reveal itself when I start to think about all this stuff in terms of operators in a high-dimensional vector space. I want to do a singular value decomposition on these sensitivity matrices, and then I want to use that decomposition to better reason about the strange world in which I find myself.

Terms and definitions

I'm using math here mostly as a source of metaphors for concepts from folk psychology. So I'm not going to be overly picky about the details. Because it will make the discussion simpler, let's assume that a person's sensitivities can be represented in terms of a linear transformation $T$, where $T : X \to Q$. Let's imagine that $X$ is a finite-dimensional inner-product space in which we can reasonably encode any discrete life experience. So, the events of your life, however you might choose to describe them, can be encoded as a sequence of $x_i \in X$. The partitioning details aren't super important to this discussion. Neither is the encoder. What's important is that $T$ describes the way your nervous system maps "a thing that happened" to "the feelings you had about it". $T$ describes your sensitivities. Similarly, let's assume that $Q$, the latent space for our feelings, is also a finite-dimensional inner-product space. So now we have $$ \begin{aligned} &x_i \in X &&\text{(the $i$-th life experience / event)} \\[4pt] &T : X \to Q, &&\text{(a sensitivity transform)} \\[4pt] &q_i = T x_i, \quad q_i \in Q &&\text{(the feeling you have after experiencing $x_i$)} \\[4pt] \end{aligned} $$

But the feeling $q_i$ is just the first part of what happens after you experience $x_i$. It's the way you behave in response to the feeling that will have the biggest impact on the course of your life. Let's assume your behavioral responses can also be encoded in yet another finite-dimensional latent space; call this space $Y$. $y_i$ is clearly determined by a bunch of factors apart from your felt-response $q_i$ -- your mood at the time matters, your emotional maturity matters, whether or not you still remember anything about vector spaces may matter.

Let's split these additional factors up into 2 categories. One category is your momentary state. More specifically, if we say "rate how calm you are (0-10)", "rate how activated/engaged you are (0-10)", and "rate how tired you are (0-10)", then we could get a simple 3-dimensional state variable, $s_i \in S$, that neatly encodes a reasonably nuanced (if incomplete) set of answers to the question "how are you feeling right now". There's a lot of other things that will matter in the mapping from $Q$ to $Y$, but let's assume that all of these things are hidden in a complex nonlinear function $\phi_i : Q \times S \to Y$.

So to summarize, the new key terms are:

$$ \begin{aligned} &c_i \in [0,10] &&\text{how calm?} \\[4pt] &a_i \in [0,10] &&\text{how activated?} \\[4pt] &t_i \in [0,10] &&\text{how tired?} \\[4pt] &s_i := (c_i,a_i,t_i), \quad s_i \in S, \quad S=[0,10]^3 &&\text{aggregate ephemeral mental state} \\[4pt] &\phi_i : Q \times S \to Y&&\text{nonlinear response function mapping feelings to behaviors} \\[4pt] &y_i \in Y, \quad y_i = \phi_i(q_i, s_i) &&\text{behavioral response to experience $x_i$} \\[4pt] \end{aligned} $$

If, in the course of your own $x_i$, you have done any CBT therapy, you will notice that this structure is very familiar. It's a rough formalization, using the language of abstract math, of the "experiences / feelings / behaviors" model that CBT therapy trains you to think in. It's a bit more nuanced than standard CBT, however, in that I'm explicitly trying to break apart things about us that frequently change ($s_i$), from things about us that change less often ($\phi_i$, $T$).

Now let's notice that the system is clearly recursive. The answer to "how you're doing" naturally changes after any $y_i$, which means the new $s_{i+1}$ and $\phi_{i+1}$ are themselves functions of $y_i$, $s_i$ and $\phi_i$. Let's introduce another "probably nonlinear" function to model this. If we call the set of possible behavioral response functions $\mathcal{F}:=\{ \phi : Q \times S \to Y \}$, then we can define $$ \begin{aligned} &\psi : Y \times S \times \mathcal{F} \to S \times \mathcal{F} && \text{function that updates mental state variables and response functions}\\[4pt] &(s_{i+1},\phi_{i+1}) = \psi(y_i, s_i, \phi_i) &&\text{mental state and response update recursively} \\[4pt] \end{aligned} $$

However, in the math-metaphorical language I'd like to use here, it's important that for most $i$, $$\phi_{i+1}\approx\phi_i.$$

Switching back to folk psychology language: your mood changes constantly -- basically all $y_i$ have some nontrivial impact on $s_i$. Eating a cookie makes me more activated, and also more calm. Taking a nap makes me less tired, but also less activated. Unclogging a drain makes me more calm but also uses up energy. Etc. However, behaviors that will change the way I respond to daily life in more lasting ways are rarer. Learning a new skill changes $\phi_{i+1}$. Reading a really great book sometimes changes $\phi_{i+1}$. Grieving for a dying parent changes $\phi_{i+1}$. There are a lot of $y_i$ that can change $\phi_{i+1}$, but they're not things that happen on a daily basis.

I think $T$ also changes over the course of a person's life. So at times it's useful to notice that, rather than considering $T$ to be constant for each person, $T$ is also sometimes worth subscripting with $i$. But responses that alter the way any life event makes you feel are even rarer than those that change $\phi_i$. Taking an antidepressant changes $T_{i+1}$. Puberty changes $T_{i+1}$. A traumatic brain injury can change $T_{i+1}$. Dementia changes $T_{i+1}$. But even those seismic events usually don't completely reconfigure $T$. They just tend to exaggerate or dampen patterns in the sensitivity set that already existed in earlier $T_i$.

For the rest of this discussion, let's omit time indexing with $i$ whenever it isn't saying something useful, and switch to person-based superscripts -- so $T^a$ and $\phi^a$ denote the current sensitivity transform and response function for person-$a$.

Healthcare Diagnosis

What does it mean to say that a person is "autistic"? Or that they have "ADHD"? Assuming these are diagnoses provided by healthcare professionals, we can model the process by which the diagnosis is given.

In an idealized clinical setting, a psychologist will try to consider the patient's full personal history, everything that's happened to him, and all his behaviors in response to all those events. Let's expand our terminology so we keep the conversation relatively formal. Assume the patient has $k\in\mathbb{N}$ pre-diagnosis experiences. Let: $$ \begin{aligned} &\mathbf{x}:=(x_1,\dots,x_k)\in X^k && \text{all the patient's experiences}\\[4pt] &\mathbf{y}:=(y_1,\dots,y_k)\in Y^k && \text{all the patient's behaviors} \end{aligned} $$

The doctor's knowledge of $\mathbf{x}$ and $\mathbf{y}$ is necessarily imperfect, but her goal is to somehow map this lifetime's worth of experience to a point in a symptom space $Z$. $Z$ has very low dimension compared to $X^k \times Y^k$, but it's useful in a healthcare setting, because the DSM-5 defines a set of regions $\Omega_d$ in $Z$-space. Each of these regions has a diagnostic label attached to it -- for example, $\Omega_{OCD},\Omega_{ADHD},\Omega_{ASD}$. As the clinician becomes more confident about where the patient's life events map in $Z$-space, $z^{\text{patient}}$, she can add or remove diagnostic labels $d$ by checking whether $z^{\text{patient}}\in\Omega_d$.

A few things become clear when we formalize diagnoses this way. First, there are quite a lot of other variables standing between $T$, which is the thing that determines "how life feels", and a diagnostic label $d$. For example, $\mathbf{x}$ obviously matters. If person-$a$ is constantly exposed to situations that trigger compulsive behavior, but person-$b$ is not, then we can end up in a situation where $z^a\in\Omega_{OCD}$, but $z^b\notin\Omega_{OCD}$, and that may be true even if person-$a$ and person-$b$ have very similar psychologies -- it can be true even if $T^a \approx T^b$ and $\phi^a \approx \phi^b$. Different life circumstances, by themselves, can be sufficient reason for one person to be diagnosable with a mental health condition, and another person to be deemed "normal/healthy".

More than that, the diagnostic mapping from $(\mathbf{x},\mathbf{y})$ into $Z$-space is a function of the specific doctor who does the assessment. Not all doctors interpret the same data in the same way. Sometimes healthcare professionals will argue (quite passionately) about how a given patient's experiences should be interpreted. If we define a diagnostic mapping function as $$ \begin{gathered} D_c : X^k \times Y^k \to Z \\[4pt] D_c(\mathbf{x},\mathbf{y}) = z^{\text{patient}} \end{gathered} $$ we can notice that $D_c$ varies with both the doctor doing the assessment and the version of the DSM that the doctor is working with. $D_c$ clearly has pretty high variance over the set of all healthcare environments.

However, if we're talking about how life feels, none of this matters very much. Naturally, healthcare professionals need to care a lot about their diagnoses -- their training requires them to construct this diagnostic mapping as carefully as they can and then use it to inform treatment. But being told you have $z \in \Omega_d$ for any particular set of $d$ does not, in and of itself, tell you everything you might want to know about the shape of your $T$. At best, it suggests that your $T$ is perhaps similar in some important ways to other people who also have $z \in \Omega_d$.

Neurodiversity

The term "neurodivergent" has become very popular. And I think it's popular because it performs a useful social function. As mental healthcare improves, and $\Omega_d$ become increasingly helpful for naming categories of people who experience life in importantly different ways, there's a growing understanding that many, perhaps even most people are in one way or another importantly atypical in the way they experience the world. This appreciation for the diversity of human experience, however, is obviously not dependent on the current standards and procedures laid out in the DSM-5 -- it's a statement about the distribution of $T$ and $\phi$ over the human population, not a statement about distributions in $Z$-space.

How could we measure neurodiversity? It's obviously a tricky thing to do well. But we can start to make some progress by assuming $T$ is a linear transformation between finite-dimensional inner-product spaces, because then we can choose an orthonormal basis set, express $T$ as a matrix, and perform a singular value decomposition: $$ T = U \Sigma V^\ast $$

The SVD of $T$ tells us a lot of interesting things about this person all at once. The columns of $V$ imply subspaces of $X$, while the singular values give us a way of identifying how salient the experiences in any such subspace may be. Taken together, $\Sigma$ and $V$ provide a toolkit for speaking in precise ways about high-saliency subspaces of $X$. $U$, meanwhile, gives us the ability to identify the emotional-response subspaces of $Q$ that are paired with any salient experience subspace of $X$.

In light of all this, let's imagine a similarity score $\mathbb{S}(a,b)\in[0,1]$ that tries to summarize how similar person-$a$ and person-$b$ are. Since the relevant data for each person is the pair $(T,\phi)$, $\mathbb{S}$ should really be a function of two such pairs. If $\mathbb{T}$ is the set of all possible sensitivity transforms $T$ and $\mathcal{F}$ is the set of all possible response functions $\phi$, then a natural type signature is $$ \mathbb{S} : (\mathbb{T} \times \mathcal{F}) \times (\mathbb{T} \times \mathcal{F}) \to [0,1]. $$

Actually writing down a good formula for $\mathbb{S}$ would probably require a math-dense appendix, because normalization choices and singular-value weighting schemes both matter in interesting ways. But let's not try to write that appendix now. Instead, let's just say that a good $\mathbb{S}$ should balance at least 3 different kinds of similarity:

how strongly do the high-saliency subspaces of $X$ for $T^a$ and $T^b$ overlap?
when those salient subspaces overlap, do they map to similar subspaces of $Q$? In other words, even if both people respond strongly to the same kind of experiences, is their nervous system response similar?
how similar are the behavioral response functions $\phi^a$ and $\phi^b$?

In this language, asserting "I'm ND" is a claim that for a typical person drawn from the broader population, the overlap score $\mathbb{S}(\text{me},\text{them})$ is usually small. Equivalently: People who experience the world the same way I do are relatively rare, and people who tend to behave like me are also rare.

These kinds of differences tend to create double-empathy problems, because the neurodiverse individual has a hard time guessing both what other people may be feeling, and also how they're likely to react to those feelings. ND person-$a$ has trouble anticipating NT person-$b$'s behavior, because it's very difficult for person-$a$ to intuit $q^b$ in response to any potential $x$, and it's even harder to predict $y^b=\phi^b(T^b x,s^b)$. Person-$a$ knows all these key variables are very different for person-$b$, but these spaces have very high dimension, and that makes building a reliable internal model for a structurally dissimilar $T$ virtually impossible. Meanwhile, person-$b$ has the same problem in reverse -- failing to predict $y^a$ because $b$ misjudges $a$'s likely felt-response $q^a$. Even in those moments when person-$b$ can more or less guess $q^a$, the very alien $\phi^a$ still throws a wrench in the works. Person-$a$ and person-$b$ find each other confusing and difficult to relate to, and that creates social friction.

Neurotribes

However, it's also clear that "neurodiversity" has limited utility as a term because it's essentially boolean -- whatever exact thresholds you may use to define it, it's just a label that divides the population into 2 categories -- ND and NT. And it's clear to everyone involved in this discussion that 2 categories are nowhere near enough. Again, if we think in terms of SVDs of different $T$, what's important is that for any 2 different people, some high-saliency subspaces of $X$ are probably behaving in pretty similar ways -- but in most cases there will be a nontrivial salient subspace of $X$ that's interpreted in different ways by both people. And this is why the same person can feel very relatable in some situations, but very alien in others.

I suspect that when we perceive $\mathbb{S}(a,b) \approx 1$, that perception is often at the root of felt human connection. When we see someone else get hit with a tricky $x$, and they respond with a $y$ that seems very much like what we know we might do in that situation, we feel connected to that person. We can relate, we can guess that their $q$ and our $q$ would have been about the same. And we can further guess our $\phi$ are probably importantly similar in many ways as well. And if that very relatable behavior is atypical, if it implies a $(x,y)$ pair that's well outside the expected value for median human behavior, then the feeling of connection becomes even more profound. Now we can infer that both of us are likely unusual in roughly the same way.

But now language itself is starting to become a problem. What words can I use to express how and why I relate to different kinds of people? The diagnostic regions defined in the DSM-5 are gesturing pretty clearly at certain families of $(T,\phi)$. I want to be able to say something, in relatively casual conversation, that basically means: "I have some important salient-experience subspace overlaps with people in $\Omega_{ASD}$, $\Omega_{OCD}$, and $\Omega_{ADHD}$, yet I'm pretty sure I wouldn't actually qualify for a formal diagnosis of any of them. And that's partly because my $\mathbf{x}$ are relatively easy for me to handle, but also partly because critical thresholds in my $\phi$ are shifted slightly relative to the diagnosable populations in a way that generally reduces any observable pathologies below diagnostically critical thresholds". But the language of folk psychology hasn't really gotten to a place where I can easily communicate that yet. And so in my experience, there's really no way to state my position on this without making the people I'm talking to confused and uncomfortable.

But I assure you all, if you would just read the math carefully, you'll see that my position is really very reasonable! I just need the language of folk psychology to keep maturing for a few years, so that I can say "ASD-ish but also probably not really ASD for reals" in a way that doesn't sound like I'm confused and wishy-washy. I'm not confused. You're confused. You are all horribly, horribly, confused about the way that language ought to work here, and I really want you to stop making so many category errors, because every time I hear one it drives me crazy. I propose we all switch to speaking about this subject using the strict language of formal mathematics, so that I can know which subspaces are relevant to a given discourse, and which are not.

Writing interesting characters

I think a lot of what makes for a good writer is that they're the sort of person who naturally intuits all of the above. CJ Cherryh, were she fluent in linear algebra, would probably consider everything I've written thus far "super obvious". Is Signy Mallory autistic? Why are we even asking this question? There's no reason an author or a reader should need to care about $Z$-space. Storytellers are in the business of looking for interesting types of human beings, and then imagining how those interesting people might behave in a fantastical setting. That's at the core of what any genre writer needs to do to create compelling fiction. If you see yourself in a fictional character, that suggests that the author has done a good job of portraying a person who experiences life in a way that's unusual, yet also very recognizable. Her writing has you perceiving $\mathbb{S}(\text{reader},\text{character}) \approx 1$. But that's not happening because CJ has read the DSM-5, and is intentionally trying to write characters that fit a particular diagnostic label. $Z$-space has nothing to do with this. This is about telling good stories. And good stories, particularly strong character dramas like Downbelow Station, are mostly concerned with interesting people, exciting events, inspiring choices, and perhaps also a few critical moments when someone experiences meaningful personal growth ($\phi_i \neq \phi_{i+1}$). If we're having a conversation about what we like about the characters in a science fiction book, $Z$-space does not matter.