how to define subjective information and interest,
when an event or an action is relevant,
what is common to aesthetics, emotion, humour and coding.
Detailed outline Chapter (5)
Subjective Information and SimplicityIntuitive approach to subjective probability First contact with subjective information with an example: a consecutive lottery draw.
Why subjective probability sometimes seems in total contradiction with standard algorithmic probability.
Cognitive complexity Our brain seems to be sensitive to Kolmogorov complexity!
For instance, newsworthiness depends on the simplicity of locations.
Aesthetics, when it includes a component of surprise, depends on simplicity as well.
Relevance The notion of unexpectedness is shown to be central to model relevance, which is an ultimate goal of AI ! Unexpectedness is defined as complexity drop. Coincidences will be modelled as a case study.
A compressionist view of AI Many aspect of intelligent processing will be revisited from the compressionist perspective.
AI systems will not be regarded as intelligent as long as they remain unable to understand human intelligence and to adapt to it.
Algorithmic Information offers us a way to model key components of human intellect, such as surprise, relevance, aesthetics or emotional intensity. Why does AIT have something to say on these matters? Could it be because our mind does process algorithmic information?
Intuitive approach to subjective probability
Standard vs. algorithmic probability
Question:
Can we use probability to characterize interesting events?
Answer:
No, it doesn’t work. We must find something else.
Understanding Artificial Intelligence
through Algorithmic Information Theory
You know you shouldn’t.
After all, you know that all combinations have the same probability of occurrence.
Since all such events have the same probability, why is 1-2-3-4-5-6 big news while 8-10-12-23-31-45 isn’t?
Based on probability, the latter should be as narratable than the former!
But it’s not!!
This lottery example offers a striking proof that probability is of little help to predict whether an event will be interesting or boring.
Sure! But still, it’s hard to convince ourselves that
1-2-3-4-5-6 had the same probability to occur
as "normal" outcomes.
In 2006, we offered pre-filled lottery tickets to customers in a bar, about fifty of them.They could choose among a dozen of predefined combinations.
No-one chose 1-2-3-4-5-6.
One customer laughed:
"if you want to lose, choose that one!"
So it seems that the interest of events
has to do with their probability after all!
One just need to change the concept of probability 🤨.
As one may suspect by looking at the lottery example, subjective probability has to do with simplicity.Maybe not with absolute simplicity, though.
We are not amazed by the white surface of a whiteboard!
Simplicity is only perceived as improbable if it is unexpected.
As one may suspect by looking at the lottery example, subjective probability has to do with simplicity.Unexpectedness is the difference between expected complexity and observed complexity.
This definition of subjective probability
based on Algorithmic Information Theory
is found to match people’s intuitive probability judgments.Bibliography
Dessalles, J.-L. (2013). Algorithmic simplicity and relevance. In D. L. Dowe (Ed.), Algorithmic probability and friends - LNAI 7070, 119-130. Berlin, D: Springer Verlag.
Which lottery draw will be perceived as less likely?
2-4-6-8-10-12
6-12-18-24-30-36
7-14-21-28-35-42
Problem:
Algorithmic probability and subjective probability provide inconsistent values!
There is indeed a radical divergence between our two definitions of probability: algorithmic probability and subjective probability. The former declares that simpler events are more probable, while the later states the exact contrary! The next video provides the beginning of an answer, in a somewhat unexpected form.
Cognitive complexity
Question:
What is the difference between Kolmogorov complexity and cognitive complexity?
Answer:
Cognitive complexity is resource-bounded.
Several phenomena point to the fact that the human brain is sensitive to Kolmogorov complexity. Nick Chatersuggested that simplicity should be used as a fundamental principle in Cognitive Science. Here is an example.
The preferred hidden pattern is not the complex one, but the closing curve that appears simplest (in the usual Kolmogorov sense).
Note that the straight line, which would be the simplest closing curve in the absolute, is not the simplest one in the context. Cognitive complexity is a resource-bounded version of description complexity. It is defined as the shortest description (in bits) of the situation s that a human individual could reach at time t. In a semi-formal way:
Cognitive complexity
C_{d}(s)=min_{p}{length(p); H^{t}(p)= s}
Here, H represents the cognitive model that simulates the individual’s behaviour.
H^{t} represents the same model that is allowed to run during t units of time.
p is any characterization that leads to the unambiguous determination of s and that is available to H^{t}.
Note that C_{d}(s) may vary through time.
From now on, the name description complexity will be used to refer to its resource-bounded version: cognitive complexity.
Cognitive Complexity
If \(C\) represents plain complexity, what should we say?
Complexity of locations: more thrilling when close
Problem:
Why does the interest of events vary with the distance from home?
This is a fundamental law of journalism: greater distance tend to make identical events less interesting. Casualties in a tragic event are more newsworthy than distant ones. The phenomenon is known as the hierarchy of deaths, or as the "law of distant deaths".
Question:
Does Algorithmic Information Theory have anything to say about the "hierarchy of deaths"?
Answer:
Yes, definitely! It even predicts the phenomenon.
Complexity of Locations and Newsworthiness in a nutshell
More distant events tend to be less interesting. This is explained by the fact that distance adds to cognitive complexity.
In an isotropic 2D space,
locations of size a × a can be numbered by concentric circles around the observer. There are 2πd/a locations at distance d, and then πd^{2}/a^{2} locations at a distance smaller than d. The complexity of a given location at distance d amounts to log(πd^{2}/a^{2}). It varies like 2 × log(d).
Note that considering polar coordinates will not help. Far locations require much greater accuracy of the angle.Using landmarks may dramatically reduce complexity, and therefore increase interest. Minimizing description complexity tells us which landmark to use:
$$L_0 = \mathrm{argmin}(C(L) + 2 \times log_2(\mathrm{distance}(L, x)))$$
where \(C(L)\) can be assessed by \(\log_2(r(L))\) when \(L\) appears at position \(r(L)\) in a list of landmarks.
On December 25, 2003, the French news reported three accidents in the following order:
Distances from Paris to these three events are 0.5 Mm, 4 Mm and 8 Mm (Mm = mega-metre).
What is the difference in bits of complexity between these locations (B-A and C-A)?
One may consider that dramatic impact depends on the logarithm of the number of casualties.
We may write interest I as depending on distance d and on the number of casualties N:
I = – 2 × log_{2}(d)+ b × log_{2}(1+N/a).
Parameter a provides the unit value on the stimulus axis. Here we consider that a =1.
Parameter b measures the emotional interest due to the first casualty.
We ignore the effect of the cultural distance between the observer (here, the French audience) and the expected nationality of casualties. Cultural distance adds up to the cognitive complexity of the event, and thus contributes to diminishing its interest.In the French news example, the most dramatic events were cited last: first A, then B and then C, though the ratio of casualties was 56 (B:A) and 115 (C:A).
This means that the added complexity due to distance more than compensates for the emotion due to more numerous casualties.
Distance effects (2)
What is the maximal value of coefficient \(b\) in the French news example?
Measuring the distance of the event to the observer may not be the most economic way to measure description complexity. Quite often, events can be located in the vicinity of a known landmark (such as the Eiffel tower in Paris). The chain rule then applies to get an upper bound of C_{d}(s) using landmark L:
Suppose that a dramatic event takes place in a foreign country. The media want to locate the event for the audience. A list of landmarks \(\{L_i\}\) (cities, monuments, points of interest, etc.) is available. It is ranked by decreasing popularity. The event is located at distance \(d_i\) of landmark \(L_i\).
Which is the best way to locate the event?
Simplicity and aesthetics
Problem:
Aesthetics is an intuitive feelings that has nothing to do with computation!
What is to be considered as beautiful? Giving a definitive and complete answer to this question is way beyond current theorizing. However, people such as Jürgen Schmidhuber, John Maeda and others suggest that simplicity is involved in aesthetics. This principle is well-known in design. The French engineer Pierre Bézier used what are known as Bézier curves as a way to draw harmonious automobile bodies.
Bézier curves are constructed using a set {P_{i}} of "control" points and a simple recursive parametric definition. Let’s call B_{0..n} the Bézier curve based on the first n control points. It is defined for the value t of the parameter as B_{0}(t)= P_{0} and:
B_{0..n}(t)= B_{0..n-1}(t)+ t B_{1..n}(t).
The animated illustration (from Wikipedia) above shows how the recursive definition is implemented.
Bezier curves
Explain in what way Bézier curves are simple.
Aesthetics
Explain why Larry Kagan’s structures are unexpected (you may watch this).
I find the ability of AIT to predict these cognitive phenomena quite impressive...
Yes it is! But you seem puzzled...
I wonder how it works. Are we supposing that our brains are Turing machines?
This just means that we are able to perform computations.
No, more that that! It presupposes that we monitor our own computations.
You mean, to decide that one is shorter than the other?
Yes. It is one thing to perform computations, and another thing to measure and compare their length.
So that we know that something is simple?
Precisely. It presupposes some kind of self-monitoring. I’m not fully convinced.
And yet it does explain a lot of things. The story about abnormal lottery outcomes, the effect of distance on interest, the role of simplicity in aesthetics. And, I guess, many other cognitive phenomena (we are just halfway through this chapter).
Sure. I don’t deny that. But it could be that our brain does all this through very different means.
I just read that people are very good at detecting visual patterns based on their simplicity.
No. The point was to measure reaction time when people have to pick an object that doesn’t fit.
And?
It was about geometrical shapes. Reaction times was predicted by the simplicity of the shapes.
You mean that people were quicker at designating anomalies when the shapes were simpler?
That’s it.
It just shows that our brain prefers simplicity. Not that it can measure it explicitly.
Maybe the two are inseparable.
What do you mean?
Don’t know. Just a hunch.
Relevance: the Holy Grail of AI
Problem:
How can we bring machines to be relevant?
Since the year 1950, when Alan Turing gave the first definition of artificial intelligence, relevance is considered to be one of the most important attributes of intelligence. A relevant utterance not only makes sense, but also raises interest. On the other hand, by failing to be relevant, one is at risk of being ignored or, worse, to be rejected. One of the main objectives of AI research for the future is not only to be efficient, but also to be able to sustain a coherent, rational, dialogue. This is how people judge each other’s intellectual abilities and even mental health. This is why relevance represents a kind of Holy Grail for AI research.
Relevance and subjective information rely on two forms of complexity computations. We first examine the newcomer, causal complexity, before observing how they interact to generate relevance.
Causal complexity: the missing piece in the Relevance puzzle
Problem:
Some events are exciting, not because they are simple, but because they seem incredible.
Understanding Artificial Intelligence
through Algorithmic Information Theory
There is nothing simple in a rabbit
that pops out of the magician’s hat. On the contrary!
There is so much complexity there that the layperson finds it hard to believe.
Before it appears, you are pretty sure that there’s no rabbit in the hat.
You don’t expect to see one there.
This is the normal, simple, situation.
The negation of this simple situation is hugely complex. You need to imagine complex circumstances to explain why a rabbit ends up in the hat (teleportation, contortionist rabbit, you’ve gone blind for 5 seconds and have no memory of it, etc.).
So is unexpectedness sometimes due to simplicity (as in remarkable lottery draws) and sometimes due to complexity, as here?
In a way, yes...... but there is a possible mix-up here.
We defined unexpectedness as
the difference between
expected complexity
and observed complexity.Be sure to mind the difference.
Expected complexity means that you mentally consider the event before its occurrence (ex ante) and measure the complexity of this occurrence. It answers the questions "What were the odds" in computational terms (i.e., in bits).
Observed complexity means that you consider the event after its occurrence (ex post) and measure the complexity of this occurrence.
We defined unexpectedness as
the difference between
expected complexity
and observed complexity.
In the case of apparent magic, the expected complexity of the event (which is the negation of a normal situation) is huge. This is what creates the complexity gap, hence the unexpectedness.
So unexpectedness may result, not only from a small value of the observed complexity (i.e. simplicity), but also from a large value of the expected complexity.
On many occasions, expected complexity is assessed through an attempt to reconstruct a causal scenario that led the event to happen.
It is then called causal complexity,
or also generation complexity,
or sometimes "world complexity", and is noted C_{w}.
C_{w}(s)=min_{p}{length(p)| W(p)= s}W designates the observer’s causal model of the world.
The only admissible programs p are procedures that respects causality, as understood by the observer.
On many occasions, expected complexity is assessed through an attempt to reconstruct a causal scenario that led the event to happen.It is then called causal complexity,
or also generation complexity,
or sometimes "world complexity", and is noted C_{w}.
C_{w}(s)=min_{p}{length(p)| W(p)= s}
Quite often, causal complexity matches probability (in bits, i.e.C_{w}(s)=log_{2}(1/Pr(s))).
However:
This is not always true, as for example in the case of the rabbit’s magic occurrence.
Causal complexity is more natural, as probability calculus must implicitly perform the same causal computations.
As we did for cognitive complexity, we consider a resource-bounded notion of causal complexity.
Causal complexity, also called world complexity, is defined as the shortest description (in bits) of a causal procedure that a human individual could reach at time t to account for the occurrence of a situation s. In a semi-formal way:
Causal complexity
C_{w}(s)=min_{p}{length(p); W^{t}(p)= s}
Here, W represents the observer’s causal model of the world.
W^{t} represents the same model that is allowed to run during t units of time.
p is any procedure that respects causality as understood by the observer and that is available to W^{t}.
Note that C_{w}(s) may vary through time.In many cases (e.g. when you meet a colleague on the way to the workplace), there is one single obvious scenario that only demands parameters to be set to precise values (such as time of departure from home). If so, causal complexity amounts to the description of these values.
Causal Complexity
In 1991, the New Jersey three-digit number lottery issued the same winning number, 741, twice in a row. What is the causal complexity of the event?
A formal definition of unexpectedness
Question:
Unexpectedness should be a statistical notion, shouldn’t it?
Answer:
No! Statistics may even play no role whatsoever. For instance, many unexpected events are unique, unheard-of.
Unexpectedness is the core ingredient of relevance and subjective information.
Unexpected situations are situations
that require complex circumstances to occur,
while being simple to describe in comparison.
We already defined unexpectedness as the difference between expected complexity and description complexity. Now, we revisit this definition by observing that expected complexity, for rational agents, takes the form of causal complexity C_{w}. We may write:
Unexpectedness
U = C_{w} – C.
Unexpectedness
What is the unexpectedness of the New Jersey Lottery event? (see preceding exercise)
(we ignore the complexity of locating the event in time and space)
Blase attitude
Can you describe the "blasé" attitude (most events aren’t worth noticing) in complexity terms?
The "blasé" attitude amounts to ...
Paranoia and surprise
When adopting a paranoiac attitude, you are not surprised by events such as accidents that affect you. Why is it so?
Fortuitous encounter
Coming across a celebrity by chance near your doorstep might be perceived as an interesting event. The story would be even better if you live in the suburbs rather than downtown. Why?
People use features, not programs
Problem:
People are not computers. They use features rather than programs to characterize things.
Indeed, people use relevant features to characterize events (rather than producing compact programs that generate those events from scratch).
Suppose we are talking about a person born in the US in 2002.
We want to mention that the person’s given name is Shea. There were about 0.04% children with that name. If you regard causality in this case as a mere choice among options, causal complexity is about log_{2}(1/0.0004) ≈ 11bits.
Now mentioning the name comes at a price on the description complexity side. ‘Shea’ appears 900^{th} or so in the list of names ranked by popularity. With only this source of information, we have to pay log_{2}(900) ≈ 10bits of description complexity for the mention of the name.
As we can see, the mention of the name hardly contributes by itself to making that person unexpected.
Suppose now that the name is ‘Madonna’ instead of ‘Shea’. This would have made this person unique among 4 millions for that birth year. For an observer aware of name distributions, causal complexity amounts to log_{2}(4 × 10^{6})=22bits. On the other hand, the name ‘Madonna’ is at least as popular as ‘Shea’ on the Web, which makes the former no more complex than the latter. Mentioning the first name ‘Madonna’ for a person born in 2002 is an efficient way of making her quite unexpected, as it reduces complexity from 22 (causal) bits down to less than 10 (description) bits.
Question:
Can features be relevant?
Answer:
Yes, and this is precisely how people use language to talk about events.
Understanding Artificial Intelligence
through Algorithmic Information Theory
Do you know how to tell a story?
Yes, of course, like everyone else on Earth.
But are you aware of what you are really doing when telling a story?
Probably not...
Let’s illustrate the underlying mechanism
through an example taken from everyday life.- Do you remember my friend Jeannine?
- Sure.
- She installed an automatic night vision camera on her backyard.
Guess what she could film?
- A hedgehog?
- No, that was a few weeks ago. But I just got this video from her. Look at this!
- Wow! A fox? But doesn’t she live in town?
- Yes. That’s amazing, isn’t it?
Let’s summarize the event as:
a fox visited Jeannine’s backyard two days ago.
Now let’s use "features" to describe the event:
the event involves Jeannine
the event took place in her backyard
the event took place two days ago
the event involves a fox
the event involves Jeannine
Features can be represented computionally using predicates, as in the Prolog programming language. We may translate the above feature in Prolog’s style as: involves(S, 'Jeannine').
the event involves Jeannine
the event took place in her backyard
the event took place two days ago
Features can be represented computionally using predicates, as in the Prolog programming language. We may translate the above features in Prolog’s style as: involves(S, 'Jeannine'). location(S, L), backyard('Jeannine', L). date(S, -2).
the event involves Jeannine
the event took place in her backyard
the event took place two days ago
the event involves a fox
Features can be represented computionally using predicates, as in the Prolog programming language. We may translate the above features in Prolog’s style as:
involves(S, 'Jeannine'). location(S, L), backyard('Jeannine', L). date(S, -2). involves(S, F), fox(F). involves(S, 'Jeannine'). location(S, L), backyard('Jeannine', L). date(S, -2). involves(S, F), fox(F).These features act as constraints on situation s.
Let’s represent them as predicates f_{i}(s) that are regarded as being true about situation s.
The description complexity of a feature f_{i} is simply C_{d}(f_{i}), independently from s.
The causal complexityC_{w}(f_{i}(s)) of f_{i}(s) is the complexity of circumstances that brought f_{i}(s) to hold in the known world.
A feature f_{i}(s) is unexpected if C_{w}(f_{i}(s))> C_{d}(f_{i})
(and the difference measures how unexpected it is).
Unexpected features contribute to making situations interesting. involves(S, 'Jeannine'). The above feature is not unexpected. The balance is even negative: there is no causal cost, while the mention of ‘Jeannine’ costs a few description bits
(not much, crucially, as she is a close acquaintance). involves(S, 'Jeannine'). location(S, L), backyard('Jeannine', L). date(S, -2).
Same thing for location in time and space. involves(S, 'Jeannine'). location(S, L), backyard('Jeannine', L). date(S, -2). involves(S, F), fox(F).Now here comes the climax of the story.
The notion of ‘fox’ is not descriptively very costly.
However, it demands complex circumstances to become true, if one knows that Jeannine lives in town
(what would bring a fox to venture that deep through the city?). involves(S, 'Jeannine'). location(S, L), backyard('Jeannine', L). date(S, -2). involves(S, F), fox(F).The complexity drop between causality and description makes the event unexpected, and thus interesting.
The initial mention of Jeannine and her backyard are descriptively (slightly) costly, but they are essential to make causal complexity large. involves(S, 'Jeannine'). location(S, L), backyard('Jeannine', L). date(S, -2). involves(S, F), fox(F).
The dialog continued with a tentative explanation: the fox would have taken advantage from the general lockdown associated with the Covid-19 crisis to venture into the city. Note that the effect of the explanation is to diminish causal complexity, making it "2^{nd} order-relevant" (more on this there).
Question:
Do unexpected features make events relevant?
Answer:
Yes, thanks to the chain rule.
Let’s observe how features’ unexpectedness is transferred to the situation.
Suppose we recount an event s by mentioning an unexpected featuref(s) (e.g. a fox wandering in the city). On the causality side, we have to measure the complexity needed by the "world" to make f(s) true. Contrary to mere description that only cares about f, causality requires to take both f and s into account to compute the complexity of reaching a state of the world in which f(s) holds.
Note that f(s) expresses a constraint. It is not a ‘function’, but a logical predicate. This is why one may write C_{w}(s | f(s)) even if s is not fully characterized at this stage of the computation.
Equality comes from the fact that f is the only available means to compute the causality of s at this point.C_{w}(f(s)) measures the complexity of generating a state of the world in which f is true when applied to the situation s.
C_{w}(s | f(s)) measures the causal complexity of other aspects of s.
The chain rule for description complexity reads:
Chain rule for feature description
C_{d}(s)< C_{d}(f)+ C_{d}(s | f(s))
Note that f appears alone in C_{d}(f). This is a crucial point: description complexity is about the conceptual nature of features, whereas causal complexity is about making them true.
Relevant features are simple to describe, but hard to produce (as in our example about the fox).
Features can be relevant for an observer while being irrelevant for another one. For instance, in this excerpt from the animated series Futurama, the two robots are amazed at a mathematical coincidence that leaves their friends completely unmoved.
[I found initially that the second serial number 2716057 is not the sum of two cubes and that it should read 7216057 == 30**3 + 193**3; however, 2716057 == 952**3 + (-951)**3 - Thank you El Jj 🙂]
We can derive a chain rule for unexpectedness:
Chain rule for unexpectedness
U(s)> U(f(s))+ U(s | f(s))
where U(f(s))= C_{w}(f(s)) – C_{d}(f).This chain rule, when repeatedly applied, offers an optimal guide for (artificial) storytelling: choose features and feature order that will eventually maximize unexpectedness.
Features
Suppose we mention now (i.e. in the 2020s) the first name of a person born in the US in 2002. How unexpected is this feature if the first name is ‘Michelle’?
Coincidences: a case study
Problem:
Why are coincidences so thrilling?
The Lincoln-Kennedy coincidence should not be considered an objective marvel. However, the fact that it is so fascinating to so many people reveals that some systematic cognitive phenomenon is at work, and this demands an explanation. Let’s call s_{1} and s_{2} the two terms of the coincidence. Under the crucial hypothesis that the two events are independent, the causal complexity of the two events taken together equals the sum of their causal complexities (note that this is the proper definition of independence; it is more rigorous and general than the probabilistic notion):
C_{w}(s_{1} & s_{2})= C_{w}(s_{1})+ C_{w}(s_{2})
On the description side, however, we apply the chain rule:
Each element common to the two situations, such as the name of the successors in the Lincoln-Kennedy parallel, has to be generated twice causally, but has to be described only once. Its impact on the conditional complexity C_{d}(s_{2}|s_{1}) is zero. Common elements thus add to the unexpectedness of the situation.If there is a strong analogy or at least a resemblance between the two situations, then C_{d}(s_{2}|s_{1})<< C_{d}(s_{2}). The conjunction of the two events turns out to be simpler than expected. This creates unexpectedness.
More on the effect of common features
Effect of common features on coincidencesLet’s call f a feature that is common to s_{1} and to s_{2}. Suppose that f is not unexpected in itself:
C_{w}(f(s_{i}))= C_{d}(f).
Suppose also that f is independent from all other features that characterize s_{1} and s_{2}. This means that the following equality holds:
The existence of a common feature f introduces a positive term C_{w}(f(s_{2})), equal to C_{d}(f), in the expression of unexpectedness. This explains why common features contribute to coincidences, and why more complex common features are more interesting.
Coincidence (1)
Why does the mention: "KENNEDY was shot in a car named LINCOLN"
affect the interest of the coincidence?
Coincidence (2)
Why is the time interval of 100 years more interesting than an interval of 87 years?
Coincidence (3)
Most people consider that the coincidence would have been even more impressive if both successors were named LICZNEROWITCH instead of JOHNSON (as the name LICZNEROWITCH is much less frequent in the US).
Let’s call \(D\) the difference \(C_d(LICZNEROWITCH) - C_d(JOHNSON)\).
Why is it so ?
I wonder whether this definition of relevance is always valid.
Which definition?
That interest, I mean relevance, is controlled by the gap between causal complexity and description complexity.
My sister bought a bike.
What?!
Ha, you see! It’s not relevant.
And?
It’s not causally complex, so it is not relevant.
You mean, it would be more relevant if she were in prison for the next ten years. Or if she were blind?
Yes. That would increase causal complexity considerably. What could bring her to buy a bike in such a case?
But what about an eclipse?
??
I mean, eclipses are events, and yet they are perfectly deterministic.
By saying "deterministic", you mean that their causal complexity is zero?
Yes.
I just read something about this example. I mean, about why the occurrence of an eclipse might be unexpected, and therefore
exciting.
I can’t see how an eclipse can be unexpected!
It all depends on the viewpoint you take.
What do you mean?
Well, it doesn’t happen everyday!
Oh, you mean that there’s some causal complexity to get an eclipse precisely today?
Yes.
But that’s just probability!
More or less. You’ve still to consider the description complexity of the concept "eclipse".
I thought it rather had to do with the causal complexity of having the Earth, the moon and the sun perfectly aligned.
Yes, that’s another viewpoint. It’s all explained on the Website.
I’ll definitely have a look.
Yes, you should. It’s cool!
A compressionist view of intelligent processing
Problem:
Is AI more than a heterogeneous set of clever tools?
Algorithmic information can serve as a theoretical foundation for many aspects of intelligent processing.
AIT offered us a definition of unexpectedness and of subjective information. We mentioned Gregory Chaitin’s famous sentence: "Comprehension is compression". Let’s build on the idea. The following table lists various aspects of intelligent processing and how they can be viewed from a compressionist perspective.We start with statements "X is compression" that are most relevant to the present chapter.
Compression as unexpectedness
Subjective probability is compression
Subjective probability, as measured by unexpectedness, corresponds to the compression between expected complexity and observed complexity. ➢ see this chapter
Actions or innovations are perceived as creative when they elicit surprise, and surprise is analyzed as a complexity drop. ➢ More on this.
Abduction is compression
Abduction is the ability to find out a cause for a problematic state of affairs. The best causal hypotheses, according to Occam’s razor, are the simplest ones. Their effect is to diminish the causal complexity of the path to a solution. ➢ More on this.
Relevance is compression
A relevant statement corresponds to a complexity gap
by pointing to an unexpected situation
by reducing causal complexity through explanation or action
by increasing description complexity through trivialization
The judge decides on responsibility by showing that the defendant’s action diminished the causal complexity of the undesirable effect.
Moral judgment also takes the simplicity of the victim (seen from the judge) into account. ➢ More on this.
Emergence is compression
Emergence in collective systems can be defined as the difference between the sum of individual complexities and the collective complexity. ➢ Watch the video at the end of chapter 1.
Humour is compression
Many humorous effects result from unexpected simplicity. ➢ See next exercise.
– Why did the fly fly?
– Because the spider spied her!
Now, let’s look at some "X is compression" conjectures that are relevant to computer science in general.
Compression and computer science
Coding is compression
Shannon-Fano codes, Huffman codes, hash codes are methods for designing compact codes. These codes replace the data. Since they are more compact, they achieve compression.
Storing is compression
Once data is stored, it can be defined by its address. As the latter is generally shorter than the data stored (which might sometimes be arbitrarily large), storing achieves compression.
Indexing methods are used to diminish the average size of addresses.
Forgetting is compression
One should forget only unimportant data. Importance, as relevance, is associated to unexpectedness. As time goes by, description complexity increases (as log(t)), making some data less and less relevant, until they become good candidates for forgetting.
Pattern recognition is compression
Whenever a structure is recognized in an image, the complexity of the pixels or sub-structures it contains drops as they can be predicted from the structure.
Image synthesis is compression
Fixed or animated image synthesis is efficient when it can be controlled using only a few parameters (e.g. recursion depth in a fractal). The fact that realistic images can be compressed down to those few parameters is often remarkable (see the demoscene mentioned in Chapter 2).
Visualizing is compression
Large data visualization, for instance for the description of large social networks, requires lossy compression to generate pictures that a human mind can understand.
Schematic is compression
Schematic results from lossy compression: contours are simplified and some objects are replaced by symbols. Note that the whole point is to keep the semantics and relevance of what is represented.
This is also true of comics.
Hash codes
Some hash code functions such as SHA-256 are designed to be virtually impossible to reverse in practice.
They correspond to lossy compression:
Finally, let’s look at some other "X is compression" conjectures that are relevant to Artificial Intelligence.
Compression and AI
Information is compression
The smallest program p that generates an object s represents its informational content. It is a compressed version of s. ➢ see Chapter 2.
The probability of occurrence of an object is measured by the cumulative probability of the minimal programs that generate it. ➢ see Chapter 3.
Analogy is compression
Good proportional analogies (A is to B as C is to D) minimize the overall complexity of (A, B, C, D). ➢ see Chapter 4.
"Comprehension is compression"
This is Gregory Chaitin’s great aphorism. Understanding a phenomenon means replacing observations by a theory. A good theory realizes a trade-off between simplicity and accuracy. ➢ More on this.
Proving is compression
A proof replaces many true statements by a theorem. The theorem is a compressed version of the (often infinite) set of statements it proves.
Conversely, the compression argument can be used in many proofs by contradiction. ➢ see Chapter 3.
Intelligence is compression
Intelligent behaviour amounts to taking the best rewarding action based on the most probable, i.e. the simplest, future. ➢ see Chapter 4.
And more generally, any intelligent device (and this includes the human brain) seeks to achieving maximal compression (within the constraints of available resources).
So, "everything" is compression!
Why do you mean by "everything"?
From what I see here, every cognitive process generates complexity drop.
I don’t see why.
So it seems that probability, aesthetics, creativity, relevance, responsibility, even humour, and many more, correspond to compression.
Maybe it is just due to the way our brain functions.
Or maybe one could say that any computation leads to information compression.
I disagree. For instance, your brain is processing the syntax of what I’m saying, and...
and you mean that there is no complexity drop when processing a sentence?
Indeed! At least not at the syntactic level.
And yet, once you get the syntactic structure, words fall in place, and the sequence of words becomes less complex.
OK. Now I see your point. But that form of complexity reduction is quite different.
You mean, it’s not a difference between expected complexity and observed complexity?
That’s it.
Maybe we should say that we are more concerned about "explicit" complexity drop here?
Yes. The kind of compression you can experience... and talk about!
This "explicit" form of compression is what determines surprise.
To me, it’s the amazing side of complexity drop.
As in fortuitous encounters: you expect to meet someone complex and you see your neighbor.
My favorite is the story about Larry Kagan’s structures: you expect a complex shadow and you see something familiar projected on the wall.
Did you see the next topic? It’s about emotion.
Yes, I wonder what complexity drop has to do with emotion.
Emotional intensity
Problem:
Nothing can be further from emotions than the cold computations of AIT!
And yet, Algorithmic Information is crucial to compute emotions. Not the emotions themselves, but their intensity. Emotions elicited by the occurrence of events may be intense. Their intensity is due to two factors:
The type of event (are there casualties, or are we talking of a 10€ loss);
The event’s unexpectedness.
Simplicity is of course at work in unexpectedness, since unexpected situations are by definition abnormally simple.
As we saw when we discussed the "law of distant deaths", simplicity is involved in the amplitude of the emotional stimulus. For instance, under the hypothesis that the impact of each casualty is proportional to its complexity, we can derive that emotional intensity varies as the logarithm of their number.
Wheel of Fortune (1)
In a wheel of fortune game, the player launches the wheel which then stops at one of the \(N\) discrete spaces of the wheel. There is only one winning space.
Imagine for a moment that the player, though having won, is denied the reward. The value of emotional intensity would be the same as for winning, though with an opposite sign.Let’s suppose now that the player did not win because she missed the winning space.
She might be upset as well.
In these kinds of situations, people tend to consider the unexpectedness of the counterfactual win. They are all the more upset as they failed by a small margin.
Their (negative) emotion is, in absolute value, as intense as the emotion of winning, minus the counterfactual unexpectedness of reaching the winning space.
Wheel of Fortune (2)
Suppose that the wheel stopped at distance \(k\) of the winning space (near miss).
Interview
Recap
Take-home message
We hope you enjoyed this chapter on AIT and Machine Learning.
Let’s leave the last word to Angel@ and Alexandr@.
Subjective probability seems to function the other way around.
What do you mean?
While algorithmic probability declares objects improbable if complex, situations are subjectively improbable if they are too simple!
Oh, as in the lottery examples?
Yes. Improbable if simple.
Not in the case of magic.
You’re right.
Now I remember.
Improbability is due to causal complexity in that case.
Or rather to the difference between causal complexity and description complexity.
Yes. That’s how unexpectedness is defined.
I was amazed by all the consequences of this definition.
You mean, concerning interest, relevance, aesthetics and all the other predictions?
Yes. Relevance is my favorite one.
Why especially relevance?
Because it defines intelligence, according to Turing’s imitation game.
Well, if I follow you correctly, relevance, and therefore complexity drop, define the way we think?
Yes. It’s as if our mind was some kind of compression machine.
That’s something!
Indeed, it’s quite unexpected (oops) to see our minds put in equations that way!
It seems you’ve become a supporter of compressionism now.
Ha ha, yes, I guess I’m officially converted.
Me too!
Test (1)
People find that getting ten "tails" in a row is (subjectively) less probable than getting tail-head-tail-head-tail-head-tail-head-tail-head. Why?
The interest of news decays with time as \(2 \times log_2(t)\), where \(t\) is the time interval since the news.
Test (4)
Are rhyming verses appreciated because the identity of sounds spares bits of information in description complexity?
Test (5)
When asked whether they are happy to find a folded 10-euro bill on the ground, people declare that they would be happier if it happened in the forest rather that on a commercial street in the evening.
Why?
Test (6)
Why are important events celebrated more intensely after 10, 50 or 100 years?
Forum: Chapter5
Do you have feedback or suggestions about the topics addressed in this chapter?
Dessalles, J.-L. (2008). Coincidences and the encounter problem: A formal account. In B. C. Love, K. McRae & V. M. Sloutsky (Eds.), Proceedings of the 30th Annual Conference of the Cognitive Science Society, 2134-2139. Austin, TX: Cognitive Science Society.
Dessalles, J.-L. (2013). Algorithmic simplicity and relevance. In D. L. Dowe (Ed.), Algorithmic probability and friends - LNAI 7070, 119-130. Berlin, D: Springer Verlag.
Falk, R. & Macgregor, D. (1983). The surprisingness of coincidences. In O. Svenson & A. Vári (Eds.), Analysing and aiding decision processes, 489-502. Budapest: Adadémiai Kiadó.