User blog:Rgetar/Taranovsky's C format improvement proposal

Recently I noticed similarities between Taranovsky's C and my booster-base (BB) system, and then I thought: "What if use BB format for Taranovsky's C itself?"

New format
I think it can be done like this: remove all "C", ",", "0", and in all "C(a, b)" move b outside "", that is
 * 0 → empty string
 * C(a, b) → (a)b

(In my BB system I use "[", "]" instead of "(", ")", but it does not matter).

Comparison
"C" and "0" were used in comparison, but we removed them.

The comparison algorithm: remove all ",", "(", ")", reverse strings and then compare them lexicographically, where "C" < "0" < "Ωn".

I think that there is an equivalent comparison algorithm, which can be used in the new format: reverse strings and then compare them lexicographically, where "(" < ")" < "Ωn".

I was going to write a proof here (attempt of proof): Proof. If strings are the same, then they are equal. Else, we have 3 cases:
 * 1. Both strings are not C. Then Ωn > 0, and in new format Ωn > empty string.
 * 2. One string is C(a, b), and another is not. Let β is the another string. Then if b is not C, then compare b and β, and if b ≥ β then C(a, b) > β, else C(a, b) < β. Else, compare b and β. Indeed, C(a, b) → Cab → baC, and if b is one symbol, then compare it with β, and if b > β, then baC > β, if b = β, then aC > empty string, so baC > β, and if b < β, then baC < β. Else, compare b and β. New format: (a)b → b)a(. If b = Ωn and β = empty string (b > β) then Ωn)a( > empty string, so b)a( > β; if b = β then )a( > empty string, so b)a( > β; if b = empty string and β = Ωn (b < β) then ) < Ωn, so )a( < Ωn, b)a( < β. Else, compare b and β.
 * 3. Both strings are C. Let they are C(a1, b1) and C(a2, b2). Then compare b1 and b2, and if they are equal, compare a1 and a2. Indeed, C(a1, b1) → Ca1b1 → b1a1C, C(a2, b2) → Ca2b2 → b2a2C.

but I failed to prove it.

However, I think that new comparison algorithm should work: I noticed that comparison algorithm from here (it is translation of AAAgoogology's user page "TaranovskyのC表記の解析" by Rpakr and koteitan) basically coincides with algorithm used in my programs, and, I think, it should be equivalent with the lexicographic algorithm, then I thought what the new lexicographic algorithm should be, and then I replaced algorithm in my program with it, and it works the same, but faster.

Advantages of new format
It has fewer symbols (3 instead of 6), strings are of shorter length, strings often have fewer "nested levels", that is "" inside "", so it is more simple.

Also, comparison is more simple, since there is no need to delete symbols.

For example, 8
 * C(0, C(0, C(0, C(0, C(0, C(0, C(0, C(0, 0))))))))

has 8 nested levels. In new format:

it has 1 nested level. Or ωω ω + 12 + ωω 2 + 1:
 * C(0, C(C(C(0, C(0, 0)), 0), C(C(C(0, C(C(0, 0), 0)), C(C(0, C(C(0, 0), 0)), 0)), 0)))

In new format:

Also, here tree format is used. In new format number of "nested levels" is "height" of a tree: a    b     c     d C  —  C  —  C  —  C  —  e is (a)(b)(c)(d)e (height of tree is 1, and 1 nested level), and a C —  b C  —  c C  —  d C  —  e is ((((a)b)c)d)e (height of tree is 4, and 4 nested levels).