A better proof builds internal links to strengthen the theorem's inner
structure.
Example. The link from [Cou, vol. 1, chap. II, '
4.1] to [Cou, vol. 1, chap. II, ' 4.3].
A better proof builds external links for generalization to various
perspectives.
Example. The link from [Cou, vol. 1, chap. II, '
4.1] to [Cou, vol.1, chap. II, ' 4.2] .
A better proof eliminates complexity.
Example.
In [Cou, vol. 1, p.68, l.- 4-p.69, l.26], the Fourier
coefficients are too complicated to calculate. By contrast, in [Cou,
vol. 1, p.70,l.-
6-p.71,l.2], the Fourier coefficients are easy to calculate.
(Fejér's theorem)
A theorem has physical meaning. A good proof should follow closely with
the theorem's physical ideas. [Cou1, vol. 1, pp.447-450] provides a good proof of Fejér's theorem.
[Cou1,
vol. 1, p.448, l.- 11,-l.-
5] is especially noteworthy because it indicates the essential physical idea of the proof,
which is a very important point
that is used again and again in quantum mechanics but is left out in many
modern advanced calculus books. [Ru1, p.176, Theorem 8.15] generalizes Fejér's theorem somewhat, but fails to pinpoint the physical idea behind the proof.
Mathematicians' contribution over seventy years since [Cou1] was published should not regress
by making the
proof's physical idea unrecognizable through tampering with the original good proof .
A textbook like advanced calculus should not be written by someone who only has
a title of mathematics professor at an outstanding university. It must be written
by someone who has a strong background in physics because such a person knows
how to select the right material and preserve the essential point. Otherwise, a
popular textbook that is of poor quality only causes great damage to many
students simply because of the author's ignorance.
Sometimes using many
indices often involves too much detail that will hide the key message behind
the proof. In such a case, a rigorous formal proof only makes it more difficult
for the reader to understand the essential meaning of the theorem. Consequently,
it is better
to point out the main idea and then show how we engage the idea with the simple settings.
Example [Cou, vol. 1, p.20, l.- 3-l.-
2].
Main idea: The derivative of a determinant equals the sum of derivatives for
fixed rows.
It is enough to consider the coefficient of l in
the polynomial D (u,y;l )
for the cases (n=2 and 3).
The geometric proof in [Cou,
vol. 1, p.166, Example (d)] is more effective and
intuitive than the analytic proof in [Cou, vol. 1, p.167, l.-
4-l.- 1].
(Intuition) Intuition leads to guidelines for proofs.
Because the main ideas of outer regularity [Ru2, p.49, l.11], Lebesgue's dominated convergence theorem [Ru2, p.27, Theorem 1.34] & consistency of tree measure [Kem1, p.42, l. 13-l. 6; p.43, l. 8-l. 2] are similar, it is a natural consequence that their proofs are closely related to each other.
(Implicit function theorem) The statement of a
theorem should be simple to make its key idea easy to recognize. The formulation should
not be affected by the method used in its proof. The statement in [Buc1, p.284, Theorem 26]
is simple, while the statement in [Spi1, p.41, Theorem 2-12] or [Ru1, p.196,
Theorem 9.18] is too complicated.
Deduction gradually decreases the information given in the
hypothesis. Consequently, we would like to preserve as much information as
we can when we reach the conclusion. The proof of [Buc1, p.284, Theorem 26]
preserves a great deal of information for the solution because the proof is
based on the inverse function theorem . In contrast, the reasoning in [Cou1,
vol.2, l.119, l.-1-p.120, l.-8]
loses so much useful information that the solution's function form can no
longer be recovered. In particular, we can deduce y' =j(x,y, Fy')
[Cou, vol.1, p.202, l.-12] from the proof of
[Buc1, p.284, Theorem 26], but not from the proof in [Cou1, vol.2,
pp.119-121, §6]. Therefore, the proof of
[Buc1, p.284, Theorem 26] is more effective than that of [Cou1,
vol.2, P.114, l.1-l.13].
(Commutative diagrams)
Lagrange's equations imply that the Hamiltonian is a constant of motion.
This theorem can be expressed in the following two mathematical forms:
The proof of statement A is given in [Akh, p.11, l.13-p.12, l.10]. The proof of statement B is given in [Akh,
p.14, (5)]. Since the implication operation "Þ" commutes with integration or differentiation,
either of the above two proofs
by itself can be used to prove both statements.
A good proof follows geometric intuition closely. The proof
of the divergence theorem given in [Cou1, vol.2, p.385, l.2-p.386, l.17] follows geometric intuition
closely, while the
proof given in [Cou2, vol.2, pp.639-642, §A.2.b]
does not. As a result, the construction in [Cou2, vol.2, p.640, l.2] is
extremely ineffective.
If a formula is independent of coordinate systems and
the proof of the formula uses a special coordinate system, then we must prove that the formula is invariant under coordinate transformations.
Example (the divergence theorem) [Cou1, vol.2, p.387, l.-11;
Cou2, vol.2, p.638, l.6-p.639, l.9].
Strategy versus simulation. For the 3-dim case, [Cou2] gives two proofs of the divergence theorem.
The proof given in [Cou2, vol.2, pp.597-601, §5.9a]
can be easily simulated, while the proof given in [Cou2, vol.2, pp.639-642,
§A.2b] cannot. The latter proof only
gives a strategy of how to handle the higher dimensional surface. The
existences in many steps of the argument are either assumed by axioms [Cou2, vol.2, p.637,
l.2] or involved in complicated calculations [Cou2, vol.2, p.641, l.12]. How
effectively these existences can be constructed depends on each individual
case we consider.
Effective design for proving a group of theorems. If we want to prove a group of theorems and try to find the most effective way to prove the entire set of theorems, we must search for the main links among them, and then arrange them
in a logical order. For example, we want to prove the following two
theorems:
Jn(r) is the solution of [Cou, vol.1, p.303, (28)].
[Cou, vol.1, p.304, (29)].
The design in [Cou, pp.303-304] is somewhat redundant,
while the design in [Col, p.245, (IV.40) & (IV.41)] is systematic and
effective.
We may use either [Wangs, p.60, l.19] or [Ru2, p.31, Theorem 1.39(b)]
to prove [Jack, p.29, (1.13)].
We may use the idea in [Wangs, p.65, l.16-26] or that in [Ru2, pp.246-247, Exercise 16] to prove [Jack, p.30, (1.15)].
A more effective proof uses a weaker property to obtain a stronger result.
Example. Let k be 0 or a positive integer, f(a)=y(a+k)-y(a),
g(a)=y(1-a-k)-y(1-a),
and h(a)=p cot
pa. By [Guo, p.107, (4)], f(a)ºg(a)ºa-1+…+(a+k-1)-1.
The summands in f(-n) (where n is a positive
integer) have nothing to do with
y(-n)=¥. However, [Guo, p.145, l.9] proves f(-n)=g(-n)
as follows: f(a)=y(a+k)-y(a)=y(1-a-k)-h(a+k)-[y(1-a)-h(a)]
(by [Guo, p.107, (3)]) =y(1-a-k)-y(1-a)
(since 1 is the period of h(a)) =g(a). Although h(-n)=¥=y(-n), f(a) and g(a) approaches the same limit as
a®-n. All the former
proof does is nothing but change the dummy index. The latter proof uses the periodicity of function h and the concept of limit to overcome the difficulty of explaining
¥-¥=0 in this case. An ineffective proof often
comes from the failure to study the problem deeply enough to grasp its
center or to see the essential point clearly.
A simple and effective proof must comply with the theorem's intrinsic
nature.
The idea of analytic continuation in [Leb, p.249, l.9-l.15] fulfills the goal of extending the domain of the functions
in a simple manner, so the proof based on this idea is simpler and more effective than that of [Guo, p.160, l.-14-p.161,
l.6]. The later proof is ineffective because it repeats a tricky
method three
times.
If we want to try to use one of the two theorems (Theorem A and
Theorem B) to prove Theorem C, it is usually easier to proceed by choosing
the one whose pattern is closer to Theorem C's.
The hypergeometric functions appear in both side of the equality in [Guo, p.204, Ex. 6]. We can express these functions in
their series representations [Guo, p.153, (1)] or in integral representations as [Guo, p.153, (7)].
The series representations have powers of z and the integral representations have
the powers of 1-zt, so in term of the powers of
z, the pattern of the series representations is similar to that of [Guo, p.204, Ex. 6].
This is the reason why we choose the series representations [Guo, p.153, (1)]
to prove [Guo, p.204, Ex. 6].
The more powerful proof of a set of proofs is the one that has the potential for
generalization.
We must compare the proof given in [Jack, p.98, l.7-l.16] with the proof given in [Guo, p.210, l.-3-p.220, l.7] in a large context. The former is consistent with the scheme given in the proof of [Bir, p.258, Theorem 1], while the latter is not.
Proofs using consecutive points.
Two proofs are given in each of [Wea1, Art. 16 & Art.
18]. One proof uses the concept of consecutive points. The other is
analytical. There is some ambiguity in the concept of consecutive points.
We have to avoid answering certain questions, such as what is the
distance between two consecutive points, and is it zero or nonzero? As
applied in [Wea1, p.12, l.-20], it should be zero.
As applied in [Wea1, p.66, l.10], it should be nonzero. In geometric
terms, the trouble can be described as follows: there cannot exist two
points next to each other; there are always many points in between. However, we should not discard all the advantages the concept produces
simply because of this small drawback. Indeed, a proof using consecutive
points greatly helps us visualize the picture and formulate the statement.
In some cases, the meaning of a proof using consecutive points is vague.
We must use the analytical proof to make the meaning precise. For example,
in [Wea1, p.68, (3')], it is difficult to recognize that
k is a variable whose value depends on the
chosen direction. We
must use the concept of principal directions and [Kre, p.126, (37.2)] to
obtain its precise meaning.
Natural proofs.
Professors in the world's elite universities love to
tamper with the proofs of masters and then claim them as their own.
This practice often disrupts or even reverses a proof's natural flow.
Thus, they do more damage than benefit to a reader's
ability to understand. The proof in a modern textbook has gone through many
cycles of this type of "plastic surgery". This is why the proof will not
stick in the reader's mind.
A natural proof is a proof in which one step naturally leads to the next.
For example, the proof in [For, p.5, l.-9-p.6,
l.8] is a natural proof, while the proof in [Wea1, p.15, l.5-l.-3]
is not. This is because [r', r'', r'''] in [Wea1, p.15, l.-7]
comes from nowhere. It makes sense only through hindsight. I would say a
tampered proof is a drunkard's proof. A normal human walks forward. Only a
drunkard sometimes walks forward and sometimes walks backward.
The argument of a natural proof must be closely related to the definition
of the problem. It should not use remote derivations.
Suppose AÞA1ÞA2ÞA3ÞA4ÞA5ÞA6ÞA7ÞA8ÞA9ÞA10.
We want to prove AÞB. We would like to use A1
or A2 to prove B instead of A9
or A10.
Remark. [Kli, p.59, l.3-l.5] informs that [Kre, p.174, Theorem 55.2] is
incorrect. The proof of [Kre, p.174, Theorem 55.2] uses the concept of asymptotic
directions, a remote concept, while the proof of [Kli, p.59,
Theorem 3.7.9] uses the concept of principal directions, a more basic and
direct concept. In other words, the latter proof operates on familiar ground, while the former proof see things from a distance. Thus,
it is difficult to detect the hidden error in the analysis used in the latter
proof.
The proof of [Lau, p.68, Theorem 6.3.1] simplifies that of
[Kre, p.174, Theorem 55.2] by not using the
concept of asymptotic directions. However, there are two places in the proof of
[Lau, p.68, Theorem 6.3.1] that require further explanation: we should use
[Kli, p.46, Collary 3.5.3] to justify [Lau, p.68, l.14] and use [Kre,
p.33, (10.8)] to justify that u2-curves
are straight lines [Lau, p.68, l.22].
A natural proof contains only a single and pure idea: I does not
involve complications. Unless absolutely necessary, unrelated material should be removed.
For example, the proof in [For, p.3. l.16-l.-13] is natural,
while the proof in [Wea1, p.12, l.-19-l.-13]
is not. The latter proof uses a more remote idea, [Wea1, p.12, Fig.2],
which is useful in proving the statement in [For, p.4, l.4-l.5]. Next we
consider the idea's ability to reduce the argument in [For, p.3. l.16-l.-13].
Indeed, this idea can reduce the argument, but it cannot strengthen the
result. A natural proof will use the sources within its own domain and
involve external resources as little as possible.
A proof motivated by condensing a theory is not a natural proof.
Suppose a theory contains the statements in Group A and the
statements in Group B. If every statement in Group B can be derived from the
statements in Group A, we would like to keep the number of statements in
the theory to a minimum by
omitting the statements in Group B. Furthermore, if the statements in
Group A represents the established theory and Statement C, related to the
theory, cannot be derived from the statements in Group A, we may
easily recognize Statement C as new blood in the theory.
[For, p.46, l.-10-p.47, l.17] and [For, p.46,
l.18-p.48, l.14] give two proofs of the statement
[(Q=F=0)Þ(Q'=F'=Q"=F"=0)].
The former proof is a natural proof, while the latter proof, motivated by
keeping the number of statements in the hypotheses of Bonnet's theorem
[For, p.50, l.14-l.16] to a minimum, is not.
A proof with many detours can be board and deep in other aspects.
If we compare the proof given in [For, §20]
with that given in [Wea1, p.27, l.6-l.17], we see that the latter is
obviously simpler and more direct. However, the former
talks about not only the curve but also its rectifying developable, while
the latter limits its discussion to the curve only. Furthermore, the former specifies
that the fixed direction is the rectifying line, while the latter fails to do so.
Before we prove a theorem, we must understand its theme. Even though the methods of the proof can be different, the theme will not change.
Example. [Spi1, p.123, Theorem 5-4] says that the integral over a surface
is independent of coordinate systems. The proof given in [Spi1, p.123] and
that given in [Spi, vol. 1, p.347] play the definitions in slightly
different ways.
(Proof pattern) [Spi, vol. 1, p.346, l.11-l.12] says that Stokes'
theorem is the right tool to prove that certain forms on Rn-{0}
are not exact. This provides a guide on what tools we should choose when we
try to solve this type of problem [Spi, vol. 1, p.346, l.-4].
The
theorems we quote in
a proof should be as closely related to the context as possible. It is natural and
advantageous to use theorems from group theory rather than set theory when we discuss groups
[Mun00, p.411, l.-10]. References should be
treated the same way.
(The Schwarz inequality) [Spi, vol. 1, p.411, Theorem 1(2)]
The proof of the Schwarz inequality given in [Spi, vol. 1, p.411, Theorem 1(2)]
is divided into two cases: the first case leads to equality and the second
case leads to the strict inequality. Linear dependence is the key to the
neat division. In contrast, [Roy, p.210, l.12-l.21] proves the inequality
first and then from hindsight checks all steps of the proof to see which of them
contribute to the equality. Royden's approach involves a time-consuming
checkup and results in a redundant and entangled argument.
If we compare
the tools of a proof to shoes, we should tailor our tools to our need just
as we tailor our shoes to our feet. We do not try to accommodate our
argument to the form of a theorem available at hand just as we do not cut our toes to
fit the shoes.
Remark. [Dug. p.352, l.6-l.8] tries to find D, D+
, D- such that they satisfy certain
properties. All we have to do is find D+
and D-
such that D+
Ç D- = Æ.
Then let D = f-1(D+).
We do not need to use [Dug, p.86, Theorem 11.2(1)]. If a theorem
originates from the abstraction of a bad proof, we should discard the
theorem
rather than allow it to occupy precious space in a textbook.
A proof should not be ambiguous.
In [Mun00, p.368, l.17], Munkres should have said, "Apply the Lebesgue number lemma to [0, 1]" rather than "Use the
Lebesgue number lemma". If the readers were to apply the lemma to X,
they would be making a mistake.
How we make proofs effective
Do not digress from the theme.
If a problem arises from a certain theory, then as a
rule of thumb its solution is likely to be found in the same theory. The
proofs of [Cod, p.130, (7.2); p.131, (7.3)] use the theorem given in
[Cou2, vol. 1, p.286, l.35-p.287, l.11]. We may prove this theorem [Cou2,
vol. 1, p.287, l.-15-l.-1] without using the theory of analytic functions [Kap, p.614,
(8.88)]. Indeed, the digression from partial fractions to the theory of
analytic functions only makes the proof ineffective [Guo, p.64, l.-3-p.65,
l.7].
(Revisions)
In [Gon1], González proves ei
¹ ej (i
¹ j) three times
[Gon1, p.449, l.4-l.5; p.450, l.2-l.6; p.456, l.1-l.6]. His first proof is
not clear. His second proof is better. His third proof is the clearest of
the three. Actually, we need only one proof at the right juncture. In
order to acknowledge drawbacks in time, we should not wait until the
picture of the situation is faded and the assumptions [Gon1, p.449, l.5]
are no longer fresh in our memory.
It is usually more direct and effective to attack a problem with a closely-related method.
It is possible to prove Liouville's theorem using the method of Fourier
analysis [Ru2, p.228, l.8-p.229, l.2], but it is more direct and effective
if we use a method of complex analysis instead [Wat1,
p.105, §5.63]. In
addition, the latter approach helps us master the subject of complex
analysis.¬
How to make a proof more accessible to readers
Make the tools as specific as possible so that readers may understand what
exact tools are required to solve the problem. In
[Wat1, p.496, l.4-l.6], Watson says, "By the general theory of
differential equations of the first order, these integrals cannot be
functionally independent." This statement is too vague and too general for
readers to understand. The proof will become clearer if we replace [Wat1,
p.496, l.4-l.9] with the following argument: Let v
= 0. Then u = a and C = sn u = sn
a. If Watson considers it
necessary to know the fact that a differential equation of the first order
allows only one independent integration constant, he should put it as a
side comment after his proof. This approach will not discourage readers'
attempt to fully understand the proof if they do not have any background in
differential equations. The explanation given in [Guo, p.538, l.3-l.6] is
worse than Watson's.
A formulation should not cause confusion and should help readers avoid detours.
The meaning of t given in [Wat1, p.499, l.2] is ambiguous. From the appearance of the expression (1-k2t2)1/2,
the expression seems more closely related to the formula given in [Wat1,
p.492, l.14] than the formula given in [Wat1, p.493, (III)]. In fact,
t = sn (u, k). In order to meet the requirement t Î
[0, 1], we have to use the formula given in [Wat1, p.498, l.-7]
to prove that sn -1 y increases as y increases from 0 to 1. Consequently, sn (u,
k) increases from 0 to 1 as u increases from 0 to K. The omission of t's
definition may lead readers to the following detour: Treat (1-k2t2)1/2
as a function of two variables k and t, where 0 < k < 1 and 0
£ t £ 1.
dn (u, k) =
±(1-k2sn2
(u, k))1/2 is a continuous and nonzero
real-valued function. Since dn (0, k) = 1, dn (u, k) > 0 on (u, k)
Î [0, K]´(0, 1).
Remark. Readers cannot read an author's mind. Specifying the exact meaning
of his statement may save readers from unnecessary guesswork.
(The idea behind a proof vs. the key for generalization)
For a theorem's proofs, on the one hand, we desire to
find a concrete setting so that we may easily visualize the big picture;
on the other hand, we need to find a key for generalization. It might be difficult to recognize the idea behind the proof if we
were to study
the generalized version of the theorem alone.
Example 1.
The proof of the theorem given in [Zyg, vol.1, p.17, l.11] is based on the
geometric theorem given in [Zyg, vol.1, p.16, (9.1)]. This proof helps us
visualize the big picture. The proof given in [Ru2, p.65, Theorem 3.5 (1)]
uses the concept of convexity [Ru2, p.65, l.-9]
which is the key for generalization.
Example 2. [Zyg, vol.1, p.45, l.-5-l.-4] provides an intuitive proof of
the Riemann-Lebesgue theorem [Zyg, vol.1, p.45, Theorem 4.4]. The
second proof given in [Zyg, vol.1, p.46, l.1-l.6] uses [Zyg, vol.1, p.37,
the third equality of (1.13)] which is the key to proving the general case
[Zyg, vol.1, p.46, l.6].¬
A proof must be insightful. When proving a theorem, one should focus
on our goal rather than be distracted by side lemmas. When quoting a
theorem, one should have a clear idea about its proof.
Example. (Lebesgue's dominated convergence theorem)
Titchmarsh's approach: The proofs of [Tit2, p.337, Theorem 10.5; p.345,
Theorem 10.8] start from scratch and use the same method. They are natural
and insightful. One need not use complications such as lim sup or lim inf in Titchmarsh's
proofs. In addition, one need not make a fuss by developing small tricks
into lemmas.
Rudin's approach: [Ru2, p.27, Theorem 1.34] follows from [Ru2, p.24, Theorem 1.28]. [Ru2, p.24, Theorem 1.28] follows from [Ru2, p.22, Theorem 1.26]. There are three ideas in total.
Poyden's approach: [Roy, p.88, Theorem 15] follows from [Roy, p.83, Theorem 8]. [Roy, p.83, Theorem 8] follows from [Roy, p.81, Theorem 6]. [Roy, p.81, Theorem 6] follows from [Roy, p.71, Proposition 23]. There are four ideas in total.
The proof of [Roy, p.81, Theorem 6] uses Littlewood's third principle
[Roy, p.71, l.7-l.8]. This principle provides a formal procedure which in this case is not the most effective tool
toward proving Lebesgue's dominated convergence theorem.
There are two ways to prove a formula: An argument which leads directly to the formula or an argument which can merely prove its validity if the formula is given.
The latter approach figures out the argument from hindsight. We prefer the former approach because it is more creative, constructive, and thus more inspiring.
Example.
(Neumann's formula for Qn) Both [Wat1, §15.34] and [Guo, p.227, l.-11-l.-6] try to prove Neumann's formula for Qn. The former argument merely proves the formula's validity only when the formula is given. The latter approach uses [Wat1, p.316, l.-7]
to derive the formula without guessing the formula beforehand. Remark. For a good use of cancellation, we must know how to pair the values of the gamma function. In order to prove the equality given in [Wat1, p.320, l.-9], we apply the duplication formula
[Wat1, p.240, l.-3] to G(n+2-1)G(n+1),
G(2-1(n+1)+m)G(2-1n+1+m), G((n+1)/2)G(2-1n+1) and G(n+m+1)G(n+m+3/2).¬
Squeezing the solution out of a maze
Suppose we want to prove the second equality given on
[Wat1, p.330, l.2].
In view of the first equality given on [Wat1, p.330, l.2], it suffices to express Cnn+1 in terms of Cn-1n+1 and Cn-2n+1.
We can do so by using [Guo, p.274, (3)].
Recurrence relations for Bessel functions
The proofs given in [Wat1, §17.21] use integral representations,
while those given in [Guo, p.349, l.8-p.350, l.9] use series. The latter
method is more elementary, so it is better. Some people may argue that the
former method is applicable to more general cases because n can be real or complex
[Wat1, p.359, l.20]. However, we can use series and analytic continuation [Wat1, p.368, l.-7-l.-5] to
achieve the same thing without using integral representations.
Good proof: assume the answer is unknown and then construct the answer.
Bad proof: assume the answer is given and then show that it is correct.
Example. Suppose we want to prove the equality given in [Inc1, p.188, l.-9]. We may apply [Cod, p.86, (6-12)] to the left-hand side of the equality to construct the right-hand side, or start from the right-hand side to mechanically obtain the left-hand side
through differentiation.
Choosing the starting point as close to the target (conclusion) as possible
If we choose the starting point as close to the target (conclusion) as possible, the proof will go smoothly, i.e., reach the conclusion easily, quickly, or straightforwardly.
Example 1. The proof given in [Bel, p.13, (5)] reaches the conclusion more
easily than that given in [Bel, p.13, l.-14-l.-8].
Example 2. Using [Har, p.46, (1.5)] to prove the statement given in [Har, p.46, l.-10-l.-9]
reaches the conclusion more easily and straightforwardly than [Har, p.46, l.-6] than using uniqueness [Har, p.46,
Corollary 1.1; l.-13].
Remark. The fact that AÞBÞC may give us a wrong impression that it is easier to prove BÞC than to prove AÞC because A is stronger than B, i.e., A contains more resources than B. In fact, it is easier to prove AÞC than to prove BÞC because A is stronger than B, i.e., A contains more resources than B. In other words, the proof of AÞC is at least as effective as that of BÞC. In terms of Venn diagrams, the set characterized by A is smaller than the set characterized by B; it is easier to put a smaller ball into
a bag.
Finding the primitive model with a simple setting
If one wants to prove the statement given in [Har, p.85, l.-3-l.-2]
or that given in [Cod, p.125, l.11-l.14], there are three ways to do so: Read [Yos, p.37, l.-1-p.39,
l.16], the proof of [Har, pp.70-71, Theorem 10.1], or the proof of [Cod,
p.109, Theorem 1.1]. The first way provides the primitive model for
the case d = 2. One can easily generalize the argument to the general case
of order d. This is the reason that the first way is more inspiring
and insightful than the last two ways. The Jordan canonical form and ODEs of order d involved in
the last two ways are unnecessary complications for our pursuit.
Note that for the Jordan canonical form, [Har, p.59, (5.15)] is
consistent with [Jaco, vol.2, p.97, (29)] through a proper order of the elements of the basis, while [Cod, p.63, l.-5] is
not. Consequently, in order to obtain the solutions of desired
form given in
[Har, p.85, l.-2], we may consider the fundamental matrices Y(t)T given
in [Har, p.71, l.8], Y(t) given in [Har, p.71, (10.2)], or S(z-z0)P T given in [Cod, p.110, l.5],
but not S(z-z0)P.¬