- Let statement A be one of the following formulas: [Leb, pp.247-250, (9.5.7)-(9.5.10)]. More specifically, let A
_{i}=[Leb, (9.5.i)]. Let B=[Guo, p.70, (13)]. The proof of [BÞA_{7}] is difficult because it uses the theory of differential equations and because the operation of linear transformation on the solutions of a differential equation is not intuitive. All the proofs of [BÞA_{i}] can follow the same complicated method. For example, [Guo, p.161, l.9, l.-1] follows the pattern of [Guo, p.159, l.9-p.160, l.11]. However, we can restrict the complexity to the proof of [BÞA_{7}] alone and prove other A_{i}'s by combining the derived formulas [Leb, p.249, l.16-l.17].

- (The Mehler-Dirichlet integral representation of the associate Legendre
functions)

A good start is half of the battle. If we work on something simple, the idea of the argument is easily recognized. If we work on something complicated, we will have more problems to solve and our argument will become fragmented. It is wise to start with a simpler basis which shortens the argument, produces the same effect, and highlights the essential ideas.

[Guo, p.263, (2), (3); p.264, (6)] generalize [Hob, p.25, (24), (25); p.24, (23)]. However, Guo's proofs are awkward. This is because he extends an associate Legendre function to the complex domain in the beginning of his proofs. If he could first establish the forms of the the above formulas on the real line using Hobson's elementary method, and in the end use the concept of analytic continuation to prove that these formulas are valid in the extended domain, the argument would be much clearer and simpler. For example, the problem Guo worries about in [Guo, p.263, l.-7] would not arise.

Remark. Nonetheless Guo's approach has the advantage of helping us replace the integral on a line segment with a more__flexible__contour integral. For example, [Guo, p.263, (4)] = [Guo, p.265, (7)].

- If we just follow other people's
footsteps and try to figure out the meaning of each reasoning step, then we can only understand a topic superficially. To gain
complete understanding, we must study the topic from various perspectives and
see the big picture by digesting the material thoroughly. Then we should manage
to adopt an advantageous viewpoint and find a simpler approach to the original
problem.

Example {1}.

### How we articulate the proof of a theorem with depth.

Example. [Bir, p.163, Theorem 11; Pon, pp. 179-180, Theorem 15].- Simplify the argument.

Let us compare the proof of [Bir, p.163, Theorem 11] with [Pon, pp.179-180, Theorem 15]. Pontryagin first proves that the solution is differentiable with respect to the initial points. Then he proves that the partial derivatives of the solution with respect to a component of the initial point satisfy the variational equation. Birkhoff merges these two arguments into one.

For approximations to the solution, Birkhoff uses the simple Lipschitz condition once [Bir, p.164, l.1]. In contrast, Pontryagin uses the complicated Picard approximation twice [Pon, p.172, l.6; p.175, l.-3].

The use of Taylor's Theorem [Bir, p.164, l.13] provides a more direct and straightforward argument than the use of Hadamard's lemma [Pon, p.174, l.-7] and does not cause the problem occurring in [Pon, p.176, l.6].

Apply the key idea in a flexible way. Unless necessary, do not add mathematical structures or stray from the point. In the proof of [Arn1, p.285, Theorem], Arnold discusses the variational equation along with a differential equation and treats them as an inseparable pair. Whenever he discusses one equation, he discusses the other a well. This makes it difficult for the argument to proceed. When he discusses the variational equation, the presumed inseparability makes him suddenly have the urge to discuss the linear system [Arn1, p.287, Lemma 7]. The discussion of the linear system is not even related to the theorem. Thus, this unnecessary digression only obscures the key idea. - Do not reverse the argument's
__natural order__.

In order to derive the formula for ¶/¶t, we introduce the quotient of differences given in [Pon, p.174, l.16]. Then we successively derive [Pon, p.175, (18); p.176, (21); p.177, (24)]. The argument's order is natural. In contrast, Birkhoff discusses [Bir, p.163, (25)] first. Only in the end of his proof [Bir, p.164, l.-3] may we understand the reason why he wants to discuss [Bir, p.163, (25)]. - Do not use confusing notations.

In [Bir, p.163, (25)], Birkhoff introduces the notation h_{j}. I believe many readers will mistake what is meant to be the index of a component of a vector (x_{1}, … , x_{n}) (j=1,…,n) for the index of a sequence (j=1,2,…). This is because he writes h_{j}®0 several times. In fact, h is the variable and j is just its parameter.

In [Arn1, p.286, l.-6], Arnold uses x_{1}to represent y in [Pon, p.178, (28)]. In [Arn1, p.287, l.9], Arnold tries to use x_{1}to represent x in [Pon, p.178, (28)], but he makes a mistake in his formulation ( perhaps the translator translates it incorrectly, see E). Thus, Arnold uses the same notation to represent two different things. In fact, Arnold's proof of [Arn1, p.285, Theorem] is pretty much the same as Pontryagin's proof of [Pon, pp.179-180, Theorem 15] except Arnold uses confusing notations. The only improvement Arnold has made is in his proof of [Arn1, p.286, Lemma 4]. - Do not say "repeating verbatim the reasoning of Sects. 3, 4, and 5" [Arn1, p.285, l.15-l.16].

This will not help readers grasp the essence of the argument. If they read the above three sections long ago, they will not even understand the reasoning. Give the__key points__instead: we prove Proposition (B) by__induction__on m by means of the__variational equation__[Pon, p.177, l.-14-l.-13]. - Do not make mistakes.

In order to satisfy dC/dt =AC [Arn1, p.287, l.23], x_{1}[Arn1, p.287, l.9] should have referred to the initial value instead of the solution. Based on the proof of [Bir, p.156, Corollary], Arnold has obviously made a mistake. - Do not leave out essential points and calculations.

The remark given in [Bir, p.165, l.10-l.13] should have been added to the end of the proof of [Arn1, p.287, Lemma 7]. Otherwise, the proof is incomplete. dx_{1}/dt=0 in [Arn1, p.287, l.12] should have been proved as follows:

dx_{1}/dt = (dB/dt)x+B(dx/dt)

= (dB/dt)x+B(dC/dt)Bx (because dC/dt = AC [Bir, p.156, Corollary] Þ B(dC/dt)B=BACB=BA)

= (dB/dt)x-(dB/dt)CBx =0.

- Simplify the argument.
- (Smoothness of roots)

[Cod, p.176, l.-14] says that the eigenvalues of A_{0}are smooth in t. The proof of this statement is founded on changing the basis. It is easiest to assume that A_{0}(t) is analytic in t first. For any fixed t on the overlap of two consecutive intervals, the chosen kth columns of B_{0}differ by a scalar factor because the dimension of the corresponding eigen vector space of any eigenvalue is 1. By expanding A_{0}(t) in the Taylor series at the fixed t instead of using continuity [Cod, p.176, l.-6-l.-5], we see that on the overlap of the two consecutive intervals the chosen kth columns of B_{0}differ by a scalar factor that is an analytic function in t [Cod, p.177, l.1-l.2]. Once we prove the analytic case, we may similarly derive the case of infinite differentiability.

Remark. The Weierstrass preparation theorem [Whi, p.16, Theorem 5I] says that the coefficients of the Weierstrass polynomial w(x^{ ~},l) are analytic. The proof uses [Whi, p.321, Lemma 5B]. Thus, the theory of complex variables is useful in proving the smoothness of the elementary symmetric functions of zeros, but is useless in proving the smoothness of zeros. However, if the polynomial w has the linear form det (A(x^{ ~}) - lE), where A is an n´n matrix, then the above algebraic method can be used to prove that its roots l_{1}(x^{ ~}), l_{2}(x^{ ~}), …, l_{n}(x^{ ~}) are analytic in x^{ ~}.

- The big picture may help us get out of the mess of entanglement once and for all [1].

- Removing irrelevant facts from our consideration.

Suppose we want to prove the equality given in [Wat1, p.281, l.16]. If we let s_{m}be the m^{th}partial sum of the hypergeometric series given in [Wat1, p. 281, l.5], then there are three quantities that can approach ¥: z, b and m. However, the fact that z®¥ is irreverent in this case. If we allow z®¥ to enter our consideration, it only confuses us. Consequently, we should fix z = z_{0}, consider b > |z_{0}| and then use [Ru1, p.135, Theorem 7.11].

- Avoiding complexity by ignoring insignificant contributions.

In order to find branch points of a hypergeometric function, Watson asks readers to read [Wat1, §14.53] (see [Wat1, p.281, l.8] and Guo asks readers to read [Guo, p.151, l.-2-p.152, l.-12]. Their approaches are too difficult to understand. Branch points are singularities. The only possible singularities of [Guo, p.135, (1)] are 1 and ¥. In order to prove that t = 1 is a branch point, we consider [Guo, p.151, (6)]. If we ignore insignificant contributions, the integral given in [Guo, p.151, (6)] can be considered the integral of (1-t)^{g - b - }^{1}, i.e., (1-t)^{g - b}. Then t =1 is a branch point as long as g - b is not an integer.

- When we try to solve a problem, we go straight toward our goal by ignoring
side results.

Saks puts [Sak1, p.396, Theorem 12.8] before [Sak1, p.399, Theorem 12.11; p.400, Theorem 12.12; Lemma 12.13], because we do not need to use the latter theorems in order to prove [Sak1, p.396, Theorem 12.8]. In contrast, González puts [Gon1, p.490, Theorem 5.64] before [Gon1, p.491, Theorem 5.65]. The reader may be under the false impression that [Gon1, p.490, Theorem 5.64] is an indispensable tool for proving [Gon1, p.491, Theorem 5.65].

In order to prove [Sak1, p.395, Theorem 12.7], we use [Sak1, p.357, l.-15-l.-5; p.394, Theorem 12.5]. In contrast, González makes a fuss about the the latter results and creates a complicated side theorem [Gon1, p.369, Theorem 5.6].

- (莫畫蛇添足;
莫削足適履)
Do not ruin the effect by adding something superfluous; do not cut one's toes to
fit shoes.

Once upon a time there was a painter who painted a snake. Then he added feet to it. Thus, he ruined the picture. In [Wat1, §9.62], Watson used [Wat1, §3.35, Example 2] to prove Riemann's first lemma. If he had used [Ru1, p.61, Theorem 3.42] instead, the complicated passage given in [Wat1, p.184, l.-7-p.185, l.11] could have been eliminated. In this case, Watson only asked for trouble when he used the generalized theorem [Wat1, §3.35, Example 2] instead of [Ru1, p.61, Theorem 3.42]. This is because a generalized theorem weakens the hypotheses and the weakened hypotheses are usually more involved and difficult to verify than the original hypotheses.

**When we solve a problem, we should focus on its essence and use only the tools that are indispensable to the solution.**In order to prove Lusin's Theorem [Ru2, p.56, Theorem 2.23], Rudin also uses the Riesz representation theorem [Ru2, p.56, l.7] and Urysohn's Lemma [Ru2, p.56, l.-13]. In contrast, [Roy, p.72, Problem 31] does not use these complications. Instead, it uses the following simple method to prove Lusin's theorem: Every measurable function is nearly a simple function [Roy, p.70, Problem 23 b]; Every simple function is nearly a step function [Roy, p.70, Problem 23 c]; Every step function is nearly a continuous function [Roy, p.70, Problem 23 d]. In order to prove the last statement, all we have to do is use a line segment to connect each gap of the graph of the step function.

In order to construct the Lebesgue measure, Rudin uses the Riesz representation theorem [Ru2, p.42, Theorem 2.14]. See [Ru2, p.53, l.-8]. The construction given in [Roy, chap.3, §3] shows that it is unnecessary to use the concept of integration [Ru2, p.42, Theorem 2.14 (a)] or Urysohn's Lemma [Ru2, p.43, l.5]. Note that the proof of Urysohn's Lemma [Ru2, p.40] uses the axiom of mathematical induction, which we should avoid using if possible. The essence of the Riesz representation theorem is [Roy, p.121, Theorem 8]. The key to proving the latter theorem is that an absolutely continuous function is an integral [Roy, p.106, Theorem 13]. The goal of mathematics is not to build a huge machine that may solve everything. Even if we could build such a machine, it would still be a failure. This is because nothing would be clear. For a college textbook, it is better to focus on the essence of a subject rather than prove complicated theorems in the beginning part of a book. [1] ¬

- Suppose the proofs of Theorem A and Theorem B are similar and the proof of
Theorem A is given. In order to prove Theorem B, we only need to modify the part
of the
proof of Theorem A required by the situation.

In [Zyg, vol.1, p.104, l.-11-l.-10], Zygmund says, "A minor modification in the proof of (7.15) shows that (7.16) tends to 0 if (7.17) is replaced by X(t) = o(t^{2})." He hints that we should change an integral whose integrand has the factor x to an integral whose integrand has the factor X using integration by parts.

[Zyg, vol.1, p.102, (7.16)] = ¶F^{ ~}(r,x)/¶x - B_{1}[Zyg, vol.1, p.103, l.13]

= A + B_{2}[Zyg, vol.1, p.103, l.4, l.9, and l.12].

Therefore, we only need to modify A and B_{2}, but not B_{1}. The observation that we need not modify B_{1}allows us to avoid unnecessary computations.

- The formulation and proof of a theorem should be centered around the essence of the theorem.

In a simply connected region, a harmonic function has a single-valued conjugate function which is determined up to an additive constant [Ahl, p.162, l.-7-l.-5]. The key to proving this statement is the concept of exact differential [Ahl, p.141, Theorem 15]. Ahlfors' approach centers around this key concept, so his formulation and proof are direct and concise. In contrast, the formulations and proofs of [Sak, p.444, Theorem 1.8; p.447, Theorem 1.11; p.447, l.-2-p.448, l.1] are indirect and complicated without giving more information because Saks fails to recognize the key concept.

Remark. The proof given in [Ahl, p.161, l.11-l.13] is simpler than the proof given in [Sak, p.445, l.5-l.17]. - We want to prove that t
_{1}maps the right half-plane Re(w)³0 onto the circular region given by the inequality [Wal, p.34, (7.3)].

Remark. There are many ways to prove the above statement; we try to find the simplest one.

Trick: Choose convenient points from the circle rather than from the imaginary axis.

Proof. Assume we know the fact given by [Ru2, p.298, l.18-l.19].

I. We want to find three points on the circle whose images are on the imaginary axis.

Solution: Take t = 0, (2 Re b_{1})^{-1}(1±i).

II. We want to find a point inside the circle whose image is on the right half-plane.

Solution: Take t = (2 Re b_{1})^{-1}. - Links {1, 2}.