� EXTRACTA MATHEMATICAE Volumen 33, Número 2, 2018 instituto de investigación de matemáticas de la universidad de extremadura EXTRACTA MATHEMATICAE Vol. 36, Num. 1 (2021), 25 – 50 doi:10.17398/2605-5686.36.1.25 Available online June 11, 2021 Free (rational) derivation K. Schrempf Austrian Academy of Sciences, Acoustics Research Institute Wohllebengasse 12-14, 1040 Vienna, Austria math@versibilitas.at , https://orcid.org/0000-0001-8509-009X Received April 15, 2021 Presented by Manuel Saoŕın Accepted May 21, 2021 Abstract: By representing elements in free fields (over a commutative field and a finite alphabet) using Cohn and Reutenauer’s linear representations, we provide an algorithmic construction for the (partial) non-commutative (or Hausdorff-) derivative and show how it can be applied to the non- commutative version of the Newton iteration to find roots of matrix-valued rational equations. Key words: Hausdorff derivative, free associative algebra, free field, minimal linear representation, admissible linear system, free fractions, chain rule, Newton iteration. MSC (2020): primary 16K40, 16S85; secondary 68W30, 46G05. Introduction Working symbolically with matrices requires non-commuting variables and thus non-commutative (nc) rational expressions. Although the (algebraic) construction of free fields, that is, universal fields of fractions of free associative algebras, is available due to Paul M. Cohn since 1970 [8, Chapter 7], its practical application in terms of free fractions [29] —building directly on Cohn and Reutenauer’s linear representations [10]— in computer algebra systems is only at the very beginning. The main difficulty for arithmetic —or rather lexetic from the non-existing Greek word λεξητικος (from λεξις for word) as analogon to αριθμητικος (from αριθμος for number)— was the construction of minimal linear representations [32], that is, the normal form of Cohn and Reutenauer [9]. Here we will show that free fractions also provide a framework for “free” derivation, in particular of nc polynomials. The construction we provide generalizes the univariate (commutative) case we are so much used to, for example f = f(x) = x3 + 4x2 + 3x + 5 with f ′ = d dx f(x) = 3x2 + 8x + 3. ISSN: 0213-8743 (print), 2605-5686 (online) c©The author(s) - Released under a Creative Commons Attribution License (CC BY-NC 3.0) https://doi.org/10.17398/2605-5686.36.1.25 mailto:math@versibilitas.at https://orcid.org/0000-0001-8509-009X https://publicaciones.unex.es/index.php/EM https://creativecommons.org/licenses/by-nc/3.0/ 26 k. schrempf The coefficients are from a commutative field K (for example the rational Q, the real R, or the complex number field C), the (non-commuting) variables from a (finite) alphabet X , for example X = {x,y,z}. For simplicity we focus here (in this motivation) on the free associative algebra R := K〈X〉, aka “algebra of nc polynomials”, and recall the properties of a (partial) derivation ∂x : R → R (for a fixed x ∈X), namely • ∂x(α) = 0 for α ∈ K, and more general, ∂x(g) = 0 for g ∈ K〈X\{x}〉, • ∂x(x) = 1, and • ∂x(fg) = ∂x(f) g + f ∂x(g). In [27] this is called Hausdorff derivative. For a more general (module theo- retic) context we refer to [7, Section 2.7] or [1]. Before we continue we should clarify the wording: To avoid confusion we refer to the linear operator ∂x (for some x ∈ X) as free partial derivation (or just derivation) and call ∂xf ∈ R the (free partial) derivative of f ∈ R, sometimes denoted also as fx or f ′ depending on the context. In the commutative, we usually do not distinguish too much between al- gebraic and analytic concepts. But in the (free) non-commutative setting, analysis is quite subtle [21]. There are even concepts like “matrix convexity” and symbolic procedures to determine (nc) convexity [4]. However, a system- atic treatment of the underlying algebraic tools was not available so far. We are going to close this gap in the following. After a brief description of the setup (in particular that of linear represen- tations of elements in free fields) in Section 1, we develop the formalism for free derivations in Section 2 with the main result, Theorem 2.8. To be able to state a (partial) “free” chain rule (in Proposition 3.2) we derive a language for the free composition (and illustrate how to “reverse” it) in Section 3. And finally, in Section 4, we show how to develop a meta algorithm “nc Newton” to find matrix-valued roots of a non-commutative rational equation. Notation. The set of the natural numbers is denoted by N = {1, 2, . . .}, that including zero by N0. Zero entries in matrices are usually replaced by (lower) dots to emphasize the structure of the non-zero entries unless they result from transformations where there were possibly non-zero entries before. We denote by In the identity matrix (of size n) respectively I if the size is clear from the context. By v> we denote the transpose of a vector v. free (rational) derivation 27 1. Getting started We represent elements (in free fields) by admissible linear systems (Definition 1.6), which are just a special form of linear representations (Definition 1.3) and “general” admissible systems [8, Section 7.1]. Ratio- nal operations (scalar multiplication, addition, multiplication, inverse) can be easily formulated in terms of linear representations [10, Section 1]. For the for- mulation on the level of admissible linear systems and the “minimal” inverse we refer to [30, Proposition 1.13] resp. [30, Theorem 4.13]. Let K be a commutative field, K its algebraic closure and X = {x1,x2, . . . , xd} be a finite (non-empty) alphabet. K〈X〉 denotes the free associative algebra (or free K-algebra) and F = K(〈X〉) its universal field of fractions (or “free field”) [6, 10]. An element in K〈X〉 is called (non-commutative or nc) polynomial. In our examples the alphabet is usually X = {x,y,z}. Including the algebra of nc rational series [2] we have the following chain of inclusions: K ( K〈X〉 ( Krat〈〈X〉〉 ( K(〈X〉) =: F. Definition 1.1. ([8, Section 0.1], [10]) Given a matrix A ∈ K〈X〉n×n, the inner rank of A is the smallest number k ∈ N such that there exists a factorization A = CD with C ∈ K〈X〉n×k and D ∈ K〈X〉k×n. The matrix A is called full if k = n, non-full otherwise. Theorem 1.2. ([8, Special case of Corollary 7.5.14]) Let X be an alphabet and K a commutative field. The free associative algebra R = K〈X〉 has a universal field of fractions F = K(〈X〉) such that every full matrix over R can be inverted over F. Remark. Non-full matrices become singular under a homomorphism into some field [8, Chapter 7]. In general (rings), neither do full matrices need to be invertible, nor do invertible matrices need to be full. An example for the former is the matrix B =   · z −y−z · x y −x ·   over the commutative polynomial ring K[x,y,z] which is not a Sylvester do- main [5, Section 4]. An example for the latter are rings without unbounded generating number (UGN) [8, Section 7.3]. 28 k. schrempf Definition 1.3. ([9, 10]) Let f ∈ F. A linear representation of f is a triple πf = (u,A,v) with u >,v ∈ Kn×1, full A = A0⊗1+A1⊗x1 +. . .+Ad⊗xd with A` ∈ Kn×n for all ` ∈ {0, 1, . . . ,d} and f = uA−1v. The dimension of πf is dim (u,A,v) = n. It is called minimal if A has the smallest possible dimension among all linear representations of f. The “empty” representation π = (, , ) is the minimal one of 0 ∈ F with dim π = 0. Let f ∈ F and π be a minimal linear representation of f. Then the rank of f is defined as rank f = dim π. Definition 1.4. ([9]) Let π = (u,A,v) be a linear representation of f ∈ F of dimension n. The families (s1,s2, . . . ,sn) ⊆ F with si = (A−1v)i and (t1, t2, . . . , tn) ⊆ F with tj = (uA−1)j are called left family and right fam- ily respectively. L(π) = span{s1,s2, . . . ,sn} and R(π) = span{t1, t2, . . . , tn} denote their linear spans (over K). Proposition 1.5. ([9, Proposition 4.7]) A representation π = (u,A, v) of an element f ∈ F is minimal if and only if both, the left family and the right family are K-linearly independent. In this case, L(π) and R(π) depend only on f. Remark. The left family (A−1v)i (respectively the right family (uA −1)j) and the solution vector s of As = v (respectively t of u = tA) are used synonymously. Definition 1.6. ([30]) A linear representation A = (u,A,v) of f ∈ F is called admissible linear system (ALS) for f, written also as As = v, if u = e1 = [1, 0, . . . , 0]. The element f is then the first component of the (unique) solution vector s. Given a linear representation A = (u,A,v) of dimension n of f ∈ F and invertible matrices P,Q ∈ Kn×n, the transformed PAQ = (uQ,PAQ,Pv) is again a linear representation (of f). If A is an ALS, the transformation (P,Q) is called admissible if the first row of Q is e1 = [1, 0, . . . , 0]. 2. Free derivation Before we define the concrete (partial) derivation and (partial) directional derivation, we start with a (partial) formal derivation on the level of admis- sible linear systems and show the basic properties with respect to the repre- sented elements, in particular that the (formal) derivation does not depend on the ALS in Corollary 2.3. free (rational) derivation 29 In other words: Given some letter x ∈X and an admissible linear system A = (u,A,v), there is an algorithmic point of view in which the (free) deriva- tion ∂x defines the ALS A′ = ∂xA. (Alternatively one can identify x by its index ` ∈{1, 2, . . . ,d} and write A′ = ∂`A.) Written in a sloppy way, we show that ∂x(A + B) = ∂xA + ∂xB and ∂x(A·B) = ∂xA·B + A·∂xB, yielding im- mediately the algebraic point of view (summarized in Theorem 2.8) by taking the respective first component of the unique solution vectors f = (sA)1 and g = (sB)1. Remark. The following definition is much more general then usually needed. One gets the “classical” (partial) derivation with respect to some letter x = x` ∈X (with ` ∈{1, 2, . . . ,d}) for k = 0 resp. (the empty word) a = 1 ∈X∗. Definition 2.1. Let A = (u,A,v) be an admissible linear system of di- mension n ≥ 1 for some element in the free field F = K(〈X〉) and ` 6= k ∈ {0, 1, . . . ,d}. The ALS ∂`|kA = ∂`|k(u,A,v) = ([ u · ] , [ A A` ⊗xk · A ] , [ · v ]) (of dimension 2n) is called (partial) formal derivative of A, (with respect to x`,xk ∈{1}∪X). For x,a ∈{1}∪X = {1,x1,x2, . . . ,xd} with x 6= a we write also ∂x|aA, having the indices ` 6= k ∈{0, 1, . . . ,d} of x resp. a in mind. Lemma 2.2. Let Af = (uf,Af,vf ) and Ag = (ug,Ag,vg) be admissible linear systems of dimension dimAf ≥ 1 resp. dimAg ≥ 1. Fix x,a ∈{1}∪X such that x 6= a. Then ∂x|a(Af + Ag) = ∂x|aAf + ∂x|aAg. Proof. Let ` 6= k ∈ {0, 1, . . . ,d} be the indices of x resp. a. We write A`f for the coefficient matrix A` of Af resp. A ` g for A` of Ag. Taking the sum from [30, Proposition 1.13] we have ∂x|a(Af + Ag) = ([ uf · ] , [ Af −Afu>fug · Ag ] , [ vf vg ]) =  [uf · · ·] ,   Af −Afu>fug A ` f ⊗a −A ` fu > fug ⊗a · Ag . A`g ⊗a · · Af −Afu>fug · · · Ag   ,   · · vf vg     30 k. schrempf =  [uf · · ·] ,   Af A ` f ⊗a −Afu > fug −A ` fu > fug ⊗a · Af · −Afu>fug · · Ag A`g ⊗a · · · Ag   ,   · vf · vg     =  [uf · · ·] ,   Af A ` f ⊗a −Afu > fug 0 · Af · 0 · · Ag A`g ⊗a · · · Ag   ,   · vf · vg     = ([ uf · ] , [ Af A ` f ⊗a · Af ] , [ · vf ]) + ([ ug · ] , [ Ag A ` g ⊗a · Ag ] , [ · vg ]) = ∂x|aAf + ∂x|aAg. The two main steps are swapping block rows 2 and 3 and block columns 2 and 3, and eliminating the single non-zero (first) column in −A`fu > fug⊗a and −Afu>fug (in block column 4) using the first column in block column 2. Corollary 2.3. Let f,g ∈ F be given by the admissible linear systems Af = (uf,Af,vf ) and Ag = (ug,Ag,vg) of dimensions nf,ng ≥ 1 respectively. Fix x,a ∈{1}∪X such that x 6= a. Then f = g implies that ∂x|aAf −∂x|aAg is an ALS for 0 ∈ F. Definition 2.4. Let f ∈ F be given by the ALS A = (u,A,v) and fix x,a ∈ {1}∪X = {1,x1,x2, . . . ,xd} such that x 6= a. Denote by ∂x|af the element defined by the ALS ∂x|aA. The map ∂x|a : F → F, f 7→ ∂x|af is called (partial) formal derivation, the element ∂x|af (partial) formal derivative of f. Corollary 2.5. For each x,a ∈{1}∪X with x 6= a, the formal derivation ∂x|a : F → F is a linear map. Now we are almost done. Before we show the product rule in the following Lemma 2.6, we have a look into the left family of the ALS of the (formal) derivative of a polynomial. Let p = x3 ∈ F (and a = 1). A (minimal) ALS for ∂x|1p is given by free (rational) derivation 31   1 −x · · 0 −1 · · · 1 −x · · 0 −1 · · · 1 −x · · 0 −1 · · · 1 · · · 0 · · · · 1 −x · · · · · · · 1 −x · · · · · · · 1 −x · · · · · · · 1   s =   0 0 0 0 · · · 1   , s =   3x2 2x 1 0 x3 x2 x 1   . Notice that the first four entries in the left family of ∂x|1A are si = ∂x|1si+4. Lemma 2.6. (Product Rule) Let f,g ∈ F be given by the admissible linear systems Af = (uf,Af,vf ) and Ag = (ug,Ag,vg) of dimension nf,ng ≥ 1 respectively. Fix x ∈X and a ∈{1}∪X \{x}. Then ∂x|a(fg) = ∂x|af g + f ∂x|ag. Proof. Let ` 6= k ∈ {0, 1, . . . ,d} be the indices of x resp. a. We write A`f for the coefficient matrix A` of Af resp. A ` g for A` of Ag. We take the sum and the product from [30, Proposition 1.13] and start with the ALS from the right hand side,  Af A ` f ⊗a 0 −Afu > fug · · · Af −vfug · · · · · Ag · · · · · · Af −vfug · · · · · Ag A`g ⊗a · · · · · Ag   s =   · · vg · · vg   , subtract block row 6 from block row 3, add block column 3 to block column 6 and remove block row/column 3 to get the ALS  Af A ` f ⊗a −Afu > fug · · · Af · · −vfug · · Af −vfug · · · · Ag A`g ⊗a · · · · Ag  s =   · · · · vg   . 32 k. schrempf Now we can add block row 3 to block row 1 and eliminate the remaining columns in block (1, 3) by the columns {2, 3, . . . ,nf} from block (1, 1), remove block row/column 3 to get the ALS  Af A ` f ⊗a −vfug · · Af · −vfug · · Ag A`g ⊗a · · · Ag  s =   · · · vg   . Swapping block rows 2 and 3 and block columns 2 and 3 yields the ALS  Af −vfug A`f ⊗a 0 · Ag · A`g ⊗a · · Af −vfug · · · Ag  s =   · · · vg   of the left hand side ∂x|0(fg). Notice the upper right zero in the system matrix which is because of x 6= 1. Definition 2.7. Let f ∈ F, x ∈ X and a ∈ X \ {x}. The element ∂xf := ∂x|1f is called partial derivative of f. The element ∂x|af is called (partial) directional derivative of f (with respect to a). Theorem 2.8. (Free derivation) Let x ∈ X . Then the (partial) free derivation ∂x : F → F = K(〈X〉) is the unique map with the properties • ∂xh = 0 for all h ∈ K(〈X\{x}〉), • ∂xx = 1, and • ∂x(fg) = ∂xf g + f ∂xg for all f,g ∈ F = K(〈X〉). Proof. Let h be given by the ALS A = (u,A,v) and let ` ∈ {1, 2, . . . ,d} such that x = x`. We just need to recall the ALS for ∂xh,[ A A` ⊗ 1 · A ][ s′ s′′ ] = [ · v ] and observe that A` = 0 and thus As ′ = 0, in particular the first component of s′. Therefore ∂xh = 0. For ∂xx = 1 we need to minimize  1 −x · −1 · 1 · · · · 1 −x · · · 1  s =   · · · 1   . free (rational) derivation 33 And the product rule ∂x(fg) = ∂xf g + f ∂xg is due to Lemma 2.6. For the uniqueness we assume that there exists another ∂′x : F → F with the same properties. From the product rule we obtain ∂x(xf) = f + x∂xf = f + x∂ ′ xf = ∂ ′ x(xf), that is, x(∂xf −∂′xf) = 0 for all f ∈ F, thus ∂x = ∂′x. Corollary 2.9. (Hausdorff derivation [27]) Let x ∈ X . Then ∂xκ = 0 for all κ ∈ K, ∂xy = 0 for all y ∈ X \ {x}, ∂xx = 1, and ∂x(fg) = ∂xf g + f ∂xg for all f,g ∈ K〈X〉. Remark. More general [8, Theorem 7.5.17]: “Any derivation of a Sylvester domain extends to a derivation of its universal field of fractions.” Recall however that the cyclic derivative is not (from) a derivation [27, Section 1]. For a discussion of cyclic derivatives of nc algebraic power series we refer to [24]. Proposition 2.10. Let f ∈ F, x,y ∈ X and a,b ∈ X \{x,y} with a = b if and only if x = y. Then ∂x|a(∂y|bf) = ∂y|b(∂x|af), that is, the (partial) derivations ∂x|a and ∂y|b commute. Proof. Let l,k ∈{1, 2, . . . ,d} the indices of x resp. y. There is nothing to show for the trivial case x = y, thus we can assume l 6= k. Let f be given by the admissible linear system A = (u,A,v). Then the “left” ALS ∂x|a(∂y|bA) is   A Ak ⊗ b Al ⊗a 0 · A 0 Al ⊗a · · A Ak ⊗ b · · · A  s =   · · · v   . Swapping block rows/columns 2 and 3 yields the “right” ALS ∂y|b(∂x|aA):  A Al ⊗a Ak ⊗ b 0 · A 0 Ak ⊗ b · · A Al ⊗a · · · A  s =   · · · v   . 34 k. schrempf Since there is no danger of ambiguity, we can define the (free) “higher” derivative of f ∈ F as ∂wf for each word w in the free monoid X∗ with the “trivial” derivative ∂(1)f = f. Let w ∈X∗ and σ(w) denote any permutation of the letters of w. Then ∂wf = ∂σ(w)f. The proof for nc formal power series in [23, Proposition 1.8] is based on words (monomials), that is, ∂x(∂yw) = ∂y(∂xw) ∈ K〈X〉. Recall that one gets the (nc) rational series by intersecting the (nc) series and the free field [26, Section 9]: K〈〈X〉〉∩K(〈X〉) = Krat〈〈X〉〉. Overall, (free) nc derivation does not appear that often in the literature. And when there is some discussion it is (almost) always connected with “not simple” [27, Section 1], “complicated” [16, Section 14.3], etc. This is however not due to the Hausdorff derivation but to the use of (finite) formal series as representation (for nc polynomials). Using linear representations in the sense of Cohn and Reutenauer [9] for elements in the free field F can even reveal additional structure, as indicated in Example 2.11 (below). For the somewhat more “complicated” example p̃ = 3cyxb + 3xbyxb + 2cyxax + cybxb− cyaxb− 2xbyxax + 4xbybxb − 3xbyaxb + 3xaxyxb− 3bxbyxb + 6axbyxb + 2xaxyxax + xaxybxb −xaxyaxb− 2bxbyxax− bxbybxb + bxbyaxb + 5axbybxb− 4axbyaxb from [3, Section 8.2] we refer to [31, Example 3.7]. Example 2.11. Let p = xyzx. A (minimal) polynomial ALS for p is   1 −x · · · · 1 −y · · · · 1 −z · · · · 1 −x · · · · 1  s =   · · · · 1   . Then ∂xp = xyz + yzx admits a factorization into matrices [31, Section 3]: ∂xp = [ x y ][y · · z ][ z x ] . free (rational) derivation 35 A minimal ALS for ∂xp is  1 −x −y · · · · 1 · −y 0 · · · 1 0 −z · · · · 1 · −z · · · · 1 −x · · · · · 1   s =   · · · · · 1   . Example 2.12. We will use the directional derivative later in Section 4 for the (nc) Newton iteration. For q = x2 we get the “Sylvester equation” ∂x|aq = xa + ax which is linear in a. In (the next) Section 3 we have a look on the “free” chain rule which will turn out to be very elegant. We avoid the term “function” here since one needs to be careful with respect to evaluation (domain of definition), e.g. f = (xy − yx)−1 is not defined for diagonal matrices. For the efficient evaluation of polynomials (by matrices) one can use Horner Systems [31]. A generalization to elements in free fields is considered in future work. If there is a “compositional structure” available (in admissible linear systems) it could be used to further optimize evaluation. Remark. For details on minimization (of linear representations) we refer to [32]. Notice in particular that the construction (of the formal derivative) in Definition 2.1 preserves refined pivot blocks. Therefore, if A is refined, linear (algebraic) techniques suffice for minimization of ∂xA. Last but not least, given the alphabet X = {x1,x2, . . . ,xd}, we can define the “free” (canonical) gradient ∇f = [ ∂1f,∂2f, . . . ,∂df ]> = [ ∂x1f,∂x2f, . . . ,∂xdf ]> ∈ Fd for some f ∈ K(〈X〉). Cyclic gradients are discussed in [34]. And for a “vector valued” element f = (f1,f2, . . . ,fd) we can define the Jacobian matrix J(f) = (∂jfi) d i,j=1 = (∂xjfi) d i,j=1. For a discussion of non-commutative Jacobian matrices on the level of nc formal power series we refer to [25]. 36 k. schrempf 3. Free composition To be able to formulate a (partial) “free” chain rule in an elegant way, we need a suitable notation. It will turn out that Cohn’s admissible systems [8, Section 7.1] provide the perfect framework for the “expansion” of letters by elements from another free field. First we recall the “classical” (analytical) chain rule: Let X, Y and Z be (open) sets, f = f(x) differentiable on X, g = g(y) differentiable on Y and h = h(x) = g ( f(x) ) . Then d dx h = d df g d dx f resp. h′(x) = g′ ( f(x) ) f ′(x). X f // h 88Y g // Z Notation. For a fixed d ∈ N let X = {x1,x2, . . . ,xd}, Y = {y1,y2, . . . ,yd} and Y′ = {y′1,y ′ 2, . . . ,y ′ d} be pairwise disjoint alphabets, that is, X ∩Y = X ∩Y′ = Y ∩Y′ = ∅. By FZ we denote the free field K(〈Z〉). Let A ∈ K〈Y〉n×n be a linear full matrix and f = (f1,f2, . . . ,fd) a d-tuple of elements fi ∈ FX . By A ◦ f we denote the (not necessarily full) n×n matrix over FX where each letter yi ∈Y is replaced by the corresponding fi ∈ FX , that is, A◦ f = A0 ⊗ 1 + A1 ⊗f1 + . . . + Ad ⊗fd ∈ Fn×nX . We write A◦Af = A◦(Af1,Af2, . . . ,Afd ) for a linearized version induced by the d-tuple of admissible linear systems Afi = (ufi,Afi,vfi ). Now let g ∈ FY be given by the ALS Ag = (ug,Ag,vg) and f = (f1,f2, . . . , fd) ∈ FdX such that Ag ◦ f is full. Then (the unique element) h = g ◦ f ∈ FX is defined by the admissible system Ãh = (ug,Ag ◦ f,vg) and we write Ah = (uh,Ah,vh) = ( ug ◦Af,Ag ◦Af,vg ◦Af ) =: Ag ◦Af for a linearized version using “linearization by enlargement” [8, Section 5.8]. (For details we refer to the proof of Proposition 3.2 below.) Fixing some x ∈X , we get the (partial) derivative ∂xh of h via the derivative ∂xAh[ Ah A x h ⊗ 1 · Ah ] s = [ · vh ] . free (rational) derivation 37 Ag ∼ g ∈ FY ◦f vvmmm mmm mmm mmm ∂y|y′ ((RR RRR RRR RRR RR Ãh ∼ h ∈ FX lin. �� A′g ∼ g′ ∈ FY∪Y′ ◦(f,f′) �� Ah ∼ h ∈ FX ∂x ((PP PPP PPP PPP PP Ã′h ∼ h ′ ∈ FX lin. vvmmm mmm mmm mmm m ∂xAh ∼ ∂xh = h′ ∼A′h Figure 1: Let f = (f1, . . . ,fd) with fi ∈ FX = K(〈X〉) given by the d-tuple of admissible linear systems Af = (A1, . . . ,Ad) and g ∈ FY given by Ag = (ug,Ag,vg) such that the system matrix Ag remains full when we replace each letter yi ∈ Y by the respective element fi, written as Ag ◦ f. Then h = g ◦ f ∈ FX is defined by the admissible system Ãh = (ug,Ag ◦ f,vg) which we linearize to obtain an ALS Ah for h before we apply the (partial) derivation ∂x (left path). On the other hand, we can apply the (directional) derivation ∂y|y′ by going over to the free field FY∪Y′ = K(〈Y∪Y′〉) with an extended alphabet (with “placeholders” y′i), yielding g ′ = ∂y1|y′1g+. . .+∂yd|y ′ d g given by some ALS A′g = (u′g,A′g,v′g). Then h′ = g′◦(f,f′) ∈ FX is defined by the admissible system Ã′h = ( u′g,A ′ g ◦ (f,f′),v′g ) , where also each letter y′i ∈Y ′ is replaced by the respective element f′i = ∂xfi ∈ FX . After linearization we get an ALS A′h such that ∂xAh −A ′ h = 0, that is, ∂xh = h ′ (right path). To give a meaning to the right hand side of ∂xh = ∂x(g◦ f), we introduce the “total” (directional) derivative g′ := ∂y|y′g = ∂y1|y′1g + ∂y2|y ′ 2 g + . . . + ∂yd|y′d g ∈ FY∪Y′ given by the ALS A′g := ∂y|y′Ag = ([ ug · ] , [ Ag ∑d i=1 A (i) g ⊗y′i · Ag ] , [ · vg ]) with letters y′i ∈Y ′. Using a similar notation for the derivatives f ′ = (f ′1,f ′ 2, . . . ,f ′ d) := (∂xf1,∂xf2, . . . ,∂xfd) = ∂xf, we can write h′ := ∂x(g ◦ f) = g′ ◦ (f,f ′) = ∂y|y′g ◦ (f,f ′) (3.1) 38 k. schrempf given by the admissible system A′g ◦ (f,f ′) = ∂y|y′Ag ◦ (f,f ′). After an illustration in the following example, we show in Proposition 3.2 that indeed ∂xh = h ′ = ∂x(g ◦ f) = ∂y|∂xg ◦ f using an abbreviation for the right hand side of (3.1). For an overview see Figure 1. Remark. Cohn writes an admissible system A = (u,A,v) as “block” [A,v] with not necessarily scalar column v [8, Section 7.1]. A ring homomorphism which preserves fullness of matrices is called honest [8, Section 5.4]. Remark. It is crucial that Ag ◦ f is full to be able to define the composi- tion. On a purely algebraic level this is sufficient to define the (partial) chain rule. For a d-tuple g = (g1,g2, . . . ,gd) given by admissible linear systems Ag = (Ag1,Ag2, . . . ,Agd ) with Agi = (ugi,Agi,vgi ) we need Agi ◦ f full for all i ∈{1, 2, . . . ,d}. Example 3.1. Here we take X = {x,y}, Y = {f,p} and abuse notation. Let f = (x−1 + y)−1 ∈ FX , p = xy ∈ FX and g = pfp ∈ FY. Then h = g ◦ (f,p) ∈ FX is given by the (minimal) ALS   1 −x · · · · · 1 −y · · · · · 1 −x · · · · y 1 −x · · · · · 1 −y · · · · · 1   s =   · · · · · 1   , and ∂xh = ∂x(pfp) = ∂xpfp + p∂xf p + pf ∂xp by free (rational) derivation 39   1 −x · · · · · −1 · · · · · 1 −y · · · · · · · · · · · 1 −x · · · · · −1 · · · · y 1 −x · · · · · −1 · · · · · 1 −y · · · · · · · · · · · 1 · · · · · · 1 −x · · · · · 1 −y · · · · · 1 −x · · · · y 1 −x · · · · · 1 −y · · · · · 1   s =   · · · · · · · · · · · 1   . (3.2) On the other hand, ∂x(∂fg + ∂pg) = ∂x(∂fg) + ∂x(∂pg) is given by (the admissible system)  1 −p · · · −∂xp · · · 1 −f · · · −∂xf · · · 1 −p · · · −∂xp · · · 1 · · · · · · · · 1 −p · · · · · · · 1 −f · · · · · · · 1 −p · · · · · · · 1   s =   · · · · · · · 1   . The summands ∂xpfp and pf ∂xp are easy to read off in the ALS (3.2). (Alternatively one could add row 8 to row 1 resp. row 11 to row 4.) To read off p∂xf p = −p (x−1 + y)−2 p, we just need to recall the (minimal) ALS  1 −x · −1 y 1 · · 1 −x y 1  s =   · · · 1   . In other words and with g ∈ FY given by the ALS Ag = (ug,Ag,vg) of dimension n: ∂xh is given by the admissible system ∂y|∂xAg,[ Ag ∂xAg · Ag ] s = [ · v ] 40 k. schrempf of dimension 2n. Notice that ∂xAg is understood here in a purely symbolic way, that is, over FY∪Y′ with additional letters “∂xy” in Y′ for each y ∈Y. In this sense ∂y|∂xAg is actually linear. Proposition 3.2. (Free Chain Rule) Let X = {x1,x2, . . . ,xd}, Y = {y1,y2, . . . ,yd} and Y′ = {y′1,y ′ 2, . . . ,y ′ d} be pairwise disjoint alphabets and fix x ∈X . For g ∈ K(〈Y〉) given by the ALS Ag = (ug,Ag,vg) and the d-tuple f = (f1,f2, . . . ,fd) ∈ K(〈X〉)d such that Ag◦f is full. Denote by f ′ the d-duple (∂xf1,∂xf2, . . . ,∂xfd). Then ∂x(g ◦ f) = ∂y|y′g ◦ (f,f ′) =: ∂y|∂xg ◦ f. Proof. For i ∈ {1, 2, . . . ,d} let fi ∈ FX be given by the admissible lin- ear systems Afi = (ufi,Afi,vfi ) respectively. In the following we assume —without loss of generality— d = 3, and decompose Ag ◦ f into [ A11 A12 A12 a ] of size n with a = α0 + α1f1 + α2f2 + α3f3. A generalization of the “lineariza- tion by enlargement” [8, Section 5.8] is then to start with the full matrix Ag ◦ f ⊕ Af1 ⊕ . . . ⊕ Afd , add ufiA −1 fi from the corresponding block row to row n, and −αiA−1fi vfi from the corresponding block column to column n in the upper left block (these transformations preserve fullness):  A11 A12 · · · A21 α0 uf1 uf2 uf3 · −α1vf1 Af1 · · · −α2vf2 · Af2 · · −α3vf3 · · Af3   . A partially linearized system matrix for ∂x(Ag ◦ f) is  A11 A12 · · · ∗ ∗ · · · A21 α0 uf1 uf2 uf3 ∗ 0 0 0 0 · −α1vf1 Af1 · · · 0 A x f1 ⊗ 1 · · · −α2vf2 · Af2 · · 0 · A x f2 ⊗ 1 · · −α3vf3 · · Af3 · 0 · · A x f3 ⊗ 1 A11 A12 · · · A21 α0 uf1 uf2 uf3 · −α1vf1 Af1 · · · −α2vf2 · Af2 · · −α3vf3 · · Af3   . (3.3) free (rational) derivation 41 On the other hand, the system matrix of ∂yi|y′i Ag is  A11 A12 A yi 11 ⊗y ′ i A yi 12 ⊗y ′ i A21 a A yi 21 ⊗y ′ i αiy ′ i A11 A12 A21 a   , the corresponding partially linearized system matrix of (∂yi|y′i Ag) ◦ (f,f ′i) is  A11 A12 · · · A yi 11 ⊗f ′ i A yi 12 ⊗f ′ i · · · A21 α0 uf1 uf2 uf3 A yi 21 ⊗f ′ i αif ′ i · · · · −α1vf1 Af1 · · · · · · · · −α2vf2 · Af2 · · · · · · · −α3vf3 · · Af3 · · · · · A11 A12 · · · A21 α0 uf1 uf2 uf3 · −α1vf1 Af1 · · · −α2vf2 · Af2 · · −α3vf3 · · Af3   . To eliminate the boxed entry αif ′ i we just need to recall the derivative ∂xAfi of Afi = (ufi,Afi,vfi ), the invertible matrix Q swaps block columns 1 and 2:  0 ufi ·· Afi Axfi ⊗ 1 −αivfi · Afi  Q =  ufi 0 ·Afi · Axfi ⊗ 1 · −αivfi Afi   . Thus, after summing up (over i ∈ {1, 2, . . . ,d}), we get the system matrix (3.3) and hence ∂xAh = ∂x(Ag ◦Af ) = ∂y|y′Ag ◦Af,f′, that is, ∂xh = ∂x(g ◦ f) = ∂y|∂xfg ◦ f. Free (non-commutative) decomposition is important in control theory [12, Section 6.2.2]: “The authors do not know how to fully implement the decom- pose operation. Finding decompositions by hand can be facilitated with the use of certain type of collect command”. 42 k. schrempf Let g = fabf +cf +de and f = xy +z, that is, f solves a Riccati equation. Given h = g ◦f = (xy + z)ab(xy + z) + c(xy + z) + de by the (minimal) admissible linear system Ah = (uh,Ah,vh),  1 −x −z · −d −c · · · 1 −y · · · · · · · 1 −a · · · · · · · 1 · −b · · · · · · 1 · · −e · · · · · 1 −x −z · · · · · · 1 −y · · · · · · · 1   s =   · · · · · · · 1   , one can read off f = xy + z directly since Ah has the form  1 −f · −d −c · · 1 −a · · · · · 1 · −b · · · · 1 · −e · · · · 1 −f · · · · · 1   s =   · · · · · 1   . So, starting with a minimal ALS Ãh for h one “just” needs to find appro- priate (invertible) transformation matrices P,Q such that Ah = PÃhQ using (commutative) Gröbner bases techniques similar to [10, Theorem 4.1] or the refinement of pivot blocks [32, Section 3]. Although this is quite challeng- ing using brute force methods, in practical examples it is rather easy using free fractions, that is, minimal and refined admissible linear systems, as a “work bench” where one can perform the necessary row and column opera- tions manually. “The challenge to computer algebra is to start with an expanded version of h = g ◦f, which is a mess that you would likely see in a computer algebra session, and to automatically motivate the selection of f.” [12, Section 6.2.1] Working with free fractions is simple, in particular with nc polynomials. The main difficulty in working with (finite) formal power series is that the number of words can grow exponentially with respect to the rank, the minimal free (rational) derivation 43 dimension of a linear representation [31, Table 1]. In this case, for h = g ◦f = xyabxy + xyabz + zabxy + zabz + cxy + cz + de, the main “structure” becomes (almost) visible already after minimization of the corresponding “polynomial” admissible linear system (in upper triangular form with ones in the diagonal). 4. Application: Newton iteration To compute the third root 3 √ z of a (positive) real number z, say z = 2, we just have to find the roots of the polynomial p = x3 − z, for example by using the Newton iteration xk+1 := xk −p(xk)/p′(xk), that —given a “good” starting value x0— yields a (quadratically) convergent sequence x0,x1,x2, . . . such that limk→∞x 3 k = z. Using FriCAS [14], the first 6 iterations for x0 = 1 are k |xk −xk−1| xk 1 3.333·10−1 1.3333333333333333 2 6.944·10−2 1.2638888888888888 3 3.955·10−3 1.259933493449977 4 1.244·10−5 1.2599210500177698 5 1.229·10−10 1.2599210498948732 6 0 1.2599210498948732 Detailed discussions are available in every book on numerical analysis, for example [17]. As a starting point towards current research, one could take [33]. Now we would like to compute the third root 3 √ Z of a real (square) matrix Z (with positive eigenvalues). We (still) can use Xk+1 := 2 3 Xk + 1 3 X−2k Z if we choose a starting matrix X0 such that X0Z = ZX0 because then all Xk commute with Z [19, Section 7.3]. For X0 = I and Z =  47 84 5442 116 99 9 33 32   respectively 3√Z =  3 2 01 4 3 0 1 2   , some iterations are (with ‖.‖F denoting the Frobenius norm) 44 k. schrempf k ‖Xk −Xk−1‖F ‖Xk − 3 √ Z‖F 4 9.874 23.679 5 6.511 13.818 6 4.109 7.309 7 2.254 3.200 8 8.295·10−1 9.455·10−1 9 1.140·10−1 1.160·10−1 10 2.044·10−3 2.044·10−3 11 1.115·10−6 6.489·10−7 Does this sequence of matrices X0,X1, . . . converge? Some more iterations reveal immediately that something goes wrong, visible in Table 1. k ‖Xk −Xk−1‖F ‖Xk − 3 √ Z‖F ‖XkZ −ZXk‖F 1 65.836 5.385 2.274·10−13 2 22.279 60.756 7.541·10−13 3 14.845 38.500 1.110·10−12 4 9.874 23.679 2.031·10−12 5 6.511 13.818 7.725·10−12 6 4.109 7.309 7.010·10−11 7 2.254 3.200 9.451·10−10 8 8.295·10−1 9.455·10−1 1.716·10−8 9 1.140·10−1 1.160·10−1 3.548·10−7 10 2.044·10−3 2.044·10−3 7.473·10−6 11 1.115·10−6 6.489·10−7 1.574·10−4 12 1.979·10−5 8.971·10−7 3.314·10−3 13 4.168·10−4 1.890·10−5 6.978·10−2 14 8.777·10−3 3.979·10−4 1.469 15 1.848·10−1 8.379·10−3 3.094·10+1 16 3.892 1.764·10−1 6.515·10+2 17 8.195·10+1 3.715 1.372·10+4 18 1.726·10+3 7.823·10+1 2.889·10+5 Table 1: The first 18 Newton iterations to find 3 √ Z for X0 = I. The values in column 2 (and 3) for the first 11 Newton steps seem to indicate convergence. However, the (Frobenius) norm of the commutator XkZ −ZXk (column 4) increases steadily and that causes eventually a diverging sequence (Xk). free (rational) derivation 45 The problem is that due to rounding errors (in finite precision arithmetics) commutativity of Xk (with Z) is lost, that is, ‖XkZ −ZXk‖F increases with every iteration. This happens even for a starting value close to the solution X0 = 3 √ Z +εI. For a detailled discussion of matrix roots and how to overcome such problems we refer to [19, Section 7], for further information about nu- merics to [11]. A classical introduction to matrix functions is [15, Chapter V]. For the (matrix) square root and discussions about the stability (of Newton’s method) we recommend [18]. Remark. To ensure commutation of the iterates Xk with Z one could use an additional correction step solving the Sylvester equation Z∆Xk−∆XkZ = XkZ−ZXk for an update ∆Xk. Since this is expensive the benefit of using the “commutative” Newton iteration would be lost. For fast solutions of Sylvester equations we refer to [20]. Although what we are going to introduce now as “non-commutative (nc) Newton iteration” is even more expensive, it can be implemented as black box algorithm, that is, without manual computation of the non-commutative derivative and any individual implementation/programming. Furthermore, there is absolutely no restriction for the initial iterate. To get the nc Newton method for solving f(x) = 0 we just need to “trun- cate” the Hausdorff polarization operator [27, Section 4] f(x + b) = f(x) + ∂x|bf(x) + 1 2 ∂x|b ( ∂x|bf(x) ) + · · · as analogon to Taylor’s formula. Thus, for f(x) = x3 −z we have to solve 0 = x3 −z︸ ︷︷ ︸ f + bx2 + xbx + x2b︸ ︷︷ ︸ ∂x|bf ∈ R(〈X〉) =: F with respect to b. In terms of (square) matrices X,Z,B this is just the gen- eralized Sylvester equation BX2 + XBX + X2B = Z − X3 which is linear in B [19, Section 7.2]. Taking a (not necessarily with Z commuting) start- ing matrix X0, we can compute B0 and get X1 := X0 + B0 and iteratively Xk+1 := Xk + Bk. Minimal admissible linear systems for f = x3−z and ∂x|bf = bx2+xbx+x2b are given by   1 −x · z · 1 −x · · · 1 −x · · · 1  s =   · · · 1   46 k. schrempf and   1 −x · −b · · · 1 −x · −b · · · 1 · · −b · · · 1 −x · · · · · 1 −x · · · · · 1   s =   · · · · · 1   respectively. A minimal ALS A = (u,A,v) of dimension n = 6 for g = f + ∂x|bf is immediate:  1 −x · −b · z · 1 −x · −b · · · 1 · · −b−x · · · 1 −x · · · · · 1 −x · · · · · 1   s =   · · · · · 1   . Since the system matrix is just a linear matrix pencil A = A1 ⊗ 1 + Ab ⊗ b + Ax ⊗x + Az ⊗z we can plug in m×m matrices using the Kronecker product A(B,X,Z) = A1 ⊗ I + Ab ⊗B + Ax ⊗X + Az ⊗Z. In this case, for v = en = [0, . . . , 0, 1] >, the evaluation of g : Mm(R)3 → Mm(R) is just the upper right m × m block of A(B,X,Z)−1. For how to efficiently evaluate (nc) polynomials by matrices using Horner systems we refer to [31]. Here, A(B,X,Z) is invertible for arbitrary B, X and Z. Given Z and X, to find a B such that g(B,X,Z) = 0 we have to look for transformation matrices P = P(Tij) and Q = Q(Uij) of the form P =   I · · T1,1 T1,2 0 · I · T2,1 T2,2 0 · · I T3,1 T3,2 0 · · · I · · · · · · I · · · · · · I   , Q =   I · · 0 0 0 · I · U2,1 U2,2 U2,3 · · I U3,1 U3,2 U3,3 · · · I · · · · · · I · · · · · · I   such that the upper right block of size 3m×3m of PA(B,X,Z)Q is zero, that is, we need to solve a linear system of equations (with 6m2 +6m2 +m2 = 13m2 free (rational) derivation 47 unknowns) similar to the word problem [30, Theorem 2.4]. Table 2 shows the nc Newton iterations for the starting matrix X0 =  1 0 20 1 0 0 0 1   (4.1) using FriCAS [14] and the least squares solver DGELS from [22]. k ‖Bk‖F ‖Xk − 3 √ Z‖F ‖XkZ −ZXk‖F 0 46.877 5.745 113.842 1 16.081 42.298 5552.242 2 10.768 26.374 3659.788 3 7.320 15.912 2395.971 4 4.971 9.358 1534.892 5 2.934 6.122 912.700 6 2.651 4.389 414.201 7 1.380 1.846 86.771 8 3.878·10−1 4.875·10−1 5.638 9 9.378·10−2 1.023·10−1 2.652·10−1 10 8.432·10−3 8.506·10−3 6.737·10−3 11 7.407·10−5 7.408·10−5 1.951·10−5 12 5.895·10−9 5.895·10−9 5.106·10−10 13 1.825·10−14 1.521·10−14 1.491·10−12 Table 2: The first 13 (nc) Newton iterations to find 3 √ Z for X0 from (4.1). In the beginning, the iterates Xk do not commute with Z (column 4). In general, depending on the initial iterate X0, there is no guarantee that one ends up in some prescribed solution X since there can be several (matrix) roots. For the principal p-th root (and further discussion) we refer to [19, Theorem 7.2]. For a discussion of Taylor’s theorem (for matrix functions) one should have a look in [13]. The goal of this section was mainly for illustration (of the use of the free derivative). How could we attack analysis of convergence (for special classes of rational functions) in this context? “Classical” interval Newton is discussed in [28, Chapter 6.1]. What’s about “nc interval Newton”? The next natural 48 k. schrempf step would be multi-dimensional nc Newton, say to find a root G(X,Y,Z) = 0 for G =  g1g2 g3   =   x 3 + z −d( x(1 −yx) )−1 −e xzx−yz + z2 −f   with (matrix-valued) parameters d,e,f. Although very technical, one can set up a “joint” linear system of equations to solve  g1 + ∂x|ag1 g1 + ∂y|bg1 g1 + ∂z|cg1 g2 + ∂x|ag2 g2 + ∂y|bg2 g2 + ∂z|cg2 g3 + ∂x|ag3 g3 + ∂y|bg3 g3 + ∂z|cg3   = 0 for matrices A, B and C using the previous approach to find tuples of “indi- vidual” transformation matrices Pij and Qij to create respective upper right blocks of zeros. The problem however is, that this system is overdetermined in general, and starting arbitrary close to a root leads to a residual that causes divergence. How can one overcome that? Acknowledgements I am very grateful to Tobias Mai for making me aware of cyclic deriva- tives respectively the work of Rota, Sagan and Stein [27] in May 2017 in Graz and for a discussion in October 2018 in Saarbrücken. I also use this opportunity to thank Soumyashant Nayak for continuous “non- commutative” discussions, in particular about free associative algebras and (inner) derivations. And I thank the referees for the feedback and a hint on the literature. References [1] G.M. Bergman, W. Dicks, On universal derivations, J. Algebra 36 (2) (1975), 193 – 211. [2] J. Berstel, C. Reutenauer, “ Noncommutative Rational Series with Applications ”, Encyclopedia of Mathematics and its Applications, 137, Cambridge University Press, Cambridge, 2011. [3] J.F. Camino, J.W. Helton, R.E. Skelton, Solving matrix inequalities whose unknowns are matrices, SIAM J. Optim. 17 (1) (2006), 1 – 36. [4] J.F. Camino, J.W. Helton, R.E. Skelton, J. Ye, Matrix inequal- ities: a symbolic procedure to determine convexity automatically, Integral Equations Operator Theory 46 (4) (2003), 399 – 454. [5] P.M. Cohn, Around Sylvester’s law of nullity, Math. Sci. 14 (2) (1989), 73 – 83. free (rational) derivation 49 [6] P.M. Cohn, “ Skew Fields. Theory of General Division Rings ”, Encyclope- dia of Mathematics and its Applications, 57, Cambridge University Press, Cambridge, 1995. [7] P.M. Cohn, “ Further Algebra and Applications ”, Springer-Verlag London, Ltd., London, 2003. [8] P.M. Cohn, “ Free Ideal Rings and Localization in General Rings ”, New Mathematical Monographs, 3, Cambridge University Press, Cambridge, 2006. [9] P.M. Cohn, C. Reutenauer, A normal form in free fields, Canad. J. Math. 46 (3) (1994), 517–531. [10] P.M. Cohn, C. Reutenauer, On the construction of the free field, Inter- nat. J. Algebra Comput. 9 (3-4) (1999), 307 – 323. Dedicated to the memory of Marcel-Paul Schützenberger. [11] J.W. Demmel, “ Applied Numerical Linear Algebra ”, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. [12] M.C. de Oliveira, J.W. Helton, Computer algebra tailored to matrix inequalities in control, Internat. J. Control 79 (11) (2006), 1382 – 1400. [13] E. Deadman, S.D. Relton, Taylor’s theorem for matrix functions with ap- plications to condition number estimation, Linear Algebra Appl. 504 (2016), 354 – 371. [14] FriCAS team, FriCAS — An advanced computer algebra system, 2019. Release 1.3.5, available at http://fricas.sf.net, documentation http: //fricas.github.io. [15] F.R. Gantmacher, “ Matrizenrechnung. Teil I. Allgemeine Theorie ”, Zweite, Berichtigte Auflage. Hochschulbücher für Mathematik, Band 36 VEB Deutscher Verlag der Wissenschaften, Berlin, 1965. [16] W. Hackbusch, “ Hierarchische Matrizen: Algorithmen und Analysis ”, Springer, Berlin, 2009. [17] P. Henrici, “ Elements of Numerical Analysis ”, John Wiley & Sons, Inc., New York-London-Sydney, 1964. [18] N.J. Higham, Newton’s method for the matrix square root, Math. Comp. 46 (174) (1986), 537 – 549. [19] N.J. Higham, “ Functions of Matrices. Theory and Computation ”, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008. [20] P. Kirrinnis, Fast algorithms for the Sylvester equation AX − XBᵀ = C, Theoret. Comput. Sci. 259 (1-2) (2001), 623 – 638. [21] D.S. Kaliuzhnyi-Verbovetskyi, V. Vinnikov, “ Foundations of Free Noncommutative Function Theory ”, Mathematical Surveys and Mono- graphs, 199, American Mathematical Society, Providence, RI, 2014. [22] LAPACK 3.5.0, 2018, http://www.netlib.org/lapack. [23] G. Popescu, Free holomorphic functions on the unit ball of B(H )n, J. Funct. Anal. 241 (1) (2006), 268 – 333. [24] C. Reutenauer, Cyclic derivation of noncommutative algebraic power series, J. Algebra 85 (1) (1983), 32 – 39. http://fricas.sf.net http://fricas.github.io http://fricas.github.io http://www.netlib.org/lapack 50 k. schrempf [25] C. Reutenauer, Applications of a noncommutative Jacobian matrix, J. Pure Appl. Algebra 77 (2) (1992), 169 – 181. [26] C. Reutenauer, Michel Fliess and non-commutative formal power series, Internat. J. Control 81 (3) (2008), 336 – 341. [27] G.-C. Rota, B. Sagan, P.R. Stein, A cyclic derivative in noncommuta- tive algebra, J. Algebra 64 (1) (1980), 54 – 75. [28] S.M. Rump, Verification methods: rigorous results using floating-point arith- metic, Acta Numer. 19 (2010), 287 – 449. [29] K. Schrempf, Free fractions: An invitation to (applied) free fields, ArXiv e-prints, September 2018. Version 2, October 2020, http://arxiv.org/ pdf/1809.05425. [30] K. Schrempf, Linearizing the word problem in (some) free fields, Internat. J. Algebra Comput. 28 (7) (2018), 1209 – 1230. [31] K. Schrempf, Horner Systems: How to efficiently evaluate non-commutative polynomials (by matrices), arXiv e-prints, October 2019. [32] K. Schrempf, A standard form in (some) free fields: How to construct mini- mal linear representations, Open Math. 18 (1) (2020), 1365 – 1386. [33] D. Schleicher, R. Stoll, Newton’s method in practice: Finding all roots of polynomials of degree one million efficiently, Theoret. Comput. Sci. 681 (2017), 146 – 166. [34] D. Voiculescu, A note on cyclic gradients, Indiana Univ. Math. J. 49 (3) (2000), 837 – 841. http://arxiv.org/pdf/1809.05425 http://arxiv.org/pdf/1809.05425 Getting started Free derivation Free composition Application: Newton iteration