�
EXTRACTA

MATHEMATICAE

Volumen 33, Número 2, 2018

instituto de investigación de matemáticas de la

universidad de extremadura

EXTRACTA MATHEMATICAE

Vol. 36, Num. 1 (2021), 25 – 50

doi:10.17398/2605-5686.36.1.25

Available online June 11, 2021

Free (rational) derivation

K. Schrempf

Austrian Academy of Sciences, Acoustics Research Institute

Wohllebengasse 12-14, 1040 Vienna, Austria

math@versibilitas.at , https://orcid.org/0000-0001-8509-009X

Received April 15, 2021 Presented by Manuel Saoŕın
Accepted May 21, 2021

Abstract: By representing elements in free fields (over a commutative field and a finite alphabet)

using Cohn and Reutenauer’s linear representations, we provide an algorithmic construction for the
(partial) non-commutative (or Hausdorff-) derivative and show how it can be applied to the non-

commutative version of the Newton iteration to find roots of matrix-valued rational equations.

Key words: Hausdorff derivative, free associative algebra, free field, minimal linear representation,

admissible linear system, free fractions, chain rule, Newton iteration.

MSC (2020): primary 16K40, 16S85; secondary 68W30, 46G05.

Introduction

Working symbolically with matrices requires non-commuting variables and
thus non-commutative (nc) rational expressions. Although the (algebraic)
construction of free fields, that is, universal fields of fractions of free associative
algebras, is available due to Paul M. Cohn since 1970 [8, Chapter 7], its
practical application in terms of free fractions [29] —building directly on Cohn
and Reutenauer’s linear representations [10]— in computer algebra systems
is only at the very beginning.

The main difficulty for arithmetic —or rather lexetic from the non-existing
Greek word λεξητικος (from λεξις for word) as analogon to αριθμητικος (from
αριθμος for number)— was the construction of minimal linear representations
[32], that is, the normal form of Cohn and Reutenauer [9].

Here we will show that free fractions also provide a framework for “free”
derivation, in particular of nc polynomials. The construction we provide
generalizes the univariate (commutative) case we are so much used to, for
example

f = f(x) = x3 + 4x2 + 3x + 5 with f ′ = d
dx
f(x) = 3x2 + 8x + 3.

ISSN: 0213-8743 (print), 2605-5686 (online)

c©The author(s) - Released under a Creative Commons Attribution License (CC BY-NC 3.0)

https://doi.org/10.17398/2605-5686.36.1.25
mailto:math@versibilitas.at
https://orcid.org/0000-0001-8509-009X
https://publicaciones.unex.es/index.php/EM
https://creativecommons.org/licenses/by-nc/3.0/


26 k. schrempf

The coefficients are from a commutative field K (for example the rational Q,
the real R, or the complex number field C), the (non-commuting) variables
from a (finite) alphabet X , for example X = {x,y,z}.

For simplicity we focus here (in this motivation) on the free associative
algebra R := K〈X〉, aka “algebra of nc polynomials”, and recall the properties
of a (partial) derivation ∂x : R → R (for a fixed x ∈X), namely

• ∂x(α) = 0 for α ∈ K, and more general, ∂x(g) = 0 for g ∈ K〈X\{x}〉,

• ∂x(x) = 1, and

• ∂x(fg) = ∂x(f) g + f ∂x(g).

In [27] this is called Hausdorff derivative. For a more general (module theo-
retic) context we refer to [7, Section 2.7] or [1].

Before we continue we should clarify the wording: To avoid confusion we
refer to the linear operator ∂x (for some x ∈ X) as free partial derivation
(or just derivation) and call ∂xf ∈ R the (free partial) derivative of f ∈ R,
sometimes denoted also as fx or f

′ depending on the context.

In the commutative, we usually do not distinguish too much between al-
gebraic and analytic concepts. But in the (free) non-commutative setting,
analysis is quite subtle [21]. There are even concepts like “matrix convexity”
and symbolic procedures to determine (nc) convexity [4]. However, a system-
atic treatment of the underlying algebraic tools was not available so far. We
are going to close this gap in the following.

After a brief description of the setup (in particular that of linear represen-
tations of elements in free fields) in Section 1, we develop the formalism for
free derivations in Section 2 with the main result, Theorem 2.8. To be able
to state a (partial) “free” chain rule (in Proposition 3.2) we derive a language
for the free composition (and illustrate how to “reverse” it) in Section 3. And
finally, in Section 4, we show how to develop a meta algorithm “nc Newton”
to find matrix-valued roots of a non-commutative rational equation.

Notation. The set of the natural numbers is denoted by N = {1, 2, . . .},
that including zero by N0. Zero entries in matrices are usually replaced by
(lower) dots to emphasize the structure of the non-zero entries unless they
result from transformations where there were possibly non-zero entries before.
We denote by In the identity matrix (of size n) respectively I if the size is
clear from the context. By v> we denote the transpose of a vector v.


free (rational) derivation 27

1. Getting started

We represent elements (in free fields) by admissible linear systems
(Definition 1.6), which are just a special form of linear representations
(Definition 1.3) and “general” admissible systems [8, Section 7.1]. Ratio-
nal operations (scalar multiplication, addition, multiplication, inverse) can be
easily formulated in terms of linear representations [10, Section 1]. For the for-
mulation on the level of admissible linear systems and the “minimal” inverse
we refer to [30, Proposition 1.13] resp. [30, Theorem 4.13].

Let K be a commutative field, K its algebraic closure and X = {x1,x2, . . . ,
xd} be a finite (non-empty) alphabet. K〈X〉 denotes the free associative
algebra (or free K-algebra) and F = K(〈X〉) its universal field of fractions (or
“free field”) [6, 10]. An element in K〈X〉 is called (non-commutative or nc)
polynomial. In our examples the alphabet is usually X = {x,y,z}. Including
the algebra of nc rational series [2] we have the following chain of inclusions:

K ( K〈X〉 ( Krat〈〈X〉〉 ( K(〈X〉) =: F.

Definition 1.1. ([8, Section 0.1], [10]) Given a matrix A ∈ K〈X〉n×n,
the inner rank of A is the smallest number k ∈ N such that there exists a
factorization A = CD with C ∈ K〈X〉n×k and D ∈ K〈X〉k×n. The matrix A
is called full if k = n, non-full otherwise.

Theorem 1.2. ([8, Special case of Corollary 7.5.14]) Let X be
an alphabet and K a commutative field. The free associative algebra R =
K〈X〉 has a universal field of fractions F = K(〈X〉) such that every full matrix
over R can be inverted over F.

Remark. Non-full matrices become singular under a homomorphism into
some field [8, Chapter 7]. In general (rings), neither do full matrices need to
be invertible, nor do invertible matrices need to be full. An example for the
former is the matrix

B =


 · z −y−z · x
y −x ·




over the commutative polynomial ring K[x,y,z] which is not a Sylvester do-
main [5, Section 4]. An example for the latter are rings without unbounded
generating number (UGN) [8, Section 7.3].


28 k. schrempf

Definition 1.3. ([9, 10]) Let f ∈ F. A linear representation of f is a
triple πf = (u,A,v) with u

>,v ∈ Kn×1, full A = A0⊗1+A1⊗x1 +. . .+Ad⊗xd
with A` ∈ Kn×n for all ` ∈ {0, 1, . . . ,d} and f = uA−1v. The dimension of
πf is dim (u,A,v) = n. It is called minimal if A has the smallest possible
dimension among all linear representations of f. The “empty” representation
π = (, , ) is the minimal one of 0 ∈ F with dim π = 0. Let f ∈ F and π
be a minimal linear representation of f. Then the rank of f is defined as
rank f = dim π.

Definition 1.4. ([9]) Let π = (u,A,v) be a linear representation of
f ∈ F of dimension n. The families (s1,s2, . . . ,sn) ⊆ F with si = (A−1v)i
and (t1, t2, . . . , tn) ⊆ F with tj = (uA−1)j are called left family and right fam-
ily respectively. L(π) = span{s1,s2, . . . ,sn} and R(π) = span{t1, t2, . . . , tn}
denote their linear spans (over K).

Proposition 1.5. ([9, Proposition 4.7]) A representation π = (u,A,
v) of an element f ∈ F is minimal if and only if both, the left family and the
right family are K-linearly independent. In this case, L(π) and R(π) depend
only on f.

Remark. The left family (A−1v)i (respectively the right family (uA
−1)j)

and the solution vector s of As = v (respectively t of u = tA) are used
synonymously.

Definition 1.6. ([30]) A linear representation A = (u,A,v) of f ∈ F
is called admissible linear system (ALS) for f, written also as As = v, if
u = e1 = [1, 0, . . . , 0]. The element f is then the first component of the
(unique) solution vector s. Given a linear representation A = (u,A,v) of
dimension n of f ∈ F and invertible matrices P,Q ∈ Kn×n, the transformed
PAQ = (uQ,PAQ,Pv) is again a linear representation (of f). If A is an
ALS, the transformation (P,Q) is called admissible if the first row of Q is
e1 = [1, 0, . . . , 0].

2. Free derivation

Before we define the concrete (partial) derivation and (partial) directional
derivation, we start with a (partial) formal derivation on the level of admis-
sible linear systems and show the basic properties with respect to the repre-
sented elements, in particular that the (formal) derivation does not depend
on the ALS in Corollary 2.3.


free (rational) derivation 29

In other words: Given some letter x ∈X and an admissible linear system
A = (u,A,v), there is an algorithmic point of view in which the (free) deriva-
tion ∂x defines the ALS A′ = ∂xA. (Alternatively one can identify x by its
index ` ∈{1, 2, . . . ,d} and write A′ = ∂`A.) Written in a sloppy way, we show
that ∂x(A + B) = ∂xA + ∂xB and ∂x(A·B) = ∂xA·B + A·∂xB, yielding im-
mediately the algebraic point of view (summarized in Theorem 2.8) by taking
the respective first component of the unique solution vectors f = (sA)1 and
g = (sB)1.

Remark. The following definition is much more general then usually needed.
One gets the “classical” (partial) derivation with respect to some letter x =
x` ∈X (with ` ∈{1, 2, . . . ,d}) for k = 0 resp. (the empty word) a = 1 ∈X∗.

Definition 2.1. Let A = (u,A,v) be an admissible linear system of di-
mension n ≥ 1 for some element in the free field F = K(〈X〉) and ` 6= k ∈
{0, 1, . . . ,d}. The ALS

∂`|kA = ∂`|k(u,A,v) =

([
u ·

]
,

[
A A` ⊗xk
· A

]
,

[
·
v

])

(of dimension 2n) is called (partial) formal derivative of A, (with respect to
x`,xk ∈{1}∪X). For x,a ∈{1}∪X = {1,x1,x2, . . . ,xd} with x 6= a we write
also ∂x|aA, having the indices ` 6= k ∈{0, 1, . . . ,d} of x resp. a in mind.

Lemma 2.2. Let Af = (uf,Af,vf ) and Ag = (ug,Ag,vg) be admissible
linear systems of dimension dimAf ≥ 1 resp. dimAg ≥ 1. Fix x,a ∈{1}∪X
such that x 6= a. Then ∂x|a(Af + Ag) = ∂x|aAf + ∂x|aAg.

Proof. Let ` 6= k ∈ {0, 1, . . . ,d} be the indices of x resp. a. We write A`f
for the coefficient matrix A` of Af resp. A

`
g for A` of Ag. Taking the sum from

[30, Proposition 1.13] we have

∂x|a(Af + Ag) =

([
uf ·

]
,

[
Af −Afu>fug
· Ag

]
,

[
vf

vg

])

=


[uf · · ·] ,



Af −Afu>fug A

`
f ⊗a −A

`
fu
>
fug ⊗a

· Ag . A`g ⊗a
· · Af −Afu>fug
· · · Ag


 ,


·
·
vf

vg







30 k. schrempf

=


[uf · · ·] ,



Af A

`
f ⊗a −Afu

>
fug −A

`
fu
>
fug ⊗a

· Af · −Afu>fug
· · Ag A`g ⊗a
· · · Ag


 ,


·
vf

·
vg






=


[uf · · ·] ,



Af A

`
f ⊗a −Afu

>
fug 0

· Af · 0
· · Ag A`g ⊗a
· · · Ag


 ,


·
vf

·
vg






=

([
uf ·

]
,

[
Af A

`
f ⊗a

· Af

]
,

[
·
vf

])
+

([
ug ·

]
,

[
Ag A

`
g ⊗a

· Ag

]
,

[
·
vg

])

= ∂x|aAf + ∂x|aAg.

The two main steps are swapping block rows 2 and 3 and block columns 2
and 3, and eliminating the single non-zero (first) column in −A`fu

>
fug⊗a and

−Afu>fug (in block column 4) using the first column in block column 2.

Corollary 2.3. Let f,g ∈ F be given by the admissible linear systems
Af = (uf,Af,vf ) and Ag = (ug,Ag,vg) of dimensions nf,ng ≥ 1 respectively.
Fix x,a ∈{1}∪X such that x 6= a. Then f = g implies that ∂x|aAf −∂x|aAg
is an ALS for 0 ∈ F.

Definition 2.4. Let f ∈ F be given by the ALS A = (u,A,v) and fix
x,a ∈ {1}∪X = {1,x1,x2, . . . ,xd} such that x 6= a. Denote by ∂x|af the
element defined by the ALS ∂x|aA. The map ∂x|a : F → F, f 7→ ∂x|af is called
(partial) formal derivation, the element ∂x|af (partial) formal derivative of f.

Corollary 2.5. For each x,a ∈{1}∪X with x 6= a, the formal derivation
∂x|a : F → F is a linear map.

Now we are almost done. Before we show the product rule in the following
Lemma 2.6, we have a look into the left family of the ALS of the (formal)
derivative of a polynomial. Let p = x3 ∈ F (and a = 1). A (minimal) ALS for
∂x|1p is given by


free (rational) derivation 31




1 −x · · 0 −1 · ·
· 1 −x · · 0 −1 ·
· · 1 −x · · 0 −1
· · · 1 · · · 0
· · · · 1 −x · ·
· · · · · 1 −x ·
· · · · · · 1 −x
· · · · · · · 1



s =




0

0

0

0

·
·
·
1



, s =




3x2

2x

1

0

x3

x2

x

1



.

Notice that the first four entries in the left family of ∂x|1A are si = ∂x|1si+4.

Lemma 2.6. (Product Rule) Let f,g ∈ F be given by the admissible
linear systems Af = (uf,Af,vf ) and Ag = (ug,Ag,vg) of dimension nf,ng ≥ 1
respectively. Fix x ∈X and a ∈{1}∪X \{x}. Then

∂x|a(fg) = ∂x|af g + f ∂x|ag.

Proof. Let ` 6= k ∈ {0, 1, . . . ,d} be the indices of x resp. a. We write A`f
for the coefficient matrix A` of Af resp. A

`
g for A` of Ag. We take the sum

and the product from [30, Proposition 1.13] and start with the ALS from the
right hand side,



Af A
`
f ⊗a 0 −Afu

>
fug · ·

· Af −vfug · · ·
· · Ag · · ·
· · · Af −vfug ·
· · · · Ag A`g ⊗a
· · · · · Ag



s =




·
·
vg

·
·
vg



,

subtract block row 6 from block row 3, add block column 3 to block column 6
and remove block row/column 3 to get the ALS


Af A

`
f ⊗a −Afu

>
fug · ·

· Af · · −vfug
· · Af −vfug ·
· · · Ag A`g ⊗a
· · · · Ag


s =



·
·
·
·
vg


 .


32 k. schrempf

Now we can add block row 3 to block row 1 and eliminate the remaining
columns in block (1, 3) by the columns {2, 3, . . . ,nf} from block (1, 1), remove
block row/column 3 to get the ALS


Af A

`
f ⊗a −vfug ·

· Af · −vfug
· · Ag A`g ⊗a
· · · Ag


s =



·
·
·
vg


 .

Swapping block rows 2 and 3 and block columns 2 and 3 yields the ALS

Af −vfug A`f ⊗a 0
· Ag · A`g ⊗a
· · Af −vfug
· · · Ag


s =



·
·
·
vg




of the left hand side ∂x|0(fg). Notice the upper right zero in the system matrix
which is because of x 6= 1.

Definition 2.7. Let f ∈ F, x ∈ X and a ∈ X \ {x}. The element
∂xf := ∂x|1f is called partial derivative of f. The element ∂x|af is called
(partial) directional derivative of f (with respect to a).

Theorem 2.8. (Free derivation) Let x ∈ X . Then the (partial) free
derivation ∂x : F → F = K(〈X〉) is the unique map with the properties
• ∂xh = 0 for all h ∈ K(〈X\{x}〉),
• ∂xx = 1, and
• ∂x(fg) = ∂xf g + f ∂xg for all f,g ∈ F = K(〈X〉).

Proof. Let h be given by the ALS A = (u,A,v) and let ` ∈ {1, 2, . . . ,d}
such that x = x`. We just need to recall the ALS for ∂xh,[

A A` ⊗ 1
· A

][
s′

s′′

]
=

[
·
v

]
and observe that A` = 0 and thus As

′ = 0, in particular the first component
of s′. Therefore ∂xh = 0. For ∂xx = 1 we need to minimize


1 −x · −1
· 1 · ·
· · 1 −x
· · · 1


s =



·
·
·
1


 .


free (rational) derivation 33

And the product rule ∂x(fg) = ∂xf g + f ∂xg is due to Lemma 2.6. For the
uniqueness we assume that there exists another ∂′x : F → F with the same
properties. From the product rule we obtain

∂x(xf) = f + x∂xf = f + x∂
′
xf = ∂

′
x(xf),

that is, x(∂xf −∂′xf) = 0 for all f ∈ F, thus ∂x = ∂′x.

Corollary 2.9. (Hausdorff derivation [27]) Let x ∈ X . Then
∂xκ = 0 for all κ ∈ K, ∂xy = 0 for all y ∈ X \ {x}, ∂xx = 1, and
∂x(fg) = ∂xf g + f ∂xg for all f,g ∈ K〈X〉.

Remark. More general [8, Theorem 7.5.17]: “Any derivation of a Sylvester
domain extends to a derivation of its universal field of fractions.” Recall
however that the cyclic derivative is not (from) a derivation [27, Section 1].
For a discussion of cyclic derivatives of nc algebraic power series we
refer to [24].

Proposition 2.10. Let f ∈ F, x,y ∈ X and a,b ∈ X \{x,y} with a = b
if and only if x = y. Then ∂x|a(∂y|bf) = ∂y|b(∂x|af), that is, the (partial)
derivations ∂x|a and ∂y|b commute.

Proof. Let l,k ∈{1, 2, . . . ,d} the indices of x resp. y. There is nothing to
show for the trivial case x = y, thus we can assume l 6= k. Let f be given by
the admissible linear system A = (u,A,v). Then the “left” ALS ∂x|a(∂y|bA)
is 


A Ak ⊗ b Al ⊗a 0
· A 0 Al ⊗a
· · A Ak ⊗ b
· · · A


s =



·
·
·
v


 .

Swapping block rows/columns 2 and 3 yields the “right” ALS ∂y|b(∂x|aA):

A Al ⊗a Ak ⊗ b 0
· A 0 Ak ⊗ b
· · A Al ⊗a
· · · A


s =



·
·
·
v


 .


34 k. schrempf

Since there is no danger of ambiguity, we can define the (free) “higher”
derivative of f ∈ F as ∂wf for each word w in the free monoid X∗ with the
“trivial” derivative ∂(1)f = f. Let w ∈X∗ and σ(w) denote any permutation
of the letters of w. Then ∂wf = ∂σ(w)f.

The proof for nc formal power series in [23, Proposition 1.8] is based on
words (monomials), that is, ∂x(∂yw) = ∂y(∂xw) ∈ K〈X〉. Recall that one gets
the (nc) rational series by intersecting the (nc) series and the free field [26,
Section 9]:

K〈〈X〉〉∩K(〈X〉) = Krat〈〈X〉〉.

Overall, (free) nc derivation does not appear that often in the literature.
And when there is some discussion it is (almost) always connected with “not
simple” [27, Section 1], “complicated” [16, Section 14.3], etc. This is however
not due to the Hausdorff derivation but to the use of (finite) formal series as
representation (for nc polynomials). Using linear representations in the sense
of Cohn and Reutenauer [9] for elements in the free field F can even reveal
additional structure, as indicated in Example 2.11 (below). For the somewhat
more “complicated” example

p̃ = 3cyxb + 3xbyxb + 2cyxax + cybxb− cyaxb− 2xbyxax + 4xbybxb
− 3xbyaxb + 3xaxyxb− 3bxbyxb + 6axbyxb + 2xaxyxax + xaxybxb
−xaxyaxb− 2bxbyxax− bxbybxb + bxbyaxb + 5axbybxb− 4axbyaxb

from [3, Section 8.2] we refer to [31, Example 3.7].

Example 2.11. Let p = xyzx. A (minimal) polynomial ALS for p is




1 −x · · ·
· 1 −y · ·
· · 1 −z ·
· · · 1 −x
· · · · 1


s =



·
·
·
·
1


 .

Then ∂xp = xyz + yzx admits a factorization into matrices [31, Section 3]:

∂xp =
[
x y

][y ·
· z

][
z

x

]
.


free (rational) derivation 35

A minimal ALS for ∂xp is


1 −x −y · · ·
· 1 · −y 0 ·
· · 1 0 −z ·
· · · 1 · −z
· · · · 1 −x
· · · · · 1



s =




·
·
·
·
·
1



.

Example 2.12. We will use the directional derivative later in Section 4
for the (nc) Newton iteration. For q = x2 we get the “Sylvester equation”
∂x|aq = xa + ax which is linear in a.

In (the next) Section 3 we have a look on the “free” chain rule which
will turn out to be very elegant. We avoid the term “function” here since
one needs to be careful with respect to evaluation (domain of definition),
e.g. f = (xy − yx)−1 is not defined for diagonal matrices. For the efficient
evaluation of polynomials (by matrices) one can use Horner Systems [31]. A
generalization to elements in free fields is considered in future work. If there
is a “compositional structure” available (in admissible linear systems) it could
be used to further optimize evaluation.

Remark. For details on minimization (of linear representations) we refer
to [32]. Notice in particular that the construction (of the formal derivative) in
Definition 2.1 preserves refined pivot blocks. Therefore, if A is refined, linear
(algebraic) techniques suffice for minimization of ∂xA.

Last but not least, given the alphabet X = {x1,x2, . . . ,xd}, we can define
the “free” (canonical) gradient

∇f =
[
∂1f,∂2f, . . . ,∂df

]>
=
[
∂x1f,∂x2f, . . . ,∂xdf

]> ∈ Fd
for some f ∈ K(〈X〉). Cyclic gradients are discussed in [34].

And for a “vector valued” element f = (f1,f2, . . . ,fd) we can define
the Jacobian matrix J(f) = (∂jfi)

d
i,j=1 = (∂xjfi)

d
i,j=1. For a discussion of

non-commutative Jacobian matrices on the level of nc formal power series we
refer to [25].


36 k. schrempf

3. Free composition

To be able to formulate a (partial) “free” chain rule in an elegant way, we
need a suitable notation. It will turn out that Cohn’s admissible systems [8,
Section 7.1] provide the perfect framework for the “expansion” of letters by
elements from another free field.

First we recall the “classical” (analytical) chain rule: Let X, Y and Z be
(open) sets, f = f(x) differentiable on X, g = g(y) differentiable on Y and
h = h(x) = g

(
f(x)

)
. Then d

dx
h = d

df
g d

dx
f resp. h′(x) = g′

(
f(x)

)
f ′(x).

X
f

//

h

88Y
g

// Z

Notation. For a fixed d ∈ N let X = {x1,x2, . . . ,xd}, Y = {y1,y2, . . . ,yd}
and Y′ = {y′1,y

′
2, . . . ,y

′
d} be pairwise disjoint alphabets, that is,

X ∩Y = X ∩Y′ = Y ∩Y′ = ∅.

By FZ we denote the free field K(〈Z〉). Let A ∈ K〈Y〉n×n be a linear full
matrix and f = (f1,f2, . . . ,fd) a d-tuple of elements fi ∈ FX . By A ◦ f we
denote the (not necessarily full) n×n matrix over FX where each letter yi ∈Y
is replaced by the corresponding fi ∈ FX , that is,

A◦ f = A0 ⊗ 1 + A1 ⊗f1 + . . . + Ad ⊗fd ∈ Fn×nX .

We write A◦Af = A◦(Af1,Af2, . . . ,Afd ) for a linearized version induced by
the d-tuple of admissible linear systems Afi = (ufi,Afi,vfi ).

Now let g ∈ FY be given by the ALS Ag = (ug,Ag,vg) and f = (f1,f2, . . . ,
fd) ∈ FdX such that Ag ◦ f is full. Then (the unique element) h = g ◦ f ∈ FX
is defined by the admissible system Ãh = (ug,Ag ◦ f,vg) and we write

Ah = (uh,Ah,vh) =
(
ug ◦Af,Ag ◦Af,vg ◦Af

)
=: Ag ◦Af

for a linearized version using “linearization by enlargement” [8, Section 5.8].
(For details we refer to the proof of Proposition 3.2 below.) Fixing some
x ∈X , we get the (partial) derivative ∂xh of h via the derivative ∂xAh[

Ah A
x
h ⊗ 1

· Ah

]
s =

[
·
vh

]
.


free (rational) derivation 37

Ag ∼ g ∈ FY
◦f

vvmmm
mmm

mmm
mmm ∂y|y′

((RR
RRR

RRR
RRR

RR

Ãh ∼ h ∈ FX
lin.

��

A′g ∼ g′ ∈ FY∪Y′

◦(f,f′)
��

Ah ∼ h ∈ FX

∂x ((PP
PPP

PPP
PPP

PP
Ã′h ∼ h

′ ∈ FX

lin.
vvmmm

mmm
mmm

mmm
m

∂xAh ∼ ∂xh = h′ ∼A′h

Figure 1: Let f = (f1, . . . ,fd) with fi ∈ FX = K(〈X〉) given by the d-tuple of
admissible linear systems Af = (A1, . . . ,Ad) and g ∈ FY given by Ag = (ug,Ag,vg)
such that the system matrix Ag remains full when we replace each letter yi ∈ Y
by the respective element fi, written as Ag ◦ f. Then h = g ◦ f ∈ FX is defined
by the admissible system Ãh = (ug,Ag ◦ f,vg) which we linearize to obtain an
ALS Ah for h before we apply the (partial) derivation ∂x (left path). On the other
hand, we can apply the (directional) derivation ∂y|y′ by going over to the free field

FY∪Y′ = K(〈Y∪Y′〉) with an extended alphabet (with “placeholders” y′i), yielding g
′ =

∂y1|y′1g+. . .+∂yd|y
′
d
g given by some ALS A′g = (u′g,A′g,v′g). Then h′ = g′◦(f,f′) ∈ FX

is defined by the admissible system Ã′h =
(
u′g,A

′
g ◦ (f,f′),v′g

)
, where also each letter

y′i ∈Y
′ is replaced by the respective element f′i = ∂xfi ∈ FX . After linearization we

get an ALS A′h such that ∂xAh −A
′
h = 0, that is, ∂xh = h

′ (right path).

To give a meaning to the right hand side of ∂xh = ∂x(g◦ f), we introduce the
“total” (directional) derivative

g′ := ∂y|y′g = ∂y1|y′1g + ∂y2|y
′
2
g + . . . + ∂yd|y′d

g ∈ FY∪Y′

given by the ALS

A′g := ∂y|y′Ag =

([
ug ·

]
,

[
Ag

∑d
i=1 A

(i)
g ⊗y′i

· Ag

]
,

[
·
vg

])
with letters y′i ∈Y

′. Using a similar notation for the derivatives

f ′ = (f ′1,f
′
2, . . . ,f

′
d) := (∂xf1,∂xf2, . . . ,∂xfd) = ∂xf,

we can write

h′ := ∂x(g ◦ f) = g′ ◦ (f,f ′) = ∂y|y′g ◦ (f,f
′) (3.1)


38 k. schrempf

given by the admissible system

A′g ◦ (f,f
′) = ∂y|y′Ag ◦ (f,f

′).

After an illustration in the following example, we show in Proposition 3.2 that
indeed

∂xh = h
′ = ∂x(g ◦ f) = ∂y|∂xg ◦ f

using an abbreviation for the right hand side of (3.1). For an overview see
Figure 1.

Remark. Cohn writes an admissible system A = (u,A,v) as “block” [A,v]
with not necessarily scalar column v [8, Section 7.1]. A ring homomorphism
which preserves fullness of matrices is called honest [8, Section 5.4].

Remark. It is crucial that Ag ◦ f is full to be able to define the composi-
tion. On a purely algebraic level this is sufficient to define the (partial) chain
rule. For a d-tuple g = (g1,g2, . . . ,gd) given by admissible linear systems
Ag = (Ag1,Ag2, . . . ,Agd ) with Agi = (ugi,Agi,vgi ) we need Agi ◦ f full for all
i ∈{1, 2, . . . ,d}.

Example 3.1. Here we take X = {x,y}, Y = {f,p} and abuse notation.
Let f = (x−1 + y)−1 ∈ FX , p = xy ∈ FX and g = pfp ∈ FY. Then h =
g ◦ (f,p) ∈ FX is given by the (minimal) ALS




1 −x · · · ·
· 1 −y · · ·
· · 1 −x · ·
· · y 1 −x ·
· · · · 1 −y
· · · · · 1



s =




·
·
·
·
·
1



,

and

∂xh = ∂x(pfp) = ∂xpfp + p∂xf p + pf ∂xp

by


free (rational) derivation 39




1 −x · · · · · −1 · · · ·
· 1 −y · · · · · · · · ·
· · 1 −x · · · · · −1 · ·
· · y 1 −x · · · · · −1 ·
· · · · 1 −y · · · · · ·
· · · · · 1 · · · · · ·

1 −x · · · ·
· 1 −y · · ·
· · 1 −x · ·
· · y 1 −x ·
· · · · 1 −y
· · · · · 1




s =




·
·
·
·
·
·
·
·
·
·
·
1




. (3.2)

On the other hand, ∂x(∂fg + ∂pg) = ∂x(∂fg) + ∂x(∂pg) is given by (the
admissible system)



1 −p · · · −∂xp · ·
· 1 −f · · · −∂xf ·
· · 1 −p · · · −∂xp
· · · 1 · · · ·
· · · · 1 −p · ·
· · · · · 1 −f ·
· · · · · · 1 −p
· · · · · · · 1



s =




·
·
·
·
·
·
·
1



.

The summands ∂xpfp and pf ∂xp are easy to read off in the ALS (3.2).
(Alternatively one could add row 8 to row 1 resp. row 11 to row 4.) To
read off p∂xf p = −p (x−1 + y)−2 p, we just need to recall the (minimal) ALS


1 −x · −1
y 1 · ·

1 −x
y 1


s =



·
·
·
1


 .

In other words and with g ∈ FY given by the ALS Ag = (ug,Ag,vg) of
dimension n: ∂xh is given by the admissible system ∂y|∂xAg,[

Ag ∂xAg
· Ag

]
s =

[
·
v

]


40 k. schrempf

of dimension 2n. Notice that ∂xAg is understood here in a purely symbolic
way, that is, over FY∪Y′ with additional letters “∂xy” in Y′ for each y ∈Y. In
this sense ∂y|∂xAg is actually linear.

Proposition 3.2. (Free Chain Rule) Let X = {x1,x2, . . . ,xd}, Y =
{y1,y2, . . . ,yd} and Y′ = {y′1,y

′
2, . . . ,y

′
d} be pairwise disjoint alphabets and

fix x ∈X . For g ∈ K(〈Y〉) given by the ALS Ag = (ug,Ag,vg) and the d-tuple
f = (f1,f2, . . . ,fd) ∈ K(〈X〉)d such that Ag◦f is full. Denote by f ′ the d-duple
(∂xf1,∂xf2, . . . ,∂xfd). Then

∂x(g ◦ f) = ∂y|y′g ◦ (f,f
′) =: ∂y|∂xg ◦ f.

Proof. For i ∈ {1, 2, . . . ,d} let fi ∈ FX be given by the admissible lin-
ear systems Afi = (ufi,Afi,vfi ) respectively. In the following we assume
—without loss of generality— d = 3, and decompose Ag ◦ f into

[
A11 A12
A12 a

]
of

size n with a = α0 + α1f1 + α2f2 + α3f3. A generalization of the “lineariza-
tion by enlargement” [8, Section 5.8] is then to start with the full matrix
Ag ◦ f ⊕ Af1 ⊕ . . . ⊕ Afd , add ufiA

−1
fi

from the corresponding block row to
row n, and −αiA−1fi vfi from the corresponding block column to column n in
the upper left block (these transformations preserve fullness):


A11 A12 · · ·
A21 α0 uf1 uf2 uf3
· −α1vf1 Af1 · ·
· −α2vf2 · Af2 ·
· −α3vf3 · · Af3


 .

A partially linearized system matrix for ∂x(Ag ◦ f) is


A11 A12 · · · ∗ ∗ · · ·
A21 α0 uf1 uf2 uf3 ∗ 0 0 0 0
· −α1vf1 Af1 · · · 0 A

x
f1
⊗ 1 · ·

· −α2vf2 · Af2 · · 0 · A
x
f2
⊗ 1 ·

· −α3vf3 · · Af3 · 0 · · A
x
f3
⊗ 1

A11 A12 · · ·
A21 α0 uf1 uf2 uf3
· −α1vf1 Af1 · ·
· −α2vf2 · Af2 ·
· −α3vf3 · · Af3




. (3.3)


free (rational) derivation 41

On the other hand, the system matrix of ∂yi|y′i
Ag is


A11 A12 A

yi
11 ⊗y

′
i A

yi
12 ⊗y

′
i

A21 a A
yi
21 ⊗y

′
i αiy

′
i

A11 A12

A21 a


 ,

the corresponding partially linearized system matrix of (∂yi|y′i
Ag) ◦ (f,f ′i) is



A11 A12 · · · A
yi
11 ⊗f

′
i A

yi
12 ⊗f

′
i · · ·

A21 α0 uf1 uf2 uf3 A
yi
21 ⊗f

′
i αif

′
i · · ·

· −α1vf1 Af1 · · · · · · ·
· −α2vf2 · Af2 · · · · · ·
· −α3vf3 · · Af3 · · · · ·

A11 A12 · · ·
A21 α0 uf1 uf2 uf3
· −α1vf1 Af1 · ·
· −α2vf2 · Af2 ·
· −α3vf3 · · Af3




.

To eliminate the boxed entry αif
′
i we just need to recall the derivative ∂xAfi

of Afi = (ufi,Afi,vfi ), the invertible matrix Q swaps block columns 1 and 2:
 0 ufi ·· Afi Axfi ⊗ 1
−αivfi · Afi


Q =


ufi 0 ·Afi · Axfi ⊗ 1
· −αivfi Afi


 .

Thus, after summing up (over i ∈ {1, 2, . . . ,d}), we get the system matrix
(3.3) and hence

∂xAh = ∂x(Ag ◦Af ) = ∂y|y′Ag ◦Af,f′,

that is, ∂xh = ∂x(g ◦ f) = ∂y|∂xfg ◦ f.

Free (non-commutative) decomposition is important in control theory [12,
Section 6.2.2]: “The authors do not know how to fully implement the decom-
pose operation. Finding decompositions by hand can be facilitated with the
use of certain type of collect command”.


42 k. schrempf

Let g = fabf +cf +de and f = xy +z, that is, f solves a Riccati equation.
Given

h = g ◦f = (xy + z)ab(xy + z) + c(xy + z) + de

by the (minimal) admissible linear system Ah = (uh,Ah,vh),


1 −x −z · −d −c · ·
· 1 −y · · · · ·
· · 1 −a · · · ·
· · · 1 · −b · ·
· · · · 1 · · −e
· · · · · 1 −x −z
· · · · · · 1 −y
· · · · · · · 1



s =




·
·
·
·
·
·
·
1



,

one can read off f = xy + z directly since Ah has the form


1 −f · −d −c ·
· 1 −a · · ·
· · 1 · −b ·
· · · 1 · −e
· · · · 1 −f
· · · · · 1



s =




·
·
·
·
·
1



.

So, starting with a minimal ALS Ãh for h one “just” needs to find appro-
priate (invertible) transformation matrices P,Q such that Ah = PÃhQ using
(commutative) Gröbner bases techniques similar to [10, Theorem 4.1] or the
refinement of pivot blocks [32, Section 3]. Although this is quite challeng-
ing using brute force methods, in practical examples it is rather easy using
free fractions, that is, minimal and refined admissible linear systems, as a
“work bench” where one can perform the necessary row and column opera-
tions manually.

“The challenge to computer algebra is to start with an expanded version
of h = g ◦f, which is a mess that you would likely see in a computer algebra
session, and to automatically motivate the selection of f.” [12, Section 6.2.1]

Working with free fractions is simple, in particular with nc polynomials.
The main difficulty in working with (finite) formal power series is that the
number of words can grow exponentially with respect to the rank, the minimal


free (rational) derivation 43

dimension of a linear representation [31, Table 1]. In this case, for

h = g ◦f = xyabxy + xyabz + zabxy + zabz + cxy + cz + de,

the main “structure” becomes (almost) visible already after minimization of
the corresponding “polynomial” admissible linear system (in upper triangular
form with ones in the diagonal).

4. Application: Newton iteration

To compute the third root 3
√
z of a (positive) real number z, say z = 2,

we just have to find the roots of the polynomial p = x3 − z, for example by
using the Newton iteration xk+1 := xk −p(xk)/p′(xk), that —given a “good”
starting value x0— yields a (quadratically) convergent sequence x0,x1,x2, . . .
such that limk→∞x

3
k = z. Using FriCAS [14], the first 6 iterations for x0 = 1

are

k |xk −xk−1| xk
1 3.333·10−1 1.3333333333333333
2 6.944·10−2 1.2638888888888888
3 3.955·10−3 1.259933493449977
4 1.244·10−5 1.2599210500177698
5 1.229·10−10 1.2599210498948732
6 0 1.2599210498948732

Detailed discussions are available in every book on numerical analysis,
for example [17]. As a starting point towards current research, one could
take [33].

Now we would like to compute the third root
3
√
Z of a real (square) matrix

Z (with positive eigenvalues). We (still) can use Xk+1 :=
2
3
Xk +

1
3
X−2k Z if

we choose a starting matrix X0 such that X0Z = ZX0 because then all Xk
commute with Z [19, Section 7.3]. For X0 = I and

Z =


47 84 5442 116 99

9 33 32


 respectively 3√Z =


3 2 01 4 3

0 1 2


 ,

some iterations are (with ‖.‖F denoting the Frobenius norm)


44 k. schrempf

k ‖Xk −Xk−1‖F ‖Xk −
3
√
Z‖F

4 9.874 23.679

5 6.511 13.818

6 4.109 7.309

7 2.254 3.200

8 8.295·10−1 9.455·10−1
9 1.140·10−1 1.160·10−1

10 2.044·10−3 2.044·10−3
11 1.115·10−6 6.489·10−7

Does this sequence of matrices X0,X1, . . . converge? Some more iterations
reveal immediately that something goes wrong, visible in Table 1.

k ‖Xk −Xk−1‖F ‖Xk −
3
√
Z‖F ‖XkZ −ZXk‖F

1 65.836 5.385 2.274·10−13
2 22.279 60.756 7.541·10−13
3 14.845 38.500 1.110·10−12
4 9.874 23.679 2.031·10−12
5 6.511 13.818 7.725·10−12
6 4.109 7.309 7.010·10−11
7 2.254 3.200 9.451·10−10
8 8.295·10−1 9.455·10−1 1.716·10−8
9 1.140·10−1 1.160·10−1 3.548·10−7

10 2.044·10−3 2.044·10−3 7.473·10−6
11 1.115·10−6 6.489·10−7 1.574·10−4
12 1.979·10−5 8.971·10−7 3.314·10−3
13 4.168·10−4 1.890·10−5 6.978·10−2
14 8.777·10−3 3.979·10−4 1.469
15 1.848·10−1 8.379·10−3 3.094·10+1
16 3.892 1.764·10−1 6.515·10+2
17 8.195·10+1 3.715 1.372·10+4
18 1.726·10+3 7.823·10+1 2.889·10+5

Table 1: The first 18 Newton iterations to find 3
√
Z for X0 = I. The values in

column 2 (and 3) for the first 11 Newton steps seem to indicate convergence. However,

the (Frobenius) norm of the commutator XkZ −ZXk (column 4) increases steadily
and that causes eventually a diverging sequence (Xk).


free (rational) derivation 45

The problem is that due to rounding errors (in finite precision arithmetics)
commutativity of Xk (with Z) is lost, that is, ‖XkZ −ZXk‖F increases with
every iteration. This happens even for a starting value close to the solution
X0 =

3
√
Z +εI. For a detailled discussion of matrix roots and how to overcome

such problems we refer to [19, Section 7], for further information about nu-
merics to [11]. A classical introduction to matrix functions is [15, Chapter V].
For the (matrix) square root and discussions about the stability (of Newton’s
method) we recommend [18].

Remark. To ensure commutation of the iterates Xk with Z one could use
an additional correction step solving the Sylvester equation Z∆Xk−∆XkZ =
XkZ−ZXk for an update ∆Xk. Since this is expensive the benefit of using the
“commutative” Newton iteration would be lost. For fast solutions of Sylvester
equations we refer to [20].

Although what we are going to introduce now as “non-commutative (nc)
Newton iteration” is even more expensive, it can be implemented as black
box algorithm, that is, without manual computation of the non-commutative
derivative and any individual implementation/programming. Furthermore,
there is absolutely no restriction for the initial iterate.

To get the nc Newton method for solving f(x) = 0 we just need to “trun-
cate” the Hausdorff polarization operator [27, Section 4]

f(x + b) = f(x) + ∂x|bf(x) +
1
2
∂x|b
(
∂x|bf(x)

)
+ · · ·

as analogon to Taylor’s formula. Thus, for f(x) = x3 −z we have to solve

0 = x3 −z︸ ︷︷ ︸
f

+ bx2 + xbx + x2b︸ ︷︷ ︸
∂x|bf

∈ R(〈X〉) =: F

with respect to b. In terms of (square) matrices X,Z,B this is just the gen-
eralized Sylvester equation BX2 + XBX + X2B = Z − X3 which is linear
in B [19, Section 7.2]. Taking a (not necessarily with Z commuting) start-
ing matrix X0, we can compute B0 and get X1 := X0 + B0 and iteratively
Xk+1 := Xk + Bk.

Minimal admissible linear systems for f = x3−z and ∂x|bf = bx2+xbx+x2b
are given by 


1 −x · z
· 1 −x ·
· · 1 −x
· · · 1


s =



·
·
·
1





46 k. schrempf

and 


1 −x · −b · ·
· 1 −x · −b ·
· · 1 · · −b
· · · 1 −x ·
· · · · 1 −x
· · · · · 1



s =




·
·
·
·
·
1




respectively. A minimal ALS A = (u,A,v) of dimension n = 6 for g =
f + ∂x|bf is immediate:



1 −x · −b · z
· 1 −x · −b ·
· · 1 · · −b−x
· · · 1 −x ·
· · · · 1 −x
· · · · · 1



s =




·
·
·
·
·
1



.

Since the system matrix is just a linear matrix pencil A = A1 ⊗ 1 + Ab ⊗ b +
Ax ⊗x + Az ⊗z we can plug in m×m matrices using the Kronecker product

A(B,X,Z) = A1 ⊗ I + Ab ⊗B + Ax ⊗X + Az ⊗Z.

In this case, for v = en = [0, . . . , 0, 1]
>, the evaluation of g : Mm(R)3 →

Mm(R) is just the upper right m × m block of A(B,X,Z)−1. For how to
efficiently evaluate (nc) polynomials by matrices using Horner systems we
refer to [31].

Here, A(B,X,Z) is invertible for arbitrary B, X and Z. Given Z and
X, to find a B such that g(B,X,Z) = 0 we have to look for transformation
matrices P = P(Tij) and Q = Q(Uij) of the form

P =




I · · T1,1 T1,2 0
· I · T2,1 T2,2 0
· · I T3,1 T3,2 0
· · · I · ·
· · · · I ·
· · · · · I



, Q =




I · · 0 0 0
· I · U2,1 U2,2 U2,3
· · I U3,1 U3,2 U3,3
· · · I · ·
· · · · I ·
· · · · · I




such that the upper right block of size 3m×3m of PA(B,X,Z)Q is zero, that
is, we need to solve a linear system of equations (with 6m2 +6m2 +m2 = 13m2


free (rational) derivation 47

unknowns) similar to the word problem [30, Theorem 2.4]. Table 2 shows the
nc Newton iterations for the starting matrix

X0 =


1 0 20 1 0

0 0 1


 (4.1)

using FriCAS [14] and the least squares solver DGELS from [22].

k ‖Bk‖F ‖Xk −
3
√
Z‖F ‖XkZ −ZXk‖F

0 46.877 5.745 113.842

1 16.081 42.298 5552.242

2 10.768 26.374 3659.788

3 7.320 15.912 2395.971

4 4.971 9.358 1534.892

5 2.934 6.122 912.700

6 2.651 4.389 414.201

7 1.380 1.846 86.771

8 3.878·10−1 4.875·10−1 5.638
9 9.378·10−2 1.023·10−1 2.652·10−1

10 8.432·10−3 8.506·10−3 6.737·10−3
11 7.407·10−5 7.408·10−5 1.951·10−5
12 5.895·10−9 5.895·10−9 5.106·10−10
13 1.825·10−14 1.521·10−14 1.491·10−12

Table 2: The first 13 (nc) Newton iterations to find 3
√
Z for X0 from (4.1). In the

beginning, the iterates Xk do not commute with Z (column 4).

In general, depending on the initial iterate X0, there is no guarantee that
one ends up in some prescribed solution X since there can be several (matrix)
roots. For the principal p-th root (and further discussion) we refer to [19,
Theorem 7.2]. For a discussion of Taylor’s theorem (for matrix functions) one
should have a look in [13].

The goal of this section was mainly for illustration (of the use of the free
derivative). How could we attack analysis of convergence (for special classes
of rational functions) in this context? “Classical” interval Newton is discussed
in [28, Chapter 6.1]. What’s about “nc interval Newton”? The next natural


48 k. schrempf

step would be multi-dimensional nc Newton, say to find a root G(X,Y,Z) = 0
for

G =


g1g2
g3


 =


 x

3 + z −d(
x(1 −yx)

)−1 −e
xzx−yz + z2 −f




with (matrix-valued) parameters d,e,f. Although very technical, one can set
up a “joint” linear system of equations to solve


g1 + ∂x|ag1 g1 + ∂y|bg1 g1 + ∂z|cg1

g2 + ∂x|ag2 g2 + ∂y|bg2 g2 + ∂z|cg2

g3 + ∂x|ag3 g3 + ∂y|bg3 g3 + ∂z|cg3


 = 0

for matrices A, B and C using the previous approach to find tuples of “indi-
vidual” transformation matrices Pij and Qij to create respective upper right
blocks of zeros. The problem however is, that this system is overdetermined
in general, and starting arbitrary close to a root leads to a residual that causes
divergence. How can one overcome that?

Acknowledgements

I am very grateful to Tobias Mai for making me aware of cyclic deriva-
tives respectively the work of Rota, Sagan and Stein [27] in May 2017
in Graz and for a discussion in October 2018 in Saarbrücken. I also
use this opportunity to thank Soumyashant Nayak for continuous “non-
commutative” discussions, in particular about free associative algebras
and (inner) derivations. And I thank the referees for the feedback and a
hint on the literature.

References

[1] G.M. Bergman, W. Dicks, On universal derivations, J. Algebra 36 (2)
(1975), 193 – 211.

[2] J. Berstel, C. Reutenauer, “ Noncommutative Rational Series with
Applications ”, Encyclopedia of Mathematics and its Applications, 137,
Cambridge University Press, Cambridge, 2011.

[3] J.F. Camino, J.W. Helton, R.E. Skelton, Solving matrix inequalities
whose unknowns are matrices, SIAM J. Optim. 17 (1) (2006), 1 – 36.

[4] J.F. Camino, J.W. Helton, R.E. Skelton, J. Ye, Matrix inequal-
ities: a symbolic procedure to determine convexity automatically, Integral
Equations Operator Theory 46 (4) (2003), 399 – 454.

[5] P.M. Cohn, Around Sylvester’s law of nullity, Math. Sci. 14 (2) (1989),
73 – 83.


free (rational) derivation 49

[6] P.M. Cohn, “ Skew Fields. Theory of General Division Rings ”, Encyclope-
dia of Mathematics and its Applications, 57, Cambridge University Press,
Cambridge, 1995.

[7] P.M. Cohn, “ Further Algebra and Applications ”, Springer-Verlag London,
Ltd., London, 2003.

[8] P.M. Cohn, “ Free Ideal Rings and Localization in General Rings ”, New
Mathematical Monographs, 3, Cambridge University Press, Cambridge,
2006.

[9] P.M. Cohn, C. Reutenauer, A normal form in free fields, Canad. J. Math.
46 (3) (1994), 517–531.

[10] P.M. Cohn, C. Reutenauer, On the construction of the free field, Inter-
nat. J. Algebra Comput. 9 (3-4) (1999), 307 – 323. Dedicated to the memory
of Marcel-Paul Schützenberger.

[11] J.W. Demmel, “ Applied Numerical Linear Algebra ”, Society for Industrial
and Applied Mathematics (SIAM), Philadelphia, PA, 1997.

[12] M.C. de Oliveira, J.W. Helton, Computer algebra tailored to matrix
inequalities in control, Internat. J. Control 79 (11) (2006), 1382 – 1400.

[13] E. Deadman, S.D. Relton, Taylor’s theorem for matrix functions with ap-
plications to condition number estimation, Linear Algebra Appl. 504 (2016),
354 – 371.

[14] FriCAS team, FriCAS — An advanced computer algebra system, 2019.
Release 1.3.5, available at http://fricas.sf.net, documentation http:
//fricas.github.io.

[15] F.R. Gantmacher, “ Matrizenrechnung. Teil I. Allgemeine Theorie ”,
Zweite, Berichtigte Auflage. Hochschulbücher für Mathematik, Band 36 VEB
Deutscher Verlag der Wissenschaften, Berlin, 1965.

[16] W. Hackbusch, “ Hierarchische Matrizen: Algorithmen und Analysis ”,
Springer, Berlin, 2009.

[17] P. Henrici, “ Elements of Numerical Analysis ”, John Wiley & Sons, Inc.,
New York-London-Sydney, 1964.

[18] N.J. Higham, Newton’s method for the matrix square root, Math. Comp.
46 (174) (1986), 537 – 549.

[19] N.J. Higham, “ Functions of Matrices. Theory and Computation ”, Society
for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.

[20] P. Kirrinnis, Fast algorithms for the Sylvester equation AX − XBᵀ = C,
Theoret. Comput. Sci. 259 (1-2) (2001), 623 – 638.

[21] D.S. Kaliuzhnyi-Verbovetskyi, V. Vinnikov, “ Foundations of Free
Noncommutative Function Theory ”, Mathematical Surveys and Mono-
graphs, 199, American Mathematical Society, Providence, RI, 2014.

[22] LAPACK 3.5.0, 2018, http://www.netlib.org/lapack.

[23] G. Popescu, Free holomorphic functions on the unit ball of B(H )n, J. Funct.
Anal. 241 (1) (2006), 268 – 333.

[24] C. Reutenauer, Cyclic derivation of noncommutative algebraic power series,
J. Algebra 85 (1) (1983), 32 – 39.

http://fricas.sf.net
http://fricas.github.io
http://fricas.github.io
http://www.netlib.org/lapack


50 k. schrempf

[25] C. Reutenauer, Applications of a noncommutative Jacobian matrix, J. Pure
Appl. Algebra 77 (2) (1992), 169 – 181.

[26] C. Reutenauer, Michel Fliess and non-commutative formal power series,
Internat. J. Control 81 (3) (2008), 336 – 341.

[27] G.-C. Rota, B. Sagan, P.R. Stein, A cyclic derivative in noncommuta-
tive algebra, J. Algebra 64 (1) (1980), 54 – 75.

[28] S.M. Rump, Verification methods: rigorous results using floating-point arith-
metic, Acta Numer. 19 (2010), 287 – 449.

[29] K. Schrempf, Free fractions: An invitation to (applied) free fields, ArXiv
e-prints, September 2018. Version 2, October 2020, http://arxiv.org/
pdf/1809.05425.

[30] K. Schrempf, Linearizing the word problem in (some) free fields, Internat.
J. Algebra Comput. 28 (7) (2018), 1209 – 1230.

[31] K. Schrempf, Horner Systems: How to efficiently evaluate non-commutative
polynomials (by matrices), arXiv e-prints, October 2019.

[32] K. Schrempf, A standard form in (some) free fields: How to construct mini-
mal linear representations, Open Math. 18 (1) (2020), 1365 – 1386.

[33] D. Schleicher, R. Stoll, Newton’s method in practice: Finding all roots
of polynomials of degree one million efficiently, Theoret. Comput. Sci. 681
(2017), 146 – 166.

[34] D. Voiculescu, A note on cyclic gradients, Indiana Univ. Math. J. 49 (3)
(2000), 837 – 841.

http://arxiv.org/pdf/1809.05425
http://arxiv.org/pdf/1809.05425

	Getting started
	Free derivation
	Free composition
	Application: Newton iteration