ap-v2.dvi


Acta Polytechnica Vol. 50 No. 2/2010

FPU-Supported Running Error Analysis

T. Zahradnický, R. Lórencz

Abstract

A-posteriori forward rounding error analyses tend to give sharper error estimates than a-priori ones, as they use actual data
quantities. One of such a-posteriori analysis – running error analysis – uses expressions consisting of twoparts; one generates
the error and the other propagates input errors to the output. This paper suggests replacing the error generating term with
an FPU-extracted rounding error estimate, which produces a sharper error bound.

Keywords: Analysis of algorithms, a-posteriori error estimates, running error analysis, floating point, numerical stability.

1 Introduction

Rounding error analyses are used to discover the sta-
bility of numerical algorithms and provide information
on whether the computed result is valid. They are typ-
ically performed in forward or backward direction, and
either a-priori or a-posteriori. Backward error analy-
sis [6] treats all operations as if they were exact but
with a perturbed data, and we ask for which input
data we have solved our problem. Using backward
error analysis is preferred, as each algorithm which
is backward stable is automatically numerically sta-
ble [5], while this does not hold for forward error anal-
yses. On the other hand, forward error analyses treat
operations as sources of errors, and tell us about the
size of the solution that our input data corresponds
to. Since error analyses performed a-priori can often
be difficult if the algorithm being analyzed is complex,
a-posteriori analyses may provide a more feasible al-
ternative (MSC 2000 Classification: 65G20, 65G50).

An objective of an a-priori analysis is to find the
worst case bound for an algorithm, if it exists, while
a-posteriori analyses calculate the error bound concur-
rently with the evaluation of the result and work with
actual data quantities, providing a tighter error bound.

This paper presents a replacement for an error gen-
erating term in error bounds calculated with forward
a-posteriori error estimates (also known as running er-
ror analysis) to obtain sharper error bounds by exploit-
ing the behavior of the computer FPU. We expect the
calculation to be portable (conforming strictly to [3])
and assume an Intel x86 compatible FPU which sup-
ports double extended precision (80-bit) floating point
registers, which are essential for this method.

2 Running error analysis and
error bounds

Running error analysis [6, 7] is a type of a-posteriori
forward error analysis. The algorithm being analyzed
is extended to calculate the partial error bound along-

side the normal calculation. As the algorithm pro-
ceeds, bounds are accumulated, making a total error
bound estimate. From here on, a binary double preci-
sion floating point arithmetic with a guard digit with
round to nearest rounding mode is assumed.

Definition 1 Let Ft ⊂ Q be a binary floating point
set with a precision of t, where Q denotes the rational
number set.

Definition 2 Let y = ±m 2e−t ∈ Ft be a binary
floating point number, where m stands for mantissa,
e for exponent, and t for precision. Let ex(y) = e.

Lemma 3 Let x̂ = fl(x) be a floating point repre-
sentation of an exact number x, which is obtained by
rounding x to the nearest element in Ft. The rounding
process can be described as:

x̂ = fl(x) = x(1 + δ), |δ| ≤ ut, (1)

where ut = 2
−t is unit roundoff in precision t. To

conserve some space, and for clarity, u without a sub-
script means u53.

Assumption 4 (Standard model) We assume that
operations ♦ ∈ {+, −, ·, /} and the square root follow
the Standardmodel [5] and are evaluated with error no
greater than u:

ŝ♦ = fl(â ♦ b̂) = (â ♦ b̂)/(1 + δ♦), |δ♦| ≤ u,
ŝ√ = fl(

√
â) =

√
â/(1 + δ√ ), |δ√ | ≤ u,

(2)

where we use ŝ as a computed result of an operation.
The Standard model will also be used in the form of
ŝ♦ = fl(â♦b̂) = (â♦b̂) · (1 + δ♦), where useful.

Table 1 summarizes the error bounds for basic op-
erations and the square root with products of errors
being neglected. We also assume that the following
holds: ŝ = s/(1 + δs) = s + σ, â = a/(1 + δa) = a + α,
and b̂ = b/(1 + δb) = b + β, where α, β, and σ stand
for absolute (static) errors.

30


Acta Polytechnica Vol. 50 No. 2/2010

Table 1: Error bound estimates for basic operations

Operation Error bound

Addition and subtraction |σ±| ≤ u |â ± b̂| + α + β
Multiplication |σ·| ≤ u |âb̂| + α|̂b| + β|â|

Division |σ/| ≤ u
∣∣∣∣â
b̂

∣∣∣∣ + α|̂b| + β|â|
b̂2

Square root |σ√ | ≤ u
∣∣∣√â∣∣∣ + α

2
∣∣∣√â∣∣∣

The right-hand side of each error bound in Tab. 1
contains an error generating term u| · |, while the rest
of each expression simply propagates input error(s) to
the output. When we substitute u = 2−53 into u|ŝ|,
multiplying by u reduces the exponent by 53

u|ŝ| = 1, xx . . . xx︸ ︷︷ ︸
52 times

·2ex(̂s)−2t, (3)

leaving the mantissa unchanged if u|ŝ| is normal. The
roundoff unit is just the first 1 in (3), while x-en are
generally nonzero. Using u|ŝ| as a rounding error esti-
mate features two problems:

1. The calculated error bound u|ŝ| can be up to two
times larger

(
u
∑t−1

i=0
ŝi2

−i<∼2u
)
, assuming that

ŝi refers to the i-th bit of ŝ.

2. The error generation term always generates an
error even in the case when no physical error
was committed.

The expression u|ŝ| tends to give a higher error
bound estimate than we would need. However, we can
revise this expression if we are interested in obtaining
a tighter error bound estimate, and we can do so by
exploiting the FPU.

3 Analysis

Many computers use a processor with Intel x86 ar-
chitecture [4], which features an FPU with 8 double
extended precision floating point registers. Once a
number is loaded into the FPU, regardless of its pre-
cision, it gets automatically converted into the dou-
ble extended precision format. Further calculations
are performed in this format but rounded to a preci-
sion specified in the floating point control word register
(FCW). The default settings for FCW specify double ex-
tended precision, round to the nearest rounding mode
and mask out all floating point exceptions. If neces-
sary, they can be changed with an fstcw instruction.
Once the result gets stored back into a memory lo-
cation, it is rounded to (or extended if FCW specified
single precision) a defined precision, depending on the
type of store instruction.

3.1 Obtaining a tighter bound from
the FPU

When rounding to the nearest element of Ft, the rela-
tive error is no worse than ut, and we can extract the
static error from the FPU by subtracting the computed
value in double extended precision from its rounded
value. This idea looks similar to compensated sum-
mation [5], which uses an entirely software approach
in contrast to our paper. Subtracting the values pro-
vides an 11-bit error estimate of the real rounding er-
ror, which is no greater than u. The entire idea can
be written as:

Theorem 5 Let Ŝ be the result of an operation per-
formed in double extended precision and let ŝ be the
result in double precision, which we obtained by round-
ing Ŝ to 53 bits by rounding to the nearest as defined
by the IEEE 754 Standard. The absolute rounding er-
ror for an operation in floating point can be calculated
rather as Δ = |ŝ − Ŝ|, where |Δ| ≤ u|ŝ|.

PROOF. We can extract 64-bit mantissa intermedi-
ate results (Ŝ) from the FPU before they get rounded
to 53 bits (ŝ), and calculate Δ = |ŝ − Ŝ|. The quantity
Ŝ is rounded either down to f1 ∈ F53 or up to f2 ∈ F53,
where f1, f2 = {f1, f2 ∈ F53 : |f2 − f1|/|f1| = 2u},
Ŝ ∈ F64 ∧ Ŝ ∈ [f1, f2], ŝ = round

64→53
(Ŝ) ⇒ ŝ ∈ {f1, f2},

where

ŝ = round
64→53

(Ŝ) =

⎧⎪⎪⎨
⎪⎪⎩

f2 if |Ŝ − f1|/|f1| > u
f1 or f2

1 if |Ŝ − f1|/|f1| = u
f1 if |Ŝ − f1|/|f1| < u

. (4)

The following figure shows the principle:

f1

f2
�

�

2
u

�

�

round down

�

�

�

tie (round to even)

�

�

round up

Fig. 1: Rounding a double extended precision number to
double precision.

In all cases, the relative error can be computed in the
same way as |ŝ − Ŝ|/|ŝ| ≤ u and is always bounded by
u. �

1Rounding to even as in [3] applies here.

31


Acta Polytechnica Vol. 50 No. 2/2010

Note 1 Theorem 5 is proved for positive numbers, but
it can be proved for negatives too.

Corollary 6 The expression Δ = |ŝ − Ŝ| ≤ u|ŝ| pro-
vides an 11-bit estimate of the static rounding error
and since |Δ| ≤ u|ŝ|, it can be safely substituted for
all occurrences of u|ŝ| in Tab. 1, providing a tighter
error bound.

3.2 Implementation highlights

Our approach is implemented in C++ language as a
class fdouble with GCC AT&T inline assembly lan-

guage FPU statements [9]. The fdouble class over-
loads most operators and some standard functions2

which are beyond the scope of this paper, providing
a handy replacement for the double data type. The
transition from double data type to fdouble is per-
formed just by changing the name of the data type.
The rest is handled by the class. For an illustration
of how the class works, the source of a plus oper-
ator fdouble::operator+ and its assembly portion
op_plus performing the FPU-supported running er-
ror analysis with the terms above follows:

fdouble fdouble::operator+(fdouble const &b) const
{

fdouble r;

op_plus( r.d, r.e, this->d, this->e, b.d, b.e );

return r;
}

The fdouble::operator+ method calls the op_plus
function which contains the assembly inline and is

common for all other fdouble::operator+ overloads:

inline void op_plus( double& result,
double& result_err,
register const double a,
register const double a_err,
register const double b,
register const double b_err )

{
__asm__ __volatile__(

"fldl %1\n\t" // 1. load b
"fldl %2\n\t" // 2. load a
"faddp\n\t" // 3. calc a+b
"fstl %3\n\t" // 4. store rounded result
"fldl %3\n\t" // 5. load rounded result
"fsubp\n\t" // 6. calc difference
"fabs\n\t" // 7. calc its absolute value
"fldl %4\n\t" // 8. load a_err
"faddp\n\t" // 9. calc diff + a_err
"fldl %5\n\t" // 10. load b_err
"faddp" // 11. calc diff + a_err + b_err
:
"=t"(result_err) // 12. put result in result_err

:
"m"(b), // 13. %1 = b
"m"(a), // 14. %2 = a
"m"(result), // 15. %3 = result
"m"(a_err), // 16. %4 = a_err
"m"(b_err) // 17. %5 = b_err

);
}

2Currently only logarithm, power and exponential function, absolute value, remainder and the square root are available.

32


Acta Polytechnica Vol. 50 No. 2/2010

The op_plus first loads both double precision a
and b (lines 1 and 2) on the floating point stack and
the FPU extends them to the double extended pre-
cision. The sum is then calculated (line 3) popping
a from the FPU stack and replacing b by the sum.
Since the result is still in double extended precision, it
has to be stored back into memory in order to get a
rounded result according to the Standard that we have
to round the result (Ŝ) after each operation (line 4)
and rounded result (ŝ) is pushed back onto the floating
point stack (line 5). Ŝ − ŝ is calculated (line 6) and

its absolute value is determined (line 7). Errors are
propagated according to Tab. 1 (lines 8 through 11),
and the requested absolute error is found at the top of
the FPU stack (line 12). Lines 13 through 17 specify
input operand constraints for the __asm__ directive.
Other operators and functions are implemented in a
similar way.

Traditional running error analysis is implemented
alike, and provides class radouble. The source code
for radouble::operator+ follows:

radouble radouble::operator+(radouble const &b) const
{

radouble r;

r.d = d + b.d;
r.e = fabs(r.d) * ROUNDOFF_UNIT + e + b.e;

return r;
}

Here, we can see that no assembly language in-
lines are necessary, but it is necessary to provide spe-
cial compiler flags in order to obtain the same re-
sults with the fdouble class. These GCC flags are:
-ffloat-store, which ensures that the result of each
floating point operation is stored back into memory
and rounded to a defined precision (required by IEEE
754), and the second flag -mfpmath=387, which assures
that the floating point unit is used. Without the sec-
ond flag, the optimization could use SSE instructions
and provide less accurate results. Readers interested
in obtaining the complete source code for all classes
should refer to [9] but cite this paper.

3.3 Complexity and implementation
notes

The traditional running error analysis needs to calcu-
late u|ŝ|, and four x86 floating point instructions are
necessary3. These are: fld instruction to load u onto
the floating point stack, fmul to calculate uŝ, fabs to
obtain |uŝ| = u|ŝ|, and fstl to store the result back
into memory and round it to double precision.

Our approach suggests to substitute |ŝ− Ŝ| for u|ŝ|
and a typical evaluation requires an fldl instruction
to load ŝ onto the floating point stack, fsub to calcu-
late ŝ−Ŝ, fabs to obtain its absolute value |ŝ−Ŝ|, and
fstl to store the result into memory while rounding
it to double precision.

According to instruction latency tables [1], fsub
instruction latency is 3 cycles, while fmul latency is 5.
The following table compares the two approaches from
the latency point of view, proving that there is a speed
enhancement:

Table 2: Comparing the FPU instructions necessary to
evaluate the absolute error

Traditional approach Our approach

Instructions Latency Instructions Latency

fldl 3 fldl 3

fmul 5 fsub 3

fabs 1 fabs 1

fstl 3 fstl 3

Total 12 Total 10

Our approach not only provides a better error bound
estimate, but also at a higher speed and we should
expect about 20 per cent speed up.

3.4 Case study

The case study compares the presented approach to
traditional running error analysis, and also compares
it to a forward error bound determined a-priori on the
following mathematical identity:

y∞ = log 2 =
∞∑

k=1

(−1)k−1
k

. (5)

C++ and Mathematica [8] software are used to verify
the results. A rounding error analysis of (5) for a dou-
ble precision floating point arithmetic for N addends
is given with the following assumptions:

3This approach works only with FPU, not SSE2/SSE3, as SSE does not support double extended precision and we are unable to

calculate |Ŝ − ŝ|.

33


Acta Polytechnica Vol. 50 No. 2/2010

−8

−6

−2

−4

0

L
og

ar
it
h
m

of
er

ro
r

Running error
Our approach

Result error
Forward error bound

log2 N

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

0 4 8 12 16 20 24 28 32 36
−18

−16

−14

−12

−10

Fig. 2: Calculated value of ŷN and its error bounds (forward sum direction)

1. The first two addends (1 and 1/2) of the sum
are exact as we use a binary floating point arith-
metic,

2. (−1)k−1 is evaluated as

(−1)k−1 =

⎧⎨
⎩ 1 when k is odd−1 otherwise

rather than using pow function from libm thus
the result is always exact,

3. all operations follow the Standard model.

The computed ŷN is expressed as:

ŷN =

[
· · ·
[(

1 −
1
2

)
(1 + δ1) +

1
3

(1 + δ2)

]
(1 + δ3) − · · ·] (1 + δ2N−3) = (6)

= 1(1 + θN ) −
1
2

(1 + θN ) +
1
3

(1 + θN ) −

. . . +
(−1)N−1

N
(1 + θ2) ≤ (7)

≤
(

1 −
1
2

+
1
3
−

1
4

+ − · · · +
(−1)N−1

N

)
(1 + θN ) = (1 + θN )

N∑
k=1

(−1)k−1
k

, (8)

where |δi| ≤ u,
n∏

i=1

(1 + δi)
ρi = 1 + θn, where ρi = ±1,

and |θn| ≤
nu

1 − nu
= γn, and nu < 1 [5].

The backward error is immediately visible from
equation (7), in which we can consider the sum as an
exact sum of perturbed data entries by a relative value
certainly bounded by γn. The forward a-priori error
bound is then calculated as:

|ŷN − yN | ≤ γN
N∑

k=1

(−1)k−1
k

, (9)

and presents the worst case error estimate, which can
be far from true error bound. The following section
demonstrates this statement.

3.4.1 Results

Results are provided for two summation orders. The
first case performs the summation in decreasing order
of magnitude, where a poor, insufficiently accurate re-
sult is quickly visible. The figure 2 depicts this sce-
nario, where N stands for number of iterations and
the y-axis shows a base 10 logarithm of the order of
the error. The first line, marked with �, presents the
absolute error, which is log10 (|ŷN − log 2|) and it gets
smaller with each further accumulation of the sum.
Rounding errors go against this error from bottom
to top, and are represented with the remaining three
lines. From top to bottom: The ×+line presents the
worst case, that is, the a-priori forward error bound
which is obtained from the right hand side of (9). The
second line, marked with +, stands for an a-posteriori
running error bound. We can see that it is slightly
better than the a-priori bound, and the last line (×)
presents our approach, which accounts real rounding
errors, and that is the reason why the first two itera-
tions have no error (cf. the × at the top).

We can also observe that if we use an a-priori
bound, we will have to terminate the accumulation
somewhere after 226 iterations, and after 227 iterations
with the running error bound and approximately 228

iterations with our approach. Calculating more iter-
ations beyond the bound does not make sense in this
case, because each accumulant is smaller than the er-
ror of the result.

34


Acta Polytechnica Vol. 50 No. 2/2010

444036322824

Result error
Forward error bound
Our approach

20

Running error

−18
8 1612

0

−2

−4

−6

−8

−10

−12

−14

−16

−14

−12

−10

−8

−6

−4

−2

0

0 4 48
−18

log2 N

L
og

ar
it
h
m

of
er

ro
r

52

−16

Fig. 3: Calculated value of ŷN and its error bounds (reverse sum direction)

The second case performs the sum in the reverse
order, i.e. going from numbers of the smallest mag-
nitude towards greater numbers (see figure 3). Note
that this scenario demonstrates the claim that an er-
ror bound obtained a-priori (i.e. the forward error
bound) is often too pessimistic and presents the worst
case. Using it would lead to premature termination of
the sum evaluation.

One more thing is demonstrated, i.e., that we
should accumulate numbers in increasing order of mag-
nitude. If we do not do that, numbers with small mag-
nitude start to contribute less and less to a proportion-
ally huge sum value, and we sooner or later find that
further additions do not change the value of the sum.

Due to this fact, the harmonic sequence
∞∑

k=1

1
k

= ∞

converges in finite arithmetics, and has a finite sum
depending on the type of arithmetic.

The following table presents selected portions of
the evaluation time for all two a-posteriori error anal-
yses including the original computation. In addition,
the table shows the run time for the objdouble class,
which wraps the double data class that is used af-
ter subtracting the double column to determine the
amount of the C++ object overhead.

The results were obtained with the POSIX
getrusage API which provides, besides other informa-
tion, the amount of used time in seconds and microsec-
onds since the start of the process. For measuring
purposes, these are internally converted into millisec-
onds and a difference of two millisecond values per-
forms a measurement without interference with other
processes and the operating system. Each value was
measured 50 times and the results were statistically
evaluated [2].

When we compare the traditional running error
analysis run time to a simple evaluation in double

precision, we obtain that object-oriented running error
analysis with the radouble class is approximately 2.81
times slower. This slowdown includes the C++ over-
head, consisting of object construction and operator
calls. The overhead was measured with the objdouble
class and the results show that the evaluation with
objdouble class took 2 times longer than with the
double data type. Our approach is only 2.35 times
slower than the original approach with the double
data type, and it is 1.19 times faster than the tradi-
tional approach. This speed up is what we have been
expecting from Tab. 2.

Table 3: Run times for reverse sum evaluations of selected
N operations with four data types. These are double data
type with no error analysis, the radouble data type, an
object-oriented traditional running error analysis, fdouble
data type which performs FPU-supported running error
analysis, and the objdouble class, which wraps double
data type and is used to measure the C++ overhead. All
values are rounded to whole milliseconds

log2 N double radouble fdouble objdouble

16 1 3 3 2

20 8 50 42 36

24 285 803 672 573

28 4 568 12 849 10 777 9 178

32 73 120 205 726 172 538 146 847

4 Conclusions

The traditional running error analysis approach uses
the u|ŝ| term to get a rounding error estimate and, as
we have shown, this estimate can be up to two times

35


Acta Polytechnica Vol. 50 No. 2/2010

larger than the actual rounding error. Moreover, this
term always generates an error, regardless of whether
an error was physically committed.

By exploiting the floating point unit’s behavior of
the Intel x86 platform, we are able to obtain an 11-
bit error estimate by subtracting the rounded result ŝ
from its not yet rounded equivalent Ŝ, and when di-
vided by Ŝ, this estimate is always less than or equal
to the unit roundoff u. Our approach is very similar
to the traditional approach; it also needs four FPU
instructions, but it replaces multiplication by subtrac-
tion and that saves 2 CPU cycles per evaluation, ob-
taining almost 20 per cent speedup over traditional
running error analysis.

We encourage the use of running error analysis in
all iterative tasks where critical cancellation can oc-
cur, such as during evaluation of a numeric derivative.
As we have seen in the case study, an error bound
determined a-priori can be far from the actual error
and, especially with the provided classes, running er-
ror analysis provides a very quick and feasible replace-
ment. This, however, costs 2.85 times the evaluation
time, but when replaced by FPU-supported running
error analysis, the cost is 2.35 while providing a yet
tighter bound. With a tighter bound, we are able to
calculate more iterations and be sure that the result is
still valid.

Acknowledgement

This research has been partially supported by the Min-
istry of Education, Youth, and Sport of the Czech
Republic, under research program MSM 6840770014,
and by the Czech Science Foundation as project
No. 201/06/1039.

References

[1] Fog, A.: Instruction Tables: Lists of Instruc-
tion Latencies, Throughputs and Micro-operation
Breakdowns for Intel and AMD CPU’s, Copen-
hagen University College of Engineering, avail-
able online as a part of Software Optimization Re-
sources, http://www.agner.org/optimize/, 2008.

[2] Bevington, P., Robinson, D. K.: Data Reduc-
tion and Error Analysis for the Physical Sciences,
McGraw-Hill Science/Engineering/Math, 2002.

[3] IEEE Computer Society Standards Committee.
Working group of the Microprocessor Standards
Subcommittee and American National Standards
Institute: IEEE Standard for Binary Floating-
point Arithmetic, ser. ANSI/IEEE Std 754-1985.
IEEE Computer Society Press, 1985.

[4] Intel Corporation: Intel R© 64 and IA-32 Architec-
tures Software Developer’s Manual, vol. 3A, Sys-
tem Programming Guide, Part 1, 11/2007.

[5] Higham, N. J.: Accuracy and Stability of Numeri-
cal Algorithms, 2nd edition, Society for Industrial
and Applied Mathematics, 2002.

[6] Wilkinson, J. H.: The State of the Art in Error
Analysis, NAG Newsletter, vol. 2/85, 1985.

[7] Wilkinson, J. H.: Error Analysis Revisited, IMA
Bulletin, vol. 22, no. 11/12, pp. 192–200, 1986.

[8] Wolfram Research, Inc.: Mathematica 7,
http://www.wolfram.com/, 2008.

[9] Zahradnický, T.: Fdouble Class, A Double Data
Type Replacement C++ Class with the FPU-
Supported Running Error Analysis. Accessible on-
line at http://service.felk.cvut.cz/anc/zahradt/
fdouble.tar.gz, 2007.

Ing. Tomáš Zahradnický
was born on 9th March 1979, in Prague, Czech Repub-
lic. In 2003 he graduated (MSc) from the Department
of Computer Science and Engineering of the Faculty of
Electrical Engineering at the Czech Technical Univer-
sity in Prague. Since 2004 he has been a postgraduate
student at the same department, where he became an
assistant professor in 2007. Since 2009 he has been
an assistant professor at the Department of Computer
Systems at the Faculty of Information Technologies at
the Czech Technical University in Prague. His scien-
tific research focuses on system optimization, and on
parameter extraction.

Doc. Ing. Róbert Lórencz, CSc.
was born on 10th August 1957 in Prešov, Slovak Re-
public. In 1981 he graduated (MSc) from the Fac-
ulty of Electrical Engineering at the Czech Technical
University in Prague. He received his Ph.D. degree
in 1990 from the Slovak Academy of Sciences. In 1998
he became an assistant professor at the Department of
Computer Science and Engineering at the Faculty of
Electrical Engineering at the Czech Technical Univer-
sity in Prague. In 2005 he defended his habilitation
thesis and became an associate professor at the same
department. Since 2009, he has worked as an associate
professor at the Department of Computer Systems at
the Faculty of Information Technologies at the Czech
Technical University in Prague. His scientific research
focuses on arithmetics for numerical algorithms, resid-
ual arithmetic, error-free computation and cryptogra-
phy.

Ing. Tomáš Zahradnický
Doc. Ing. Róbert Lórencz, CSc.
E-mail: zahradt|lorencz@fit.cvut.cz
Department of Computer Systems
Faculty of Information Technologies
Czech Technical University in Prague
Kolejńı 550/2, 160 00 Prague, Czech Republic

36