welcome to the fourth part of the multivariate polynomials in c++ using templates series. in this part, i want to explain how to implement efficient polynomial evaluation using a collection of interacting templates. the output will be quite optimized code (assuming the compiler has a decent enough optimizer).

evaluating multivariate polynomials is more complicated. i started with a naive implementation of `operator()`

in `poly<n, T>`

:

1 template<class S>
2 poly<n - 1, S> operator() (const S & x) const
3 {
4 poly<n - 1, S> res;
5 S xx = (S)1;
6 for (unsigned i = 0; i < d_value.size(); ++i)
7 {
8 res += d_value[i] * xx;
9 xx *= x;
10 }
11 return res;
12 }

if `f`

is of type `poly<2, int>`

, then `f(4)`

will be of type `poly<1, int>`

, whence `f(4)(2)`

will call `poly<1, int>::operator()<int>(const int &)`

to yield a polynomial of type `poly<0, int>`

, which automatically casts to an `int`

. unfortunately, in this process, first `f(4)`

will be created, which requires arithmetic of polynomials of type `poly<1, int>`

, and then the resulting polynomial will be evaluated again. in particular, if `n`

is larger, this is far from being optimal. hence, this solution is fine if evaluations are seldomly done, but if they are done more often, it is too slow.

to stay with our example of `f`

of type `poly<2, int>`

. one could evaluate , written as `f(4)(2)`

, directly as follows:

1int result = 0, xx1 = 1;
2for (int i = 0; i <= f.degree(); ++i)
3{
4 int result2 = 0, xx2 = 1;
5 for (int j = 0; j <= f[i].degree(); ++j)
6 {
7 result2 += f[i][j] * xx2;
8 xx2 *= 2;
9 }
10 result += result2 * xx1;
11 xx1 *= 4;
12}

clearly, this is tiresome (and prone to typos) to write every time. moreover, this is also not optimal, as our `operator[]`

will do more than just returning an element. in particular, if `f`

is not `const`

in this context, the compiler will insert code at every operation `f[i]`

and `f[i][j]`

to check whether the index is out of range (to resize `d_value`

in that case). writing more directly

1int result = 0, xx1 = 1;
2for (int i = 0; i < f.d_value.size(); ++i)
3{
4 int result2 = 0, xx2 = 1;
5 for (int j = 0; j < f.d_value[i].d_value.size(); ++j)
6 {
7 result2 += f.d_value[i].d_value[j] * xx2;
8 xx2 *= 2;
9 }
10 result += result2 * xx1;
11 xx1 *= 4;
12}

would result in faster code, if it would compile – `d_value`

is `private`

. for this reason, i came up with two templates:

1template<int n, class T, class S = T>
2class poly_evaluator;
3template<int n, class T, class HL, class S>
4class poly_evaluator_impl;

the template `poly_evaluator<n, T, S>`

is instanciated in `poly<n, T>::operator()<S>(const S &)`

, which is defined as follows:

1 template<class S>
2 inline poly_evaluator<n, T, S> operator() (const S & x) const
3 {
4 return poly_evaluator<n, T, S>(*this, x);
5 }

the idea is that if `operator()`

of `poly_evaluator<n, T, S>(*this, x)`

is called, it will spawn an object of type `poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>`

. if `operator()`

of this new object is called, it will create an object of type `poly_evaluator_impl<n-2, T, poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>, S>`

, and so on. the purpose of carrying the type of the caller around is that the outer loop of the evaluation function is the loop for the inner-most object (of type `poly_evaluator<n, T, S>`

). if `operator()`

would be right-associative instead of left-associative (as it is), this trouble would not be necessary.

the idea is that both templates `poly_evaluator<n, T, S>`

and `poly_evaluator<n, T, HL, S>`

provide an internal function `template<class SS, class Fun> evaluate(SS & res, Fun evalfun)`

, which in case of `poly_evaluator<n, T, HL, S>`

calls the corresponding `evaluate`

function of its owner (with `evalfun`

replaced), and in the case of `poly_evaluator<n, T, S>`

implements the outermost loop, which loops over the coefficients of and uses `evalfun`

to evaluate the coefficients. hence, the `evalful`

handed upwards must be extended in every step to include the next variable.

assume that we have a polynomial `f`

of type `poly<3, int>`

, and we want to evaluate `f(10)(20)(30)`

. this is an object of type `poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>`

, whose cast operator `operator int()`

will start evaluation by defining a local class `poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun`

, defined as follows:

1 class eval_fun
2 {
3 const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & d_owner;
4
5 public:
6 inline eval_fun(const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & owner)
7 : d_owner(owner)
8 {
9 }
10
11 inline int operator() (const poly<1, int> & p) const
12 {
13 int res = 0;
14 int xx = 1;
15 for (int i = 0; i < (int)p.d_value.size(); ++i)
16 {
17 res += p.d_value[i] * xx;
18 xx = xx * d_owner.d_evalpoint;
19 }
20 return res;
21 }
22 };

an object of it is created by `operator int()`

as follows:

1 inline operator int() const
2 {
3 int res = 0;
4 d_owner.evaluate(res, eval_fun(*this));
5 return res;
6 }

the object `eval_fun(*this)`

can now be used to evaluate polynomials of type `poly<1, int>`

at the specified evaluation point `d_owner.d_evalpoint`

, which equals 30 in this case.

the `operator int()`

calls `poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>::evaluate<int, Fun>(int &, Fun)`

, which in turn is implemented as follows:

1 template<class Fun>
2 inline void evaluate(int & res, const Fun & evalfun) const
3 {
4 d_owner.evaluate(res, eval_fun<int, Fun>(*this, evalfun));
5 }

the newly created object of type `eval_fun<int, Fun>`

will now evaluate polynomials of type `poly<2, int>`

by using the provided functor `evalfun`

, which is an object of the class `poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun`

defined above. the class `eval_fun`

in `poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>`

is defined as follows:

1 template<class Fun>
2 class eval_fun
3 {
4 const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & d_owner;
5 const Fun & d_evalfun;
6
7 public:
8 inline eval_fun(const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & owner, const Fun & evalfun)
9 : d_owner(owner), d_evalfun(evalfun)
10 {
11 }
12
13 inline int operator() (const poly<2, int> & p) const
14 {
15 int res = 0;
16 int xx = 1;
17 for (int i = 0; i < (int)p.d_value.size(); ++i)
18 {
19 res += d_evalfun(p.d_value[i]) * xx;
20 xx = xx * d_owner.d_evalpoint;
21 }
22 return res;
23 }
24 };

note that `d_owner.d_evalpoint`

is 20 in this case. finally, we are at the top level, namely in the function `poly_evaluator<3, int, int>::evaluate<int, Fun>()`

, defined as follows:

1 template<class Fun>
2 inline void evaluate(int & res, const Fun & evalfun) const
3 {
4 int xx = 1;
5 for (int i = 0; i < (int)d_poly.d_value.size(); ++i)
6 {
7 res += evalfun(d_poly.d_value[i]) * xx;
8 xx = xx * d_evalpoint;
9 }
10 }

(here, `d_evalpoint`

is 10.) this is the outer loop. the middle loop is inserted when calling `evalfun(d_poly.d_value[i])`

, and into that the innermost loop is inserted when it calles `d_evalfun(p.d_value[i])`

. this all essentially boils down to the following combination:

1 {
2 int res = 0;
3 int xx = 1;
4 for (int i = 0; i < (int)d_poly.d_value.size(); ++i)
5 {
6 int tmp;
7 const poly<2, int> & p = d_poly.d_value[i];
8 {
9 int res2 = 0;
10 int xx2 = 1;
11 for (int i2 = 0; i2 < (int)p.d_value.size(); ++i2)
12 {
13 int tmp2;
14 const poly<2, int> & p2 = p.d_value[i2];
15 {
16 int res3 = 0;
17 int xx3 = 1;
18 for (int i3 = 0; i3 < (int)p2.d_value.size(); ++i3)
19 {
20 res3 += p2.d_value[i3] * xx3;
21 xx3 = xx3 * d_owner.d_evalpoint;
22 }
23 tmp2 = res3;
24 }
25 res2 += tmp2 * xx2;
26 xx2 = xx2 * d_owner.d_evalpoint;
27 }
28 tmp = res2;
29 }
30 res += tmp * xx;
31 xx = xx * d_evalpoint;
32 }
33 return res;
34 }

of course, this is just an intermediate step (to highlight the similiarities to all the code snippets from above); the compiler will optimize this to something more compact.

this “example” already presented most of the concept. the implementation itself is more complicated, since different types can be used for evaluation (at different stages, even), and since the implementation has support for allocators.

an important point is that both templates `poly_evaluator<n, T, S>`

and `poly_evaluator<n, T, HL, S>`

are specialized for `n == 1`

, as in that case, there is no lower level, and hence these specializations provide no `evaluate`

template and try to pass evaluation further down, but right away pass it higher up to their owner, or in case of `poly_evaluator<1, T, S>`

, it will just do the evaluation.

this concludes the fourth part. the last part will concentrate something less complicated: implementing long division and the euclidean algorithm.