welcome to the fourth part of the multivariate polynomials in c++ using templates series. in this part, i want to explain how to implement efficient polynomial evaluation using a collection of interacting templates. the output will be quite optimized code (assuming the compiler has a decent enough optimizer).
evaluating multivariate polynomials is more complicated. i started with a naive implementation of operator()
in poly<n, T>
:
1 template<class S> 2 poly<n - 1, S> operator() (const S & x) const 3 { 4 poly<n - 1, S> res; 5 S xx = (S)1; 6 for (unsigned i = 0; i < d_value.size(); ++i) 7 { 8 res += d_value[i] * xx; 9 xx *= x; 10 } 11 return res; 12 }
if f
is of type poly<2, int>
, then f(4)
will be of type poly<1, int>
, whence f(4)(2)
will call poly<1, int>::operator()<int>(const int &)
to yield a polynomial of type poly<0, int>
, which automatically casts to an int
. unfortunately, in this process, first f(4)
will be created, which requires arithmetic of polynomials of type poly<1, int>
, and then the resulting polynomial will be evaluated again. in particular, if n
is larger, this is far from being optimal. hence, this solution is fine if evaluations are seldomly done, but if they are done more often, it is too slow.
to stay with our example of f
of type poly<2, int>
. one could evaluate , written as f(4)(2)
, directly as follows:
1int result = 0, xx1 = 1; 2for (int i = 0; i <= f.degree(); ++i) 3{ 4 int result2 = 0, xx2 = 1; 5 for (int j = 0; j <= f[i].degree(); ++j) 6 { 7 result2 += f[i][j] * xx2; 8 xx2 *= 2; 9 } 10 result += result2 * xx1; 11 xx1 *= 4; 12}
clearly, this is tiresome (and prone to typos) to write every time. moreover, this is also not optimal, as our operator[]
will do more than just returning an element. in particular, if f
is not const
in this context, the compiler will insert code at every operation f[i]
and f[i][j]
to check whether the index is out of range (to resize d_value
in that case). writing more directly
1int result = 0, xx1 = 1; 2for (int i = 0; i < f.d_value.size(); ++i) 3{ 4 int result2 = 0, xx2 = 1; 5 for (int j = 0; j < f.d_value[i].d_value.size(); ++j) 6 { 7 result2 += f.d_value[i].d_value[j] * xx2; 8 xx2 *= 2; 9 } 10 result += result2 * xx1; 11 xx1 *= 4; 12}
would result in faster code, if it would compile – d_value
is private
. for this reason, i came up with two templates:
1template<int n, class T, class S = T> 2class poly_evaluator; 3template<int n, class T, class HL, class S> 4class poly_evaluator_impl;
the template poly_evaluator<n, T, S>
is instanciated in poly<n, T>::operator()<S>(const S &)
, which is defined as follows:
1 template<class S> 2 inline poly_evaluator<n, T, S> operator() (const S & x) const 3 { 4 return poly_evaluator<n, T, S>(*this, x); 5 }
the idea is that if operator()
of poly_evaluator<n, T, S>(*this, x)
is called, it will spawn an object of type poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>
. if operator()
of this new object is called, it will create an object of type poly_evaluator_impl<n-2, T, poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>, S>
, and so on. the purpose of carrying the type of the caller around is that the outer loop of the evaluation function is the loop for the inner-most object (of type poly_evaluator<n, T, S>
). if operator()
would be right-associative instead of left-associative (as it is), this trouble would not be necessary.
the idea is that both templates poly_evaluator<n, T, S>
and poly_evaluator<n, T, HL, S>
provide an internal function template<class SS, class Fun> evaluate(SS & res, Fun evalfun)
, which in case of poly_evaluator<n, T, HL, S>
calls the corresponding evaluate
function of its owner (with evalfun
replaced), and in the case of poly_evaluator<n, T, S>
implements the outermost loop, which loops over the coefficients of and uses evalfun
to evaluate the coefficients. hence, the evalful
handed upwards must be extended in every step to include the next variable.
assume that we have a polynomial f
of type poly<3, int>
, and we want to evaluate f(10)(20)(30)
. this is an object of type poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>
, whose cast operator operator int()
will start evaluation by defining a local class poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun
, defined as follows:
1 class eval_fun 2 { 3 const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & d_owner; 4 5 public: 6 inline eval_fun(const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & owner) 7 : d_owner(owner) 8 { 9 } 10 11 inline int operator() (const poly<1, int> & p) const 12 { 13 int res = 0; 14 int xx = 1; 15 for (int i = 0; i < (int)p.d_value.size(); ++i) 16 { 17 res += p.d_value[i] * xx; 18 xx = xx * d_owner.d_evalpoint; 19 } 20 return res; 21 } 22 };
an object of it is created by operator int()
as follows:
1 inline operator int() const 2 { 3 int res = 0; 4 d_owner.evaluate(res, eval_fun(*this)); 5 return res; 6 }
the object eval_fun(*this)
can now be used to evaluate polynomials of type poly<1, int>
at the specified evaluation point d_owner.d_evalpoint
, which equals 30 in this case.
the operator int()
calls poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>::evaluate<int, Fun>(int &, Fun)
, which in turn is implemented as follows:
1 template<class Fun> 2 inline void evaluate(int & res, const Fun & evalfun) const 3 { 4 d_owner.evaluate(res, eval_fun<int, Fun>(*this, evalfun)); 5 }
the newly created object of type eval_fun<int, Fun>
will now evaluate polynomials of type poly<2, int>
by using the provided functor evalfun
, which is an object of the class poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun
defined above. the class eval_fun
in poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>
is defined as follows:
1 template<class Fun> 2 class eval_fun 3 { 4 const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & d_owner; 5 const Fun & d_evalfun; 6 7 public: 8 inline eval_fun(const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & owner, const Fun & evalfun) 9 : d_owner(owner), d_evalfun(evalfun) 10 { 11 } 12 13 inline int operator() (const poly<2, int> & p) const 14 { 15 int res = 0; 16 int xx = 1; 17 for (int i = 0; i < (int)p.d_value.size(); ++i) 18 { 19 res += d_evalfun(p.d_value[i]) * xx; 20 xx = xx * d_owner.d_evalpoint; 21 } 22 return res; 23 } 24 };
note that d_owner.d_evalpoint
is 20 in this case. finally, we are at the top level, namely in the function poly_evaluator<3, int, int>::evaluate<int, Fun>()
, defined as follows:
1 template<class Fun> 2 inline void evaluate(int & res, const Fun & evalfun) const 3 { 4 int xx = 1; 5 for (int i = 0; i < (int)d_poly.d_value.size(); ++i) 6 { 7 res += evalfun(d_poly.d_value[i]) * xx; 8 xx = xx * d_evalpoint; 9 } 10 }
(here, d_evalpoint
is 10.) this is the outer loop. the middle loop is inserted when calling evalfun(d_poly.d_value[i])
, and into that the innermost loop is inserted when it calles d_evalfun(p.d_value[i])
. this all essentially boils down to the following combination:
1 { 2 int res = 0; 3 int xx = 1; 4 for (int i = 0; i < (int)d_poly.d_value.size(); ++i) 5 { 6 int tmp; 7 const poly<2, int> & p = d_poly.d_value[i]; 8 { 9 int res2 = 0; 10 int xx2 = 1; 11 for (int i2 = 0; i2 < (int)p.d_value.size(); ++i2) 12 { 13 int tmp2; 14 const poly<2, int> & p2 = p.d_value[i2]; 15 { 16 int res3 = 0; 17 int xx3 = 1; 18 for (int i3 = 0; i3 < (int)p2.d_value.size(); ++i3) 19 { 20 res3 += p2.d_value[i3] * xx3; 21 xx3 = xx3 * d_owner.d_evalpoint; 22 } 23 tmp2 = res3; 24 } 25 res2 += tmp2 * xx2; 26 xx2 = xx2 * d_owner.d_evalpoint; 27 } 28 tmp = res2; 29 } 30 res += tmp * xx; 31 xx = xx * d_evalpoint; 32 } 33 return res; 34 }
of course, this is just an intermediate step (to highlight the similiarities to all the code snippets from above); the compiler will optimize this to something more compact.
this “example” already presented most of the concept. the implementation itself is more complicated, since different types can be used for evaluation (at different stages, even), and since the implementation has support for allocators.
an important point is that both templates poly_evaluator<n, T, S>
and poly_evaluator<n, T, HL, S>
are specialized for n == 1
, as in that case, there is no lower level, and hence these specializations provide no evaluate
template and try to pass evaluation further down, but right away pass it higher up to their owner, or in case of poly_evaluator<1, T, S>
, it will just do the evaluation.
this concludes the fourth part. the last part will concentrate something less complicated: implementing long division and the euclidean algorithm.