finally.

# posts for 2011.

#### introduction.

as mentioned in part one, testing expression templates isn’t that easy. one often has to go down to assembler level to see what is really going on. to make developing expression templates easier, and to debug my (fictional) `myvec<T>`

expression templates a bit more in detail, i created a second test type: `TestType2`

. again equiped with expression templates, its aim is to break down the expressions into *three-address assembler* commands, using temporaries to achieve this. for example, `a -= (b + c) * d;`

should evaluate into something like

1 TestType2 t1, t2; // without calling constructors or destructors! 2 TT2_create(t1); 3 TT2_add(t1, b, c); 4 TT2_create(t2); 5 TT2_mul(t2, t1, d); 6 TT2_sub(a, a, t2); 7 TT2_destroy(t2); 8 TT2_destroy(t1);

by defining the functions

`add()`

, `mul()`

, etc. in a different translation unit, one can analyse the assembler output of this translation unit to see what exactly came out. by searching for terms like `TT2_add`

one can quickly find the corresponding assembler commands.
#### the implementation.

we begin by declaring the class `TestType2`

and by declaring the functions to operate on it. again, we leave out multiplication (and divison and modulo) to shorten code.

1 class TestType2; 2 3 void TT2_create(TestType2 & r); 4 void TT2_destroy(TestType2 & r); 5 void TT2_createcopy(TestType2 & r, const TestType2 & a); 6 void TT2_copy(TestType2 & r, const TestType2 & a); 7 void setZero(TestType2 & r); 8 void setOne(TestType2 & r); 9 void TT2_add(TestType2 & r, const TestType2 & a, const TestType2 & b); 10 void TT2_sub(TestType2 & r, const TestType2 & a, const TestType2 & b); 11 void TT2_neg(TestType2 & r, const TestType2 & a);

the implementation of

`TestType2`

is straightforward. we add a pointer so that the class actually stores something.1 class TestType2 2 { 3 private: 4 void * d_data; 5 6 public: 7 inline TestType2() 8 { 9 TT2_create(*this); 10 } 11 12 inline TestType2(const TestType2 & src) 13 { 14 TT2_createcopy(*this, src); 15 } 16 17 inline ~TestType2() 18 { 19 TT2_destroy(*this); 20 } 21 22 inline TestType2 & operator = (const TestType2 & src) 23 { 24 TT2_copy(*this, src); 25 return *this; 26 } 27 28 inline TestType2 & operator += (const TestType2 & b) 29 { 30 TT2_add(*this, *this, b); 31 return *this; 32 } 33 34 inline TestType2 & operator -= (const TestType2 & b) 35 { 36 TT2_sub(*this, *this, b); 37 return *this; 38 } 39 };

again, as in part one, we have templates

`TestExpression2<O, D>`

and `TestWrapper2`

:1 template<class Op, class Data> 2 class TestExpression2 3 { 4 private: 5 Op d_op; 6 Data d_data; 7 8 public: 9 inline TestExpression2(const Op & op, const Data & data) 10 : d_op(op), d_data(data) 11 { 12 } 13 14 operator TestType2 () const 15 { 16 TestType2 res; 17 evalTo(res); 18 return res; 19 } 20 21 inline TestType2 evaluate() const 22 { 23 TestType2 res; 24 evalTo(res); 25 return res; 26 } 27 28 inline void evalTo(TestType2 & dest) const 29 { 30 d_op.evalTo(dest, d_data); 31 } 32 }; 33 34 class TestWrapper2 35 { 36 private: 37 const TestType2 & d_val; 38 39 public: 40 inline TestWrapper2(const TestType2 & val) 41 : d_val(val) 42 { 43 } 44 45 inline const TestType2 & evaluate() const 46 { 47 return d_val; 48 } 49 50 inline void evalTo(TestType2 & dest) 51 { 52 TT2_copy(dest, d_val); 53 } 54 };

this time, we have both an

`evaluate()`

and a `evalTo()`

member function. the first allows to just evaluate the expression, generating temporaries for subexpressions, and the second one is used by callers like `operator=()`

of `TestType2`

to evaluate the result of an expression into an object of type `TestType2`

. the expression machinery is added to `TestType2`

by the following functions:1 template<class O, class D> 2 inline TestType2(const TestExpression2<O, D> & src) 3 { 4 TT2_createcopy(*this, src.evaluate()); 5 } 6 7 template<class O, class D> 8 inline TestType2 & operator = (const TestExpression2<O, D> & e) 9 { 10 e.evalTo(*this); 11 return *this; 12 } 13 14 template<class O, class D> 15 inline TestType2 & operator += (const TestExpression2<O, D> & e) 16 { 17 TT2_add(*this, *this, e.evaluate()); 18 return *this; 19 } 20 21 template<class O, class D> 22 inline TestType2 & operator -= (const TestExpression2<O, D> & e) 23 { 24 TT2_sub(*this, *this, e.evaluate()); 25 return *this; 26 }

the operators are defined as follows. there is not much to do: they just provide a

`evalTo()`

template function which calls the corresponding three-address operation:1 class AddOp2 2 { 3 public: 4 template<class A, class B> 5 inline void evalTo(TestType2 & dest, const std::pair<A, B> & data) const 6 { 7 TT2_add(dest, data.first.evaluate(), data.second.evaluate()); 8 } 9 }; 10 11 class SubOp2 12 { 13 public: 14 template<class A, class B> 15 inline void evalTo(TestType2 & dest, const std::pair<A, B> & data) const 16 { 17 TT2_sub(dest, data.first.evaluate(), data.second.evaluate()); 18 } 19 }; 20 21 class NegOp2 22 { 23 public: 24 template<class A> 25 inline void evalTo(TestType2 & dest, const A & data) const 26 { 27 TT2_neg(dest, data.evaluate()); 28 } 29 };

again, what is left is the tedious integration, by defining a ton of

`operator`

s:1 inline TestExpression2<AddOp2, std::pair<TestWrapper2, TestWrapper2> > operator + (const TestType2 & a, const TestType2 & b) 2 { return TestExpression2<AddOp2, std::pair<TestWrapper2, TestWrapper2> >(AddOp2(), std::make_pair(TestWrapper2(a), TestWrapper2(b))); } 3 inline TestExpression2<SubOp2, std::pair<TestWrapper2, TestWrapper2> > operator - (const TestType2 & a, const TestType2 & b) 4 { return TestExpression2<SubOp2, std::pair<TestWrapper2, TestWrapper2> >(SubOp2(), std::make_pair(TestWrapper2(a), TestWrapper2(b))); } 5 6 template<class O2, class D2> 7 inline TestExpression2<AddOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > > operator + (const TestType2 & a, const TestExpression2<O2, D2> & b) 8 { return TestExpression2<AddOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > >(AddOp2(), std::make_pair(TestWrapper2(a), b)); } 9 template<class O2, class D2> 10 inline TestExpression2<SubOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > > operator - (const TestType2 & a, const TestExpression2<O2, D2> & b) 11 { return TestExpression2<SubOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > >(SubOp2(), std::make_pair(TestWrapper2(a), b)); } 12 13 template<class O1, class D1> 14 inline TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> > operator + (const TestExpression2<O1, D1> & a, const TestType2 & b) 15 { return TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> >(AddOp2(), std::make_pair(a, TestWrapper2(b))); } 16 template<class O1, class D1> 17 inline TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> > operator - (const TestExpression2<O1, D1> & a, const TestType2 & b) 18 { return TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> >(SubOp2(), std::make_pair(a, TestWrapper2(b))); } 19 20 template<class O1, class D1, class O2, class D2> 21 inline TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > > operator + (const TestExpression2<O1, D1> & a, const TestExpression2<O2, D2> & b) 22 { return TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > >(AddOp2(), std::make_pair(a, b)); } 23 template<class O1, class D1, class O2, class D2> 24 inline TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > > operator - (const TestExpression2<O1, D1> & a, const TestExpression2<O2, D2> & b) 25 { return TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > >(SubOp2(), std::make_pair(a, b)); } 26 27 inline TestExpression2<NegOp2, TestWrapper2> operator - (const TestType2 & a) 28 { return TestExpression2<NegOp2, TestWrapper2>(NegOp2(), TestWrapper2(a)); } 29 30 template<class O1, class D1> 31 inline TestExpression2<NegOp2, TestExpression2<O1, D1> > operator - (const TestExpression2<O1, D1> & a) 32 { return TestExpression2<NegOp2, TestExpression2<O1, D1> >(NegOp2(), a); }

note that all the above code contains a lot of

`inline`

s, as opposed to part one. in part one, the aim was not to generate most efficient code, but to make visible the expressions evaluated by the expression templates. in this second part, we want to observe the assembler output, and thus want the code to be as optimized as possible.
#### a real-life example.

in this section, i want to look at three examples. first, let us consider the following example:

1 TestType2 s, t, u, x, y, z; 2 s /= t + (x - y) * z - u;

the resulting assembler code (generated with

`g++ -S -O3`

) looks as follows:1 leaq 1024(%rsp), %rcx 2 .... 3 movq %rbx, 56(%rsp) 4 call _Z10TT2_createR9TestType2 5 leaq 960(%rsp), %r12 6 movq %r12, %rdi 7 call _Z10TT2_createR9TestType2 8 leaq 928(%rsp), %rbp 9 movq %rbp, %rdi 10 call _Z10TT2_createR9TestType2 11 leaq 944(%rsp), %rbx 12 movq %rbx, %rdi 13 call _Z10TT2_createR9TestType2 14 leaq 1024(%rsp), %rdx 15 leaq 1040(%rsp), %rsi 16 movq %rbx, %rdi 17 call _Z7TT2_subR9TestType2RKS_S2_ 18 leaq 1008(%rsp), %rdx 19 movq %rbx, %rsi 20 movq %rbp, %rdi 21 call _Z7TT2_mulR9TestType2RKS_S2_ 22 movq %rbx, %rdi 23 call _Z11TT2_destroyR9TestType2 24 leaq 1072(%rsp), %rsi 25 movq %rbp, %rdx 26 movq %r12, %rdi 27 call _Z7TT2_addR9TestType2RKS_S2_ 28 movq %rbp, %rdi 29 call _Z11TT2_destroyR9TestType2 30 leaq 1056(%rsp), %rdx 31 movq %r12, %rsi 32 movq %r13, %rdi 33 call _Z7TT2_subR9TestType2RKS_S2_ 34 movq %r12, %rdi 35 call _Z11TT2_destroyR9TestType2 36 leaq 1088(%rsp), %rsi 37 movq %r13, %rdx 38 movq %rsi, %rdi 39 call _Z7TT2_divR9TestType2RKS_S2_ 40 movq %r13, %rdi 41 call _Z11TT2_destroyR9TestType2

(i removed a lot of register/memory arithmetic in the beginning, which prepares all addresses.) removing all cludder, we are left with the following function calls:

1 TT2_create() 2 TT2_create() 3 TT2_create() 4 TT2_create() 5 TT2_sub() 6 TT2_mul() 7 TT2_destroy() 8 TT2_add() 9 TT2_destroy() 10 TT2_sub() 11 TT2_destroy() 12 TT2_div() 13 TT2_destroy()

four temporaries

`t1, t2, t3, t4`

are created. then, the subtraction `x - y`

is computed into the temporary `t4`

, and the result is multiplied by `z`

into the temporary `t3`

. the temporary `t4`

used to compute `x - y`

is then destroyed, and `t`

is added to `t3`

, with the result being stored in the temporary `t2`

. the temporary `t3`

is destroyed, and `t2 - u`

is evaluated into `t1`

. finally, `s`

is divided by `t1`

and `t1`

is destroyed. this shows that the compiler generated an equivalent to1 TestType2 t1, t2, t3, t4; // without calling constructors or destructors! 2 TT2_create(t1); 3 TT2_create(t2); 4 TT2_create(t3); 5 TT2_create(t4); 6 TT2_sub(t4, x, y); 7 TT2_mul(t3, t4, z); 8 TT2_destroy(t4); 9 TT2_add(t2, t, t3); 10 TT2_destroy(t3); 11 TT2_sub(t1, t2, u); 12 TT2_destroy(t2); 13 TT2_div(s, s, t1); 14 TT2_destroy(t1);

this is pretty much optimal: if one would have done this translation by hand in a straightforward way, one would have reached the same solution.

now let us re-consider our example from part one: the

`myvec<T>`

template. assume we have two `myvec<TestType2>`

vectors `v`

and `w`

, and we write `v += v + w;`

and `v += w + v;`

. the first command should generate something like1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 add(v[i], v[i], v[i]); 4 add(v[i], v[i], w[i]); 5 }

while the second command should generate something like

1 TestType2 t; // without calling constructors or destructors! 2 create(t); 3 for (unsigned i = 0; i < v.size(); ++i) 4 { 5 add(t, w[i], v[i]); 6 add(v[i], v[i], t); 7 } 8 destroy(t);

or, if the expression templates for

`myvec<T>`

are not that good, at least something like1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 TestType2 t; // without calling constructors or destructors! 4 create(t); 5 add(t, w[i], v[i]); 6 add(v[i], v[i], t); 7 destroy(t); 8 }

first, consider

`v += v + w;`

. the assembler code generated is the following:1 movl 320(%rsp), %r9d 2 testl %r9d, %r9d 3 je .L128 4 xorl %ebp, %ebp 5 .L129: 6 mov %ebp, %r12d 7 salq $3, %r12 8 movq %r12, %rbx 9 addq 328(%rsp), %rbx 10 movq %rbx, %rdx 11 movq %rbx, %rsi 12 movq %rbx, %rdi 13 call _Z7TT2_addR9TestType2RKS_S2_ 14 movq %r12, %rdx 15 addq 312(%rsp), %rdx 16 movq %rbx, %rsi 17 movq %rbx, %rdi 18 call _Z7TT2_addR9TestType2RKS_S2_ 19 addl $1, %ebp 20 cmpl 320(%rsp), %ebp 21 jb .L129 22 .L128:

clearly, first the code tests whether the loop has to be run through at least once; if not, one jumps to label

`.L128`

. then, per loop iteration, exactly two calls to `TT2_add()`

are made. this shows that the code is essentially like1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 add(v[i], v[i], v[i]); 4 add(v[i], v[i], w[i]); 5 }

as we were hoping. thus, the expression templates worked well in this case. now let us look at

`v += w + v;`

. this time, a temporary has to be created, since otherwise `v[i]`

is modified before being used in the expression. the generated assembler code is the following:1 movl 320(%rsp), %eax 2 cmpl 304(%rsp), %eax 3 jne .L283 4 leaq 416(%rsp), %rbx 5 movq %rbx, %rdi 6 call _Z10TT2_createR9TestType2 7 movl 320(%rsp), %r8d 8 testl %r8d, %r8d 9 je .L146 10 xorl %r12d, %r12d 11 .p2align 4,,10 12 .p2align 3 13 .L147: 14 mov %r12d, %ebp 15 movq %rbx, %rdi 16 salq $3, %rbp 17 movq %rbp, %rsi 18 addq 312(%rsp), %rsi 19 call _Z8TT2_copyR9TestType2RKS_ 20 movq %rbp, %rdx 21 addq 328(%rsp), %rdx 22 movq %rbx, %rsi 23 movq %rbx, %rdi 24 call _Z7TT2_addR9TestType2RKS_S2_ 25 movq %rbp, %rdi 26 addq 328(%rsp), %rdi 27 movq %rbx, %rdx 28 movq %rdi, %rsi 29 call _Z7TT2_addR9TestType2RKS_S2_ 30 addl $1, %r12d 31 cmpl 320(%rsp), %r12d 32 jb .L147 33 .L146: 34 movq %rbx, %rdi 35 call _Z11TT2_destroyR9TestType2

this is as good as we were hoping: before the loop, a temporary is created, and destroyed after the loop. in the loop, we have three calls:

`TT2_copy()`

, `TT2_add()`

and a second time `TT2_add()`

. thus, the resulting code is1 TestType2 t; // without calling constructors or destructors! 2 create(t); 3 for (unsigned i = 0; i < v.size(); ++i) 4 { 5 copy(t, w[i]); 6 add(t, t, v[i]); 7 add(v[i], v[i], t); 8 } 9 destroy(t);

this is not as optimal as it could be: the best solution would be to optimize

1 copy(t, w[i]); 2 add(t, t, v[i]);

to one call:

`add(t, w[i], v[i]);`

. but this is still much better than1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 TestType2 t; // without calling constructors or destructors! 4 create(t); 5 add(t, w[i], v[i]); 6 add(v[i], v[i], t); 7 destroy(t); 8 }

where the temporary is created and destroyed every iteration. when trying to add this optimization, it is easier to use

`TestType`

from part one to see what is happening. once the output looks fine, one switches back to `TestType2`

to make sure the generated assembler code is fine.
#### the code.

you can download the source code of the `TestType2`

class here, and dummy implementations (to be put into another translation unit) of the `TT_*`

functions here.

#### introduction.

when developing expression templates, say a vector class allowing efficient arithmetic using expression templates, one often wonders if the long, complicated pages of template code one wrote actually does what it is supposed to do. to be sure, one has to ask the compiler to generate assembler code and wade through that code to see what is happening. this is tiresome, though essential.

most of the time, it is enough to check if the operations done with the supplied type – in the above example of a vector class, the scalar type – are what we want them to be. for example, if `v, w, x, y, z`

are variables of type `myvec<int>`

, say all of the same dimension, one wants to know whether `v = 2 * w + (x - (y - z))`

is evaluated to something like

1 for (unsigned i = 0; i < v.size(); ++i) 2 v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

or (more likely)

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] = 2 * w[i]; 4 v[i] += x[i]; 5 v[i] -= y[i]; 6 v[i] += z[i]; 7 }

(note that this code is not equivalent to the one above for certain more complex types than

`int`

, but for my purposes, such a transformation is fine. but anyway, that is not what we want to discuss here.) an alternative would be that temporaries are created, like in1 myvec<int> t1, t2, t3, t4; 2 t1 = 2 * w; 3 t2 = y - z; 4 t3 = x - t2; 5 t4 = t1 - t3; 6 v = t4;

this involves allocating and releasing memory and copying around data, which is significantly slower than the direct version

1 for (unsigned i = 0; i < v.size(); ++i) 2 v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

the aim of expression templates is to generate such more optimal code. but what if your expression templates don’t do what you want? maybe they still generate temporaries, like in the following version:

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 int t2 = 2 * w[i]; 4 int t2 = y[i] - z[i]; 5 int t3 = x[i] - t2; 6 v[i] = t1 - t3; 7 }

the compiler will still optimize this to the same as above, but in case the type is not

`int`

but, say, `myArbitraryPrecisionInteger`

(which uses expression templates itself and can deal very efficiently with expressions like `v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));`

), this code is suboptimal, and no expression templates provided by that type can make it better.therefore, one would like to have a dummy type, a type one can just plug in instead of

`int`

in the above example, which somehow outputs what exactly is done: which temporaries are created and destroyed, and which operations and expressions are created.
#### a test type.

the test type has to use expression templates itself to gather the expressions which are generated by the other expression templates, like the ones of `myvec<T>`

. then, it should just print these expressions when assignments etc. occur, say to `std::cout`

. then one can run a test program and just read the output to see what was going on. this is in general much easier than looking at the generated assembler code.

note that we only care about additive operations in the following, to make the code more readable.

the basic version of the type looks as follows:

1 class TestType 2 { 3 public: 4 TestType() 5 { 6 std::cout << "create " << this << "\n"; 7 } 8 9 TestType(const TestType & src) 10 { 11 std::cout << "create " << this << " from " << &src << "\n"; 12 } 13 14 ~TestType() 15 { 16 std::cout << "destroy " << this << "\n"; 17 } 18 19 TestType & operator = (const TestType & src) 20 { 21 std::cout << "copy " << this << " from " << &src << "\n"; 22 return *this; 23 } 24 25 TestType & operator += (const TestType & b) 26 { 27 std::cout << this << " += " << &b << "\n"; 28 return *this; 29 } 30 31 TestType & operator -= (const TestType & b) 32 { 33 std::cout << this << " -= " << &b << "\n"; 34 return *this; 35 } 36 };

this already allows us to see when

`TestType`

objects and temporaries are created and assigned, and when basic arithmetic is done. but so far, no *real*arithmetic can be done. to allow arithmetic, we introduce a

`TestExpression<O, D>`

template. here, the `O`

class describes the operand, and the `D`

class describes the argument(s). it is defined as follows:1 template<class Op, class Data> 2 class TestExpression 3 { 4 private: 5 Op d_op; 6 Data d_data; 7 8 public: 9 inline TestExpression(const Op & op, const Data & data) 10 : d_op(op), d_data(data) 11 { 12 } 13 14 operator TestType () const 15 { 16 std::cout << "casting TestExpression["; 17 d_op.print(d_data); 18 std::cout << "] to TestType()\n"; 19 return TestType(); 20 } 21 22 void print() const 23 { 24 d_op.print(d_data); 25 } 26 };

to add support for expressions to the

`TestType`

class, one adds the following methods to it:1 template<class O, class D> 2 TestType(const TestExpression<O, D> & src) 3 { 4 std::cout << "create " << this << " from "; 5 src.print(); 6 std::cout << "\n"; 7 } 8 9 template<class O, class D> 10 TestType & operator = (const TestExpression<O, D> & e) 11 { 12 std::cout << this << " = "; 13 e.print(); 14 std::cout << "\n"; 15 return *this; 16 } 17 18 template<class O, class D> 19 TestType & operator += (const TestExpression<O, D> & e) 20 { 21 std::cout << this << " += "; 22 e.print(); 23 std::cout << "\n"; 24 return *this; 25 } 26 27 template<class O, class D> 28 TestType & operator -= (const TestExpression<O, D> & e) 29 { 30 std::cout << this << " -= "; 31 e.print(); 32 std::cout << "\n"; 33 return *this; 34 }

then if we assign a

`TestExpression<O, D>`

to a `TestType`

object, or add it to it, or subtract it from it, etc., the corresponding messages are printed. now, let us discuss how the operands are implemented. these are simple classes with template members, which do not store data:1 class AddOp 2 { 3 public: 4 template<class A, class B> 5 void print(const std::pair<A, B> & data) const 6 { 7 std::cout << "("; 8 data.first.print(); 9 std::cout << " + "; 10 data.second.print(); 11 std::cout << ")"; 12 } 13 }; 14 15 class SubOp 16 { 17 public: 18 template<class A, class B> 19 void print(const std::pair<A, B> & data) const 20 { 21 std::cout << "("; 22 data.first.print(); 23 std::cout << " - "; 24 data.second.print(); 25 std::cout << ")"; 26 } 27 }; 28 29 class NegOp 30 { 31 public: 32 template<class A> 33 void print(const A & data) const 34 { 35 std::cout << "-"; 36 data.print(); 37 } 38 };

note that all operations but the unary

`NegOp`

are binary; the data object is in that case a `std::pair<T1, T2>`

object. now one main piece is missing which puts everything together: the overloaded operators which generate the expression templates. let us begin with the universal ones: the ones taking two expressions and combining them by an operator.1 template<class O1, class D1, class O2, class D2> 2 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator + (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b) 3 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(AddOp(), std::make_pair(a, b)); } 4 template<class O1, class D1, class O2, class D2> 5 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator - (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b) 6 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(SubOp(), std::make_pair(a, b)); }

this code is not exactly readable, but does its job: it takes two expressions,

`TestExpression<O1, D1>`

and `TestExpression<O2, D2>`

, and combines them to a new expression `TestExpression<NewOperand, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >`

. the operator for inversion looks similar, but simpler:1 template<class O1, class D1> 2 TestExpression<NegOp, TestExpression<O1, D1> > operator - (const TestExpression<O1, D1> & a) 3 { return TestExpression<NegOp, TestExpression<O1, D1> >(NegOp(), a); }

but this whole thing only works when we already have expressions. so far, we have no code which actually

*creates*an expression in the first place. this can be done by more operator overloading, and by introducing a

`TestWrapper[/url] which encapsulates an object of type [code language="c++"]TestType`

and behaves like an expression on its own. let us first show the operator definition in the unary case:1 TestExpression<NegOp, TestWrapper> operator - (const TestType & a) 2 { return TestExpression<NegOp, TestWrapper>(NegOp(), TestWrapper(a)); }

the template

`TestWrapper`

encapsulates a `TestObject`

. the definition looks as follows:1 class TestWrapper 2 { 3 private: 4 const TestType & d_val; 5 6 public: 7 inline TestWrapper(const TestType & val) 8 : d_val(val) 9 { 10 } 11 12 void print() const 13 { 14 std::cout << &d_val; 15 } 16 };

compare this to the definition of

`TestExpression`

above; note that no casting operator is needed as a `TestWrapper`

object should never show up directly to the user.now, we are left to implement the binary operators for

`+`

and `-`

. we have to go through all combinations of `TestType`

and `TestExpression<O, D>`

combinations (except two `TestExpression<O, D>`

‘s, which we already covered). this looks as follows:1 TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> > operator + (const TestType & a, const TestType & b) 2 { return TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> >(AddOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); } 3 TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> > operator - (const TestType & a, const TestType & b) 4 { return TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> >(SubOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); } 5 6 template<class O2, class D2> 7 TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator + (const TestType & a, const TestExpression<O2, D2> & b) 8 { return TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(AddOp(), std::make_pair(TestWrapper(a), b)); } 9 template<class O2, class D2> 10 TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator - (const TestType & a, const TestExpression<O2, D2> & b) 11 { return TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(SubOp(), std::make_pair(TestWrapper(a), b)); } 12 13 template<class O1, class D1> 14 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator + (const TestExpression<O1, D1> & a, const TestType & b) 15 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(AddOp(), std::make_pair(a, TestWrapper(b))); } 16 template<class O1, class D1> 17 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator - (const TestExpression<O1, D1> & a, const TestType & b) 18 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(SubOp(), std::make_pair(a, TestWrapper(b))); }

this is as annoying to write as it looks like, but it is necessary. but only once.

#### testing the result.

now assume that `v`

and `w`

are two objects of type `myvec<TestType>`

, each having six elements. the object `s`

is of type `TestType`

itself. assume that i write the following: `v += v + w`

. then the compiled version will output:

1 0x1b780d0 += 0x1b780d0 2 0x1b780d0 += 0x1b780f0 3 0x1b780d1 += 0x1b780d1 4 0x1b780d1 += 0x1b780f1 5 0x1b780d2 += 0x1b780d2 6 0x1b780d2 += 0x1b780f2 7 0x1b780d3 += 0x1b780d3 8 0x1b780d3 += 0x1b780f3 9 0x1b780d4 += 0x1b780d4 10 0x1b780d4 += 0x1b780f4 11 0x1b780d5 += 0x1b780d5 12 0x1b780d5 += 0x1b780f5

this shows that the command

`v += v + w`

is replaced by something like:1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] += v[i]; 4 v[i] += w[i]; 5 }

now let us look at something more complicated. if i write

`v = w + v`

, this cannot be translated to1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] += w[i]; 4 v[i] += v[i]; 5 }

anymore, as

`v[i]`

is changing its value inbetween. i added code to my `myvec<T>`

implementation to detect and try to avoid such problems. in this case, it should translate the code something like1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 TestType t = w[i] + v[i]; 4 v[i] += t; 5 }

the output is:

1 create 0x7fffc4d6b5df 2 copy 0x7fffc4d6b5df from 0x17860f0 3 0x7fffc4d6b5df += 0x17860d0 4 0x17860d0 += 0x7fffc4d6b5df 5 copy 0x7fffc4d6b5df from 0x17860f1 6 0x7fffc4d6b5df += 0x17860d1 7 0x17860d1 += 0x7fffc4d6b5df 8 copy 0x7fffc4d6b5df from 0x17860f2 9 0x7fffc4d6b5df += 0x17860d2 10 0x17860d2 += 0x7fffc4d6b5df 11 copy 0x7fffc4d6b5df from 0x17860f3 12 0x7fffc4d6b5df += 0x17860d3 13 0x17860d3 += 0x7fffc4d6b5df 14 copy 0x7fffc4d6b5df from 0x17860f4 15 0x7fffc4d6b5df += 0x17860d4 16 0x17860d4 += 0x7fffc4d6b5df 17 copy 0x7fffc4d6b5df from 0x17860f5 18 0x7fffc4d6b5df += 0x17860d5 19 0x17860d5 += 0x7fffc4d6b5df 20 destroy 0x7fffc4d6b5df

this shows that in fact, the generated code is more like this:

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 TestType t = w[i]; 4 t += v[i]; 5 v[i] += t; 6 }

so without looking at the generated assembler code, we already have a good idea what the template expressions of

`myvec<T>`

are doing. now assume we write something like `v = s * v + w;`

. the output is1 0x1786116 += (0x1786116 * 0x7fffc4d6b7af) 2 0x1786116 += 0x1786110 3 0x1786117 += (0x1786117 * 0x7fffc4d6b7af) 4 0x1786117 += 0x1786111 5 0x1786118 += (0x1786118 * 0x7fffc4d6b7af) 6 0x1786118 += 0x1786112 7 0x1786119 += (0x1786119 * 0x7fffc4d6b7af) 8 0x1786119 += 0x1786113 9 0x178611a += (0x178611a * 0x7fffc4d6b7af) 10 0x178611a += 0x1786114 11 0x178611b += (0x178611b * 0x7fffc4d6b7af) 12 0x178611b += 0x1786115

which shows that the code was translated to something like

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] += v[i] * s; 4 v[i] += w[i]; 5 }

(this of course assumes that we also implemented

`operator *`

for `TestType`

.)to really check what is going on one still has to check the generated assembler code. for example, in the test above which used a temporary when writing

`v += w + v`

and no temporary when writing `v += v + w`

, it is unclear if already the compiler made the decision which case to use (which he could, since he knows the addresses of `v`

and `w`

, or at least knows whether they are equal or not), or whether both cases are compiled into the program and whether the running program has to figure out which block of code to run. this cannot be detected using `TestType`

.note that for checking the assembler code, one better uses a different test type: one which translates everything to “three-address assembler”-like commands, which are declared

`extern`

(and not defined in this translation unit). then one can search for these function calls, and the whole scenario is more realistic as with complex operators (as above, where `std::cout`

is used all over the place), in which case often operators are outsourced as own functions.
#### the code.

you can download the source code of the `TestType`

class here.

almost over. decending, soon, vanishing beneath the horizon. a last glimpse of its beauty, before it is gone.

goodbye sun, see you tomorrow.

this is not the first night it snows – yesterday, flakes fell down as well. but this time, the snow is staying. not for long, i’m afraid, but at least for a little while. time to go outside and enjoy it, which i just did. not yet enough to play in it (that probably has to wait until the end of the year anyway), but great enough for some hopefully nice photos. enjoy!

here is a table of contents for the *implementing multivariate polynomials in c++ using templates* mini-series:

- part one: a usage example.
- part two: the data structure.
- part three: partial derivatives.
- part four: efficient evaluation.
- part five: long division and euclidean algorithm.

the soure code can be downloaded here (8 kb). it is licensed under the simplified bsd license. if you need the code licensed under another open source license, feel free to contact me.

welcome to the fifth (also last and easiest) part of the multivariate polynomials in c++ using templates series. this part will explain how i implemented long division and the euclidean algorithm.

### long division.

long division in is defined as follows: given polynomials , find polynomials such that and . provided that the leading term of is a unit in (or at least divides most things), and exist and are unique (in fact, i'm implicitly assuming that is commutative and unitary). this division makes an euclidean ring in case is a field (i.e. all non-zero elements are invertible). our `T`

is in general not a field, but we can still try to do long division, and let the user worry if this actually makes sense. for example, if `T == int`

, one can divide two elements (assuming the second is not zero), but this is not the division a mathematician would expect. hence, if is not monic in case of `T == int`

, our implmentation will output <i>something</i> which does not necessarily make sense.

anyway. my implementation is rather straightforward, which is not very surprising:

1 template<class T> 2 void eucliddiv(const poly<1, T> & f, const poly<1, T> & g, poly<1, T> & q, poly<1, T> & r) 3 { 4 if (f.degree() < g.degree()) 5 { 6 // Ignore the trivial case 7 q = poly<1, T>(); 8 r = f; 9 return; 10 } 11 q = poly<1, T>(true, f.degree() - g.degree() + 1); 12 r = f; 13 for (int i = q.degree(); i >= 0; --i) 14 { 15 q[i] = r[i + g.degree()] / g.leading(); 16 for (int j = g.degree(); j >= 0; --j) 17 r[i + j] -= q[i] * g[j]; 18 } 19 r.normalize(); 20 // Note that the degree of q is already correct. 21 }

here, we assume that `f.leading() / g.leading()`

is non-zero; otherwise, `q.normalize()`

would have to be called, but in that case the whole result will probably be broken anyway. we create the polynomial `q`

using the special constructor mentioned here.

### euclidean algorithm.

the implementation of the euclidean algorithm is not depending on the implementation anymore (long division made only use of the special constructor). first, we take care of special cases (one of the two arguments is zero), and then proceed with the main loop. we normalize the remainder in every iteration of the loop, and also the result, to (hopefully) increase numerical stability a bit. hence, for this to work, we really need that `T`

is a field, or otherwise it will only work in very special cases.

1 template<class T> 2 poly<1, T> GCD(const poly<1, T> & f, const poly<1, T> & g) 3 { 4 if (f.isZero()) 5 { 6 if (g.isZero()) 7 return f; // both are zero 8 else 9 return g / g.leading(); // make g monic 10 } 11 poly<1, T> d1(f / f.leading()), d2(g / g.leading()), q, r; 12 while (!d2.isZero()) 13 { 14 eucliddiv(d1, d2, q, r); 15 d1.swap(d2); 16 d2 = r; 17 d2 /= r.leading(); // make r monic 18 } 19 return d1; 20 }

i don't think there's much to write about this, which isn't already written somewhere else. this just shows how to implement a generic polynomial based algorithm.

welcome to the fourth part of the multivariate polynomials in c++ using templates series. in this part, i want to explain how to implement efficient polynomial evaluation using a collection of interacting templates. the output will be quite optimized code (assuming the compiler has a decent enough optimizer).

evaluating multivariate polynomials is more complicated. i started with a naive implementation of `operator()`

in `poly<n, T>`

:

1 template<class S> 2 poly<n - 1, S> operator() (const S & x) const 3 { 4 poly<n - 1, S> res; 5 S xx = (S)1; 6 for (unsigned i = 0; i < d_value.size(); ++i) 7 { 8 res += d_value[i] * xx; 9 xx *= x; 10 } 11 return res; 12 }

if `f`

is of type `poly<2, int>`

, then `f(4)`

will be of type `poly<1, int>`

, whence `f(4)(2)`

will call `poly<1, int>::operator()<int>(const int &)`

to yield a polynomial of type `poly<0, int>`

, which automatically casts to an `int`

. unfortunately, in this process, first `f(4)`

will be created, which requires arithmetic of polynomials of type `poly<1, int>`

, and then the resulting polynomial will be evaluated again. in particular, if `n`

is larger, this is far from being optimal. hence, this solution is fine if evaluations are seldomly done, but if they are done more often, it is too slow.

to stay with our example of `f`

of type `poly<2, int>`

. one could evaluate , written as `f(4)(2)`

, directly as follows:

1 int result = 0, xx1 = 1; 2 for (int i = 0; i <= f.degree(); ++i) 3 { 4 int result2 = 0, xx2 = 1; 5 for (int j = 0; j <= f[i].degree(); ++j) 6 { 7 result2 += f[i][j] * xx2; 8 xx2 *= 2; 9 } 10 result += result2 * xx1; 11 xx1 *= 4; 12 }

clearly, this is tiresome (and prone to typos) to write every time. moreover, this is also not optimal, as our `operator[]`

will do more than just returning an element. in particular, if `f`

is not `const`

in this context, the compiler will insert code at every operation `f[i]`

and `f[i][j]`

to check whether the index is out of range (to resize `d_value`

in that case). writing more directly

1 int result = 0, xx1 = 1; 2 for (int i = 0; i < f.d_value.size(); ++i) 3 { 4 int result2 = 0, xx2 = 1; 5 for (int j = 0; j < f.d_value[i].d_value.size(); ++j) 6 { 7 result2 += f.d_value[i].d_value[j] * xx2; 8 xx2 *= 2; 9 } 10 result += result2 * xx1; 11 xx1 *= 4; 12 }

would result in faster code, if it would compile – `d_value`

is `private`

. for this reason, i came up with two templates:

1 template<int n, class T, class S = T> 2 class poly_evaluator; 3 template<int n, class T, class HL, class S> 4 class poly_evaluator_impl;

the template `poly_evaluator<n, T, S>`

is instanciated in `poly<n, T>::operator()<S>(const S &)`

, which is defined as follows:

1 template<class S> 2 inline poly_evaluator<n, T, S> operator() (const S & x) const 3 { 4 return poly_evaluator<n, T, S>(*this, x); 5 }

the idea is that if `operator()`

of `poly_evaluator<n, T, S>(*this, x)`

is called, it will spawn an object of type `poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>`

. if `operator()`

of this new object is called, it will create an object of type `poly_evaluator_impl<n-2, T, poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>, S>`

, and so on. the purpose of carrying the type of the caller around is that the outer loop of the evaluation function is the loop for the inner-most object (of type `poly_evaluator<n, T, S>`

). if `operator()`

would be right-associative instead of left-associative (as it is), this trouble would not be necessary.

the idea is that both templates `poly_evaluator<n, T, S>`

and `poly_evaluator<n, T, HL, S>`

provide an internal function `template<class SS, class Fun> evaluate(SS & res, Fun evalfun)`

, which in case of `poly_evaluator<n, T, HL, S>`

calls the corresponding `evaluate`

function of its owner (with `evalfun`

replaced), and in the case of `poly_evaluator<n, T, S>`

implements the outermost loop, which loops over the coefficients of and uses `evalfun`

to evaluate the coefficients. hence, the `evalful`

handed upwards must be extended in every step to include the next variable.

assume that we have a polynomial `f`

of type `poly<3, int>`

, and we want to evaluate `f(10)(20)(30)`

. this is an object of type `poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>`

, whose cast operator `operator int()`

will start evaluation by defining a local class `poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun`

, defined as follows:

1 class eval_fun 2 { 3 const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & d_owner; 4 5 public: 6 inline eval_fun(const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & owner) 7 : d_owner(owner) 8 { 9 } 10 11 inline int operator() (const poly<1, int> & p) const 12 { 13 int res = 0; 14 int xx = 1; 15 for (int i = 0; i < (int)p.d_value.size(); ++i) 16 { 17 res += p.d_value[i] * xx; 18 xx = xx * d_owner.d_evalpoint; 19 } 20 return res; 21 } 22 };

an object of it is created by `operator int()`

as follows:

1 inline operator int() const 2 { 3 int res = 0; 4 d_owner.evaluate(res, eval_fun(*this)); 5 return res; 6 }

the object `eval_fun(*this)`

can now be used to evaluate polynomials of type `poly<1, int>`

at the specified evaluation point `d_owner.d_evalpoint`

, which equals 30 in this case.

the `operator int()`

calls `poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>::evaluate<int, Fun>(int &, Fun)`

, which in turn is implemented as follows:

1 template<class Fun> 2 inline void evaluate(int & res, const Fun & evalfun) const 3 { 4 d_owner.evaluate(res, eval_fun<int, Fun>(*this, evalfun)); 5 }

the newly created object of type `eval_fun<int, Fun>`

will now evaluate polynomials of type `poly<2, int>`

by using the provided functor `evalfun`

, which is an object of the class `poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun`

defined above. the class `eval_fun`

in `poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>`

is defined as follows:

1 template<class Fun> 2 class eval_fun 3 { 4 const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & d_owner; 5 const Fun & d_evalfun; 6 7 public: 8 inline eval_fun(const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & owner, const Fun & evalfun) 9 : d_owner(owner), d_evalfun(evalfun) 10 { 11 } 12 13 inline int operator() (const poly<2, int> & p) const 14 { 15 int res = 0; 16 int xx = 1; 17 for (int i = 0; i < (int)p.d_value.size(); ++i) 18 { 19 res += d_evalfun(p.d_value[i]) * xx; 20 xx = xx * d_owner.d_evalpoint; 21 } 22 return res; 23 } 24 };

note that `d_owner.d_evalpoint`

is 20 in this case. finally, we are at the top level, namely in the function `poly_evaluator<3, int, int>::evaluate<int, Fun>()`

, defined as follows:

1 template<class Fun> 2 inline void evaluate(int & res, const Fun & evalfun) const 3 { 4 int xx = 1; 5 for (int i = 0; i < (int)d_poly.d_value.size(); ++i) 6 { 7 res += evalfun(d_poly.d_value[i]) * xx; 8 xx = xx * d_evalpoint; 9 } 10 }

(here, `d_evalpoint`

is 10.) this is the outer loop. the middle loop is inserted when calling `evalfun(d_poly.d_value[i])`

, and into that the innermost loop is inserted when it calles `d_evalfun(p.d_value[i])`

. this all essentially boils down to the following combination:

1 { 2 int res = 0; 3 int xx = 1; 4 for (int i = 0; i < (int)d_poly.d_value.size(); ++i) 5 { 6 int tmp; 7 const poly<2, int> & p = d_poly.d_value[i]; 8 { 9 int res2 = 0; 10 int xx2 = 1; 11 for (int i2 = 0; i2 < (int)p.d_value.size(); ++i2) 12 { 13 int tmp2; 14 const poly<2, int> & p2 = p.d_value[i2]; 15 { 16 int res3 = 0; 17 int xx3 = 1; 18 for (int i3 = 0; i3 < (int)p2.d_value.size(); ++i3) 19 { 20 res3 += p2.d_value[i3] * xx3; 21 xx3 = xx3 * d_owner.d_evalpoint; 22 } 23 tmp2 = res3; 24 } 25 res2 += tmp2 * xx2; 26 xx2 = xx2 * d_owner.d_evalpoint; 27 } 28 tmp = res2; 29 } 30 res += tmp * xx; 31 xx = xx * d_evalpoint; 32 } 33 return res; 34 }

of course, this is just an intermediate step (to highlight the similiarities to all the code snippets from above); the compiler will optimize this to something more compact.

this “example” already presented most of the concept. the implementation itself is more complicated, since different types can be used for evaluation (at different stages, even), and since the implementation has support for allocators.

an important point is that both templates `poly_evaluator<n, T, S>`

and `poly_evaluator<n, T, HL, S>`

are specialized for `n == 1`

, as in that case, there is no lower level, and hence these specializations provide no `evaluate`

template and try to pass evaluation further down, but right away pass it higher up to their owner, or in case of `poly_evaluator<1, T, S>`

, it will just do the evaluation.

this concludes the fourth part. the last part will concentrate something less complicated: implementing long division and the euclidean algorithm.

welcome to the third part of the multivariate polynomials in c++ using templates series. this part will explain how to compute partial derivatives using partial template specialization tricks.

the idea is very simple. given , written as with , we have

and

this allows us to implement partial derivatives iteratively. we assume that the number of the indeterminate by which one should differentiate is already known during compile time. the implementation will be done by a templated function

1 template<int N, int n, class T> 2 inline poly<n, T> derivative(const poly<n, T> & p);

the purpose of the template parameter `N`

is to say that should be computed. given a polynomial `p`

, the function can be called simply by `derivative<N>(p)`

. to implement `derivative<N>(p)`

, we would like to again use partial template specialization to distinguish between the cases `N == 0`

and `N > 0`

. unfortunately, partial template specialization is not available for template functions. hence, we need to introduce a helper template class:

1 template<int N, int n, class T> 2 class derivative_computer;

it should contain one static function, `static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest);`

which allows to compute the partial derivative of the polynomial in `src`

and store it into `dest`

. then `derviative<N>(p)`

can simply be implemented by

1 template<int N, int n, class T> 2 inline poly<n, T, std::allocator<T> > derivative(const poly<n, T, std::allocator<T> > & p) 3 { 4 poly<n, T> res; 5 derivative_computer<N, n, T>::computeDerivative(p, res); 6 return res; 7 }

now let us focus on the class template `derivative_computer<N, n, T>`

. the general implementation, for `N > 0`

, uses the second formula from above and iteratively calls `derivative_computer<N-1, n-1, T>::computeDerivative()`

:

1 template<int N, int n, class T> 2 class derivative_computer 3 { 4 public: 5 static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest) 6 { 7 dest.d_value.resize(src.degree() + 1); 8 for (int i = 0; i <= src.degree(); ++i) 9 derivative_computer<N - 1, n - 1, T>::computeDerivative(src[i], dest[i]); 10 dest.normalize(); 11 } 12 };

now we are left with the case that `N = 0`

. the implementation here is also simple:

1 template<int n, class T> 2 class derivative_computer<0, n, T> 3 { 4 public: 5 static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest) 6 { 7 dest.d_value.resize(src.degree() >= 0 ? src.degree() : 0); 8 for (int i = 1; i <= src.degree(); ++i) 9 dest[i - 1] = src[i] * (T)i; 10 dest.normalize(); 11 } 12 };

here, we cast `i`

to an object of type `T`

before multiplication. from this point on, no further iteration is needed.

we are left with the case that some (malicious or just confused) user calls `derivative<N>(p)`

for a polynomial `p`

of type `poly<n, T>`

with `n <= N`

. in that case, the compiler tries to instanciate `derivative_computer<k, 0, T>::computeDerivative()`

and tries to treat `poly<0, T>`

as a polynomial. this will result in strange error messages, and it would be nice if there is something more readable. for that reason, i added another two specializations:

1 template<class T> 2 class derivative_computer<0, 0, T> 3 { 4 public: 5 class ERROR_N_must_be_strictly_less_than_n_in_derivative_template; 6 7 static inline void computeDerivative(const poly<0, T> &, poly<0, T> &) 8 { 9 ERROR_N_must_be_strictly_less_than_n_in_derivative_template(); 10 } 11 }; 12 13 template<int N, class T> 14 class derivative_computer<N, 0, T> 15 { 16 public: 17 class ERROR_N_must_be_strictly_less_than_n_in_derivative_template; 18 19 static inline void computeDerivative(const poly<0, T> &, poly<0, T> &) 20 { 21 ERROR_N_must_be_strictly_less_than_n_in_derivative_template(); 22 } 23 };

these two definitions compile fine, but as soon as one of them is instanciated by the compiler, it will try to find out something about the class `ERROR_N_must_be_strictly_less_than_n_in_derivative_template`

and not succeed, and spit out some error message hopefully containing the string `ERROR_N_must_be_strictly_less_than_n_in_derivative_template`

. for example, if i replace the line computing the partial derivative of the intersection equation in `poly-test.cpp`

by `std::cout << derivative<2>(l) << "\n";`

, `g++`

will complain as follows:

1 In file included from poly-test.cpp:2:0: 2 polynomial.hpp: In static member function ‘static void derivative_computer<N, 0, T>::computeDerivative(const poly<0, T>&, poly<0, T>&) [with int N = 1, T = double]’: 3 polynomial.hpp:1200:13: instantiated from ‘static void derivative_computer<N, n, T>::computeDerivative(const poly<n, T>&, poly<n, T>&) [with int N = 2, int n = 1, T = double]’ 4 polynomial.hpp:1247:5: instantiated from ‘poly<n, T> derivative(const poly<n, T>&) [with int N = 2, int n = 1, T = double]’ 5 poly-test.cpp:19:33: instantiated from here 6 polynomial.hpp:1227:9: error: invalid use of incomplete type ‘struct derivative_computer<1, 0, double>::ERROR_N_must_be_strictly_less_than_n_in_derivative_template’ 7 polynomial.hpp:1223:11: error: declaration of ‘struct derivative_computer<1, 0, double>::ERROR_N_must_be_strictly_less_than_n_in_derivative_template’ 8 make: *** [poly-test.o] Error 1

this is not exactly readable, but the uppercase `ERROR`

quickly lets you look at the important part of that message. (note that i removed the allocators from the error messages, so that it is in sync with the above writing. if you try it out yourself with the source code, you'll see a slightly more complicated error message.)

this completes part three of the series. in the next part, i will discuss how to evaluate polynomials efficiently using templates.