#### introduction.

when developing expression templates, say a vector class allowing efficient arithmetic using expression templates, one often wonders if the long, complicated pages of template code one wrote actually does what it is supposed to do. to be sure, one has to ask the compiler to generate assembler code and wade through that code to see what is happening. this is tiresome, though essential.

most of the time, it is enough to check if the operations done with the supplied type – in the above example of a vector class, the scalar type – are what we want them to be. for example, if `v, w, x, y, z`

are variables of type `myvec<int>`

, say all of the same dimension, one wants to know whether `v = 2 * w + (x - (y - z))`

is evaluated to something like

1 for (unsigned i = 0; i < v.size(); ++i) 2 v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

or (more likely)

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] = 2 * w[i]; 4 v[i] += x[i]; 5 v[i] -= y[i]; 6 v[i] += z[i]; 7 }

(note that this code is not equivalent to the one above for certain more complex types than

`int`

, but for my purposes, such a transformation is fine. but anyway, that is not what we want to discuss here.) an alternative would be that temporaries are created, like in1 myvec<int> t1, t2, t3, t4; 2 t1 = 2 * w; 3 t2 = y - z; 4 t3 = x - t2; 5 t4 = t1 - t3; 6 v = t4;

this involves allocating and releasing memory and copying around data, which is significantly slower than the direct version

1 for (unsigned i = 0; i < v.size(); ++i) 2 v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

the aim of expression templates is to generate such more optimal code. but what if your expression templates don’t do what you want? maybe they still generate temporaries, like in the following version:

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 int t2 = 2 * w[i]; 4 int t2 = y[i] - z[i]; 5 int t3 = x[i] - t2; 6 v[i] = t1 - t3; 7 }

the compiler will still optimize this to the same as above, but in case the type is not

`int`

but, say, `myArbitraryPrecisionInteger`

(which uses expression templates itself and can deal very efficiently with expressions like `v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));`

), this code is suboptimal, and no expression templates provided by that type can make it better.therefore, one would like to have a dummy type, a type one can just plug in instead of

`int`

in the above example, which somehow outputs what exactly is done: which temporaries are created and destroyed, and which operations and expressions are created.
#### a test type.

the test type has to use expression templates itself to gather the expressions which are generated by the other expression templates, like the ones of `myvec<T>`

. then, it should just print these expressions when assignments etc. occur, say to `std::cout`

. then one can run a test program and just read the output to see what was going on. this is in general much easier than looking at the generated assembler code.

note that we only care about additive operations in the following, to make the code more readable.

the basic version of the type looks as follows:

1 class TestType 2 { 3 public: 4 TestType() 5 { 6 std::cout << "create " << this << "\n"; 7 } 8 9 TestType(const TestType & src) 10 { 11 std::cout << "create " << this << " from " << &src << "\n"; 12 } 13 14 ~TestType() 15 { 16 std::cout << "destroy " << this << "\n"; 17 } 18 19 TestType & operator = (const TestType & src) 20 { 21 std::cout << "copy " << this << " from " << &src << "\n"; 22 return *this; 23 } 24 25 TestType & operator += (const TestType & b) 26 { 27 std::cout << this << " += " << &b << "\n"; 28 return *this; 29 } 30 31 TestType & operator -= (const TestType & b) 32 { 33 std::cout << this << " -= " << &b << "\n"; 34 return *this; 35 } 36 };

this already allows us to see when

`TestType`

objects and temporaries are created and assigned, and when basic arithmetic is done. but so far, no *real*arithmetic can be done. to allow arithmetic, we introduce a

`TestExpression<O, D>`

template. here, the `O`

class describes the operand, and the `D`

class describes the argument(s). it is defined as follows:1 template<class Op, class Data> 2 class TestExpression 3 { 4 private: 5 Op d_op; 6 Data d_data; 7 8 public: 9 inline TestExpression(const Op & op, const Data & data) 10 : d_op(op), d_data(data) 11 { 12 } 13 14 operator TestType () const 15 { 16 std::cout << "casting TestExpression["; 17 d_op.print(d_data); 18 std::cout << "] to TestType()\n"; 19 return TestType(); 20 } 21 22 void print() const 23 { 24 d_op.print(d_data); 25 } 26 };

to add support for expressions to the

`TestType`

class, one adds the following methods to it:1 template<class O, class D> 2 TestType(const TestExpression<O, D> & src) 3 { 4 std::cout << "create " << this << " from "; 5 src.print(); 6 std::cout << "\n"; 7 } 8 9 template<class O, class D> 10 TestType & operator = (const TestExpression<O, D> & e) 11 { 12 std::cout << this << " = "; 13 e.print(); 14 std::cout << "\n"; 15 return *this; 16 } 17 18 template<class O, class D> 19 TestType & operator += (const TestExpression<O, D> & e) 20 { 21 std::cout << this << " += "; 22 e.print(); 23 std::cout << "\n"; 24 return *this; 25 } 26 27 template<class O, class D> 28 TestType & operator -= (const TestExpression<O, D> & e) 29 { 30 std::cout << this << " -= "; 31 e.print(); 32 std::cout << "\n"; 33 return *this; 34 }

then if we assign a

`TestExpression<O, D>`

to a `TestType`

object, or add it to it, or subtract it from it, etc., the corresponding messages are printed. now, let us discuss how the operands are implemented. these are simple classes with template members, which do not store data:1 class AddOp 2 { 3 public: 4 template<class A, class B> 5 void print(const std::pair<A, B> & data) const 6 { 7 std::cout << "("; 8 data.first.print(); 9 std::cout << " + "; 10 data.second.print(); 11 std::cout << ")"; 12 } 13 }; 14 15 class SubOp 16 { 17 public: 18 template<class A, class B> 19 void print(const std::pair<A, B> & data) const 20 { 21 std::cout << "("; 22 data.first.print(); 23 std::cout << " - "; 24 data.second.print(); 25 std::cout << ")"; 26 } 27 }; 28 29 class NegOp 30 { 31 public: 32 template<class A> 33 void print(const A & data) const 34 { 35 std::cout << "-"; 36 data.print(); 37 } 38 };

note that all operations but the unary

`NegOp`

are binary; the data object is in that case a `std::pair<T1, T2>`

object. now one main piece is missing which puts everything together: the overloaded operators which generate the expression templates. let us begin with the universal ones: the ones taking two expressions and combining them by an operator.1 template<class O1, class D1, class O2, class D2> 2 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator + (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b) 3 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(AddOp(), std::make_pair(a, b)); } 4 template<class O1, class D1, class O2, class D2> 5 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator - (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b) 6 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(SubOp(), std::make_pair(a, b)); }

this code is not exactly readable, but does its job: it takes two expressions,

`TestExpression<O1, D1>`

and `TestExpression<O2, D2>`

, and combines them to a new expression `TestExpression<NewOperand, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >`

. the operator for inversion looks similar, but simpler:1 template<class O1, class D1> 2 TestExpression<NegOp, TestExpression<O1, D1> > operator - (const TestExpression<O1, D1> & a) 3 { return TestExpression<NegOp, TestExpression<O1, D1> >(NegOp(), a); }

but this whole thing only works when we already have expressions. so far, we have no code which actually

*creates*an expression in the first place. this can be done by more operator overloading, and by introducing a

`TestWrapper[/url] which encapsulates an object of type [code language="c++"]TestType`

and behaves like an expression on its own. let us first show the operator definition in the unary case:1 TestExpression<NegOp, TestWrapper> operator - (const TestType & a) 2 { return TestExpression<NegOp, TestWrapper>(NegOp(), TestWrapper(a)); }

the template

`TestWrapper`

encapsulates a `TestObject`

. the definition looks as follows:1 class TestWrapper 2 { 3 private: 4 const TestType & d_val; 5 6 public: 7 inline TestWrapper(const TestType & val) 8 : d_val(val) 9 { 10 } 11 12 void print() const 13 { 14 std::cout << &d_val; 15 } 16 };

compare this to the definition of

`TestExpression`

above; note that no casting operator is needed as a `TestWrapper`

object should never show up directly to the user.now, we are left to implement the binary operators for

`+`

and `-`

. we have to go through all combinations of `TestType`

and `TestExpression<O, D>`

combinations (except two `TestExpression<O, D>`

‘s, which we already covered). this looks as follows:1 TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> > operator + (const TestType & a, const TestType & b) 2 { return TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> >(AddOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); } 3 TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> > operator - (const TestType & a, const TestType & b) 4 { return TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> >(SubOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); } 5 6 template<class O2, class D2> 7 TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator + (const TestType & a, const TestExpression<O2, D2> & b) 8 { return TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(AddOp(), std::make_pair(TestWrapper(a), b)); } 9 template<class O2, class D2> 10 TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator - (const TestType & a, const TestExpression<O2, D2> & b) 11 { return TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(SubOp(), std::make_pair(TestWrapper(a), b)); } 12 13 template<class O1, class D1> 14 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator + (const TestExpression<O1, D1> & a, const TestType & b) 15 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(AddOp(), std::make_pair(a, TestWrapper(b))); } 16 template<class O1, class D1> 17 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator - (const TestExpression<O1, D1> & a, const TestType & b) 18 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(SubOp(), std::make_pair(a, TestWrapper(b))); }

this is as annoying to write as it looks like, but it is necessary. but only once.

#### testing the result.

now assume that `v`

and `w`

are two objects of type `myvec<TestType>`

, each having six elements. the object `s`

is of type `TestType`

itself. assume that i write the following: `v += v + w`

. then the compiled version will output:

1 0x1b780d0 += 0x1b780d0 2 0x1b780d0 += 0x1b780f0 3 0x1b780d1 += 0x1b780d1 4 0x1b780d1 += 0x1b780f1 5 0x1b780d2 += 0x1b780d2 6 0x1b780d2 += 0x1b780f2 7 0x1b780d3 += 0x1b780d3 8 0x1b780d3 += 0x1b780f3 9 0x1b780d4 += 0x1b780d4 10 0x1b780d4 += 0x1b780f4 11 0x1b780d5 += 0x1b780d5 12 0x1b780d5 += 0x1b780f5

this shows that the command

`v += v + w`

is replaced by something like:1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] += v[i]; 4 v[i] += w[i]; 5 }

now let us look at something more complicated. if i write

`v = w + v`

, this cannot be translated to1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] += w[i]; 4 v[i] += v[i]; 5 }

anymore, as

`v[i]`

is changing its value inbetween. i added code to my `myvec<T>`

implementation to detect and try to avoid such problems. in this case, it should translate the code something like1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 TestType t = w[i] + v[i]; 4 v[i] += t; 5 }

the output is:

1 create 0x7fffc4d6b5df 2 copy 0x7fffc4d6b5df from 0x17860f0 3 0x7fffc4d6b5df += 0x17860d0 4 0x17860d0 += 0x7fffc4d6b5df 5 copy 0x7fffc4d6b5df from 0x17860f1 6 0x7fffc4d6b5df += 0x17860d1 7 0x17860d1 += 0x7fffc4d6b5df 8 copy 0x7fffc4d6b5df from 0x17860f2 9 0x7fffc4d6b5df += 0x17860d2 10 0x17860d2 += 0x7fffc4d6b5df 11 copy 0x7fffc4d6b5df from 0x17860f3 12 0x7fffc4d6b5df += 0x17860d3 13 0x17860d3 += 0x7fffc4d6b5df 14 copy 0x7fffc4d6b5df from 0x17860f4 15 0x7fffc4d6b5df += 0x17860d4 16 0x17860d4 += 0x7fffc4d6b5df 17 copy 0x7fffc4d6b5df from 0x17860f5 18 0x7fffc4d6b5df += 0x17860d5 19 0x17860d5 += 0x7fffc4d6b5df 20 destroy 0x7fffc4d6b5df

this shows that in fact, the generated code is more like this:

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 TestType t = w[i]; 4 t += v[i]; 5 v[i] += t; 6 }

so without looking at the generated assembler code, we already have a good idea what the template expressions of

`myvec<T>`

are doing. now assume we write something like `v = s * v + w;`

. the output is1 0x1786116 += (0x1786116 * 0x7fffc4d6b7af) 2 0x1786116 += 0x1786110 3 0x1786117 += (0x1786117 * 0x7fffc4d6b7af) 4 0x1786117 += 0x1786111 5 0x1786118 += (0x1786118 * 0x7fffc4d6b7af) 6 0x1786118 += 0x1786112 7 0x1786119 += (0x1786119 * 0x7fffc4d6b7af) 8 0x1786119 += 0x1786113 9 0x178611a += (0x178611a * 0x7fffc4d6b7af) 10 0x178611a += 0x1786114 11 0x178611b += (0x178611b * 0x7fffc4d6b7af) 12 0x178611b += 0x1786115

which shows that the code was translated to something like

1 for (unsigned i = 0; i < v.size(); ++i) 2 { 3 v[i] += v[i] * s; 4 v[i] += w[i]; 5 }

(this of course assumes that we also implemented

`operator *`

for `TestType`

.)to really check what is going on one still has to check the generated assembler code. for example, in the test above which used a temporary when writing

`v += w + v`

and no temporary when writing `v += v + w`

, it is unclear if already the compiler made the decision which case to use (which he could, since he knows the addresses of `v`

and `w`

, or at least knows whether they are equal or not), or whether both cases are compiled into the program and whether the running program has to figure out which block of code to run. this cannot be detected using `TestType`

.note that for checking the assembler code, one better uses a different test type: one which translates everything to “three-address assembler”-like commands, which are declared

`extern`

(and not defined in this translation unit). then one can search for these function calls, and the whole scenario is more realistic as with complex operators (as above, where `std::cout`

is used all over the place), in which case often operators are outsourced as own functions.
#### the code.

you can download the source code of the `TestType`

class here.

## comments.