skip to main content.

posts for december 2011.

finally.

posted in: photos
tags:
places:

introduction.

as mentioned in part one, testing expression templates isn’t that easy. one often has to go down to assembler level to see what is really going on. to make developing expression templates easier, and to debug my (fictional) myvec<T> expression templates a bit more in detail, i created a second test type: TestType2. again equiped with expression templates, its aim is to break down the expressions into three-address assembler commands, using temporaries to achieve this. for example, a -= (b + c) * d; should evaluate into something like

1 TestType2 t1, t2; // without calling constructors or destructors!
2 TT2_create(t1);
3 TT2_add(t1, b, c);
4 TT2_create(t2);
5 TT2_mul(t2, t1, d);
6 TT2_sub(a, a, t2);
7 TT2_destroy(t2);
8 TT2_destroy(t1);

by defining the functions add(), mul(), etc. in a different translation unit, one can analyse the assembler output of this translation unit to see what exactly came out. by searching for terms like TT2_add one can quickly find the corresponding assembler commands.

the implementation.

we begin by declaring the class TestType2 and by declaring the functions to operate on it. again, we leave out multiplication (and divison and modulo) to shorten code.

 1 class TestType2;
 2 
 3 void TT2_create(TestType2 & r);
 4 void TT2_destroy(TestType2 & r);
 5 void TT2_createcopy(TestType2 & r, const TestType2 & a);
 6 void TT2_copy(TestType2 & r, const TestType2 & a);
 7 void setZero(TestType2 & r);
 8 void setOne(TestType2 & r);
 9 void TT2_add(TestType2 & r, const TestType2 & a, const TestType2 & b);
10 void TT2_sub(TestType2 & r, const TestType2 & a, const TestType2 & b);
11 void TT2_neg(TestType2 & r, const TestType2 & a);

the implementation of TestType2 is straightforward. we add a pointer so that the class actually stores something.
 1 class TestType2
 2 {
 3 private:
 4     void * d_data;
 5     
 6 public:
 7     inline TestType2()
 8     {
 9         TT2_create(*this);
10     }
11 
12     inline TestType2(const TestType2 & src)
13     {
14         TT2_createcopy(*this, src);
15     }
16 
17     inline ~TestType2()
18     {
19         TT2_destroy(*this);
20     }
21 
22     inline TestType2 & operator = (const TestType2 & src)
23     {
24         TT2_copy(*this, src);
25         return *this;
26     }
27     
28     inline TestType2 & operator += (const TestType2 & b)
29     {
30         TT2_add(*this, *this, b);
31         return *this;
32     }
33     
34     inline TestType2 & operator -= (const TestType2 & b)
35     {
36         TT2_sub(*this, *this, b);
37         return *this;
38     }
39 };

again, as in part one, we have templates TestExpression2<O, D> and TestWrapper2:
 1 template<class Op, class Data>
 2 class TestExpression2
 3 {
 4 private:
 5     Op d_op;
 6     Data d_data;
 7     
 8 public:
 9     inline TestExpression2(const Op & op, const Data & data)
10         : d_op(op), d_data(data)
11     {
12     }
13     
14     operator TestType2 () const
15     {
16         TestType2 res;
17         evalTo(res);
18         return res;
19     }
20     
21     inline TestType2 evaluate() const
22     {
23         TestType2 res;
24         evalTo(res);
25         return res;
26     }
27     
28     inline void evalTo(TestType2 & dest) const
29     {
30         d_op.evalTo(dest, d_data);
31     }
32 };
33 
34 class TestWrapper2
35 {
36 private:
37     const TestType2 & d_val;
38     
39 public:
40     inline TestWrapper2(const TestType2 & val)
41         : d_val(val)
42     {
43     }
44     
45     inline const TestType2 & evaluate() const
46     {
47         return d_val;
48     }
49     
50     inline void evalTo(TestType2 & dest)
51     {
52         TT2_copy(dest, d_val);
53     }
54 };

this time, we have both an evaluate() and a evalTo() member function. the first allows to just evaluate the expression, generating temporaries for subexpressions, and the second one is used by callers like operator=() of TestType2 to evaluate the result of an expression into an object of type TestType2. the expression machinery is added to TestType2 by the following functions:
 1     template<class O, class D>
 2     inline TestType2(const TestExpression2<O, D> & src)
 3     {
 4         TT2_createcopy(*this, src.evaluate());
 5     }
 6     
 7     template<class O, class D>
 8     inline TestType2 & operator = (const TestExpression2<O, D> & e)
 9     {
10         e.evalTo(*this);
11         return *this;
12     }
13     
14     template<class O, class D>
15     inline TestType2 & operator += (const TestExpression2<O, D> & e)
16     {
17         TT2_add(*this, *this, e.evaluate());
18         return *this;
19     }
20     
21     template<class O, class D>
22     inline TestType2 & operator -= (const TestExpression2<O, D> & e)
23     {
24         TT2_sub(*this, *this, e.evaluate());
25         return *this;
26     }

the operators are defined as follows. there is not much to do: they just provide a evalTo() template function which calls the corresponding three-address operation:
 1 class AddOp2
 2 {
 3 public:
 4     template<class A, class B>
 5     inline void evalTo(TestType2 & dest, const std::pair<A, B> & data) const
 6     {
 7         TT2_add(dest, data.first.evaluate(), data.second.evaluate());
 8     }
 9 };
10 
11 class SubOp2
12 {
13 public:
14     template<class A, class B>
15     inline void evalTo(TestType2 & dest, const std::pair<A, B> & data) const
16     {
17         TT2_sub(dest, data.first.evaluate(), data.second.evaluate());
18     }
19 };
20 
21 class NegOp2
22 {
23 public:
24     template<class A>
25     inline void evalTo(TestType2 & dest, const A & data) const
26     {
27         TT2_neg(dest, data.evaluate());
28     }
29 };

again, what is left is the tedious integration, by defining a ton of operators:
 1 inline TestExpression2<AddOp2, std::pair<TestWrapper2, TestWrapper2> > operator + (const TestType2 & a, const TestType2 & b)
 2 { return TestExpression2<AddOp2, std::pair<TestWrapper2, TestWrapper2> >(AddOp2(), std::make_pair(TestWrapper2(a), TestWrapper2(b))); }
 3 inline TestExpression2<SubOp2, std::pair<TestWrapper2, TestWrapper2> > operator - (const TestType2 & a, const TestType2 & b)
 4 { return TestExpression2<SubOp2, std::pair<TestWrapper2, TestWrapper2> >(SubOp2(), std::make_pair(TestWrapper2(a), TestWrapper2(b))); }
 5 
 6 template<class O2, class D2>
 7 inline TestExpression2<AddOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > > operator + (const TestType2 & a, const TestExpression2<O2, D2> & b)
 8 { return TestExpression2<AddOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > >(AddOp2(), std::make_pair(TestWrapper2(a), b)); }
 9 template<class O2, class D2>
10 inline TestExpression2<SubOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > > operator - (const TestType2 & a, const TestExpression2<O2, D2> & b)
11 { return TestExpression2<SubOp2, std::pair<TestWrapper2, TestExpression2<O2, D2> > >(SubOp2(), std::make_pair(TestWrapper2(a), b)); }
12 
13 template<class O1, class D1>
14 inline TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> > operator + (const TestExpression2<O1, D1> & a, const TestType2 & b)
15 { return TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> >(AddOp2(), std::make_pair(a, TestWrapper2(b))); }
16 template<class O1, class D1>
17 inline TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> > operator - (const TestExpression2<O1, D1> & a, const TestType2 & b)
18 { return TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestWrapper2> >(SubOp2(), std::make_pair(a, TestWrapper2(b))); }
19 
20 template<class O1, class D1, class O2, class D2>
21 inline TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > > operator + (const TestExpression2<O1, D1> & a, const TestExpression2<O2, D2> & b)
22 { return TestExpression2<AddOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > >(AddOp2(), std::make_pair(a, b)); }
23 template<class O1, class D1, class O2, class D2>
24 inline TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > > operator - (const TestExpression2<O1, D1> & a, const TestExpression2<O2, D2> & b)
25 { return TestExpression2<SubOp2, std::pair<TestExpression2<O1, D1>, TestExpression2<O2, D2> > >(SubOp2(), std::make_pair(a, b)); }
26 
27 inline TestExpression2<NegOp2, TestWrapper2> operator - (const TestType2 & a)
28 { return TestExpression2<NegOp2, TestWrapper2>(NegOp2(), TestWrapper2(a)); }
29 
30 template<class O1, class D1>
31 inline TestExpression2<NegOp2, TestExpression2<O1, D1> > operator - (const TestExpression2<O1, D1> & a)
32 { return TestExpression2<NegOp2, TestExpression2<O1, D1> >(NegOp2(), a); }

note that all the above code contains a lot of inlines, as opposed to part one. in part one, the aim was not to generate most efficient code, but to make visible the expressions evaluated by the expression templates. in this second part, we want to observe the assembler output, and thus want the code to be as optimized as possible.

a real-life example.

in this section, i want to look at three examples. first, let us consider the following example:

1 TestType2 s, t, u, x, y, z;
2 s /= t + (x - y) * z - u;

the resulting assembler code (generated with g++ -S -O3) looks as follows:
 1         leaq    1024(%rsp), %rcx
 2         ....
 3         movq    %rbx, 56(%rsp)
 4         call    _Z10TT2_createR9TestType2
 5         leaq    960(%rsp), %r12
 6         movq    %r12, %rdi
 7         call    _Z10TT2_createR9TestType2
 8         leaq    928(%rsp), %rbp
 9         movq    %rbp, %rdi
10         call    _Z10TT2_createR9TestType2
11         leaq    944(%rsp), %rbx
12         movq    %rbx, %rdi
13         call    _Z10TT2_createR9TestType2
14         leaq    1024(%rsp), %rdx
15         leaq    1040(%rsp), %rsi
16         movq    %rbx, %rdi
17         call    _Z7TT2_subR9TestType2RKS_S2_
18         leaq    1008(%rsp), %rdx
19         movq    %rbx, %rsi
20         movq    %rbp, %rdi
21         call    _Z7TT2_mulR9TestType2RKS_S2_
22         movq    %rbx, %rdi
23         call    _Z11TT2_destroyR9TestType2
24         leaq    1072(%rsp), %rsi
25         movq    %rbp, %rdx
26         movq    %r12, %rdi
27         call    _Z7TT2_addR9TestType2RKS_S2_
28         movq    %rbp, %rdi
29         call    _Z11TT2_destroyR9TestType2
30         leaq    1056(%rsp), %rdx
31         movq    %r12, %rsi
32         movq    %r13, %rdi
33         call    _Z7TT2_subR9TestType2RKS_S2_
34         movq    %r12, %rdi
35         call    _Z11TT2_destroyR9TestType2
36         leaq    1088(%rsp), %rsi
37         movq    %r13, %rdx
38         movq    %rsi, %rdi
39         call    _Z7TT2_divR9TestType2RKS_S2_
40         movq    %r13, %rdi
41         call    _Z11TT2_destroyR9TestType2

(i removed a lot of register/memory arithmetic in the beginning, which prepares all addresses.) removing all cludder, we are left with the following function calls:
 1 TT2_create()
 2 TT2_create()
 3 TT2_create()
 4 TT2_create()
 5 TT2_sub()
 6 TT2_mul()
 7 TT2_destroy()
 8 TT2_add()
 9 TT2_destroy()
10 TT2_sub()
11 TT2_destroy()
12 TT2_div()
13 TT2_destroy()

four temporaries t1, t2, t3, t4 are created. then, the subtraction x - y is computed into the temporary t4, and the result is multiplied by z into the temporary t3. the temporary t4 used to compute x - y is then destroyed, and t is added to t3, with the result being stored in the temporary t2. the temporary t3 is destroyed, and t2 - u is evaluated into t1. finally, s is divided by t1 and t1 is destroyed. this shows that the compiler generated an equivalent to
 1 TestType2 t1, t2, t3, t4; // without calling constructors or destructors!
 2 TT2_create(t1);
 3 TT2_create(t2);
 4 TT2_create(t3);
 5 TT2_create(t4);
 6 TT2_sub(t4, x, y);
 7 TT2_mul(t3, t4, z);
 8 TT2_destroy(t4);
 9 TT2_add(t2, t, t3);
10 TT2_destroy(t3);
11 TT2_sub(t1, t2, u);
12 TT2_destroy(t2);
13 TT2_div(s, s, t1);
14 TT2_destroy(t1);

this is pretty much optimal: if one would have done this translation by hand in a straightforward way, one would have reached the same solution.
now let us re-consider our example from part one: the myvec<T> template. assume we have two myvec<TestType2> vectors v and w, and we write v += v + w; and v += w + v;. the first command should generate something like
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     add(v[i], v[i], v[i]);
4     add(v[i], v[i], w[i]);
5 }

while the second command should generate something like
1 TestType2 t; // without calling constructors or destructors!
2 create(t);
3 for (unsigned i = 0; i < v.size(); ++i)
4 {
5     add(t, w[i], v[i]);
6     add(v[i], v[i], t);
7 }
8 destroy(t);

or, if the expression templates for myvec<T> are not that good, at least something like
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     TestType2 t; // without calling constructors or destructors!
4     create(t);
5     add(t, w[i], v[i]);
6     add(v[i], v[i], t);
7     destroy(t);
8 }

first, consider v += v + w;. the assembler code generated is the following:
 1         movl    320(%rsp), %r9d
 2         testl   %r9d, %r9d
 3         je      .L128
 4         xorl    %ebp, %ebp
 5 .L129:
 6         mov     %ebp, %r12d
 7         salq    $3, %r12
 8         movq    %r12, %rbx
 9         addq    328(%rsp), %rbx
10         movq    %rbx, %rdx
11         movq    %rbx, %rsi
12         movq    %rbx, %rdi
13         call    _Z7TT2_addR9TestType2RKS_S2_
14         movq    %r12, %rdx
15         addq    312(%rsp), %rdx
16         movq    %rbx, %rsi
17         movq    %rbx, %rdi
18         call    _Z7TT2_addR9TestType2RKS_S2_
19         addl    $1, %ebp
20         cmpl    320(%rsp), %ebp
21         jb      .L129
22 .L128:

clearly, first the code tests whether the loop has to be run through at least once; if not, one jumps to label .L128. then, per loop iteration, exactly two calls to TT2_add() are made. this shows that the code is essentially like
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     add(v[i], v[i], v[i]);
4     add(v[i], v[i], w[i]);
5 }

as we were hoping. thus, the expression templates worked well in this case. now let us look at v += w + v;. this time, a temporary has to be created, since otherwise v[i] is modified before being used in the expression. the generated assembler code is the following:
 1         movl    320(%rsp), %eax
 2         cmpl    304(%rsp), %eax
 3         jne     .L283
 4         leaq    416(%rsp), %rbx
 5         movq    %rbx, %rdi
 6         call    _Z10TT2_createR9TestType2
 7         movl    320(%rsp), %r8d
 8         testl   %r8d, %r8d
 9         je      .L146
10         xorl    %r12d, %r12d
11         .p2align 4,,10
12         .p2align 3
13 .L147:
14         mov     %r12d, %ebp
15         movq    %rbx, %rdi
16         salq    $3, %rbp
17         movq    %rbp, %rsi
18         addq    312(%rsp), %rsi
19         call    _Z8TT2_copyR9TestType2RKS_
20         movq    %rbp, %rdx
21         addq    328(%rsp), %rdx
22         movq    %rbx, %rsi
23         movq    %rbx, %rdi
24         call    _Z7TT2_addR9TestType2RKS_S2_
25         movq    %rbp, %rdi
26         addq    328(%rsp), %rdi
27         movq    %rbx, %rdx
28         movq    %rdi, %rsi
29         call    _Z7TT2_addR9TestType2RKS_S2_
30         addl    $1, %r12d
31         cmpl    320(%rsp), %r12d
32         jb      .L147
33 .L146:
34         movq    %rbx, %rdi
35         call    _Z11TT2_destroyR9TestType2

this is as good as we were hoping: before the loop, a temporary is created, and destroyed after the loop. in the loop, we have three calls: TT2_copy(), TT2_add() and a second time TT2_add(). thus, the resulting code is
1 TestType2 t; // without calling constructors or destructors!
2 create(t);
3 for (unsigned i = 0; i < v.size(); ++i)
4 {
5     copy(t, w[i]);
6     add(t, t, v[i]);
7     add(v[i], v[i], t);
8 }
9 destroy(t);

this is not as optimal as it could be: the best solution would be to optimize
1     copy(t, w[i]);
2     add(t, t, v[i]);

to one call: add(t, w[i], v[i]);. but this is still much better than
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     TestType2 t; // without calling constructors or destructors!
4     create(t);
5     add(t, w[i], v[i]);
6     add(v[i], v[i], t);
7     destroy(t);
8 }

where the temporary is created and destroyed every iteration. when trying to add this optimization, it is easier to use TestType from part one to see what is happening. once the output looks fine, one switches back to TestType2 to make sure the generated assembler code is fine.

the code.

you can download the source code of the TestType2 class here, and dummy implementations (to be put into another translation unit) of the TT_* functions here.

introduction.

when developing expression templates, say a vector class allowing efficient arithmetic using expression templates, one often wonders if the long, complicated pages of template code one wrote actually does what it is supposed to do. to be sure, one has to ask the compiler to generate assembler code and wade through that code to see what is happening. this is tiresome, though essential.
most of the time, it is enough to check if the operations done with the supplied type – in the above example of a vector class, the scalar type – are what we want them to be. for example, if v, w, x, y, z are variables of type myvec<int>, say all of the same dimension, one wants to know whether v = 2 * w + (x - (y - z)) is evaluated to something like

1 for (unsigned i = 0; i < v.size(); ++i)
2     v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

or (more likely)
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] = 2 * w[i];
4     v[i] += x[i];
5     v[i] -= y[i];
6     v[i] += z[i];
7 }

(note that this code is not equivalent to the one above for certain more complex types than int, but for my purposes, such a transformation is fine. but anyway, that is not what we want to discuss here.) an alternative would be that temporaries are created, like in
1 myvec<int> t1, t2, t3, t4;
2 t1 = 2 * w;
3 t2 = y - z;
4 t3 = x - t2;
5 t4 = t1 - t3;
6 v = t4;

this involves allocating and releasing memory and copying around data, which is significantly slower than the direct version
1 for (unsigned i = 0; i < v.size(); ++i)
2     v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

the aim of expression templates is to generate such more optimal code. but what if your expression templates don’t do what you want? maybe they still generate temporaries, like in the following version:
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     int t2 = 2 * w[i];
4     int t2 = y[i] - z[i];
5     int t3 = x[i] - t2;
6     v[i] = t1 - t3;
7 }

the compiler will still optimize this to the same as above, but in case the type is not int but, say, myArbitraryPrecisionInteger (which uses expression templates itself and can deal very efficiently with expressions like v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));), this code is suboptimal, and no expression templates provided by that type can make it better.
therefore, one would like to have a dummy type, a type one can just plug in instead of int in the above example, which somehow outputs what exactly is done: which temporaries are created and destroyed, and which operations and expressions are created.

a test type.

the test type has to use expression templates itself to gather the expressions which are generated by the other expression templates, like the ones of myvec<T>. then, it should just print these expressions when assignments etc. occur, say to std::cout. then one can run a test program and just read the output to see what was going on. this is in general much easier than looking at the generated assembler code.
note that we only care about additive operations in the following, to make the code more readable.
the basic version of the type looks as follows:

 1 class TestType
 2 {
 3 public:
 4     TestType()
 5     {
 6         std::cout << "create " << this << "\n";
 7     }
 8 
 9     TestType(const TestType & src)
10     {
11         std::cout << "create " << this << " from " << &src << "\n";
12     }
13 
14     ~TestType()
15     {
16         std::cout << "destroy " << this << "\n";
17     }
18 
19     TestType & operator = (const TestType & src)
20     {
21         std::cout << "copy " << this << " from " << &src << "\n";
22         return *this;
23     }
24     
25     TestType & operator += (const TestType & b)
26     {
27         std::cout << this << " += " << &b << "\n";
28         return *this;
29     }
30     
31     TestType & operator -= (const TestType & b)
32     {
33         std::cout << this << " -= " << &b << "\n";
34         return *this;
35     }
36 };

this already allows us to see when TestType objects and temporaries are created and assigned, and when basic arithmetic is done. but so far, no real arithmetic can be done. to allow arithmetic, we introduce a TestExpression<O, D> template. here, the O class describes the operand, and the D class describes the argument(s). it is defined as follows:
 1 template<class Op, class Data>
 2 class TestExpression
 3 {
 4 private:
 5     Op d_op;
 6     Data d_data;
 7     
 8 public:
 9     inline TestExpression(const Op & op, const Data & data)
10         : d_op(op), d_data(data)
11     {
12     }
13     
14     operator TestType () const
15     {
16         std::cout << "casting TestExpression[";
17         d_op.print(d_data);
18         std::cout << "] to TestType()\n";
19         return TestType();
20     }
21     
22     void print() const
23     {
24         d_op.print(d_data);
25     }
26 };

to add support for expressions to the TestType class, one adds the following methods to it:
 1     template<class O, class D>
 2     TestType(const TestExpression<O, D> & src)
 3     {
 4         std::cout << "create " << this << " from ";
 5         src.print();
 6         std::cout << "\n";
 7     }
 8 
 9     template<class O, class D>
10     TestType & operator = (const TestExpression<O, D> & e)
11     {
12         std::cout << this << " = ";
13         e.print();
14         std::cout << "\n";
15         return *this;
16     }
17 
18     template<class O, class D>
19     TestType & operator += (const TestExpression<O, D> & e)
20     {
21         std::cout << this << " += ";
22         e.print();
23         std::cout << "\n";
24         return *this;
25     }
26     
27     template<class O, class D>
28     TestType & operator -= (const TestExpression<O, D> & e)
29     {
30         std::cout << this << " -= ";
31         e.print();
32         std::cout << "\n";
33         return *this;
34     }

then if we assign a TestExpression<O, D> to a TestType object, or add it to it, or subtract it from it, etc., the corresponding messages are printed. now, let us discuss how the operands are implemented. these are simple classes with template members, which do not store data:
 1 class AddOp
 2 {
 3 public:
 4     template<class A, class B>
 5     void print(const std::pair<A, B> & data) const
 6     {
 7         std::cout << "(";
 8         data.first.print();
 9         std::cout << " + ";
10         data.second.print();
11         std::cout << ")";
12     }
13 };
14 
15 class SubOp
16 {
17 public:
18     template<class A, class B>
19     void print(const std::pair<A, B> & data) const
20     {
21         std::cout << "(";
22         data.first.print();
23         std::cout << " - ";
24         data.second.print();
25         std::cout << ")";
26     }
27 };
28 
29 class NegOp
30 {
31 public:
32     template<class A>
33     void print(const A & data) const
34     {
35         std::cout << "-";
36         data.print();
37     }
38 };

note that all operations but the unary NegOp are binary; the data object is in that case a std::pair<T1, T2> object. now one main piece is missing which puts everything together: the overloaded operators which generate the expression templates. let us begin with the universal ones: the ones taking two expressions and combining them by an operator.
1 template<class O1, class D1, class O2, class D2>
2 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator + (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b)
3 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(AddOp(), std::make_pair(a, b)); }
4 template<class O1, class D1, class O2, class D2>
5 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator - (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b)
6 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(SubOp(), std::make_pair(a, b)); }

this code is not exactly readable, but does its job: it takes two expressions, TestExpression<O1, D1> and TestExpression<O2, D2>, and combines them to a new expression TestExpression<NewOperand, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >. the operator for inversion looks similar, but simpler:
1 template<class O1, class D1>
2 TestExpression<NegOp, TestExpression<O1, D1> > operator - (const TestExpression<O1, D1> & a)
3 { return TestExpression<NegOp, TestExpression<O1, D1> >(NegOp(), a); }

but this whole thing only works when we already have expressions. so far, we have no code which actually creates an expression in the first place. this can be done by more operator overloading, and by introducing a TestWrapper[/url] which encapsulates an object of type [code language="c++"]TestType and behaves like an expression on its own. let us first show the operator definition in the unary case:
1 TestExpression<NegOp, TestWrapper> operator - (const TestType & a)
2 { return TestExpression<NegOp, TestWrapper>(NegOp(), TestWrapper(a)); }

the template TestWrapper encapsulates a TestObject. the definition looks as follows:
 1 class TestWrapper
 2 {
 3 private:
 4     const TestType & d_val;
 5     
 6 public:
 7     inline TestWrapper(const TestType & val)
 8         : d_val(val)
 9     {
10     }
11     
12     void print() const
13     {
14         std::cout << &d_val;
15     }
16 };

compare this to the definition of TestExpression above; note that no casting operator is needed as a TestWrapper object should never show up directly to the user.
now, we are left to implement the binary operators for + and -. we have to go through all combinations of TestType and TestExpression<O, D> combinations (except two TestExpression<O, D>‘s, which we already covered). this looks as follows:
 1 TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> > operator + (const TestType & a, const TestType & b)
 2 { return TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> >(AddOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); }
 3 TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> > operator - (const TestType & a, const TestType & b)
 4 { return TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> >(SubOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); }
 5 
 6 template<class O2, class D2>
 7 TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator + (const TestType & a, const TestExpression<O2, D2> & b)
 8 { return TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(AddOp(), std::make_pair(TestWrapper(a), b)); }
 9 template<class O2, class D2>
10 TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator - (const TestType & a, const TestExpression<O2, D2> & b)
11 { return TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(SubOp(), std::make_pair(TestWrapper(a), b)); }
12 
13 template<class O1, class D1>
14 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator + (const TestExpression<O1, D1> & a, const TestType & b)
15 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(AddOp(), std::make_pair(a, TestWrapper(b))); }
16 template<class O1, class D1>
17 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator - (const TestExpression<O1, D1> & a, const TestType & b)
18 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(SubOp(), std::make_pair(a, TestWrapper(b))); }

this is as annoying to write as it looks like, but it is necessary. but only once.

testing the result.

now assume that v and w are two objects of type myvec<TestType>, each having six elements. the object s is of type TestType itself. assume that i write the following: v += v + w. then the compiled version will output:

 1 0x1b780d0 += 0x1b780d0
 2 0x1b780d0 += 0x1b780f0
 3 0x1b780d1 += 0x1b780d1
 4 0x1b780d1 += 0x1b780f1
 5 0x1b780d2 += 0x1b780d2
 6 0x1b780d2 += 0x1b780f2
 7 0x1b780d3 += 0x1b780d3
 8 0x1b780d3 += 0x1b780f3
 9 0x1b780d4 += 0x1b780d4
10 0x1b780d4 += 0x1b780f4
11 0x1b780d5 += 0x1b780d5
12 0x1b780d5 += 0x1b780f5

this shows that the command v += v + w is replaced by something like:
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] += v[i];
4     v[i] += w[i];
5 }

now let us look at something more complicated. if i write v = w + v, this cannot be translated to
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] += w[i];
4     v[i] += v[i];
5 }

anymore, as v[i] is changing its value inbetween. i added code to my myvec<T> implementation to detect and try to avoid such problems. in this case, it should translate the code something like
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     TestType t = w[i] + v[i];
4     v[i] += t;
5 }

the output is:
 1 create 0x7fffc4d6b5df
 2 copy 0x7fffc4d6b5df from 0x17860f0
 3 0x7fffc4d6b5df += 0x17860d0
 4 0x17860d0 += 0x7fffc4d6b5df
 5 copy 0x7fffc4d6b5df from 0x17860f1
 6 0x7fffc4d6b5df += 0x17860d1
 7 0x17860d1 += 0x7fffc4d6b5df
 8 copy 0x7fffc4d6b5df from 0x17860f2
 9 0x7fffc4d6b5df += 0x17860d2
10 0x17860d2 += 0x7fffc4d6b5df
11 copy 0x7fffc4d6b5df from 0x17860f3
12 0x7fffc4d6b5df += 0x17860d3
13 0x17860d3 += 0x7fffc4d6b5df
14 copy 0x7fffc4d6b5df from 0x17860f4
15 0x7fffc4d6b5df += 0x17860d4
16 0x17860d4 += 0x7fffc4d6b5df
17 copy 0x7fffc4d6b5df from 0x17860f5
18 0x7fffc4d6b5df += 0x17860d5
19 0x17860d5 += 0x7fffc4d6b5df
20 destroy 0x7fffc4d6b5df

this shows that in fact, the generated code is more like this:
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     TestType t = w[i];
4     t += v[i];
5     v[i] += t;
6 }

so without looking at the generated assembler code, we already have a good idea what the template expressions of myvec<T> are doing. now assume we write something like v = s * v + w;. the output is
 1 0x1786116 += (0x1786116 * 0x7fffc4d6b7af)
 2 0x1786116 += 0x1786110
 3 0x1786117 += (0x1786117 * 0x7fffc4d6b7af)
 4 0x1786117 += 0x1786111
 5 0x1786118 += (0x1786118 * 0x7fffc4d6b7af)
 6 0x1786118 += 0x1786112
 7 0x1786119 += (0x1786119 * 0x7fffc4d6b7af)
 8 0x1786119 += 0x1786113
 9 0x178611a += (0x178611a * 0x7fffc4d6b7af)
10 0x178611a += 0x1786114
11 0x178611b += (0x178611b * 0x7fffc4d6b7af)
12 0x178611b += 0x1786115

which shows that the code was translated to something like
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] += v[i] * s;
4     v[i] += w[i];
5 }

(this of course assumes that we also implemented operator * for TestType.)
to really check what is going on one still has to check the generated assembler code. for example, in the test above which used a temporary when writing v += w + v and no temporary when writing v += v + w, it is unclear if already the compiler made the decision which case to use (which he could, since he knows the addresses of v and w, or at least knows whether they are equal or not), or whether both cases are compiled into the program and whether the running program has to figure out which block of code to run. this cannot be detected using TestType.
note that for checking the assembler code, one better uses a different test type: one which translates everything to “three-address assembler”-like commands, which are declared extern (and not defined in this translation unit). then one can search for these function calls, and the whole scenario is more realistic as with complex operators (as above, where std::cout is used all over the place), in which case often operators are outsourced as own functions.

the code.

you can download the source code of the TestType class here.

posted in: daily life
tags:
places:

almost over. decending, soon, vanishing beneath the horizon. a last glimpse of its beauty, before it is gone.
goodbye sun, see you tomorrow.

this is not the first night it snows – yesterday, flakes fell down as well. but this time, the snow is staying. not for long, i’m afraid, but at least for a little while. time to go outside and enjoy it, which i just did. not yet enough to play in it (that probably has to wait until the end of the year anyway), but great enough for some hopefully nice photos. enjoy!