skip to main content.

posts about computer. (page 5.)


when developing expression templates, say a vector class allowing efficient arithmetic using expression templates, one often wonders if the long, complicated pages of template code one wrote actually does what it is supposed to do. to be sure, one has to ask the compiler to generate assembler code and wade through that code to see what is happening. this is tiresome, though essential.
most of the time, it is enough to check if the operations done with the supplied type – in the above example of a vector class, the scalar type – are what we want them to be. for example, if v, w, x, y, z are variables of type myvec<int>, say all of the same dimension, one wants to know whether v = 2 * w + (x - (y - z)) is evaluated to something like

1 for (unsigned i = 0; i < v.size(); ++i)
2     v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

or (more likely)
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] = 2 * w[i];
4     v[i] += x[i];
5     v[i] -= y[i];
6     v[i] += z[i];
7 }

(note that this code is not equivalent to the one above for certain more complex types than int, but for my purposes, such a transformation is fine. but anyway, that is not what we want to discuss here.) an alternative would be that temporaries are created, like in
1 myvec<int> t1, t2, t3, t4;
2 t1 = 2 * w;
3 t2 = y - z;
4 t3 = x - t2;
5 t4 = t1 - t3;
6 v = t4;

this involves allocating and releasing memory and copying around data, which is significantly slower than the direct version
1 for (unsigned i = 0; i < v.size(); ++i)
2     v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));

the aim of expression templates is to generate such more optimal code. but what if your expression templates don’t do what you want? maybe they still generate temporaries, like in the following version:
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     int t2 = 2 * w[i];
4     int t2 = y[i] - z[i];
5     int t3 = x[i] - t2;
6     v[i] = t1 - t3;
7 }

the compiler will still optimize this to the same as above, but in case the type is not int but, say, myArbitraryPrecisionInteger (which uses expression templates itself and can deal very efficiently with expressions like v[i] = 2 * w[i] + (x[i] - (y[i] - z[i]));), this code is suboptimal, and no expression templates provided by that type can make it better.
therefore, one would like to have a dummy type, a type one can just plug in instead of int in the above example, which somehow outputs what exactly is done: which temporaries are created and destroyed, and which operations and expressions are created.

a test type.

the test type has to use expression templates itself to gather the expressions which are generated by the other expression templates, like the ones of myvec<T>. then, it should just print these expressions when assignments etc. occur, say to std::cout. then one can run a test program and just read the output to see what was going on. this is in general much easier than looking at the generated assembler code.
note that we only care about additive operations in the following, to make the code more readable.
the basic version of the type looks as follows:

 1 class TestType
 2 {
 3 public:
 4     TestType()
 5     {
 6         std::cout << "create " << this << "\n";
 7     }
 9     TestType(const TestType & src)
10     {
11         std::cout << "create " << this << " from " << &src << "\n";
12     }
14     ~TestType()
15     {
16         std::cout << "destroy " << this << "\n";
17     }
19     TestType & operator = (const TestType & src)
20     {
21         std::cout << "copy " << this << " from " << &src << "\n";
22         return *this;
23     }
25     TestType & operator += (const TestType & b)
26     {
27         std::cout << this << " += " << &b << "\n";
28         return *this;
29     }
31     TestType & operator -= (const TestType & b)
32     {
33         std::cout << this << " -= " << &b << "\n";
34         return *this;
35     }
36 };

this already allows us to see when TestType objects and temporaries are created and assigned, and when basic arithmetic is done. but so far, no real arithmetic can be done. to allow arithmetic, we introduce a TestExpression<O, D> template. here, the O class describes the operand, and the D class describes the argument(s). it is defined as follows:
 1 template<class Op, class Data>
 2 class TestExpression
 3 {
 4 private:
 5     Op d_op;
 6     Data d_data;
 8 public:
 9     inline TestExpression(const Op & op, const Data & data)
10         : d_op(op), d_data(data)
11     {
12     }
14     operator TestType () const
15     {
16         std::cout << "casting TestExpression[";
17         d_op.print(d_data);
18         std::cout << "] to TestType()\n";
19         return TestType();
20     }
22     void print() const
23     {
24         d_op.print(d_data);
25     }
26 };

to add support for expressions to the TestType class, one adds the following methods to it:
 1     template<class O, class D>
 2     TestType(const TestExpression<O, D> & src)
 3     {
 4         std::cout << "create " << this << " from ";
 5         src.print();
 6         std::cout << "\n";
 7     }
 9     template<class O, class D>
10     TestType & operator = (const TestExpression<O, D> & e)
11     {
12         std::cout << this << " = ";
13         e.print();
14         std::cout << "\n";
15         return *this;
16     }
18     template<class O, class D>
19     TestType & operator += (const TestExpression<O, D> & e)
20     {
21         std::cout << this << " += ";
22         e.print();
23         std::cout << "\n";
24         return *this;
25     }
27     template<class O, class D>
28     TestType & operator -= (const TestExpression<O, D> & e)
29     {
30         std::cout << this << " -= ";
31         e.print();
32         std::cout << "\n";
33         return *this;
34     }

then if we assign a TestExpression<O, D> to a TestType object, or add it to it, or subtract it from it, etc., the corresponding messages are printed. now, let us discuss how the operands are implemented. these are simple classes with template members, which do not store data:
 1 class AddOp
 2 {
 3 public:
 4     template<class A, class B>
 5     void print(const std::pair<A, B> & data) const
 6     {
 7         std::cout << "(";
 8         data.first.print();
 9         std::cout << " + ";
10         data.second.print();
11         std::cout << ")";
12     }
13 };
15 class SubOp
16 {
17 public:
18     template<class A, class B>
19     void print(const std::pair<A, B> & data) const
20     {
21         std::cout << "(";
22         data.first.print();
23         std::cout << " - ";
24         data.second.print();
25         std::cout << ")";
26     }
27 };
29 class NegOp
30 {
31 public:
32     template<class A>
33     void print(const A & data) const
34     {
35         std::cout << "-";
36         data.print();
37     }
38 };

note that all operations but the unary NegOp are binary; the data object is in that case a std::pair<T1, T2> object. now one main piece is missing which puts everything together: the overloaded operators which generate the expression templates. let us begin with the universal ones: the ones taking two expressions and combining them by an operator.
1 template<class O1, class D1, class O2, class D2>
2 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator + (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b)
3 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(AddOp(), std::make_pair(a, b)); }
4 template<class O1, class D1, class O2, class D2>
5 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > > operator - (const TestExpression<O1, D1> & a, const TestExpression<O2, D2> & b)
6 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >(SubOp(), std::make_pair(a, b)); }

this code is not exactly readable, but does its job: it takes two expressions, TestExpression<O1, D1> and TestExpression<O2, D2>, and combines them to a new expression TestExpression<NewOperand, std::pair<TestExpression<O1, D1>, TestExpression<O2, D2> > >. the operator for inversion looks similar, but simpler:
1 template<class O1, class D1>
2 TestExpression<NegOp, TestExpression<O1, D1> > operator - (const TestExpression<O1, D1> & a)
3 { return TestExpression<NegOp, TestExpression<O1, D1> >(NegOp(), a); }

but this whole thing only works when we already have expressions. so far, we have no code which actually creates an expression in the first place. this can be done by more operator overloading, and by introducing a TestWrapper[/url] which encapsulates an object of type [code language="c++"]TestType and behaves like an expression on its own. let us first show the operator definition in the unary case:
1 TestExpression<NegOp, TestWrapper> operator - (const TestType & a)
2 { return TestExpression<NegOp, TestWrapper>(NegOp(), TestWrapper(a)); }

the template TestWrapper encapsulates a TestObject. the definition looks as follows:
 1 class TestWrapper
 2 {
 3 private:
 4     const TestType & d_val;
 6 public:
 7     inline TestWrapper(const TestType & val)
 8         : d_val(val)
 9     {
10     }
12     void print() const
13     {
14         std::cout << &d_val;
15     }
16 };

compare this to the definition of TestExpression above; note that no casting operator is needed as a TestWrapper object should never show up directly to the user.
now, we are left to implement the binary operators for + and -. we have to go through all combinations of TestType and TestExpression<O, D> combinations (except two TestExpression<O, D>‘s, which we already covered). this looks as follows:
 1 TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> > operator + (const TestType & a, const TestType & b)
 2 { return TestExpression<AddOp, std::pair<TestWrapper, TestWrapper> >(AddOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); }
 3 TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> > operator - (const TestType & a, const TestType & b)
 4 { return TestExpression<SubOp, std::pair<TestWrapper, TestWrapper> >(SubOp(), std::make_pair(TestWrapper(a), TestWrapper(b))); }
 6 template<class O2, class D2>
 7 TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator + (const TestType & a, const TestExpression<O2, D2> & b)
 8 { return TestExpression<AddOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(AddOp(), std::make_pair(TestWrapper(a), b)); }
 9 template<class O2, class D2>
10 TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > > operator - (const TestType & a, const TestExpression<O2, D2> & b)
11 { return TestExpression<SubOp, std::pair<TestWrapper, TestExpression<O2, D2> > >(SubOp(), std::make_pair(TestWrapper(a), b)); }
13 template<class O1, class D1>
14 TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator + (const TestExpression<O1, D1> & a, const TestType & b)
15 { return TestExpression<AddOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(AddOp(), std::make_pair(a, TestWrapper(b))); }
16 template<class O1, class D1>
17 TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> > operator - (const TestExpression<O1, D1> & a, const TestType & b)
18 { return TestExpression<SubOp, std::pair<TestExpression<O1, D1>, TestWrapper> >(SubOp(), std::make_pair(a, TestWrapper(b))); }

this is as annoying to write as it looks like, but it is necessary. but only once.

testing the result.

now assume that v and w are two objects of type myvec<TestType>, each having six elements. the object s is of type TestType itself. assume that i write the following: v += v + w. then the compiled version will output:

 1 0x1b780d0 += 0x1b780d0
 2 0x1b780d0 += 0x1b780f0
 3 0x1b780d1 += 0x1b780d1
 4 0x1b780d1 += 0x1b780f1
 5 0x1b780d2 += 0x1b780d2
 6 0x1b780d2 += 0x1b780f2
 7 0x1b780d3 += 0x1b780d3
 8 0x1b780d3 += 0x1b780f3
 9 0x1b780d4 += 0x1b780d4
10 0x1b780d4 += 0x1b780f4
11 0x1b780d5 += 0x1b780d5
12 0x1b780d5 += 0x1b780f5

this shows that the command v += v + w is replaced by something like:
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] += v[i];
4     v[i] += w[i];
5 }

now let us look at something more complicated. if i write v = w + v, this cannot be translated to
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] += w[i];
4     v[i] += v[i];
5 }

anymore, as v[i] is changing its value inbetween. i added code to my myvec<T> implementation to detect and try to avoid such problems. in this case, it should translate the code something like
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     TestType t = w[i] + v[i];
4     v[i] += t;
5 }

the output is:
 1 create 0x7fffc4d6b5df
 2 copy 0x7fffc4d6b5df from 0x17860f0
 3 0x7fffc4d6b5df += 0x17860d0
 4 0x17860d0 += 0x7fffc4d6b5df
 5 copy 0x7fffc4d6b5df from 0x17860f1
 6 0x7fffc4d6b5df += 0x17860d1
 7 0x17860d1 += 0x7fffc4d6b5df
 8 copy 0x7fffc4d6b5df from 0x17860f2
 9 0x7fffc4d6b5df += 0x17860d2
10 0x17860d2 += 0x7fffc4d6b5df
11 copy 0x7fffc4d6b5df from 0x17860f3
12 0x7fffc4d6b5df += 0x17860d3
13 0x17860d3 += 0x7fffc4d6b5df
14 copy 0x7fffc4d6b5df from 0x17860f4
15 0x7fffc4d6b5df += 0x17860d4
16 0x17860d4 += 0x7fffc4d6b5df
17 copy 0x7fffc4d6b5df from 0x17860f5
18 0x7fffc4d6b5df += 0x17860d5
19 0x17860d5 += 0x7fffc4d6b5df
20 destroy 0x7fffc4d6b5df

this shows that in fact, the generated code is more like this:
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     TestType t = w[i];
4     t += v[i];
5     v[i] += t;
6 }

so without looking at the generated assembler code, we already have a good idea what the template expressions of myvec<T> are doing. now assume we write something like v = s * v + w;. the output is
 1 0x1786116 += (0x1786116 * 0x7fffc4d6b7af)
 2 0x1786116 += 0x1786110
 3 0x1786117 += (0x1786117 * 0x7fffc4d6b7af)
 4 0x1786117 += 0x1786111
 5 0x1786118 += (0x1786118 * 0x7fffc4d6b7af)
 6 0x1786118 += 0x1786112
 7 0x1786119 += (0x1786119 * 0x7fffc4d6b7af)
 8 0x1786119 += 0x1786113
 9 0x178611a += (0x178611a * 0x7fffc4d6b7af)
10 0x178611a += 0x1786114
11 0x178611b += (0x178611b * 0x7fffc4d6b7af)
12 0x178611b += 0x1786115

which shows that the code was translated to something like
1 for (unsigned i = 0; i < v.size(); ++i)
2 {
3     v[i] += v[i] * s;
4     v[i] += w[i];
5 }

(this of course assumes that we also implemented operator * for TestType.)
to really check what is going on one still has to check the generated assembler code. for example, in the test above which used a temporary when writing v += w + v and no temporary when writing v += v + w, it is unclear if already the compiler made the decision which case to use (which he could, since he knows the addresses of v and w, or at least knows whether they are equal or not), or whether both cases are compiled into the program and whether the running program has to figure out which block of code to run. this cannot be detected using TestType.
note that for checking the assembler code, one better uses a different test type: one which translates everything to “three-address assembler”-like commands, which are declared extern (and not defined in this translation unit). then one can search for these function calls, and the whole scenario is more realistic as with complex operators (as above, where std::cout is used all over the place), in which case often operators are outsourced as own functions.

the code.

you can download the source code of the TestType class here.

here is a table of contents for the implementing multivariate polynomials in c++ using templates mini-series:

the soure code can be downloaded here (8 kb).

welcome to the fifth (also last and easiest) part of the multivariate polynomials in c++ using templates series. this part will explain how i implemented long division and the euclidean algorithm.

long division.

long division in R[X] is defined as follows: given polynomials f, g \in R[X], find polynomials q, r \in R[X] such that f = q \cdot g + r and \deg r < \deg g. provided that the leading term of g is a unit in R (or at least divides most things), q and r exist and are unique (in fact, i'm implicitly assuming that R is commutative and unitary). this division makes R[X] an euclidean ring in case R is a field (i.e. all non-zero elements are invertible). our T is in general not a field, but we can still try to do long division, and let the user worry if this actually makes sense. for example, if T == int, one can divide two elements (assuming the second is not zero), but this is not the division a mathematician would expect. hence, if g is not monic in case of T == int, our implmentation will output <i>something</i> which does not necessarily make sense.

anyway. my implementation is rather straightforward, which is not very surprising:

 1 template<class T>
 2 void eucliddiv(const poly<1, T> & f, const poly<1, T> & g, poly<1, T> & q, poly<1, T> & r)
 3 {
 4     if ( <
 5     {
 6         // Ignore the trivial case
 7         q = poly<1, T>();
 8         r = f;
 9         return;
10     }
11     q = poly<1, T>(true, - + 1);
12     r = f;
13     for (int i =; i >= 0; --i)
14     {
15         q[i] = r[i +] / g.leading();
16         for (int j =; j >= 0; --j)
17             r[i + j] -= q[i] * g[j];
18     }
19     r.normalize();
20     // Note that the degree of q is already correct.
21 }

here, we assume that f.leading() / g.leading() is non-zero; otherwise, q.normalize() would have to be called, but in that case the whole result will probably be broken anyway. we create the polynomial q using the special constructor mentioned here.

euclidean algorithm.

the implementation of the euclidean algorithm is not depending on the implementation anymore (long division made only use of the special constructor). first, we take care of special cases (one of the two arguments is zero), and then proceed with the main loop. we normalize the remainder in every iteration of the loop, and also the result, to (hopefully) increase numerical stability a bit. hence, for this to work, we really need that T is a field, or otherwise it will only work in very special cases.

 1 template<class T>
 2 poly<1, T> GCD(const poly<1, T> & f, const poly<1, T> & g)
 3 {
 4     if (f.isZero())
 5     {
 6         if (g.isZero())
 7             return f; // both are zero
 8         else
 9             return g / g.leading(); // make g monic
10     }
11     poly<1, T> d1(f / f.leading()), d2(g / g.leading()), q, r;
12     while (!d2.isZero())
13     {
14         eucliddiv(d1, d2, q, r);
15         d1.swap(d2);
16         d2 = r;
17         d2 /= r.leading(); // make r monic
18     }
19     return d1;
20 }

i don't think there's much to write about this, which isn't already written somewhere else. this just shows how to implement a generic polynomial based algorithm.

welcome to the fourth part of the multivariate polynomials in c++ using templates series. in this part, i want to explain how to implement efficient polynomial evaluation using a collection of interacting templates. the output will be quite optimized code (assuming the compiler has a decent enough optimizer).

evaluating multivariate polynomials is more complicated. i started with a naive implementation of operator() in poly<n, T>:

 1     template<class S>
 2     poly<n - 1, S> operator() (const S & x) const
 3     {
 4         poly<n - 1, S> res;
 5         S xx = (S)1;
 6         for (unsigned i = 0; i < d_value.size(); ++i)
 7         {
 8             res += d_value[i] * xx;
 9             xx *= x;
10         }
11         return res;
12     }

if f is of type poly<2, int>, then f(4) will be of type poly<1, int>, whence f(4)(2) will call poly<1, int>::operator()<int>(const int &) to yield a polynomial of type poly<0, int>, which automatically casts to an int. unfortunately, in this process, first f(4) will be created, which requires arithmetic of polynomials of type poly<1, int>, and then the resulting polynomial will be evaluated again. in particular, if n is larger, this is far from being optimal. hence, this solution is fine if evaluations are seldomly done, but if they are done more often, it is too slow.

to stay with our example of f of type poly<2, int>. one could evaluate f(4, 2), written as f(4)(2), directly as follows:

 1 int result = 0, xx1 = 1;
 2 for (int i = 0; i <=; ++i)
 3 {
 4     int result2 = 0, xx2 = 1;
 5     for (int j = 0; j <= f[i].degree(); ++j)
 6     {
 7         result2 += f[i][j] * xx2;
 8         xx2 *= 2;
 9     }
10     result += result2 * xx1;
11     xx1 *= 4;
12 }

clearly, this is tiresome (and prone to typos) to write every time. moreover, this is also not optimal, as our operator[] will do more than just returning an element. in particular, if f is not const in this context, the compiler will insert code at every operation f[i] and f[i][j] to check whether the index is out of range (to resize d_value in that case). writing more directly

 1 int result = 0, xx1 = 1;
 2 for (int i = 0; i < f.d_value.size(); ++i)
 3 {
 4     int result2 = 0, xx2 = 1;
 5     for (int j = 0; j < f.d_value[i].d_value.size(); ++j)
 6     {
 7         result2 += f.d_value[i].d_value[j] * xx2;
 8         xx2 *= 2;
 9     }
10     result += result2 * xx1;
11     xx1 *= 4;
12 }

would result in faster code, if it would compile – d_value is private. for this reason, i came up with two templates:

1 template<int n, class T, class S = T>
2 class poly_evaluator;
3 template<int n, class T, class HL, class S>
4 class poly_evaluator_impl;

the template poly_evaluator<n, T, S> is instanciated in poly<n, T>::operator()<S>(const S &), which is defined as follows:

1     template<class S>
2     inline poly_evaluator<n, T, S> operator() (const S & x) const
3     {
4         return poly_evaluator<n, T, S>(*this, x);
5     }

the idea is that if operator() of poly_evaluator<n, T, S>(*this, x) is called, it will spawn an object of type poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>. if operator() of this new object is called, it will create an object of type poly_evaluator_impl<n-2, T, poly_evaluator_impl<n-1, T, poly_evaluator<n, T, S>, S>, S>, and so on. the purpose of carrying the type of the caller around is that the outer loop of the evaluation function is the loop for the inner-most object (of type poly_evaluator<n, T, S>). if operator() would be right-associative instead of left-associative (as it is), this trouble would not be necessary.

the idea is that both templates poly_evaluator<n, T, S> and poly_evaluator<n, T, HL, S> provide an internal function template<class SS, class Fun> evaluate(SS & res, Fun evalfun), which in case of poly_evaluator<n, T, HL, S> calls the corresponding evaluate function of its owner (with evalfun replaced), and in the case of poly_evaluator<n, T, S> implements the outermost loop, which loops over the coefficients of X_0 and uses evalfun to evaluate the coefficients. hence, the evalful handed upwards must be extended in every step to include the next variable.

assume that we have a polynomial f of type poly<3, int>, and we want to evaluate f(10)(20)(30). this is an object of type poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>, whose cast operator operator int() will start evaluation by defining a local class poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun, defined as follows:

 1     class eval_fun
 2     {
 3         const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & d_owner;
 5     public:
 6         inline eval_fun(const poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int> & owner)
 7             : d_owner(owner)
 8         {
 9         }
11         inline int operator() (const poly<1, int> & p) const
12         {
13             int res = 0;
14             int xx = 1;
15             for (int i = 0; i < (int)p.d_value.size(); ++i)
16             {
17                 res += p.d_value[i] * xx;
18                 xx = xx * d_owner.d_evalpoint;
19             }
20             return res;
21         }
22     };

an object of it is created by operator int() as follows:

1     inline operator int() const
2     {
3         int res = 0;
4         d_owner.evaluate(res, eval_fun(*this));
5         return res;
6     }

the object eval_fun(*this) can now be used to evaluate polynomials of type poly<1, int> at the specified evaluation point d_owner.d_evalpoint, which equals 30 in this case.

the operator int() calls poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>::evaluate<int, Fun>(int &, Fun), which in turn is implemented as follows:

1     template<class Fun>
2     inline void evaluate(int & res, const Fun & evalfun) const
3     {
4         d_owner.evaluate(res, eval_fun<int, Fun>(*this, evalfun));
5     }

the newly created object of type eval_fun<int, Fun> will now evaluate polynomials of type poly<2, int> by using the provided functor evalfun, which is an object of the class poly_evaluator_impl<1, int, poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int>, int>::eval_fun defined above. the class eval_fun in poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> is defined as follows:

 1     template<class Fun>
 2     class eval_fun
 3     {
 4         const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & d_owner;
 5         const Fun & d_evalfun;
 7     public:
 8         inline eval_fun(const poly_evaluator_impl<2, int, poly_evaluator<3, int, int>, int> & owner, const Fun & evalfun)
 9             : d_owner(owner), d_evalfun(evalfun)
10         {
11         }
13         inline int operator() (const poly<2, int> & p) const
14         {
15             int res = 0;
16             int xx = 1;
17             for (int i = 0; i < (int)p.d_value.size(); ++i)
18             {
19                 res += d_evalfun(p.d_value[i]) * xx;
20                 xx = xx * d_owner.d_evalpoint;
21             }
22             return res;
23         }
24     };

note that d_owner.d_evalpoint is 20 in this case. finally, we are at the top level, namely in the function poly_evaluator<3, int, int>::evaluate<int, Fun>(), defined as follows:

 1     template<class Fun>
 2     inline void evaluate(int & res, const Fun & evalfun) const
 3     {
 4         int xx = 1;
 5         for (int i = 0; i < (int)d_poly.d_value.size(); ++i)
 6         {
 7             res += evalfun(d_poly.d_value[i]) * xx;
 8             xx = xx * d_evalpoint;
 9         }
10     }

(here, d_evalpoint is 10.) this is the outer loop. the middle loop is inserted when calling evalfun(d_poly.d_value[i]), and into that the innermost loop is inserted when it calles d_evalfun(p.d_value[i]). this all essentially boils down to the following combination:

 1     {
 2         int res = 0;
 3         int xx = 1;
 4         for (int i = 0; i < (int)d_poly.d_value.size(); ++i)
 5         {
 6             int tmp;
 7             const poly<2, int> & p = d_poly.d_value[i];
 8             {
 9                 int res2 = 0;
10                 int xx2 = 1;
11                 for (int i2 = 0; i2 < (int)p.d_value.size(); ++i2)
12                 {
13                     int tmp2;
14                     const poly<2, int> & p2 = p.d_value[i2];
15                     {
16                         int res3 = 0;
17                         int xx3 = 1;
18                         for (int i3 = 0; i3 < (int)p2.d_value.size(); ++i3)
19                         {
20                             res3 += p2.d_value[i3] * xx3;
21                             xx3 = xx3 * d_owner.d_evalpoint;
22                         }
23                         tmp2 = res3;
24                     }
25                     res2 += tmp2 * xx2;
26                     xx2 = xx2 * d_owner.d_evalpoint;
27                 }
28                 tmp = res2;
29             }
30             res += tmp * xx;
31             xx = xx * d_evalpoint;
32         }
33         return res;
34     }

of course, this is just an intermediate step (to highlight the similiarities to all the code snippets from above); the compiler will optimize this to something more compact.

this “example” already presented most of the concept. the implementation itself is more complicated, since different types can be used for evaluation (at different stages, even), and since the implementation has support for allocators.

an important point is that both templates poly_evaluator<n, T, S> and poly_evaluator<n, T, HL, S> are specialized for n == 1, as in that case, there is no lower level, and hence these specializations provide no evaluate template and try to pass evaluation further down, but right away pass it higher up to their owner, or in case of poly_evaluator<1, T, S>, it will just do the evaluation.

this concludes the fourth part. the last part will concentrate something less complicated: implementing long division and the euclidean algorithm.

welcome to the third part of the multivariate polynomials in c++ using templates series. this part will explain how to compute partial derivatives using partial template specialization tricks.

the idea is very simple. given f \in R[X_0, \dots, X_{n-1}], written as f = \sum_{i=0}^k a_i X_0^i with a_i \in R[X_1, \dots, X_{n-1}], we have

\frac{\partial f}{\partial X_0} = \sum_{i=1}^k a_i X_0^{i-1}


\frac{\partial f}{\partial X_j} = \sum_{i=0}^k \biggl( \frac{\partial}{\partial X_j} a_i \biggr) X_0^i \quad \text{for } j > 0.

this allows us to implement partial derivatives iteratively. we assume that the number of the indeterminate by which one should differentiate is already known during compile time. the implementation will be done by a templated function

1 template<int N, int n, class T>
2 inline poly<n, T> derivative(const poly<n, T> & p);

the purpose of the template parameter N is to say that \frac{\partial p}{\partial X_N} should be computed. given a polynomial p, the function can be called simply by derivative<N>(p). to implement derivative<N>(p), we would like to again use partial template specialization to distinguish between the cases N == 0 and N > 0. unfortunately, partial template specialization is not available for template functions. hence, we need to introduce a helper template class:

1 template<int N, int n, class T>
2 class derivative_computer;

it should contain one static function, static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest); which allows to compute the partial derivative \frac{\partial}{\partial X_N} of the polynomial in src and store it into dest. then derviative<N>(p) can simply be implemented by

1 template<int N, int n, class T>
2 inline poly<n, T, std::allocator<T> > derivative(const poly<n, T, std::allocator<T> > & p)
3 {
4     poly<n, T> res;
5     derivative_computer<N, n, T>::computeDerivative(p, res);
6     return res;
7 }

now let us focus on the class template derivative_computer<N, n, T>. the general implementation, for N > 0, uses the second formula from above and iteratively calls derivative_computer<N-1, n-1, T>::computeDerivative():

 1 template<int N, int n, class T>
 2 class derivative_computer
 3 {
 4 public:
 5     static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest)
 6     {
 7         dest.d_value.resize( + 1);
 8         for (int i = 0; i <=; ++i)
 9             derivative_computer<N - 1, n - 1, T>::computeDerivative(src[i], dest[i]);
10         dest.normalize();
11     }
12 };

now we are left with the case that N = 0. the implementation here is also simple:

 1 template<int n, class T>
 2 class derivative_computer<0, n, T>
 3 {
 4 public:
 5     static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest)
 6     {
 7         dest.d_value.resize( >= 0 ? : 0);
 8         for (int i = 1; i <=; ++i)
 9             dest[i - 1] = src[i] * (T)i;
10         dest.normalize();
11     }
12 };

here, we cast i to an object of type T before multiplication. from this point on, no further iteration is needed.

we are left with the case that some (malicious or just confused) user calls derivative<N>(p) for a polynomial p of type poly<n, T> with n <= N. in that case, the compiler tries to instanciate derivative_computer<k, 0, T>::computeDerivative() and tries to treat poly<0, T> as a polynomial. this will result in strange error messages, and it would be nice if there is something more readable. for that reason, i added another two specializations:

 1 template<class T>
 2 class derivative_computer<0, 0, T>
 3 {
 4 public:
 5     class ERROR_N_must_be_strictly_less_than_n_in_derivative_template;
 7     static inline void computeDerivative(const poly<0, T> &, poly<0, T> &)
 8     {
 9         ERROR_N_must_be_strictly_less_than_n_in_derivative_template();
10     }
11 };
13 template<int N, class T>
14 class derivative_computer<N, 0, T>
15 {
16 public:
17     class ERROR_N_must_be_strictly_less_than_n_in_derivative_template;
19     static inline void computeDerivative(const poly<0, T> &, poly<0, T> &)
20     {
21         ERROR_N_must_be_strictly_less_than_n_in_derivative_template();
22     }
23 };

these two definitions compile fine, but as soon as one of them is instanciated by the compiler, it will try to find out something about the class ERROR_N_must_be_strictly_less_than_n_in_derivative_template and not succeed, and spit out some error message hopefully containing the string ERROR_N_must_be_strictly_less_than_n_in_derivative_template. for example, if i replace the line computing the partial derivative of the intersection equation in poly-test.cpp by std::cout << derivative<2>(l) << "\n";, g++ will complain as follows:

1 In file included from poly-test.cpp:2:0:
2 polynomial.hpp: In static member function ‘static void derivative_computer<N, 0, T>::computeDerivative(const poly<0, T>&, poly<0, T>&) [with int N = 1, T = double]’:
3 polynomial.hpp:1200:13:   instantiated from ‘static void derivative_computer<N, n, T>::computeDerivative(const poly<n, T>&, poly<n, T>&) [with int N = 2, int n = 1, T = double]4 polynomial.hpp:1247:5:   instantiated from ‘poly<n, T> derivative(const poly<n, T>&) [with int N = 2, int n = 1, T = double]5 poly-test.cpp:19:33:   instantiated from here
6 polynomial.hpp:1227:9: error: invalid use of incomplete type ‘struct derivative_computer<1, 0, double>::ERROR_N_must_be_strictly_less_than_n_in_derivative_template’
7 polynomial.hpp:1223:11: error: declaration of ‘struct derivative_computer<1, 0, double>::ERROR_N_must_be_strictly_less_than_n_in_derivative_template’
8 make: *** [poly-test.o] Error 1

this is not exactly readable, but the uppercase ERROR quickly lets you look at the important part of that message. (note that i removed the allocators from the error messages, so that it is in sync with the above writing. if you try it out yourself with the source code, you'll see a slightly more complicated error message.)

this completes part three of the series. in the next part, i will discuss how to evaluate polynomials efficiently using templates.

welcome to the second part of the multivariate polynomials in c++ using templates series. this part will explain how to store multivariate polynomials via templates, how to implement storage and operations.

multivariate polynomial rings in mathematics.

as you might recall from your algebra courses, R[x_0, \dots, x_{n-1}] denotes the (ring, in case R itself is a ring) of multivariate polynomials in the n indeterminates (also called variables) X_0, \dots, X_{n-1} with coefficients in R. note that i numerate the indeterminates beginning with zero, which is rather intuitive for a c++ programmer, though often not for a mathematician. there are different ways to treat such polynomials; for example, one can work with multiindices i = (i_0, \dots, i_{n-1}) \in \N^n; for such an i, let a_i be an element of R and define X^i := X_0^{i_0} \cdots X_{n-1}^{e_{n-1}}; then a polynomial f \in R[X_0, \dots, X_{n-1}] can be represented as f = \sum_{i \in \N^n} a_i X^i. clearly, one does not want to work with infinite objects, whence one restricts to a subset of \N^n, for example of type [0, (k_0, \dots, k_{n-1})] \cap \N^n = \{ i \in \N^n \mid 0 \le i_j \le k_j \text{ for } j = 0, \dots, n-1 \}. this is one way of representing multivariate polynomials, and fine for several applications, but not the most easy one to implement in c++ as a template depending on n.

another way to treat multivariate polynomials is to iteratively write R[X_0, \dots, X_{n-1}] = (R[X_1, \dots, X_{n-1}])[X_0] (one could also use R[X_0, \dots, X_{n-1}] = (R[X_0, \dots, X_{n-2}])[X_{n-1}], but i will stick to the other as it is more natural for the implementation i'm describing). hence, a polynomial f \in R[X_0, \dots, X_{n-1}] is interpreted as an <i>univariate</i> polynomial with coefficients in the ring R[X_1, \dots, X_{n-1}]. this allows a recursive definition which is well-suited for templates.

implementing the data structure using templates.

now let us leave the realm of mathematics for a while and talk about c++. the ring R mentioned above will be replaced by an abstract type T, which will be a template parameter. note that in the discussion below, we will ignore allocators, and just talk about them at the very end of this article. our main template type will be template<int n, class T> class poly;, which essentially contains a std::vector<poly<n - 1, T> > to store coefficients. (the storage is in fact more delicate, we will discuss this below.)

clearly, this is not enough yet. the compiler has no clue that we are talking about multivariate polynomials, and that n should always be non-negative, and it also does not knows that poly<0, T> is essentially an element of T. the way to solve this is partial template specialization: we create a specialized version of poly<n, T> for the case n == 0. the declaration is as follows:

 1 template<int n, class T>
 2 class poly; // forward declaration
 4 template<class T>
 5 class poly<0, T> // specialize for n = 0
 6 {
 7 private:
 8     T d_value;
 9 public:
10     ...
11 }
13 template<int n, class T>
14 class poly // the general implementation
15 {
16 private:
17     std::vector<poly<n - 1, T> > d_value;
18 public:
19     ...
20 }

the specialization for n == 0 has a rather simple implementation. besides a constructor accepting an object of type T, with default value T(), it has a casting operator to T, provides an isZero() predicate, assignment and arithmetic operations, output to std::ostreams, and using std::swap() to swap two poly<0, T> objects, it has a simple “evaluate” function implemented as operator() which is essentially the same as the cast operator operator T().

the general implemenation is more interesting. there is a method normalize() which ensures that d_value.back() is the leading coefficient and d_value.size() - 1 the degree of the polynomial; note that we use -1 for the degree of the zero polynomial. these conditions are used for example in the implementation of isZero(), leading() and degree(). for most operations, the polynomial itself makes sure it is normalized, but when accessing a non-const polynomial with operator[], it might change its degree. after that, one better calls normalize(). it would also be possible to do this automatically using more template tricks, but i wanted to leave some responsibility with the user of the template. :-)

the arithmetic operations are pretty straightforward to implement; one has to resize d_value and later call normalize() to make sure nothing bad happens. for multiplication, one creates a temporary which contains the product (the maximal degree of the product is known, but normalize() should be run afterwards since there is no garuantee that the ring containing the objects of type T is zero-divisor free). as an example, adding two polynomials is accomplished by the following operator:

 1     inline poly operator + (const poly & q) const
 2     {
 3         poly r(*this);
 4         if (r.d_value.size() < q.d_value.size())
 5             r.d_value.resize(q.d_value.size());
 6         for (unsigned i = 0; i < q.d_value.size(); ++i)
 7             r[i] += q[i];
 8         r.normalize();
 9         return r;
10     }

first, a copy of the first argument (*this) is made. then, its size is adjusted to make sure there's enough room. then, simply all coefficients of the second argument are added, and finally the result is normalize()d and returned. as another example, here's polynomial multiplication:

 1     inline poly operator * (const poly & p) const
 2     {
 3         int d = + degree();
 4         if (d < -1)
 5             d = -1;
 6         poly r(true, d + 1);
 7         if (!isZero() && !p.isZero())
 8             for (unsigned i = 0; i < r.d_value.size(); ++i)
 9                 for (unsigned j = 0; j < d_value.size(); ++j)
10                     if (i < j + p.d_value.size())
11                         r[i] += d_value[j] * p[i - j];
12         r.normalize();
13         return r;
14     }

note that as we are inside the template, we can write poly instead of poly<n, T>. moreover, here, use is made of a special constructor poly(bool, unsigned size) which initializes the polynomial with size coefficients. the bool is just to ensure that the compiler won't confuse this constructor with any of the others. this constructor is made private.

printing the polynomial to a string is slightly more involved, as the correct names should be used for the indeterminates. for this, i use a method void print(std::ostream & s, int N = n) const which keeps track of the maximal number of indeterminates. the operator<< is then implemented as follows:

1     friend inline std::ostream & operator << (std::ostream & s, const poly & p)
2     {
3         p.print(s);
4         return s;
5     }

the implementation of print(...) works recursively, and N keeps track of n of the top-level instance. the indeterminate at the current level has index N - n.

finally, let me announce evaluation of polynomials. this will be done with operator(). the implementation inside poly<n, T> is very easy:

1     template<class S>
2     inline poly_evaluator<n, T, S> operator() (const S & x) const
3     {
4         return poly_evaluator<n, T, S>(*this, x);
5     }

the interesting part, namely the implementation of the poly_evaluator<n, T, S> template, will be discussed in part four of this series.

making the storage more intelligent.

in the above code, we use a simple std::vector<poly<n - 1, T> > to store the coefficients, which are (except for n=1) itself polynomials. for example, doing a simple resize of d_value in poly<11, T> will result in maybe many objects of type poly<10, T> to be created, copied and destroyed, and this in turn causes maybe many more objects of type poly<9, T> to be created, copied and destroyed, and so on. this is not exactly optimal. one solution is to replace std::vector<poly<n, T> > by std::vector<poly<n, T>*>, i.e. store pointers to polynomials instead of polynomials itself. this will make resizes very efficient, since only pointers have to moved around. unfortunately, it has a drawback: in the case n == 1, this is often not very optimal, as if T==int, we are storing pointers to ints, which are usually at least the same size as the ints themselves. this leads to inefficient code. one solution is to introduce another partial specialization for n=1, but this results in a lot of duplicated code. a more clever solution is to create another template, like template<class T, bool usePointers> class poly_ivector;, which stores pointers to T and makes sure objects are properly created and destroyed in case usePointers == true, and which is simply std::vector<T> in disguise in case usePointers == false. it provides a simple interface, offering the number of stored elements (unsigned size() const), to resize the array (void resize(unsigned)), accessing elements via operator[], and accessing the last element via back(). finally, it offers a void swap(poly_ivector<T, usePointers> &) method to swap the contents of two poly_ivector<T, usePointers> instances. if you are interested in the implementation of poly_ivector<T, bool>, please look at the source code.


in the real implementation, i've added basic support for allocators. these allow to provide own allocation functions (malloc/free) for memory used to store the coefficients. note that my template is not working perfectly with allocators having states, but for state-less allocators, it works fine. if you need to use it with allocators having states, better check if the code does exactly what you want.

i used the polynomial class with a self-written allocator which provides thread local memory management. this was necessary for the aforementioned toy project, for which i wanted to do massive parallel computation (128 threads). it turned out that most threads were waiting for others, as there was a lot of allocation/deallocation going on and the global memory manager, protected by a mutex, made everyone wait. after throwing in a not very efficient, but working memory manager with thread-local storage for the management data, the program ran perfectly parallel.

in this (short) series, i want to explain how to implement support for multivariate polynomials in c++ using templates. for a toy project i worked on mostly last weekend, i needed to work with polynomials in different ways:

  • create a polynomial in three variables with double coefficients;
  • computing the partial derivatives of this polynomial;
  • evaluate the above polynomials in many points (with double coordinates);
  • plug in different univariate polynomials (of form \alpha_i T + \beta_i) into the first polynomial to obtain a univariate polynomial whose zeros correspond to the intersection of the surface defined by the multivariate polynomial together with a parameterized line;
  • test whether a univariate polynomial has zeros in (0, \infty) using sturm chains.

for a given, simple enough polynomial, everything can be done “by hand”, but sooner or later, this is annoying and too restricting. and it gets really annoying for the second part, where i wanted to obtain the univariate polynomial. therefore, i thought about how to create a multivariate polynomial template which provides means to implement all of the above in an easy manner. in the end, i had a multivariate polynomial template with a nice interface, for which – as a small example – the following test program compiles:

 1 #include <iostream>
 2 #include "polynomial.hpp"
 4 int main()
 5 {
 6     poly<2, int> f = Monomial<int>(1, 2) + 3 * Monomial<int>(4, 5);
 7     std::cout << f << "\n";
 8     int e = f(4)(2);
 9     std::cout << e << "\n";
10     std::cout << derivative<1>(f) << "\n";
11     poly<1, int> g = f(2), h = f(3);
12     std::cout << g << " and " << h << "\n";
13     std::cout << GCD<double>(g, h) << "\n"; 
14     // cast polynomials to poly<1, double> first; otherwise,
15     // result will be incorrect since int is not a field
16     std::cout << GCD(g, h) << "\n";
17     // to prove our point, check for yourself that the
18     // result is X_0^5 instead of X_0^2
19     poly<1, double> l = f(Monomial<double>(1) - 1)(Monomial<double>(1) - 2);
20     std::cout << l << "\n";
21     std::cout << derivative<0>(l) << "\n";
22     std::cout << GCD(l, derivative<0>(l)) << "\n";
23 }

compiling it via g++ -O3 -Wall -o poly-test poly-test.cpp, it yielded the following output when run:

1 (1 X_1^2 X_0 + 3 X_1^5 X_0^4)
2 24592
3 (2 X_1 X_0 + 15 X_1^4 X_0^4)
4 (2 X_0^2 + 48 X_0^5) and (3 X_0^2 + 243 X_0^5)
5 1 X_0^2
6 1 X_0^5
7 (-100 + 632 X_0 + -1781 X_0^2 + 2905 X_0^3 + -3006 X_0^4 + 2043 X_0^5 + -912 X_0^6 + 258 X_0^7 + -42 X_0^8 + 3 X_0^9)
8 (632 + -3562 X_0 + 8715 X_0^2 + -12024 X_0^3 + 10215 X_0^4 + -5472 X_0^5 + 1806 X_0^6 + -336 X_0^7 + 27 X_0^8)
9 1

let me explain this example. first, a polynomial f = X_0 X_1^2 + 3 X_0^4 X_1^5 in two indeterminates with int coefficients is created as the linear combination of two monomials (X_0 X_1^2 and X_0^4 X_1^5), which is the first line of the output. then the polynomial is evaluated in (4, 2), and the result is printed as line 2. after that, the partial derivative of the polynomial f with respect to X_1 is computed and written as line 3. after that, the polynomial is evaluated in X_0 in 2 and 3, yielding two new polynomials g(X_0) = f(2, X_0) and h(X_0) = f(3, X_0) which are printed in line 4. in line 5, the monic greatest common divisor of g and h is computed, for which the polynomials are treated as polynomials with double coefficients. that this is necessary is demonstrated by the sixth line, in which the output is printed when ints are used to compute the greatest common divisor: clearly, the result is wrong. then, in line 7 of the output, we evaluate f at (T - 1, T - 2), yielding a univariate polynomial \ell(X_0) = f(X_0 - 1, X_0 - 2) with double coefficients. line 8 of the output shows its derivative \ell', and line 9 the monic greatest common divisor of \ell and \ell'.

note that the program is easy to read for people used to the notion of multivariate polynomials and c++.

in the second part of the series, i want to describe how to represent the polynomials. in the third part, i want to explain how partial derivatives are computed. the fourth part will describe how to do efficient evaluation, and the fifth part will explain the long division and euclidean algorithm implementations.

finally, the source code of the library polynomial.hpp, the test program poly-test.cpp and a simple makefile (using g++) can be downloaded as a zip file here (8 kb).

something which annoys the heck out of me, already for some time, which i wanted to mention here but never managed to.
its about the current (official) version of ubuntu: natty. (yeah, another ubuntu rant. yay.)
this distribution contains at least two packages which are broken. i.e. the programs contained in the packages are useless, they just don’t work. by default. since the release (in april), nothing changed. (yes, i checked the backports and the proposed updates.) needless to say that you can find out which source code lines to modify in many web forums, as well as bug reports here and there.
the packuages in question are xpdf and sshpass. two programs i used a lot. used, because now they are broken. xpdf crashes when you open a pdf file. and sshpass just hangs. for xpdf, i now use evince as a quick drop-in, but it is different. i want xpdf back. and for sshpass, i use… well, the keyboard.
i really don’t understand how something like this can happen. that packages break is ok. but that months after bugs have been reported, fixes have been described in the report’s discussions, that after this time, nothing happens? that’s just not acceptable.
of course, i can just fix the issue by deinstalling the packages, downloading the source, and compiling the programs myself. but then, i’m responsible to check for updates and security fixes myself. auto update will just ignore the programs. i’m not really willing to do that anymore. i have better things to do with my time.
maybe i should just install arch linux the next time i want to set up a new machine. or i should even just reinstall a machine and install arch linux instead of ubuntu. cannot get worse.

posted in: computer

in my last post, i wrote about how to maximize privacy for your site’s visitor. in the meantime, i programmed a mechanism which maximizes privacy when embedding youtube or vimeo videos. the idea is simple: i embed only a picture, and use javascript to replace the picture with the video player. in case javascript is disabled, the image has a link to the video, and in case javascript is there and the image is replaced by the video player, i enable autostart for the video. for the casual user, the result looks similar to just embedding the video, but the difference is that youtube or vimeo only get the visitor’s information if he wants to watch the video. here is an example:

[[for legal reasons, i do not want to include youtube videos here anymore. please click on this link to watch the video at youtube.]]

now, how do i do this? to retrieve the image, i use a php script so that the user will not interact directly with youtube or vimeo. the script obtains the url for the picture, retrieves the picture, and delivers it to the user. here’s the script:

 1 <?php
 2 // Tests a file name for being valid
 3 function isValidID($id)
 4 {
 5     return preg_match('#^([a-zA-Z0-9-_]+)$#', $id) == 1;
 6 }
 8 // Checks if the source is valid
 9 function isValidSource($source)
10 {
11     return ($source == 'youtube') || ($source == 'vimeo');
12 }
14 $id = $_GET['id'];
15 $source = $_GET['source'];
17 if (isValidID($id) && isValidSource($source))
18 {
19     // Find out URL
20     if ($source == 'youtube')
21     {
22         // Youtube is easy...
23         $url = "$id/0.jpg";
24     }
25     if ($source == 'vimeo')
26     {
27         $url = "$id.php";
28         $content = @file_get_contents($url); // avoid error message on 404
29         if ($content)
30         {
31             $content = unserialize($content);
32             $url = $content[0]['thumbnail_large'];
33         }
34         else
35             $url = false;
36     }
37 }
38 else
39     $url = false;
41 // Retrieve picture
42 if ($url)
43     $content = @file_get_contents($url); // avoid error message on 404
44 else
45     $content = false;
47 // Send picture, or print 404
48 if ($content)
49 {
50     // Deliver file
51     header('Status: 200 OK', true, 200);
52     header('HTTP/1.0 200 OK', true, 200);
53     header('Content-Type: image/jpeg');
54     header('Content-Transfer-Encoding: binary');
55     header('Content-Length: ' . strlen($content));
56     echo $content;
57     exit(); 
58 }
59 else
60 {
61     // This results in a 404
62     require_once('index.php');
63 }

note that the 404 display at the end works thanks to wordpress. if you have the script in another directory as your wordpress installation, or you are not using wordpress at all, you have to modify that part. just replace the require_once line by

1     header('Status: 404 File Not Found', true, 404);
2     header('HTTP/1.0 404 File Not Found', true, 404);
3     echo '<html><body>Error: invalid arguments or cannot retrieve picture.</body></html>';

or something similar.

ok. so let us assume that you put the script somewhere on the server, say at /tubeimage.php as i did. then, you need to insert the following javascript code into the header of your html files. in case you use wordpress, you can write a plugin to do that for you (i did that), or modify the wp-content/themes/yourtheme/header.php file, where you replace yourtheme by the name of the theme you’re using. add the following lines:

 1 <script type="text/javascript">//<![CDATA[
 2 function youtubeClick(id, w, h)
 3 {
 4     elt = document.getElementById('youtube-' + id);
 5     elt.innerHTML = '<object class="youtube-object type="application/x-shockwave-flash" data="' + id + '&amp;rel=0&amp;border=0&amp;autoplay=1" width="' + w + '" height="' + h + '"><param name="movie" value="' + id + '&amp;rel=0&amp;border=0&amp;autoplay=1" /></object></div>';
 6     elt.onclick = null;
 7 }
 8 function vimeoClick(id, w, h)
 9 {
10     elt = document.getElementById('vimeo-' + id);
11     elt.innerHTML = '<object class="youtube-object" type="application/x-shockwave-flash" data="' + id + '&amp;;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=00ADEF&amp;fullscreen=1&amp;autoplay=1&amp;loop=0" width="' + w + '" height="' + h + '"><param name="movie" value="' + id + '&amp;;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=00ADEF&amp;fullscreen=1&amp;autoplay=1&amp;loop=0"/></object>';
12     elt.onclick = null;
13 }
14 //]]></script>

note that the <![CDATA[...]] is the xhtml way of hiding javascript; if you’re using plain old html, you need to use <!– … –>.

now, you can add youtube or vimeo videos to your page as follows. for example, the video above was embedded as follows: <span id="vimeo-24456787" onClick="vimeoClick('24456787', 425, 341)"><a href="" rel="external" onclick="return false;"><img src="/tubeimage.php?id=24456787&amp;source=vimeo" width="425" height="341" /></a></span> note that 24456787 is the id of the vimeo video (and appears at four places in this snippet), and the player has size 425x341 (appears at two places in this snippet). for youtube videos, you write <span id="youtube-LFtSRR0xFec" onClick="youtubeClick('LFtSRR0xFec', 425, 341)"><a href=" rel="external" onclick="return false;"><img src="/tubeimage.php?id=LFtSRR0xFec&amp;source=youtube" width="425" height="341" /></a></span> here, LFtSRR0xFec is the id of the youtube video (and appears at four places in this snippet), and the player has size 425x341 (appears at two places in this snippet). if you are using wordpress, it is a good idea to create a plugin which automatically generates all the necessary code, so you can just write something like [youtube id="LFtSRR0xFec" width="425" height="341"] in your posts.

many websites leak information. for example, most websites including a facebook like button make your browser tell facebook “hey, i’m visiting this page”. some for google and their +1 button, and for most other of these fancy little “web 2.0 buttons” you can find nowadays at many places of the web. or they make your browser feed google analytics with your data, or any other web tracking service. and usually, you don’t notice any of this, as it appears hidden in the background.
in this post, i want to shed a bit more light on these things and their (possible) consequences, and tell a bit on how to avoid these problems both as a user and as a webmaster. in the beginning of spielwiese, i already wrote a little bit about this here, and i also mentioned the problem while writing about social networks. you might want to look at the first post if you don’t really know what is happening if you access a webpage with your browser.
in the examples below, i’m assuming that standard javascript and cookie settings are used.

what’s going on?

assume that some website you are visiting includes the (standard) facebook like button (via as soon as you access that page, its html code will contain

<iframe src=”…” scrolling=”no” frameborder=”0″ style=”border:none; width:450px; height:80px”></iframe>

your browser automatically accesses… to retrieve the content from that url and essentially puts it into the place of the iframe. moreover, if you’re a facebook user, and you are logged in, your browser will have cookies for * (containing your user id), which will be sent automatically with this http request. so at this point, without any javascript interaction, facebook already knows whether you are logged on, and who you are if that is the case. note that facebook could also set a cookie when none is already set, to be able to further track you. it seems like that is not the case, but if they would do, you probably won’t notice at all. now the html page sent by… includes several javascripts, which your browser will automatically execute, and which could sent more information like your screen resolution. facebook isn’t doing that, as far as i know, but they could with only very few people noticing, if at all.
another resource are web trackers, which try to gather statistical data for the webmaster who included them, but might also use this information for other things. these work similar to the like button: the user has to somehow include them. maybe as a 1x1 transparent pixel, or as a java script, or both. the pixel will ensure that basic data is sent even in case javascript is disabled or not available, as long as images are loaded. the java script will sent additional info making identification of the visitor easier, and with both accesses to the web server of the web tracker, cookies can be retrieved or set, allowing the web tracker to identify you along different sessions. they probably don’t know who you are, but they can distinguish you from your friends, even if you share your internet connection (and the same ip address!) with them and your computers are essentially identically configured.
but there also many other sources of leaking. for example, if a youtube video is included in the web page you’re accessing. in that case, your browser will ask for the flash player, and that retrieve a image from and, when you click the play button, will stream and play the video from other youtube servers. again, a lot of things can be leaked. if the flash plugin is disabled, this won’t happen, but most people have that one installed and active as otherwise, many sites will not work properly.

how to prevent your browser from leaking.

the radical solution is to disable javascript, disable cookies and disable all plugins like the flash plugin. but that still doesn’t solve the problem that basic http access data (browser id string, referrer, your ip address) is sent to certain included sites, for example if they are included using iframes or as images (maybe even transparent of size 1x1 so you won’t notice). so without add-ons, standard browsers can always be made to leak something.
for firefox, there are several helpful addons. two very helpful ones are the following:

  • noscript. this addon allows to say from which sites javascripts are allowed to run and from which not. unfortunately, this does not depend on the source site, so if you allow the facebook javascripts to work (which you need if you access facebook itself), they are also allowed to run if another site includes the like button.
  • requestpolicy. this addon allows to block access from sites to other sites, for all kind of requests (loading images, scripts, even loads made by the flash plugin). as the blocking is by default depening on both source and destination of the access, this block access to from any other site but, hence making it impossible for facebook to track you except if you allow a site to (temporarily or always from now on) load content from

note that both plugins require a lot of user interaction. most websites won’t work properly, and too many will look very ugly in case you don’t allow certain scripts to run or certain data to be fetched from different servers. it is annoying to find out which things you have to active without giving too much access, and can be very frustrating. but after some time, you’ll have the sites you’re using most set up properly, and most things you do in the web run without any more interaction. visits to new sites, though, are still adventurous.

how to prevent your page from making leak.

first, a few words why you should try not to leak. the first is trust: the visitors of your page trust you. in particular, they usually don’t want that you send their information to not perfectly trustable other sites. and then, there are users who block such things, like me. if i access your page and you rely, say, on google analytics to track and count your users, you won’t be able to see me. i’ll be missing in your statistics, even though i accessed your page. (and since i’m advanced enough to use addons to see where my data is supposed to be sent, i know how much you care about my data, and how much i can trust you.) so i do appreciate if sites do not leak information.
there are two basic strategies to avoid leaking:

  • include external things only when the user needs them. for example, you could use javascript yourself to display a decoy version of the facebook like button or the google+ +1 button (stored on your server!), and as soon as the user clicks on it, your script loads the real button and the associated javascripts and forwards your click to them. or you just use a decoy version of the button (again, stored on your server!) and just link it, for example to a facebook url which will then allow the facebook user to share your page. for example, instead of the like button described above, you could do something like this:
    <a href=’ of your site’ target=’_blank’ title=’Share!’><img alt=’Share!’ src=’images/facebook.gif’ /></a>

    here, images/facebook.gif should obviously be a image on your server.

  • act as a proxy. if you want to show information, like how many people like your page, show some faces who like it, show some statistics (how many users are online right now etc.), the usual solution is to include some javascript/iframe from the content provider (facebook, your web tracker, …), so the user’s browser will access the data directly from that provider. with the drawback that the provider knows that the user asked. (which is necessary for web trackers to work, but not for showing the numer of people who like a page.) there are also things which indicate your online status on skype, icq, or other services, which you can include in your page and which make the user’s browser access some other site. for most instances, one can avoid this problem by adding some kind of proxy: instead of making the user retrieve some (automatically generated) image, you link to a script on your page which retrieves the image from the provider’s server and forward it to the user. then the provider just sees that your server is asking for the picture, while the user still sees the information on the picture without giving information to the provider.

note that both strategies give you (more or less) extra work. especially setting up a proxy script on your server is very non-trivial, if you cannot just use something publicly available. and more importantly, if you do not have good enough access to the server – for example if you have a blogspot blog, or a wordpress blog running on – you are severely limited in what you can do. also note that you can combine the strategies. for example, if you include many youtube videos, and you use a javascript to only start the flash player when the user clicks on it, you could use the proxy strategy to let your server automatically retrieve the picture from which is shown until the user clicks. [edit: this post describes how this can be done. both spielwiese and musikwiese now use this technique.] note that it is also a good idea to check your own site for leaking, sometimes you’ll get surprises as certain plugins for wordpress for example include javascripts from random places, sometimes completely unnecessarily. for that, you can use firefox with noscript and requestpolicy installed (if you don’t like the addons, disable them but install them anyway, to be able to enable them from time to time to test your own site) and browse your site. in the firefox status bar, you’ll see a flag for requestpolicy; if it is red, then requestpolicy blocks something. click the flag to see what is blocked, and to allow (or block) certain destinations. you can also use this (as well as noscript) to test which (external) javascripts are needed for your page to be usable.
you have to decide for yourself how much data you make your visitors leak. and remember, most visitors appreciate that you don’t spread their data unnecessarily. and some visitors can even check what you’re doing, and know how much information you (try to) make them leak.