# implementing multivariate polynomials in c++ using templates [part 3/5].

welcome to the third part of the multivariate polynomials in c++ using templates series. this part will explain how to compute partial derivatives using partial template specialization tricks.

the idea is very simple. given , written as with , we have and this allows us to implement partial derivatives iteratively. we assume that the number of the indeterminate by which one should differentiate is already known during compile time. the implementation will be done by a templated function

1 template<int N, int n, class T>
2 inline poly<n, T> derivative(const poly<n, T> & p);


the purpose of the template parameter N is to say that should be computed. given a polynomial p, the function can be called simply by derivative<N>(p). to implement derivative<N>(p), we would like to again use partial template specialization to distinguish between the cases N == 0 and N > 0. unfortunately, partial template specialization is not available for template functions. hence, we need to introduce a helper template class:

1 template<int N, int n, class T>
2 class derivative_computer;


it should contain one static function, static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest); which allows to compute the partial derivative of the polynomial in src and store it into dest. then derviative<N>(p) can simply be implemented by

1 template<int N, int n, class T>
2 inline poly<n, T, std::allocator<T> > derivative(const poly<n, T, std::allocator<T> > & p)
3 {
4     poly<n, T> res;
5     derivative_computer<N, n, T>::computeDerivative(p, res);
6     return res;
7 }


now let us focus on the class template derivative_computer<N, n, T>. the general implementation, for N > 0, uses the second formula from above and iteratively calls derivative_computer<N-1, n-1, T>::computeDerivative():

 1 template<int N, int n, class T>
2 class derivative_computer
3 {
4 public:
5     static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest)
6     {
7         dest.d_value.resize(src.degree() + 1);
8         for (int i = 0; i <= src.degree(); ++i)
9             derivative_computer<N - 1, n - 1, T>::computeDerivative(src[i], dest[i]);
10         dest.normalize();
11     }
12 };


now we are left with the case that N = 0. the implementation here is also simple:

 1 template<int n, class T>
2 class derivative_computer<0, n, T>
3 {
4 public:
5     static inline void computeDerivative(const poly<n, T> & src, poly<n, T> & dest)
6     {
7         dest.d_value.resize(src.degree() >= 0 ? src.degree() : 0);
8         for (int i = 1; i <= src.degree(); ++i)
9             dest[i - 1] = src[i] * (T)i;
10         dest.normalize();
11     }
12 };


here, we cast i to an object of type T before multiplication. from this point on, no further iteration is needed.

we are left with the case that some (malicious or just confused) user calls derivative<N>(p) for a polynomial p of type poly<n, T> with n <= N. in that case, the compiler tries to instanciate derivative_computer<k, 0, T>::computeDerivative() and tries to treat poly<0, T> as a polynomial. this will result in strange error messages, and it would be nice if there is something more readable. for that reason, i added another two specializations:

 1 template<class T>
2 class derivative_computer<0, 0, T>
3 {
4 public:
5     class ERROR_N_must_be_strictly_less_than_n_in_derivative_template;
6
7     static inline void computeDerivative(const poly<0, T> &, poly<0, T> &)
8     {
9         ERROR_N_must_be_strictly_less_than_n_in_derivative_template();
10     }
11 };
12
13 template<int N, class T>
14 class derivative_computer<N, 0, T>
15 {
16 public:
17     class ERROR_N_must_be_strictly_less_than_n_in_derivative_template;
18
19     static inline void computeDerivative(const poly<0, T> &, poly<0, T> &)
20     {
21         ERROR_N_must_be_strictly_less_than_n_in_derivative_template();
22     }
23 };


these two definitions compile fine, but as soon as one of them is instanciated by the compiler, it will try to find out something about the class ERROR_N_must_be_strictly_less_than_n_in_derivative_template and not succeed, and spit out some error message hopefully containing the string ERROR_N_must_be_strictly_less_than_n_in_derivative_template. for example, if i replace the line computing the partial derivative of the intersection equation in poly-test.cpp by std::cout << derivative<2>(l) << "\n";, g++ will complain as follows:

1 In file included from poly-test.cpp:2:0:
2 polynomial.hpp: In static member function ‘static void derivative_computer<N, 0, T>::computeDerivative(const poly<0, T>&, poly<0, T>&) [with int N = 1, T = double]’:
3 polynomial.hpp:1200:13:   instantiated from ‘static void derivative_computer<N, n, T>::computeDerivative(const poly<n, T>&, poly<n, T>&) [with int N = 2, int n = 1, T = double]’
4 polynomial.hpp:1247:5:   instantiated from ‘poly<n, T> derivative(const poly<n, T>&) [with int N = 2, int n = 1, T = double]’
5 poly-test.cpp:19:33:   instantiated from here
6 polynomial.hpp:1227:9: error: invalid use of incomplete type ‘struct derivative_computer<1, 0, double>::ERROR_N_must_be_strictly_less_than_n_in_derivative_template’
7 polynomial.hpp:1223:11: error: declaration of ‘struct derivative_computer<1, 0, double>::ERROR_N_must_be_strictly_less_than_n_in_derivative_template’
8 make: *** [poly-test.o] Error 1


this is not exactly readable, but the uppercase ERROR quickly lets you look at the important part of that message. (note that i removed the allocators from the error messages, so that it is in sync with the above writing. if you try it out yourself with the source code, you'll see a slightly more complicated error message.)

this completes part three of the series. in the next part, i will discuss how to evaluate polynomials efficiently using templates.

# implementing multivariate polynomials in c++ using templates [part 2/5].

welcome to the second part of the multivariate polynomials in c++ using templates series. this part will explain how to store multivariate polynomials via templates, how to implement storage and operations.

### multivariate polynomial rings in mathematics.

as you might recall from your algebra courses, denotes the (ring, in case itself is a ring) of multivariate polynomials in the indeterminates (also called variables) with coefficients in . note that i numerate the indeterminates beginning with zero, which is rather intuitive for a c++ programmer, though often not for a mathematician. there are different ways to treat such polynomials; for example, one can work with multiindices ; for such an , let be an element of and define ; then a polynomial can be represented as . clearly, one does not want to work with infinite objects, whence one restricts to a subset of , for example of type . this is one way of representing multivariate polynomials, and fine for several applications, but not the most easy one to implement in c++ as a template depending on .

another way to treat multivariate polynomials is to iteratively write (one could also use , but i will stick to the other as it is more natural for the implementation i'm describing). hence, a polynomial is interpreted as an <i>univariate</i> polynomial with coefficients in the ring . this allows a recursive definition which is well-suited for templates.

### implementing the data structure using templates.

now let us leave the realm of mathematics for a while and talk about c++. the ring mentioned above will be replaced by an abstract type T, which will be a template parameter. note that in the discussion below, we will ignore allocators, and just talk about them at the very end of this article. our main template type will be template<int n, class T> class poly;, which essentially contains a std::vector<poly<n - 1, T> > to store coefficients. (the storage is in fact more delicate, we will discuss this below.)

clearly, this is not enough yet. the compiler has no clue that we are talking about multivariate polynomials, and that n should always be non-negative, and it also does not knows that poly<0, T> is essentially an element of T. the way to solve this is partial template specialization: we create a specialized version of poly<n, T> for the case n == 0. the declaration is as follows:

 1 template<int n, class T>
2 class poly; // forward declaration
3
4 template<class T>
5 class poly<0, T> // specialize for n = 0
6 {
7 private:
8     T d_value;
9 public:
10     ...
11 }
12
13 template<int n, class T>
14 class poly // the general implementation
15 {
16 private:
17     std::vector<poly<n - 1, T> > d_value;
18 public:
19     ...
20 }


the specialization for n == 0 has a rather simple implementation. besides a constructor accepting an object of type T, with default value T(), it has a casting operator to T, provides an isZero() predicate, assignment and arithmetic operations, output to std::ostreams, and using std::swap() to swap two poly<0, T> objects, it has a simple “evaluate” function implemented as operator() which is essentially the same as the cast operator operator T().

the general implemenation is more interesting. there is a method normalize() which ensures that d_value.back() is the leading coefficient and d_value.size() - 1 the degree of the polynomial; note that we use -1 for the degree of the zero polynomial. these conditions are used for example in the implementation of isZero(), leading() and degree(). for most operations, the polynomial itself makes sure it is normalized, but when accessing a non-const polynomial with operator[], it might change its degree. after that, one better calls normalize(). it would also be possible to do this automatically using more template tricks, but i wanted to leave some responsibility with the user of the template. :-)

the arithmetic operations are pretty straightforward to implement; one has to resize d_value and later call normalize() to make sure nothing bad happens. for multiplication, one creates a temporary which contains the product (the maximal degree of the product is known, but normalize() should be run afterwards since there is no garuantee that the ring containing the objects of type T is zero-divisor free). as an example, adding two polynomials is accomplished by the following operator:

 1     inline poly operator + (const poly & q) const
2     {
3         poly r(*this);
4         if (r.d_value.size() < q.d_value.size())
5             r.d_value.resize(q.d_value.size());
6         for (unsigned i = 0; i < q.d_value.size(); ++i)
7             r[i] += q[i];
8         r.normalize();
9         return r;
10     }


first, a copy of the first argument (*this) is made. then, its size is adjusted to make sure there's enough room. then, simply all coefficients of the second argument are added, and finally the result is normalize()d and returned. as another example, here's polynomial multiplication:

 1     inline poly operator * (const poly & p) const
2     {
3         int d = p.degree() + degree();
4         if (d < -1)
5             d = -1;
6         poly r(true, d + 1);
7         if (!isZero() && !p.isZero())
8             for (unsigned i = 0; i < r.d_value.size(); ++i)
9                 for (unsigned j = 0; j < d_value.size(); ++j)
10                     if (i < j + p.d_value.size())
11                         r[i] += d_value[j] * p[i - j];
12         r.normalize();
13         return r;
14     }


note that as we are inside the template, we can write poly instead of poly<n, T>. moreover, here, use is made of a special constructor poly(bool, unsigned size) which initializes the polynomial with size coefficients. the bool is just to ensure that the compiler won't confuse this constructor with any of the others. this constructor is made private.

printing the polynomial to a string is slightly more involved, as the correct names should be used for the indeterminates. for this, i use a method void print(std::ostream & s, int N = n) const which keeps track of the maximal number of indeterminates. the operator<< is then implemented as follows:

1     friend inline std::ostream & operator << (std::ostream & s, const poly & p)
2     {
3         p.print(s);
4         return s;
5     }


the implementation of print(...) works recursively, and N keeps track of n of the top-level instance. the indeterminate at the current level has index N - n.

finally, let me announce evaluation of polynomials. this will be done with operator(). the implementation inside poly<n, T> is very easy:

1     template<class S>
2     inline poly_evaluator<n, T, S> operator() (const S & x) const
3     {
4         return poly_evaluator<n, T, S>(*this, x);
5     }


the interesting part, namely the implementation of the poly_evaluator<n, T, S> template, will be discussed in part four of this series.

### making the storage more intelligent.

in the above code, we use a simple std::vector<poly<n - 1, T> > to store the coefficients, which are (except for n=1) itself polynomials. for example, doing a simple resize of d_value in poly<11, T> will result in maybe many objects of type poly<10, T> to be created, copied and destroyed, and this in turn causes maybe many more objects of type poly<9, T> to be created, copied and destroyed, and so on. this is not exactly optimal. one solution is to replace std::vector<poly<n, T> > by std::vector<poly<n, T>*>, i.e. store pointers to polynomials instead of polynomials itself. this will make resizes very efficient, since only pointers have to moved around. unfortunately, it has a drawback: in the case n == 1, this is often not very optimal, as if T==int, we are storing pointers to ints, which are usually at least the same size as the ints themselves. this leads to inefficient code. one solution is to introduce another partial specialization for n=1, but this results in a lot of duplicated code. a more clever solution is to create another template, like template<class T, bool usePointers> class poly_ivector;, which stores pointers to T and makes sure objects are properly created and destroyed in case usePointers == true, and which is simply std::vector<T> in disguise in case usePointers == false. it provides a simple interface, offering the number of stored elements (unsigned size() const), to resize the array (void resize(unsigned)), accessing elements via operator[], and accessing the last element via back(). finally, it offers a void swap(poly_ivector<T, usePointers> &) method to swap the contents of two poly_ivector<T, usePointers> instances. if you are interested in the implementation of poly_ivector<T, bool>, please look at the source code.

### allocators.

in the real implementation, i've added basic support for allocators. these allow to provide own allocation functions (malloc/free) for memory used to store the coefficients. note that my template is not working perfectly with allocators having states, but for state-less allocators, it works fine. if you need to use it with allocators having states, better check if the code does exactly what you want.

i used the polynomial class with a self-written allocator which provides thread local memory management. this was necessary for the aforementioned toy project, for which i wanted to do massive parallel computation (128 threads). it turned out that most threads were waiting for others, as there was a lot of allocation/deallocation going on and the global memory manager, protected by a mutex, made everyone wait. after throwing in a not very efficient, but working memory manager with thread-local storage for the management data, the program ran perfectly parallel.

# implementing multivariate polynomials in c++ using templates [part 1/5].

in this (short) series, i want to explain how to implement support for multivariate polynomials in c++ using templates. for a toy project i worked on mostly last weekend, i needed to work with polynomials in different ways:

• create a polynomial in three variables with double coefficients;
• computing the partial derivatives of this polynomial;
• evaluate the above polynomials in many points (with double coordinates);
• plug in different univariate polynomials (of form ) into the first polynomial to obtain a univariate polynomial whose zeros correspond to the intersection of the surface defined by the multivariate polynomial together with a parameterized line;
• test whether a univariate polynomial has zeros in using sturm chains.

for a given, simple enough polynomial, everything can be done “by hand”, but sooner or later, this is annoying and too restricting. and it gets really annoying for the second part, where i wanted to obtain the univariate polynomial. therefore, i thought about how to create a multivariate polynomial template which provides means to implement all of the above in an easy manner. in the end, i had a multivariate polynomial template with a nice interface, for which – as a small example – the following test program compiles:

 1 #include <iostream>
2 #include "polynomial.hpp"
3
4 int main()
5 {
6     poly<2, int> f = Monomial<int>(1, 2) + 3 * Monomial<int>(4, 5);
7     std::cout << f << "\n";
8     int e = f(4)(2);
9     std::cout << e << "\n";
10     std::cout << derivative<1>(f) << "\n";
11     poly<1, int> g = f(2), h = f(3);
12     std::cout << g << " and " << h << "\n";
13     std::cout << GCD<double>(g, h) << "\n";
14     // cast polynomials to poly<1, double> first; otherwise,
15     // result will be incorrect since int is not a field
16     std::cout << GCD(g, h) << "\n";
17     // to prove our point, check for yourself that the
18     // result is X_0^5 instead of X_0^2
19     poly<1, double> l = f(Monomial<double>(1) - 1)(Monomial<double>(1) - 2);
20     std::cout << l << "\n";
21     std::cout << derivative<0>(l) << "\n";
22     std::cout << GCD(l, derivative<0>(l)) << "\n";
23 }


compiling it via g++ -O3 -Wall -o poly-test poly-test.cpp, it yielded the following output when run:

1 (1 X_1^2 X_0 + 3 X_1^5 X_0^4)
2 24592
3 (2 X_1 X_0 + 15 X_1^4 X_0^4)
4 (2 X_0^2 + 48 X_0^5) and (3 X_0^2 + 243 X_0^5)
5 1 X_0^2
6 1 X_0^5
7 (-100 + 632 X_0 + -1781 X_0^2 + 2905 X_0^3 + -3006 X_0^4 + 2043 X_0^5 + -912 X_0^6 + 258 X_0^7 + -42 X_0^8 + 3 X_0^9)
8 (632 + -3562 X_0 + 8715 X_0^2 + -12024 X_0^3 + 10215 X_0^4 + -5472 X_0^5 + 1806 X_0^6 + -336 X_0^7 + 27 X_0^8)
9 1


let me explain this example. first, a polynomial in two indeterminates with int coefficients is created as the linear combination of two monomials ( and ), which is the first line of the output. then the polynomial is evaluated in , and the result is printed as line 2. after that, the partial derivative of the polynomial with respect to is computed and written as line 3. after that, the polynomial is evaluated in in 2 and 3, yielding two new polynomials and which are printed in line 4. in line 5, the monic greatest common divisor of and is computed, for which the polynomials are treated as polynomials with double coefficients. that this is necessary is demonstrated by the sixth line, in which the output is printed when ints are used to compute the greatest common divisor: clearly, the result is wrong. then, in line 7 of the output, we evaluate at , yielding a univariate polynomial with double coefficients. line 8 of the output shows its derivative , and line 9 the monic greatest common divisor of and .

note that the program is easy to read for people used to the notion of multivariate polynomials and c++.

in the second part of the series, i want to describe how to represent the polynomials. in the third part, i want to explain how partial derivatives are computed. the fourth part will describe how to do efficient evaluation, and the fifth part will explain the long division and euclidean algorithm implementations.

finally, the source code of the library polynomial.hpp, the test program poly-test.cpp and a simple makefile (using g++) can be downloaded as a zip file here (10 kb).

# how to create a simple javascript image viewer.

in this post, i want to explain how to create a simple neat javascript/css “fullscreen” image viewer, or more concretely, the one i implemented myself here on spielwiese. feel free to re-use the code. it is not garuanteed to work everywhere, but it works fine with firefox, konqueror and chromium/chrome, to my knowledge, and probably also with safari (as it is based on konqueror), and hopefully with opera and more modern versions of internet explorer. i didn’t include any hacks to make it work under older versions of internet explorer or other browsers, but feel free to enhance my version and add a link to yours in the comments here.
i assume that so far, you include your images using

<a href=”path/to/image.jpeg”><img src=”path/to/image-thumbnail.jpeg” /></a>.

this displays a thumbnail (called path/to/image-thumbnail.jpeg) and allows the user to click it, to be forwarded to the real image (called path/to/image.jpeg). maybe you also use an alt tag, or specify the size of the thumbnail. that doesn’t really matter.
the idea is to add a javascript onclick event to the <a> tag, which will become active once the link (i.e. the image) is clicked and show the “popup”. in case the user disabled javascript, everything will fall back to the old behaviour: the user will be redirected to the image and no harm is done by the addition of the new code. the above html code has to be modified as follows:

<a href=”path/to/image.jpeg” onclick=”showPicture(‘path/to/image.jpeg’)“><img src=”path/to/image-thumbnail.jpeg” /></a>

the addition is printed in bold. the drawback of this method is that you have to replicate the image’s name, but then, you already had to do that for the thumbnail, and if you have automated the whole process, changing your script to insert this part shouldn’t be any problem.
so now let me show you how to implement the showPicture() function.

 1 <script type="text/javascript">
2 //<![CDATA[
3 function showPicture(url) {
4   document.body.style.overflow = "hidden";
5   el = document.createElement("div");
6   el.style.cssText = "background: black; color: white; position: fixed; left: 0px; top: 0px; width: 100%; height: 100%; text-align: center; z-index: 1000;";
7   el.onclick = Function("this.parentNode.removeChild(this); document.body.style.overflow = 'visible';");
8   var code = '<div style="width: 100%; height: 100%; display: table; position: static; overflow: hidden;"><div style="display: table-cell; vertical-align: middle; width:100%;">';
9   code += '<img onclick="this.parentNode.parentNode.parentNode.removeChild( this.parentNode.parentNode); document.body.style.overflow = \'visible\';" ';
10   code += 'style="max-width:' + window.innerWidth + 'px; max-height: ' + (window.innerHeight - 20) + 'px;" src="' + url + '" /><br />';
11   code += 'click the photo to return to the blog. click <a href="' + url + '">here</a> to go to the image file. copyright &copy; by whoever.';
12   code += '</div></div>';
13   el.innerHTML = code;
14   document.documentElement.appendChild(el);
15   return false;
16 }
17 //]]>
18 </script>


let me explain what it does. lines 1, 2, 17 and 18 tell your browser that there’s a javascript coming up in an xhtml compatible way; if you are using plain html (instead of xhtml), you might want to change lines 2 and 17. now let’s continue with the main function, showPicture(), expecting one parameter called url.
first, the function disables scrolling for the main document (line 4), as otherwise the user can scroll around while viewing the picture (which won’t move). then it creates a new <div> tag (line 5) and eventually adds it as a child to the document root (line 14). the tag is equiped with a css style (line 6) telling the browser to display it at a position relative to the browser’s viewpoint, namely in the upper right corner with the size equal to the viewpoint size – meaning it covers the whole viewpoint. it is colored in black, with text in it being centered and written in white. then, in line 7, a onclick event is set for the tag. it simply waits until the tag is clicked (or anything inside that tag, to be precise) and removes the tag, which closes the “fullscreen” view. it also enables scrolling of the document (which was disabled in line 4). the lines 8 to 12 build up the html code which will be put into the <div> tag in line 13. in line 8, we create two <div> tags which will ensure a vertical centering in sane browsers. lines 9 and 10 inserts the image specified in url; here, i set the max-width and max-heigth parameters to essentially the viewport size, which ensures that the image will be rescaled to fit into the viewpoint. i substract a bit on the vertical size so there’s some space for the “disclaimer”. the “disclaimer” is added in line 11, it tells the user how to get rid of the fullscreen view, provides a link to the image file, and shows a copyright notice.
you can adjust the code as you like, for example you can change the colors and the disclaimer. also adding keyboard support might be nice, so you can get rid of the viewer when pressing escape. i guess it isn’t too hard to do, i’ll probably do that somewhen later. you can also move all the formatting into your .css file, by replacing the el.style.cssText = “…”; by el.className = “imageviewer”; or however you want to call the css class for that <div>.
posted in: www
tags:
places:

# a simple bash based scheduler, part two.

i recently presented a bash script which schedules computational tasks on multi-core machines. in the meanwhile, i fixed a bug in the display, made the program more flexible, and started to use local variables instead of global variables only. the new version is also more intelligent: it tries to adjust the running times of its controlled processes so that the running times are not far apart.

  1 #/bin/bash
2
3 initProfile() {
4     PROFILEFN=bigprimerunner-$PROFILE.profile 5 CORES=grep "^CORES "$PROFILEFN
6     CORES=${CORES/CORES } 7 STARTUP=grep "^STARTUP "$PROFILEFN
8     STARTUP=${STARTUP/STARTUP } 9 eval STARTUP=$STARTUP
10 }
11
12 # Startup
14 if [ "$1" != "" ] 15 then 16 PROFILE=$1
17 else
18     PROFILE=hostname
19 fi
20 if [ "$2" != "" ] 21 then 22 LOADMODIFIER=$2
23 fi
24 initProfile
25 if [ "$CORES" == "" ] 26 then 27 echo "Cannot load profile$PROFILEFN!"
28     exit
29 fi
30 echo Cores: $CORES 31 echo Load modifier:$LOADMODIFIER
32
33 # The command to execute
34 COMMAND=primefinder
35
36 computeFreecores() {
37     FREECORES=0
38     local DAY=date +%w
39     local LINE=grep "^$DAY "$PROFILEFN
40     local LINE=${LINE/$DAY }
41     local HOUR=date +%k
42     for ((i=0;i<$HOUR;++i)); 43 do 44 local LINE=${LINE#* }
45     done
46     local LINE=${LINE/ *} 47 eval FREECORES=$LINE
48     # Also determine how many jobs should be started
49     STARTUP=grep "^STARTUP " $PROFILEFN 50 STARTUP=${STARTUP/STARTUP }
51     eval STARTUP=$STARTUP 52 } 53 54 killProcess() { # One argument: PID of process to kill 55 local PID=$1
56     local FILE=lsof -p $PID -F n 2>/dev/null | grep primedatabase | grep -v "\.nfs" 57 kill$PID 2> /dev/null
58     local A=${FILE#n*} 59 local A=${A/ (nfs*}
60     if [ "$A" != "" ] 61 then 62 rm$A
63         echo Killed $PID with open file$A
64     else
65         echo Killed $PID with no open file 66 fi 67 } 68 69 stopsignal() { 70 local PIDS=jobs -p 71 echo 72 echo 73 echo Terminating... 74 echo Killing:$PIDS
75     for PID in $PIDS; 76 do 77 killProcess$PID
78     done
79     echo done.
80     exit
81 }
82
83 trap 'stopsignal' 2
84
85 computeFreecores
86
87 echo "Starting $STARTUP instances (in$BINDIR)"
88
89 filterRunning() { # Removes all PIDs from the arguments which are currently stopped
90     ps -o pid= -o s= $* | grep R | sed -e "s/R//" 91 } 92 93 filterStopped() { # Removes all PIDs from the arguments 94 ps -o pid= -o s=$* | grep T | sed -e "s/T//"
95 }
96
98     computeFreecores
99     local LOAD=uptime
100     local LOAD=${LOAD#*average: } 101 local LOAD=${LOAD/,*}
102     local LOAD=${LOAD/.*} 103 ADD=$[CORES-FREECORES-(LOAD+LOADMODIFIER)]
104     local JOBS=jobs -p
105     local JOBS=filterRunning $JOBS 106 echo "Load:$[LOAD+LOADMODIFIER], Intended number of free cores: $FREECORES, Running: echo$JOBS | wc -w, Started: jobs -p | wc -l (should be $STARTUP)" 107 } 108 109 continueOne() { 110 local JOBS=jobs -p 111 local JOBS=filterStopped$JOBS
112     if [ "$JOBS" != "" ] 113 then 114 local PID=ps -o pid= --sort +time$JOBS | head -1
115         echo Continuing $PID... 116 kill -SIGCONT$PID
117     fi
118 }
119
120 stopOne() {
121     local JOBS=jobs -p
122     local JOBS=filterRunning $JOBS 123 if [ "$JOBS" != "" ]
124     then
125         local PID=ps -o pid= --sort -time $JOBS | head -1 126 echo Stopping$PID...
127         kill -SIGSTOP $PID 128 fi 129 } 130 131 killOne() { 132 local JOBS=jobs -p 133 if [ "$JOBS" != "" ]
134     then
135         local PID=ps -o pid= --sort -time $JOBS | head -1 136 killProcess$PID
137     fi
138 }
139
140 launchOne() {
141     echo "Launching \"$COMMAND\"..." 142$COMMAND &
143     sleep 1.5
144 }
145
146 computeTotaltimeInSecs() {
147     # Input: $1 148 # Output:$TOTALSECS
149     local I=$1 150 local SECS=${I##*:}
151     local REST=${I%:*} 152 local MINS=${REST##*:}
153     local REST=${REST%:*} 154 local HOURS=${REST##*-}
155     local DAYS=expr "$REST" : '$$[0-9]*-$$' 156 local DAYS=${DAYS%-}
157     if [ "$DAYS" == "" ] 158 then 159 local DAYS=0 160 fi 161 if [ "$HOURS" == "" ]
162     then
163         local HOURS=0
164     fi
165     if [ "$MINS" == "" ] 166 then 167 local MINS=0 168 fi 169 echo "((($DAYS * 24) + $HOURS) * 60 +$MINS) * 60 + $SECS" | bc 170 } 171 172 adjustProcesses() { 173 local JOBS=jobs -p 174 local JOBS=filterRunning$JOBS
175     if [ "$JOBS" != "" ] 176 then 177 local STOPPID=ps -o pid= --sort -time$JOBS | head -1
178         local JOBS=jobs -p
179         local JOBS=filterStopped $JOBS 180 if [ "$JOBS" != "" ]
181         then
182             local CONTPID=ps -o pid= --sort +time $JOBS | head -1 183 # Compute times 184 local I=ps -o time=$STOPPID
185             local STOPSEC=computeTotaltimeInSecs $I 186 local I=ps -o time=$CONTPID
187             local CONTSEC=computeTotaltimeInSecs $I 188 # Compare times 189 local CT=echo$CONTSEC+60*5 | bc
190             if [ $STOPSEC -gt$CT ]
191             then
192                 echo Stopping $STOPPID and continuing$CONTPID
193                 kill -SIGSTOP $STOPPID 194 kill -SIGCONT$CONTPID
195             fi
196         fi
197     fi
198 }
199
200 # Start programs in the background
202 for ((i=1;i<=STARTUP;++i));
203 do
204     launchOne
205     if [ $i -gt$ADD ]
206     then
207         sleep 1
208         kill -SIGSTOP %$i 209 fi 210 done 211 212 # Start mainloop 213 while [ 1 ] 214 do 215 sleep 60 216 217 # Determine how many processes should be added/removed 218 determineToAdd 219 220 # Stop/continue processes 221 if [$ADD -gt 0 ]
222     then
224         echo ADD:$ADD 225 for ((i=0;i<ADD;++i)) 226 do 227 continueOne 228 done 229 fi 230 if [$ADD -lt 0 ]
231     then
232         REM=$[-ADD] 233 # Remove processes 234 echo REMOVE:$REM
235         for ((i=0;i<REM;++i))
236         do
237             stopOne
238         done;
239     fi
240
241     # Launch new processes or kill running ones
242     CURRLAUNCHED=jobs -p | wc -l
243     if [ $STARTUP !=$CURRLAUNCHED ]
244     then
245         if [ $STARTUP -lt$CURRLAUNCHED ]
246         then
247             echo kill: $STARTUP$CURRLAUNCHED
248             for ((i=STARTUP;i<CURRLAUNCHED;++i));
249             do
250                 killOne
251             done;
252         else
253             echo add: $CURRLAUNCHED$STARTUP
254             for ((i=CURRLAUNCHED;i<STARTUP;++i));
255             do
256                 launchOne
257             done;
258         fi
259     fi
260     sleep 2
261
264 done

posted in: computer
tags:
places:

# a simple bash based scheduler.

ever had the problem that you have access to a big machine (with many cores), and you want to run many (tens of thousands) small computations, but you want to make sure that not too many cores are used?
i’ve had this problem, and since i now have a pretty nice (i think so) solution, i thought that maybe more people are interested in it. so here’s my setup. i have a program, let’s call it primefinder, which, for a certain input n (where n is a natural number ≤ 21000), computes a prime of n bits with special properties. the program loops over all possible n, and checks for each n if a file n.prime exists. if it does not, it creates it (with zero content), computes the prime (which can take between minutes and days), writes the prime into the file and continues with the next file. this simple task distribution technique allows me to run the program in parallel on different machines (since the files are in a nfs folder) with many instances on each machine. now at our institute, we have a big computation machine (64 cores) and four user machines (on which the users work, each 32 cores). since the user machines are often not intensively used (and that only during certain times of the day), i want to use these as well. but there should be enough cores free, so the users won’t notice that there are computations going on in the background. on the computation server, also other people want to run something, so there should also be some free cores. optimally, my program would somehow decide how many cores are used by others, and use the rest. or most of them, to leave some free, especially on the user machines.
after a suggestion by our it guys, i started writing a bash script which controls the instances of my program on the same machine. the first version used the time of the day to determine the number of processes. everything was computed in terms of the number of cores of the machine, the load (with a load modifier applied, since some machines have uninterruptable processes running which do not effectively do something, and which won’t go away until the next reboot) and the hour of the day. but it is not easy to find a good scheme which yields good results on all machines. something which works well on the user machines is wasting processor time on the computation server.
so today i rewrote the program to use profiles. a profile contains information on the number of cores (this is necessary since the computation server has hyperthreading enabled, and thus returns twice the number of cores), the number of processes to be started, and the number of cores to be left free during each hour and day of a week. so on weekends or nights, i choose lower numbers for the free cores for the user machines, while for the computational server the number is always 1.
a profile can look like this (this is from a user machine, the file is called primefinderrunner-user.profile for later reference):

1 CORES 32
2 STARTUP $[CORES-CORES/8] 3 0$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8] 4 1$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/4] $[CORES/4]$[CORES/4] $[CORES/8] 5 2$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/4] $[CORES/4]$[CORES/4] $[CORES/8] 6 3$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/4] $[CORES/4]$[CORES/4] $[CORES/8] 7 4$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/4] $[CORES/4]$[CORES/4] $[CORES/8] 8 5$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/2] $[CORES/2]$[CORES/4] $[CORES/4]$[CORES/4] $[CORES/8] 9 6$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/8] $[CORES/8]$[CORES/16] $[CORES/16]$[CORES/16] $[CORES/16]  the line with prefix CORES gives the number of cores. the line prefixed by STARTUP gives the number of processes to run (at most); here, we use 7/8 of the number of cores. the lines prefixed by a number between 0 (sunday) and 6 (saturday) have 24 entries following: every entry (seperated by exactly one space, as the prefix itself is separated by exactly one space from the entries!) says how many cores should be free at each time of the day. usually during night (up to 7 am) at least 1/16 of the total number of cores should be free, while during workday (8 am to 7 pm) half of the cores should be free. of course, the numbers are different for weekends (saturday and sunday) than for the other working days. now the script itself looks like this (for reference, the filename is primefinderrunner.sh):  1 #/bin/bash 2 3 initProfile() { 4 PROFILEFN=primefinderrunner-$PROFILE.profile
5     CORES=grep "^CORES " $PROFILEFN 6 CORES=${CORES/CORES }
7     STARTUP=grep "^STARTUP " $PROFILEFN 8 STARTUP=${STARTUP/STARTUP }
9     eval STARTUP=$STARTUP 10 } 11 12 LOADMODIFIER=0 13 if [ "$1" != "" ]
14 then
15     PROFILE=$1 16 else 17 PROFILE=hostname 18 fi 19 if [ "$2" != "" ]
20 then
21     LOADMODIFIER=$2 22 fi 23 initProfile 24 if [ "$CORES" == "" ]
25 then
26     echo "Cannot load profile $PROFILEFN!" 27 exit 28 fi 29 echo Cores:$CORES
30 echo Load modifier: $LOADMODIFIER 31 32 computeFreecores() { 33 # two arguments: day (0..6) and hour (0..23) 34 FREECORES=0 35 DAY=date +%w 36 LINE=grep "^$DAY " $PROFILEFN 37 LINE=${LINE/$DAY } 38 HOUR=date +%k 39 for ((i=0;i<$HOUR;++i));
40     do
41         LINE=${LINE#* } 42 done 43 LINE=${LINE/ *}
44     eval FREECORES=$LINE 45 } 46 47 computeFreecores 48 49 stopsignal() { 50 for PID in jobs -p; 51 do 52 FILE=lsof -p$PID -F n 2>/dev/null | grep primedatabase | grep -v "\\.nfs"
53         A=${FILE#n*} 54 A=${A/ (nfs*}
55         echo killing $PID with open file$A
56         rm $A 57 kill$PID
58     done
59     exit
60 }
61
62 trap 'stopsignal' 2
63
64 echo "Starting $STARTUP instances" 65 66 determineToAdd() { 67 computeFreecores 68 LOAD=uptime 69 LOAD=${LOAD#*average: }
70     LOAD=${LOAD/,*} 71 LOAD=${LOAD/.*}
72     ADD=$[CORES-FREECORES-LOAD-LOADMODIFIER] 73 echo Load:$[LOAD-LOADMODIFIER], Intended number of free cores: $FREECORES 74 } 75 76 # Start programs in the background 77 determineToAdd 78 for ((i=1;i<=STARTUP;++i)); 79 do 80 primefinder &amp; 81 sleep 2 82 done 83 sleep 20 84 if [$ADD -lt 0 ]
85 then
87 fi
89 do
90     kill -SIGSTOP %$i 91 done 92 93 CURRRUNNING=$ADD
94 RUNNINGSTART=1 # The first one running
95 RUNNINGSTOP=$CURRRUNNING # The last one running 96 97 startOne() { 98 # Assume that$CURRRUNNING < $STARTUP 99 RUNNINGSTOP=$[(RUNNINGSTOP % STARTUP) + 1]
100     kill -SIGCONT %$RUNNINGSTOP 101 CURRRUNNING=$[CURRRUNNING+1]
102 }
103
104 stopOne() {
105     # Assume that $CURRRUNNING > 0 106 kill -SIGSTOP %$RUNNINGSTART
107     RUNNINGSTART=$[(RUNNINGSTART % STARTUP) + 1] 108 CURRRUNNING=$[CURRRUNNING-1]
109 }
110
111 # Start mainloop
112 while [ 1 ]
113 do
114     sleep 60
115
118     if [ $ADD -gt 0 ] 119 then 120 if [$[ADD+CURRRUNNING] -gt $STARTUP ] 121 then 122 ADD=$[STARTUP-CURRRUNNING]
123         fi
125         echo ADD:$ADD 126 for ((i=0;i<ADD;++i)) 127 do 128 startOne 129 done 130 fi 131 if [$ADD -lt 0 ]
132     then
133         REM=$[-ADD] 134 # Clip 135 if [$REM -gt $CURRRUNNING ] 136 then 137 REM=$CURRRUNNING
138         fi
139         # Remove processes
140         echo REMOVE:\$REM
141         for ((i=0;i<REM;++i))
142         do
143             stopOne
144         done
145     fi
146     sleep 60
147 done


the script first starts all instances, then stops the ones which are too many, and then starts the main loop. in the main loop, it waits 60 seconds (for the average load to adjust to the new process count), and then decides how many cores should be left free, and what that means for the number of processes (add/remove some). note that the profile file is read every minute, so it can be changed any time without any need to re-run the whole thing.
in case the script is stopped (with control+c), all primefinder processes are killed and their open file is deleted. to determine the open file, i use lsof with some greps. you have to adjust and test that line before using this script!
note that this script is quite a hack, and far from perfect. and it is somehow system dependent, or at least “setup dependent” since it has certain assumptions on the executables, on how the output of lsof looks like, … so better make sure it works before you use it, especially on bigger systems. also note that in the beginning, all instances are ran (they are started with a two second delay between two instances), and then everything is run for 20 seconds before the first adjustment (i.e. stopping processes which are too many) are made. if you share the system with other people, this might already annoy others when they try to measure timings of their programs (especially if hyperthreading is enabled).

posted in: computer
tags:
places: