skip to main content.

posts about bug.

today i discovered why sometimes, some of my latex output contains tildes (~) in the dvi/pdf version. usually, if you use a tilde in a tex file, it is interpreted as a non-breakable space (except in special circumstances, such as verbatim environments or in \url{…}). but thanks to a “bugfix” to texi2dvi/texi2pdf, which is a wonderful tool as it runs (pdf)latex often enough together with bibtex, makeindex etc., tildes appearing in tex files are now shown as tildes in the dvi/pdf output. which is absolutely inacceptable behaviour.
it seems that this already was reported (see here, here, here), but it is still around. i don’t really know what to think of this – is nobody responsible for working on texi2dvi/texi2pdf? or did people stop using it as it is broken?
anyway, i fixed my local installed version (/usr/bin/texi2dvi) by chaning the line catcode_special=true to catcode_special=false. a more sophisticated version would be nice, which only changes catcode_special for tex files (and not for texinfo files), but i don’t have time for that now.

i have several wordpress installations running on my servers. among them, there is one instance which is not publicly visible. it uses dates which are several decades in the past.
everything was working well when i last worked on that project, which is quite some time ago. but yesterday, when looking at it another time, i noticed something odd. the dates were all replaced with the current date (and time). wtf?
now, some debugging later, i found the culprit. that wordpress installation is in german (the only german one i have), and somewhere inside wordpress, a function called date_i18n is used in that case to translate a timestamp into a date, time, or whatever is requested. unfortunately (for me), this function (in Wordpress/wp-includes/functions.php) contains a “sanity check” for “php 5.1.0-”. in case the unix timestamp is negative, it is set to current time. and obviously, any date before january 1, 1970 will be turned into today.
commenting out these lines (or removing them) fixes the problem.

posted in: www

when adding a thread-specific allocator to a program of mine, to avoid terrible performance loss while using gmp and/or mpfr to do arbitrary precision integer respectively floating point arithmetic, i stumbled about a problem which seems to be fixed with newer solaris versions. in case anyone experiences a similar problem and cannot just update to a new enough solaris version, here’s some information on a dirty’n'quick fix for the problem.
more precisely, i wanted to combine boost::thread_specific_pointer (a portable implementation of thread specific storage, with dlmalloc, to obtain an allocator which won’t block when used from different threads at once. if you use arbitrary precision arithmetic on a machine with many cores/cpus (say, 30 to 60), having a single blocking (via a mutex) allocator totally kills performance. for example, on our ultrasparc/solaris machine, running 29 threads (on 30 cpus) in parallel, only 20% of the system’s ressources were used effectively. if the machine would have only had 6 cpus, the program would have run at the same speed. quite a waste, isn’t it?
anyway, combining thread local storage and a memory allocator solves this problem. in theory, at least. when i put the two things together, and ran my program with 30 threads, stlil only 60% of the 30 cpus processing power was used – the other 40% of the cycles were still spend waiting. (solaris has some excellent profiling tools on board. that’s why i like to use our slow old outdated solaris machine to profile, instead of our blazing fast newer big linux machine. in case anyone cares.) interestingly, on our linux machine, with 64 threads (running on 64 cores), the problem wasn’t there: 100% of the cycles went into computing, and essentially none into waiting.
inspecting the problem closer with the sun studio analyzer, it turns out that the 40% waiting cycles are caused by pthread_once, which is called by the internal boost method boost::detail::find_tss_data. that method is called every time a boost::thread_specific_pointer<> is dereferenced. which in my program happens every time when the thread local allocator is fired up to allocate, reallocate or free a piece of memory. (more precisely, boost::detail::find_tss_data calls boost::detail::get_current_thread_data, which uses boost::call_once, which in turn uses pthread_once in the pthread implementation of boost::thread, which is the implementation used on unixoid systems, such as solaris and linux.)
in theory, pthread_once uses a double-checked locking mechanism to make sure that the function specified is ran exactly once during the execution of the wohle program. while searching online, i found the source of the pthread implementation of a newer opensolaris from 2008 here; it uses a double-checked locking with a memory barrier, which should (at least in theory) turn it into a working solution (multi-threaded programming is far from being simple, both the compiler and the cpu can screw up your code by rearranging instructions in a deadly way).
anyway, it seems that the pthread_once implementation on the soliaris installation on the machine i’m using just locks a mutex every time it is called. when you massively call the function from 30 threads at once, all running perfectly parallel on a machine with enough cpus, this gives a natural bottle-neck. to make sure it is pthread_once which causes the problem, i wrote the following test program:

 1 #include <pthread.h>
 2 #include <iostream>
 4 static pthread_once_t onceControl = PTHREAD_ONCE_INIT;
 5 static int nocalls = 0;
 7 extern "C" void onceRoutine(void)
 8 {
 9     std::cout << "onceRoutine()\n";
10     nocalls++;
11 }
13 extern "C" void * thethread(void * x)
14 {
15     for (unsigned i = 0; i < 10000000; ++i)
16         pthread_once(&onceControl, onceRoutine);
17     return NULL;
18 }
20 int main()
21 {
22     const int nothreads = 30;
23     pthread_t threads[nothreads];
25     for (int i=0; i < nothreads; ++i)
26         pthread_create(&threads[i], NULL, thethread, NULL);
28     for (int i=0; i < nothreads; ++i)
29     {
30         void * status;
31         pthread_join(threads[i], &status);
32     }
34     if (nocalls != 1)
35         std::cout << "pthread_once() screwed up totally!\n";
36     else
37         std::cout << "pthread_once() seems to be doing what it promises\n";
38     return 0;
39 }

i compiled the program with CC -m64 -fast -xarch=native64 -xchip=native -xcache=native -mt -lpthread oncetest.cpp -o oncetest and ran it with time. the result:
1 real    16m9.541s
2 user    201m1.476s
3 sys     0m18.499s

compiling the same program under linux and running it there (with enough cores in the machine) yielded
1 real    0m0.243s
2 user    0m1.640s
3 sys     0m0.060s

quite a difference, isn’t it? the solaris machine is slower, so a few seconds total time would be ok, but 16 minutes?! inspecting the running program on solaris with prstat -Lmp <pid> shows the amount of waiting involved…
to solve this problem, at least for me, with this old solaris verison running, i took the code of pthread_once from the above link – namely the includes
1 #include <atomic.h>
2 #include <thread.h>
3 #include <errno.h>

copied the lines 38 to 46 from the link, and the lines 157 to 179 from the link into boost_directory/libs/thread/src/pthread/once.cpp, renamed pthread_once to my_pthread_once in the code i copied and in the boost source file i added the lines to, and re-compiled boost. then, i re-ran my program, and suddenly, there was no more waiting (at least, not for mutexes :-) ). and the oncetest from above, rewritten using boost::once_call, yielded:
1 real    0m0.928s
2 user    0m20.181s
3 sys     0m0.036s


i stumbled over a wordpress bug making my xhtml invalid. in fact, i noticed i stumbled about it the second time. i think the first time was when i installed wordpress 3.0, and the second time after upgrading to 3.0.1. the problem is, that shortcodes which appear in single lines should not be enclosed with <p>…</p> according to the manual. but my wordpress installation is doing exactly that what is claimed to be fixed for some time.
i started digging a bit, and quickly noticed that i already fixed it. since it seems to be a persistent problem i want to document it, just in case i have it again. internally, wordpress first runs the function wpautop on the content, which adds <p>…</p>, and then runs shortcode_unautop to remove <p>…</p> around shortcodes standing in a single line. (in previous wordpress versions, both was done in wpautop if i recall correctly.) now the problem is, that my wordpress installation calls these two functions in the wrong order. so shortcode_unautop is called first, finds nothing to remove, and then wpautop is called, which adds the faulty <p>…</p> code.
an easy fix is to change wp-include/default-filters.php by changing all lines add_filter(‘whatever‘, ‘shortcode_unautop’); to add_filter(‘whatever‘, ‘shortcode_unautop’, 11);. after that, everything is fine. i wonder whether this happens on every installation or just on some.
now that i fixed this a second time, i wanted to find out in more detail what’s going on. after some digging, i found out that the cause is a plugin i use: wp_unformatted. it removes wpautop from the filter list and adds it again later, hence moving it after shortcode_unautop if both have the same default priority. well, so the right thing is to fix that plugin.
in case you want to know how to fix that plugin, proceed as follows:

  1. change the line in wp_sponge from return wpautop($pee); to return shortcode_unautop(wpautop($pee));;
  2. add remove_filter(‘the_content’, ‘shortcode_unautop’); after remove_filter(‘the_content’, ‘wpautop’);.
posted in: computer www

yesterday evening, i wanted to grab a few cds. while cdparanoia was running, i copied a text file to another place. then, i noticed that the content of the copy was garbled. a quick check showed that the content of the original file wasn’t. tried it again, the same result. and again. then, i stopped cdparanoia, and after that, copying worked. well. after restarting cdparanoia, copying still worked fine. so i stopped thinking about this and continued working – which was a fatal error.
this morning, when i turned the macbook on again, the desktop was pretty garbled and the dock was at the wrong position and had the wrong size and the wrong content, i.e. everything i changed since i first got my macbook was gone. moreover, skype wanted to know a user name and a password, and adium seemed to have forgotten a lot of things i taught it, too. after starting the terminal (which is not so easy to find, if it’s not in your dock) i quickly checked some files i created yesterday – all garbled! what the heck. the older files seem to be ok. after some more trying around, it turned out that some other files from yesterday evening (namely, the music which i ripped) was fine, too. so, what happened? i don’t know. well, most of the files which were garbled i had backuped on the institute’s server, so that wasn’t a problem. but there was one file, called termine.txt, where i collected all appointments for the next months, which i changed yesterday evening and which i created in the last few days in long hours, and which i hadn’t backuped yet: now it’s garbled, too. screwed.
well. i don’t know what happened or whose fault it was. but for me, garbling data is something which an operation system should never ever do. well, good for me that i ordered a thinkpad yesterday, so i’ll switch back to linux soon anyway, hoping it will be less annoying… after all, all big data losses i had in the last years, which weren’t related to dying hard disks, happened on osx.

well, after this rant, something more constructive. one thing what could have happened is that for some reason, something screwed up with the realtime disk encryption i enabled on the macbook. maybe, for some reason, a screwed up dma transfer (maybe initiated by cdparanoia?) somehow managed to screw this up. just guessing.

i just observed that my macbook starts to dismantle itself:

i mean, hey, it’s just little older than one year. what the heck?!

posted in: computer daily life traveling

… my blog really has 100% valid xhtml 1.0 transitional code.

valid xhtml 1.0 transitional

until now, there were some exceptions:

  • a wordpress bug which generated invalid xhtml code if you have more than one blogroll category (which will be fixed in the next version; i wrote a quick hack to fix the one i’m using);
  • the youtube code snippet was quite a mess, i fixed it using some hints from the w3c validator page and by some tipps by kornel, as wordpress tended to screw up my code by inserting tags and, thus, rendering it invalid; i fixed that by using installing a plugin which allows to format specific posts completely yourself;
  • certain small bugs in my posts which were completely my faults (these will probably show up from time to time, but i’m trying to eliminate them asap).

maybe i’ll once try to force wordpress to generate xhtml 1.0 strict code, but that’s probably too much work; probably i can hack something together myself in less time which fits better to my needs and which generates such code :)

posted in: computer www

is there a reason why a variable in javascript shouldn’t be called “name”? i thought, “no”, and simply used that name. well, until i found out that this breaks safari… while it works perfectly fine under firefox. took me an hour to figure that out… thank you whoever is responsible for this, i really love wasting my time with such things…

posted in: computer www

turns out that upgrading to the newest version of ubuntu was not a good idea. not because of the visual effects, not because of the tracker, (probably) not because of ubuntu itself, but because of a very strange and annoying bug which is really killing my productivity: at random points, in particular if i’m using firefox, it “forgets” that i released a key, like the short cut for closing a tab or for switching to the next tab. and there’s basically no way to stop it except killing firefox. the problem also appeared in other applications, though almost never. i’d guess that it is connected to how much the program keeps the cpu busy, and firefox is pretty good at doing that… so, what to do? i don’t know… a first few google searches haven’t helped me a bit… probably i have to spend even more time digging out information on this… i really hate wasting time like this.

posted in: computer feelings