skip to main content.

posts about internet.

from today on, i’m enforcing https for (almost) all my web pages. i’ve added an automatic redirect which redirects all http:// pages to their corresponding https:// pages.

despite the tons of problems ssl/tls have – essentially, everything less than TLS 1.2 is unsafe, but only very few browsers actually support TLS 1.2 even though it has already been standarized in 2008 –, it is better than using no encryption at all.

and yes, i know that “just” having a self-signed certificate is only partially helpful. but i don’t have a better solution at the moment, as i don’t want to dump tons of money into CAs which i don’t really trust anyway. (maybe i’ll change my mind eventually. but not right now.) so for the moment, you have to accept my self-signed certificate (whose sha-1 fingerprint is 69:02:33:1D:F7:E3:9C:DA:D2:7D:9E:1D:4A:C6:40:99:A3:F8:B2:58, and whose md5 fingerprint is E5:DA:7D:4E:11:34:20:BD:7C:9E:3B:CD:E1:C9:6A:1B. you can compare them in firefox, for example, by clicking the padlock and then clicking “more information…” and then “view certificate”, and in chromium/chrome by clicking the padlock and then “certificate information”).

posted in: computer
tags:
places:

in my last post, i wrote about how to maximize privacy for your site’s visitor. in the meantime, i programmed a mechanism which maximizes privacy when embedding youtube or vimeo videos. the idea is simple: i embed only a picture, and use javascript to replace the picture with the video player. in case javascript is disabled, the image has a link to the video, and in case javascript is there and the image is replaced by the video player, i enable autostart for the video. for the casual user, the result looks similar to just embedding the video, but the difference is that youtube or vimeo only get the visitor’s information if he wants to watch the video. here is an example:

[[for legal reasons, i do not want to include youtube videos here anymore. please click on this link to watch the video at youtube.]]

now, how do i do this? to retrieve the image, i use a php script so that the user will not interact directly with youtube or vimeo. the script obtains the url for the picture, retrieves the picture, and delivers it to the user. here’s the script:

 1 <?php
 2 // Tests a file name for being valid
 3 function isValidID($id)
 4 {
 5     return preg_match('#^([a-zA-Z0-9-_]+)$#', $id) == 1;
 6 }
 7 
 8 // Checks if the source is valid
 9 function isValidSource($source)
10 {
11     return ($source == 'youtube') || ($source == 'vimeo');
12 }
13 
14 $id = $_GET['id'];
15 $source = $_GET['source'];
16 
17 if (isValidID($id) && isValidSource($source))
18 {
19     // Find out URL
20     if ($source == 'youtube')
21     {
22         // Youtube is easy...
23         $url = "http://img.youtube.com/vi/$id/0.jpg";
24     }
25     if ($source == 'vimeo')
26     {
27         $url = "http://vimeo.com/api/v2/video/$id.php";
28         $content = @file_get_contents($url); // avoid error message on 404
29         if ($content)
30         {
31             $content = unserialize($content);
32             $url = $content[0]['thumbnail_large'];
33         }
34         else
35             $url = false;
36     }
37 }
38 else
39     $url = false;
40 
41 // Retrieve picture
42 if ($url)
43     $content = @file_get_contents($url); // avoid error message on 404
44 else
45     $content = false;
46 
47 // Send picture, or print 404
48 if ($content)
49 {
50     // Deliver file
51     header('Status: 200 OK', true, 200);
52     header('HTTP/1.0 200 OK', true, 200);
53     header('Content-Type: image/jpeg');
54     header('Content-Transfer-Encoding: binary');
55     header('Content-Length: ' . strlen($content));
56     echo $content;
57     exit(); 
58 }
59 else
60 {
61     // This results in a 404
62     require_once('index.php');
63 }

note that the 404 display at the end works thanks to wordpress. if you have the script in another directory as your wordpress installation, or you are not using wordpress at all, you have to modify that part. just replace the require_once line by

1     header('Status: 404 File Not Found', true, 404);
2     header('HTTP/1.0 404 File Not Found', true, 404);
3     echo '<html><body>Error: invalid arguments or cannot retrieve picture.</body></html>';

or something similar.

ok. so let us assume that you put the script somewhere on the server, say at /tubeimage.php as i did. then, you need to insert the following javascript code into the header of your html files. in case you use wordpress, you can write a plugin to do that for you (i did that), or modify the wp-content/themes/yourtheme/header.php file, where you replace yourtheme by the name of the theme you’re using. add the following lines:

 1 <script type="text/javascript">//<![CDATA[
 2 function youtubeClick(id, w, h)
 3 {
 4     elt = document.getElementById('youtube-' + id);
 5     elt.innerHTML = '<object class="youtube-object type="application/x-shockwave-flash" data="http://www.youtube-nocookie.com/v/' + id + '&amp;rel=0&amp;border=0&amp;autoplay=1" width="' + w + '" height="' + h + '"><param name="movie" value="http://www.youtube-nocookie.com/v/' + id + '&amp;rel=0&amp;border=0&amp;autoplay=1" /></object></div>';
 6     elt.onclick = null;
 7 }
 8 function vimeoClick(id, w, h)
 9 {
10     elt = document.getElementById('vimeo-' + id);
11     elt.innerHTML = '<object class="youtube-object" type="application/x-shockwave-flash" data="http://vimeo.com/moogaloop.swf?clip_id=' + id + '&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=00ADEF&amp;fullscreen=1&amp;autoplay=1&amp;loop=0" width="' + w + '" height="' + h + '"><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=' + id + '&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=1&amp;color=00ADEF&amp;fullscreen=1&amp;autoplay=1&amp;loop=0"/></object>';
12     elt.onclick = null;
13 }
14 //]]></script>

note that the <![CDATA[...]] is the xhtml way of hiding javascript; if you’re using plain old html, you need to use <!– … –>.

now, you can add youtube or vimeo videos to your page as follows. for example, the video above was embedded as follows: <span id="vimeo-24456787" onClick="vimeoClick('24456787', 425, 341)"><a href="http://vimeo.com/24456787" rel="external" onclick="return false;"><img src="/tubeimage.php?id=24456787&amp;source=vimeo" width="425" height="341" /></a></span> note that 24456787 is the id of the vimeo video (and appears at four places in this snippet), and the player has size 425x341 (appears at two places in this snippet). for youtube videos, you write <span id="youtube-LFtSRR0xFec" onClick="youtubeClick('LFtSRR0xFec', 425, 341)"><a href=http://www.youtube.com/watch%3Fv=LFtSRR0xFec" rel="external" onclick="return false;"><img src="/tubeimage.php?id=LFtSRR0xFec&amp;source=youtube" width="425" height="341" /></a></span> here, LFtSRR0xFec is the id of the youtube video (and appears at four places in this snippet), and the player has size 425x341 (appears at two places in this snippet). if you are using wordpress, it is a good idea to create a plugin which automatically generates all the necessary code, so you can just write something like [youtube id="LFtSRR0xFec" width="425" height="341"] in your posts.

many websites leak information. for example, most websites including a facebook like button make your browser tell facebook “hey, i’m visiting this page”. some for google and their +1 button, and for most other of these fancy little “web 2.0 buttons” you can find nowadays at many places of the web. or they make your browser feed google analytics with your data, or any other web tracking service. and usually, you don’t notice any of this, as it appears hidden in the background.
in this post, i want to shed a bit more light on these things and their (possible) consequences, and tell a bit on how to avoid these problems both as a user and as a webmaster. in the beginning of spielwiese, i already wrote a little bit about this here, and i also mentioned the problem while writing about social networks. you might want to look at the first post if you don’t really know what is happening if you access a webpage with your browser.
in the examples below, i’m assuming that standard javascript and cookie settings are used.

what’s going on?

assume that some website you are visiting includes the (standard) facebook like button (via facebook.com). as soon as you access that page, its html code will contain

<iframe src=”http://www.facebook.com/plugins/like.php?href=…” scrolling=”no” frameborder=”0″ style=”border:none; width:450px; height:80px”></iframe>

your browser automatically accesses http://www.facebook.com/plugins/like.php?href=… to retrieve the content from that url and essentially puts it into the place of the iframe. moreover, if you’re a facebook user, and you are logged in, your browser will have cookies for *.facebook.com (containing your user id), which will be sent automatically with this http request. so at this point, without any javascript interaction, facebook already knows whether you are logged on, and who you are if that is the case. note that facebook could also set a facebook.com cookie when none is already set, to be able to further track you. it seems like that is not the case, but if they would do, you probably won’t notice at all. now the html page sent by http://www.facebook.com/plugins/like.php?href=… includes several javascripts, which your browser will automatically execute, and which could sent more information like your screen resolution. facebook isn’t doing that, as far as i know, but they could with only very few people noticing, if at all.
another resource are web trackers, which try to gather statistical data for the webmaster who included them, but might also use this information for other things. these work similar to the like button: the user has to somehow include them. maybe as a 1x1 transparent pixel, or as a java script, or both. the pixel will ensure that basic data is sent even in case javascript is disabled or not available, as long as images are loaded. the java script will sent additional info making identification of the visitor easier, and with both accesses to the web server of the web tracker, cookies can be retrieved or set, allowing the web tracker to identify you along different sessions. they probably don’t know who you are, but they can distinguish you from your friends, even if you share your internet connection (and the same ip address!) with them and your computers are essentially identically configured.
but there also many other sources of leaking. for example, if a youtube video is included in the web page you’re accessing. in that case, your browser will ask youtube.com for the flash player, and that retrieve a image from ytimg.com and, when you click the play button, will stream and play the video from other youtube servers. again, a lot of things can be leaked. if the flash plugin is disabled, this won’t happen, but most people have that one installed and active as otherwise, many sites will not work properly.

how to prevent your browser from leaking.

the radical solution is to disable javascript, disable cookies and disable all plugins like the flash plugin. but that still doesn’t solve the problem that basic http access data (browser id string, referrer, your ip address) is sent to certain included sites, for example if they are included using iframes or as images (maybe even transparent of size 1x1 so you won’t notice). so without add-ons, standard browsers can always be made to leak something.
for firefox, there are several helpful addons. two very helpful ones are the following:

  • noscript. this addon allows to say from which sites javascripts are allowed to run and from which not. unfortunately, this does not depend on the source site, so if you allow the facebook javascripts to work (which you need if you access facebook itself), they are also allowed to run if another site includes the like button.
  • requestpolicy. this addon allows to block access from sites to other sites, for all kind of requests (loading images, scripts, even loads made by the flash plugin). as the blocking is by default depening on both source and destination of the access, this block access to facebook.com from any other site but facebook.com, hence making it impossible for facebook to track you except if you allow a site to (temporarily or always from now on) load content from facebook.com.

note that both plugins require a lot of user interaction. most websites won’t work properly, and too many will look very ugly in case you don’t allow certain scripts to run or certain data to be fetched from different servers. it is annoying to find out which things you have to active without giving too much access, and can be very frustrating. but after some time, you’ll have the sites you’re using most set up properly, and most things you do in the web run without any more interaction. visits to new sites, though, are still adventurous.

how to prevent your page from making leak.

first, a few words why you should try not to leak. the first is trust: the visitors of your page trust you. in particular, they usually don’t want that you send their information to not perfectly trustable other sites. and then, there are users who block such things, like me. if i access your page and you rely, say, on google analytics to track and count your users, you won’t be able to see me. i’ll be missing in your statistics, even though i accessed your page. (and since i’m advanced enough to use addons to see where my data is supposed to be sent, i know how much you care about my data, and how much i can trust you.) so i do appreciate if sites do not leak information.
there are two basic strategies to avoid leaking:

  • include external things only when the user needs them. for example, you could use javascript yourself to display a decoy version of the facebook like button or the google+ +1 button (stored on your server!), and as soon as the user clicks on it, your script loads the real button and the associated javascripts and forwards your click to them. or you just use a decoy version of the button (again, stored on your server!) and just link it, for example to a facebook url which will then allow the facebook user to share your page. for example, instead of the like button described above, you could do something like this:
    <a href=’http://www.facebook.com/sharer.php?u=http://url.of/your/site&t=Title of your site’ target=’_blank’ title=’Share!’><img alt=’Share!’ src=’images/facebook.gif’ /></a>

    here, images/facebook.gif should obviously be a image on your server.

  • act as a proxy. if you want to show information, like how many people like your page, show some faces who like it, show some statistics (how many users are online right now etc.), the usual solution is to include some javascript/iframe from the content provider (facebook, your web tracker, …), so the user’s browser will access the data directly from that provider. with the drawback that the provider knows that the user asked. (which is necessary for web trackers to work, but not for showing the numer of people who like a page.) there are also things which indicate your online status on skype, icq, or other services, which you can include in your page and which make the user’s browser access some other site. for most instances, one can avoid this problem by adding some kind of proxy: instead of making the user retrieve some (automatically generated) image, you link to a script on your page which retrieves the image from the provider’s server and forward it to the user. then the provider just sees that your server is asking for the picture, while the user still sees the information on the picture without giving information to the provider.

note that both strategies give you (more or less) extra work. especially setting up a proxy script on your server is very non-trivial, if you cannot just use something publicly available. and more importantly, if you do not have good enough access to the server – for example if you have a blogspot blog, or a wordpress blog running on wordpress.com – you are severely limited in what you can do. also note that you can combine the strategies. for example, if you include many youtube videos, and you use a javascript to only start the flash player when the user clicks on it, you could use the proxy strategy to let your server automatically retrieve the picture from ytimg.com which is shown until the user clicks. [edit: this post describes how this can be done. both spielwiese and musikwiese now use this technique.] note that it is also a good idea to check your own site for leaking, sometimes you’ll get surprises as certain plugins for wordpress for example include javascripts from random places, sometimes completely unnecessarily. for that, you can use firefox with noscript and requestpolicy installed (if you don’t like the addons, disable them but install them anyway, to be able to enable them from time to time to test your own site) and browse your site. in the firefox status bar, you’ll see a flag for requestpolicy; if it is red, then requestpolicy blocks something. click the flag to see what is blocked, and to allow (or block) certain destinations. you can also use this (as well as noscript) to test which (external) javascripts are needed for your page to be usable.
you have to decide for yourself how much data you make your visitors leak. and remember, most visitors appreciate that you don’t spread their data unnecessarily. and some visitors can even check what you’re doing, and know how much information you (try to) make them leak.

i thought a lot about social networks recently. i want to write some of these down here.

introduction.

during the last few decades, computers and internet made deep changes to our society, to how we communicate, to how we live. electronic communication has existed for a very long time, starting with bulletin board systems (starting in the 1970s), usenet (starting 1980) and internet relay chat (starting 1990). later, after the advent of the internet to public (in form of the world wide web) around 1995, new services emerged, like the ICQ instant messaging service and sixdegrees.com, one of the earliest “modern” social networks. later, myspace became well-known, as well as business-oriented networks such as linkedin, and later facebook (starting in the united states, later going worldwide) and studivz (started in germany and german-speaking countries).

facebook is the most known representant nowadays, but there are many more social networks out there – the wikipedia list for example contains 709 entries. depending on in which area you live, some social networks are much more popular amoung your local friends than others. (an article about a study analyzing the situation in germany some time ago can be read here.)

social networks play a more and more important role. you need to be connected to your friends to find out what’s going on. to see who’s going with whom, who’s a “friend” of whom, what’s in and what not, where and when are the parties. this does not only applies to the young generation anymore, especially not to just a small subset, but to a large part of society. it is not uncommon in some areas that even your parents or grandparents are on social networks. social networks allow you to see what happens to your family, friends, to people you know but don’t have much contact with anymore. you find out who of your high school friends is marrying whom, you see photos from vacations of people you haven’t seen since kindergarten, you find out that someone you knew from university got a job, or find out that some guy you met on a vacation ten years ago now became father. a lot of these things you would have missed without a social network, maybe found out about later by chance, but more probably never heard about them at all.

so definitely, social networks play an important role.

criticism.

there are two fundamentally different criticisms one can write about.

the first one is about on the change of privacy, on the extended focus. things you say, you do, are now not just noted (and more or less quickly forgotten) by the people being present at that moment, but often dragged into your social network, shared with all your friends there, which might include distant friends you met at kindergarten, colleagues at work, many people with whom you are studying, your neighbor, your extended family, or whoever else can see what happens on your profile. you do something stupid, someone takes a photo of it, puts it into the network, tags you and everyone can see what you did. you wrote something stupid or very private, accidently on someone’s wall instead in a private message, and suddenly many people know about it. and not only that, depending on the privacy settings of the social networks, maybe the whole internet can read or see these things. but i don’t want to write about these topics today.

the other big problem, from my point of view, is the data ownership. think about it. why should a company create such a social network? provide lots of computing power to allow people to communicate, to search for friends, to exchange photos, etc., and that essentially for free? companies want to make money. in fact, need to make money, to pay for the servers, for the programmers, for the support people. of course, there are ads, which make some of the money. without ads it is essentially impossible to run a huge network. but ads are not everything. what is also very important is the collection of information. information on people, their age, gender, preferences, interests, friends, what they like or not, what they find interesting. if the state would try to get this information, people would protest against it. but on the internet, they give it to a company essentially for free. of course, it is true that many of these information pieces are available on the net anyway, at least for people like me. but then, if you have to collect them yourself, this costs a lot of time. if i have a profile at some social network and enter everything there into a form, they get all the information in a well-defined format which can easily be processed.

consider for example facebook. if you have a facebook account, they usually know your name, birthdate, email adress, gender, sexual interest, where you live, work, what your marital status is, who your friends are, which websites you like. some people also use facebook as their search machine, so facebook also knows what you search for. and depending on how websites included the facebook “like” button, facebook knows which websites you visit. if you’re logged in at the same time, they can combine that information with your profile to see what you’re doing on the web. since some time, facebook also tries to find out your location, by encouraging you to tell it to them, and also tell your friends’ locations to them. so they can also track you in the real world. besides these things, facebook is also known (and often criticized) for their very liberal view of privacy, and for storing all information without really allowing to delete it.

or consider google+. if you have an account there, google knows your personal information such as name, email adress, birthdate, … but besides that, google knows much more about you. google is the number one search engine in many parts of the world, and so most people use it to search for something. if you use their search engine while you are logged in at google+, they can connect that information. moreover, google analytics is a free service aimed at website administrators, which allows them to see how many people look at their website, what they do there, where they come from, etc. but it also allows google to see what people do. and if you have a google account (not just google+!), they can actually see what you are doing on the web. a huge amount of websites uses some google service or another. many google services are included by using some javascript, which is loaded from a google server, and so google can see where you are on the web and what you are doing there.

think about it. if the state would send out secret agents which would follow any person, look at what they do, where they are at any moment, what they look at. like in 1984. would you like that? i guess, most of you wouldn’t. but yet, many people allow google and/or facebook to do exactly that, without spending a thought about it.

a possible solution.

so now what? should one simply try not to use facebook or google? stick to smaller social networks, smaller services, which cannot track you that well? especially using smaller social networks would destroy a lot: many of your friends or people you know might not be in your network anymore, maybe forcing you to have accounts for many different social networks. this would make life much more complicated (which people do not want to), and is in practice just annoying. so this is not a solution.

if one wants to use social networks at all, one does not want such fragmentation. but one also does not want certain entities, such as big corporations or even the state, to collect all that information at one place. so the best solution would be to distribute the information in some way, splitting it up so that single entities such as google or facebook or the state cannot access most of it, but you can still see information about your friends, still contact them, communicate with them.

there is in fact a social network designed like this: diaspora. everyone can run their own diaspora server, you can have friends on any other diaspora server, you see what they do, you can communicate. every server can only see what’s going on the server, and what’s going on with the friends of the people having an account on that server, as far as these friends allow the people with an account on this server to see their actions.

unfortunately, when the first alpha version of diaspora was released, it had many vital security problems, making it essentially not useable for anyone with the slightest sensitivity for privacy and security. i don’t know what the current status is, i hope it dramatically increased. but even though the reference implementation is not good, everyone can create their own implementation, which could then communicate with servers running the reference implementation, or also any other diaspora implementation. this is what makes diaspora very attractive: you are not forced to use or trust specific implementations and servers. still, diaspora is probably far from perfect. i guess that one could write books about how to design a very good open social network. i wouldn’t be surprised if there are even research projects working on such topics.

anyway. in my opinion, the future belongs to such open distributed social networks. as soon as such a network becomes useable enough, i will migrate to it.

(originally, i wanted to discuss properties of such open distributed social networks in more details, discuss which aspects are important, discuss security of information, etc. but i’m afraid if i would really do this, the result would be way too long for a single blog post. and it will take a lot of time to write this down in detail and to reach a good enough description and discussion of most aspects; time which i simply don’t really have.)

(and a note about myself: i’ve been using several different social networks in the past, most prominently facebook and studivz. except facebook, i’ve deleted all my accounts. i’d also like to delete the facebook account, since i don’t really trust and like facebook, but the lack of alternatives currently makes me staying there. i haven’t tried diaspora yet, but that’s still on my to-do-list, though i want to wait until that projects reaches a more stable state.)

as you may have noticed, i use wikipedia a lot – both for linking to descriptions of terms i use in this blog, and for looking up stuff myself which i encounter somewhere, may it be offline or online. usually, chances are good that wikipedia offers at least some kind of description which answers my questions, or at least helps me getting an idea. but from time to time, it happens that you try to look something up on wikipedia, only to find out that such an article existed but was deleted – for example, because it was “not relevant”. i can understand that people do not want to see wikipedia flooded by biographies of john doe and jane roe – only a handful people are interested in these, probably most notably john doe and jane roe themselves.
but there are cases where i simply can’t understand the decision. for example, there is the chilenian doom metal band mar de grises, which i discovered by chance in zurich’s now deceased knochenhaus. according to the wikipedia deletion log, it is “not noteable” and failes some guidelines. so, who decides what is noteable and what is not? and, after all, the simplified ruleset explicitly mentiones

ignore all rules – rules on wikipedia are not fixed in stone. the spirit of the rule trumps the letter of the rule. the common purpose of building an encyclopedia trumps both.

i can pretty well understand that not every small band hobby band project should be mentioned – in particular the ones which sound bad and dissolve quickly with none or almost no productions. but that’s not the case for mar de grises. besides that, the deletion log also mentiones other problems with the article (namely, being badly written and failling to provide references for some claims), but why not throw these parts out or reduce the article to a stub?
two other examples, this time from the german wikipedia, are sinnlos im weltraum and lord of the weed, two fandubs. according to the english wikipedia, sinnlos im weltraum (a redub of a star trek series), dating back to 1994, is one of the first such projects, essentially starting the whole genre of fandubs. i don’t know how many people know it, probably a huge number. lord of the weed (a redub of the beginning of 2001′s lord of the rings) is also rather well-known; i don’t remember how often i saw it – at least ten times. well, it is obviously true that these movies haven’t been shown in movie theaters or on television – as they contain copyrighted material (i.e. the original movie), used without permission. for the same reason, they haven’t been shown on film festivals, you can’t buy them on dvd. they are also not listed on the imdb. but – so what? does that make them not noteable? irrelevant?
on the other hand, a lot of totally trashy movies – which, compared to sinnlos im weltraum and lord of the weed, are really crappy and lame – are featured on media, two good examples are a music video by grup tekkan and the infamous star wars kid, making a fool out of himself. these are pushed by media as “youtube movies you have to see” or are even shown on tv. and they can be found on wikipedia. even though they are real crap. in the case of star wars kid, the really embarrassing movie was uploaded by “friends” of its actor and will probably haunt him for a very long time. to make this even better, a lot of online versions of famous newspapers or magazines feature this video as well, showing it to an even wider audience. and i thought the use of a pillories are outlawed in modern countries.
anyway. i’m still using wikipedia, even though of these reasons. and i even created an account at the english wikipedia and started writing an article about infrastructures (number theory). as so far, nobody else dared to write something on this subject, and a google search only gives documents featuring other kinds of infrastructures, or scientific articles about this subject, i thought it would be time to add something to the web. i’ve started a series of posts on my math blog on infrastructures, but as google usually ranks wikipedia articles higher, i decided to also add something to wikipedia. so far, it is more a stub and far from being a complete article, but at least provides some information and several references to literature.