many websites leak information. for example, most websites including a facebook like button make your browser tell facebook “hey, i’m visiting this page”. some for google and their +1 button, and for most other of these fancy little “web 2.0 buttons” you can find nowadays at many places of the web. or they make your browser feed google analytics with your data, or any other web tracking service. and usually, you don’t notice any of this, as it appears hidden in the background.
in this post, i want to shed a bit more light on these things and their (possible) consequences, and tell a bit on how to avoid these problems both as a user and as a webmaster. in the beginning of spielwiese, i already wrote a little bit about this here, and i also mentioned the problem while writing about social networks. you might want to look at the first post if you don’t really know what is happening if you access a webpage with your browser.
what’s going on?
assume that some website you are visiting includes the (standard) facebook like button (via facebook.com). as soon as you access that page, its html code will contain
<iframe src=”http://www.facebook.com/plugins/like.php?href=…” scrolling=”no” frameborder=”0″ style=”border:none; width:450px; height:80px”></iframe>
your browser automatically accesses
http://www.facebook.com/plugins/like.php?href=… to retrieve the content from that url and essentially puts it into the place of the iframe. moreover, if you’re a facebook user, and you are logged in, your browser will have cookies for
but there also many other sources of leaking. for example, if a youtube video is included in the web page you’re accessing. in that case, your browser will ask
youtube.com for the flash player, and that retrieve a image from
ytimg.com and, when you click the play button, will stream and play the video from other youtube servers. again, a lot of things can be leaked. if the flash plugin is disabled, this won’t happen, but most people have that one installed and active as otherwise, many sites will not work properly.
how to prevent your browser from leaking.
iframes or as images (maybe even transparent of size 1x1 so you won’t notice). so without add-ons, standard browsers can always be made to leak something.
for firefox, there are several helpful addons. two very helpful ones are the following:
requestpolicy. this addon allows to block access from sites to other sites, for all kind of requests (loading images, scripts, even loads made by the flash plugin). as the blocking is by default depening on both source and destination of the access, this block access to
facebook.comfrom any other site but
facebook.com, hence making it impossible for facebook to track you except if you allow a site to (temporarily or always from now on) load content from
note that both plugins require a lot of user interaction. most websites won’t work properly, and too many will look very ugly in case you don’t allow certain scripts to run or certain data to be fetched from different servers. it is annoying to find out which things you have to active without giving too much access, and can be very frustrating. but after some time, you’ll have the sites you’re using most set up properly, and most things you do in the web run without any more interaction. visits to new sites, though, are still adventurous.
how to prevent your page from making leak.
first, a few words why you should try not to leak. the first is trust: the visitors of your page trust you. in particular, they usually don’t want that you send their information to not perfectly trustable other sites. and then, there are users who block such things, like me. if i access your page and you rely, say, on google analytics to track and count your users, you won’t be able to see me. i’ll be missing in your statistics, even though i accessed your page. (and since i’m advanced enough to use addons to see where my data is supposed to be sent, i know how much you care about my data, and how much i can trust you.) so i do appreciate if sites do not leak information.
there are two basic strategies to avoid leaking:
<a href=’http://www.facebook.com/sharer.php?u=http://url.of/your/site&t=Title of your site’ target=’_blank’ title=’Share!’><img alt=’Share!’ src=’images/facebook.gif’ /></a>
images/facebook.gifshould obviously be a image on your server.
iframefrom the content provider (facebook, your web tracker, …), so the user’s browser will access the data directly from that provider. with the drawback that the provider knows that the user asked. (which is necessary for web trackers to work, but not for showing the numer of people who like a page.) there are also things which indicate your online status on skype, icq, or other services, which you can include in your page and which make the user’s browser access some other site. for most instances, one can avoid this problem by adding some kind of proxy: instead of making the user retrieve some (automatically generated) image, you link to a script on your page which retrieves the image from the provider’s server and forward it to the user. then the provider just sees that your server is asking for the picture, while the user still sees the information on the picture without giving information to the provider.
note that both strategies give you (more or less) extra work. especially setting up a proxy script on your server is very non-trivial, if you cannot just use something publicly available. and more importantly, if you do not have good enough access to the server – for example if you have a blogspot blog, or a wordpress blog running on
you have to decide for yourself how much data you make your visitors leak. and remember, most visitors appreciate that you don’t spread their data unnecessarily. and some visitors can even check what you’re doing, and know how much information you (try to) make them leak.