skip to main content.

posts about logging.

leaving traces is easy. just think on how many fingerprints you leave outside your home every day.
and this does not only apply to your real life, but also to your online life. every little thing you do in the internet leaves traces at many different places. for example, think of typing in a url like spielwiese.fontein.de in your browser and pressing enter. first, your browser will put that address in its history (for the pedants out there: i’m fully aware that there are situations in which this does not happen. as i don’t want to blow up this article with technicalities, i’ll simply ignore that.), so days later you can still see that you visited that page. you’ll also find copies of the page in your browser cache. back to the incident itself. in order to access the site, your browser has to establish a connection with my web server and send a request for that site. this request is relayed though different places, every one able to see that you (identifyable by your ip number, which can be traced back to you by your internet provider) requested this specific site (except in case you’re accessing pages using https, in that case, the intermediate relays just know your ip address and the ip address of the destination server).
finally, in any case, my web server will receive your request to deliver the page / from my domain to you. and, as most web servers do, it will note that down into it’s logfile, so i can see that you accessed my site. in fact, i see a lot more. usually, in the request, your browser sends a lot of additional information: for example, a string identifying the user agent. for example, this could be

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4

it usually also includes information on your operating system (in this case, linux) and the exact version of your browser. usually, the browser will also send on which site you were before (the so called referer). this information will be sent for any web page you click on, and for every image or other object contained in that page. hence, i am, without any tricks, able to track you on your way through my web page, and i can see where you’re coming from.
being a bit more clever, i can find out a lot more. for example, i could modify the urls of outgoing links on my page to go though some kind of `gate’, like the forward.php which you might have noticed. then, if you click any link on my site which leaves it, your browser will first contact my webserver to retrieve the forward.php output (which will tell your browser `go to that other site’) and, surprise!, it will leave an entry in the web server’s log that you clicked that link. so i also know where you’re going from my site. next, there’s a lot more information on you which one can find out using javascript, like your screen’s resolution. i’m including a little script on my site which tells the browser to include a little picture, simply consisting of one complete transparent pixel, on every page of my site. to the images url, it adds the screen resolution. so by looking into my server’s log, i can see your screen resolution—at least if you haven’t turned javascript off, but most people have it turned on anyway.
so, now i have a big log file containing a lot of information: which user came from where, looked at which sites, left where, used which browser, which operating system and which screen resolution, at which time. if i feed this log file into an analysis tool, it will gather the information and present them to me in a useable way—whatever that might mean.
are you surprised? some of you won’t be, i know. anyone interested in this subject can find out about this on lots of places on the web (for example, look here). and, in fact, one can do much better than me. first, by using cookies, i could identify you uniquely and connect your different sessions to see how your surfing behaviour varies over a larger time scale. then, i could combine the data from several servers. if i would have data from enough servers, i could throw together a very detailed survey on what you are doing on the web. in that case, i’m your big brother, watching (almost) every step you do online. luckily, for you, i’m not doing that. but other people do. for example, the big web advertisement companies, which have their advertisements on a huge amount of web servers, can see you everytime you view such a web page (if you’re not using an ad blocker). or assume that your web page is using the service of another server which tracks statistics for you. many people are using such services (may it be in the form of a simple counter), so the provider of the service knows when you are looking at which site. and now assume that some of these data collectors cooperate. sharing their huge amount of data. a creepy thought, isn’t it?