On Fri, Jun 09, 2006 at 11:38:01AM +0100, Tim Trent wrote:
> Website log files store pretty much everything about a user's visit. They
> store where you entered the site, which pages you visited and for how long,
> and where you left the site. They even store the interval between visits
> (to an extent) to allow calculations of the uniqueness of your visit. A
> description of some of this is found at
> http://httpd.apache.org/docs/1.3/logs.html
Almost true; the small nit is that they don't know anything about
"how long". Each page request is logged, but there is no concept of
how long someone "stayed on a page". Marketing folks often ask for
this information, but it doesn't exist. You can (perhaps) infer from
the interval between page requests, but there's no information at all
beyond the final page request. This all gets even more complicated
when you encounter users who browse in multiple windows simultaneously,
or open lots of tabs in parallel, or even with users who use the 'back'
button and then follow a different link. Almost all log analysis software
I've examined (and I've played with a *lot* of them) tend to ignore these
issues, as (a) they're hard to deal with properly (b) they pretend they're
rare on most sites (although tabbed browsing is becoming increasingly
common, and anyone who has built a site with session management knows
just how much of a problem the 'back button' issue actually is - note
that many banking sites explicitly terminate your session if you try to
use it!), and (c) for the most part everything just evens out.
But, yes, most commercial websites spend (or should spend) a
considerable amount of time and effort on logfile analysis.
Very popular non-commercial sites, on the other hand, sometimes don't
keep these sorts of logs for very long, or sometimes even at all, as the
disk space required would be too great for the minimal value. Wikipedia,
for example, just doesn't bother logging requests at all.
> Your ISP also holds logs of your activity. So it is perfectly possible to
> determine precisely which machine (and login) searched for 'rose scented
> talcum powder' on St Smellbetter's Day at 10am.
Although the government has considered forcing ISPs to record and keep
this sort of data, there is currently no requirement to do so. Some ISPs
*may* keep this data, but for others it serves no useful purpose and so
they don't keep the logs for longer than a few weeks. These logs also
usually aren't anywhere near as "useful" as website logs; they're just
logs of raw data packets moving through the network. Again, there are
tools for parsing this data into more useful information, but most ISPs
I've come across (I've worked for two) don't really do anything with
these logs unless someone reports a problem. The bigger ISPs,
particularly in the US, have realised that this information is generally
useful to others, and some sell the data in anonymised value. One search
engine, for example, puchases web clickstream data from ISPs so they can
discover how users browse the sites that they're directed to from search
results, and can better tailor their results in future.
Tony
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
All archives of messages are stored permanently and are
available to the world wide web community at large at
http://www.jiscmail.ac.uk/lists/data-protection.html
If you wish to leave this list please send the command
leave data-protection to [log in to unmask]
All user commands can be found at : -
http://www.jiscmail.ac.uk/help/commandref.htm
Any queries about sending or receiving message please send to the list owner
[log in to unmask]
(all commands go to [log in to unmask] not the list please)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|