A few months ago, I blogged about a cookie-based filter I deployed to screen out librarian machines from my library’s Google Analytics and Crazy Egg data. Well, after running this experiment for awhile, it looks like our IP data was actually good enough after all.
The original problem that sparked this experiment was that the computers in our library were all on dynamic IPs and thus, we could not reliably screen those machines out in order to get a pure glimpse into what non-librarians thought was important. A few options were considered, such as putting all library machines on static IPs. The university IT department didn’t like that idea, so I had to go back to the drawing table…or baking table, if you will.
The solution was to deploy a browser cookie on each machine used by staff, which Google and Crazy Egg would then use to identify the librarians visiting various library pages. The added bonus was that we could actually see which web elements and pages were mostly used by staff and which ones were mostly used by others.
Problem is, deploying the cookie across the staff computers was a real hassle: each computer might have multiple users, like the Reference computers, which required going around chasing down librarians (not an easy task!) and having them log into each computer, launch each browser and change their default homepage to our cookie page (which sets the cookie and then refreshed to the library homepage). In some areas, there might be ten or more people using a single computer.
Needless to say, the pay off for all this work had to be really good.
And as it turned out, it out, the IP range filters were not all that different from the cookie-filtered data.
In Google Analytics, we set up multiple views, one with no filters, one with a cookie filter and one with an IP range filter. We let the data come in for two months and then checked the results to see what the discrepancies were.
What we found was actually surprising. The IP filters were doing a pretty good job…in fact, they were doing too good of a job. It appears that the IP range covers the librarians, but also many of the public machines in our computer labs and elsewhere. This isn’t too bad, since users working at home are very likely not librarians…so, we essentially have our librarians out of our data using the IP ranges.
And statistically, the IP ranges were only a few percentage points off the cookie filters, so the differences were fairly benign.
For example, on Saturday March 5, 2011, we had a total of 6,751 people come to our library homepage. Of those, 49 people were librarians according to our pretty much perfect cookie-filtered data. The IP filters identified 131 users as librarians because it counted some computers used by students in the labs. Doing a little math: we found an average 1.5% difference between the two filters. Not siginificant enough to worry about…especially given how onerous deploying and monitoring the cookie was.