Filtering Page Requests by Url in Sueetie Analytics

Including the ability to filter page requests by url is another example how seriously we take data accuracy in Sueetie Analytics. "Data Cleanliness Is Next To Godliness," I believe the saying goes.  Filtering pages by url prevents requests from being logged in our analytics logs by agents and ips that aren’t meaningful to our site objectives. We are filtering the urls of anonymous users only, and additionally by browsers/agents not used by any site member. All requests of members are logged. Filtering urls is part of Sueetie Analytics, but it works in conjunction with the Sueetie Addon Pack Client Access Control features, which you will see in a moment.

Below is the Filter Url feature in Sueetie Analytics where the url filter excerpts are at left. You may be wondering why we need so many url filters. Good question. It’s important to remember that when I originally designed Sueetie Analytics and Client Access Control in the Addon Pack, I was working with months of request data which had very few restraints by agent, IP or Url. By employing the default agent and url filtering when you start using Analytics you will most likely see few incidents of urls you want to filter.

Image

The Url Filter strings are top left. At right are anonymous urls, again, those created by browsers not used by any site member. Below the Url Filter string list column is the entry form where you enter a Url Excerpt to filter. Any url containing this excerpt and requested by an anonymous client will not be logged in the Analytics Tables.

Below the Url Excerpt Form is a function to delete all urls like the excerpt from the logs. Remember, I was working with months of unfiltered log data, so you probably won’t have to use this function.

Digging Deeper into the Urls

I mentioned how the Url Filtering feature works hand-in-hand with the Site Access Control module. If you look at the anonymous url list you’ll see that the url is linked to the actual page and that an hyperlink arrow is displayed at right. Here’s a screenshot.

Image

You can use the page link to take you to the page. This is useful if you’re unfamiliar with any "url embellishments" you happen to notice. The arrow hyperlink takes you to an activity list for the url for further analysis. This is where the Site Access Control services come into play.

Look at the nutty url below. Something sinister is at work here no doubt, or maybe just good old fashioned R&D. Who’s to say? Regardless, we can click on the Remote IP to determine it’s origination. We also have the user agent information and date of the request. If you want to take action to block the IP or Agent you can do so easily with the links below the table.

Image

Article written by

A long time developer, I was an early adopter of Linux in the mid-90's for a few years until I entered corporate environments and worked with Microsoft technologies like ASP, then .NET. In 2008 I released Sueetie, an Online Community Platform built in .NET. In late 2012 I returned to my Linux roots and locked in on Java development. Much of my work is available on GitHub.