jump to navigation

Watch Web site activity with Webalizer on CentOs February 23, 2011

Posted by Tournas Dimitrios in Linux admin tools.

You probably take for granted that your Web site is always up and that people are actually visiting it. But are they? If they are, do you actually know where your visitors are coming from, what their referrers was, or what browser they were using? Do you know what the top pages of your site are? How about your top entry and exit pages?

These are the kind of statistics that a good Web admin needs to know. But before you start combing through log files, consider installing Webalizer. Started as a simple Perl script, Webalizer has grown into something far more useful. Webalizer is now a very fast, reliable application that reads your server log files and places them in a user-friendly format that can help you analyze your HTTP servers’ traffic, keeping you on top of your sites and how they are being used. In this article, I’ll show you what exactly Webalizer is and how to use it.

The installation can be made with different ways , of course we are lazy people an prefere the simplest  method 🙂 . Although the demonstration is made on a CentOs 5.x box , the process is similar on all RedHat based distributions .

Webalizer depends upon the gd graphics library so you will need to install gd first .

  • yum  install gd
  • yum  install webalizer

Since Webalizer is running as an cron job, you’re probably assuming you should point your browser to http://localhost/webalizer/ to see what you have. If you do, the only thing you’ll see is:

Not Found
The requested URL /webalizer/ was not found on this server.

Hmmmm …. dont’t panic 🙂

Lets do a ” rpm  -ql  webalizer ” , the most important files that I noticed are :

  1. /etc/cron.daily/00webalizer
  2. /etc/webalizer.conf
  3. /var/www/usage
  • The directive “outputDir    /var/www/usage ” must be changed to
    “outputDir  /var/www/html/webalizer”
  • mkdir  /var/www/html/webalizer
  • cp  /var/www/usage/*   /var/www/html/webalizer
  • wait the cronjob to execute webalizer (probably the next day ) , or run it manually for now  –> webalizer .
  • Access with your browser locally or remotely .

You can customize Webalizer by making changes to its configuration file. Remember, the configuration file is /etc/webalizer.conf. Some of the configuration options you will want to deal with include:

configuration directives of webalizer
OutputDir As described above, this is where the Webalizer will place its output.
LogType This option defines the type of log file used. The types allowed are: clf (default), ftp (xferlogs produced by wu-ftp), or squid (native squid logs).
Incremental If you run a larger site, you will want to enable this. Incremental processing allows you to set up multiple partial log files instead of one large file. The default is no.
HistoryName This allows you to define the name of the history file produced. This file keeps data for up to twelve months and by default it is called webalizer.hist.
ReportTitle This is the text displayed as the title of the report.
HostName This defines the hostname used on the report. This hostname is the name used on the clickable entries within the report. If you change this, make sure it is correct. The default is localhost. Localhost, of course, will only work if you are viewing the report on the server running Webalizer.
PageType This defines, for Webalizer, what URLS you (or your system) consider a page. The defaults are htm* and cgi.
IncrementalName If you enable Incremental, you will want to check out this option (if you do not enable Incremental, ignore this option). The default name is webalizer.current. This file will store the most recent report data.
HTMLExtension This allows you to define the file extension to use when creating the HTML pages. The default is .html.
DNSChildren This is where you can define how many child processes may be used when performing DNS lookups. Standard values are between 5 and 20 with 10 being the default.
UseHTTPS This is employed if Webalizer is deployed on a secure server.
HTMLHead This allows you to define any HTML code to insert between the <HEAD></HEAD> tags.
HTMLBody This allows you to define any HTML code inserted within the <BODY> tag.
HTMLPost This allows you to define any HTML code immediately before the first <HR> of the page.
HTMLEnd This allows you to define any HTML code to add at the very bottom of each HTML document.
HTMLTail This allows you to define any HTML code at the bottom of each HTML document.
HTMLPre This allows you to define any HTML code to insert at the beginning of the file. The default is DOCTYPE.
DNSCache Here is where you specify your DNS cache file. This file is used for reverse DNS lookups. The default is dns_cache.db.
Quiet This option suppresses any output messages. If you are running Webalizer from a cron job it is best to use this option.
ReallyQuiet This option will suppress all messages, including warnings.
TimeMe This option will force Webalizer to show the timing information at the end of processing.
GMTTime All reports will be shown in GMT (UTC) time.
SearchEngine Allows you to define search engines and their query strings that are used to find your site. An example: SearchEngine       google.com           q=
Dump* These keywords allow sites, URLs, Referrers, User Agents, Usernames, and Search Strings to be dumped into a tab-delineated text file that can be used in database applications.
All Options These keywords enable the display of all URL’s, Sites, Referrers, User Agents, Search Strings, and Usernames. When these are enabled each will have their own HTML page created. If these options are enabled there must first be more items than will fit in the Top tables and the listing will only show those items that are normally visible. The options are: AllSites, AllURLs, AllReferrers, AllAgents,AllSearchStr, and AllUsers.
GraphLegend This allows you to enable the color-coded legends for all graphs. Default is yes.
GraphLines This allows you to enable the lines used to make the graphs more easily readable. The value of the option is in a number; the lower the number the better. The default is 2.
HourlyGraph/HourlyStats These allow you to enable or disable the Hourly Graph and Hourly Stats. Defaults are yes (enabled).
CountryGraph This allows you to enable or disable the Country Graph. Default is yes (enabled).
IndexAlias Using this feature will strip the need for the string index.html from an address. In otherwords /directory/index.html can be used as only /directory/.
Ignore* This keyword will cause Webalizer to ignore records.
Top Options These options set the number of entries for each table. You can define these to fit your needs. The options are: TopSites, TopkSites, TopURLs, TopKURLs, TopReferrers, TopAgents, TopCountries, TopEntry, TopExit, TopSearch, and TopUsers.
DailyGraph/DailyStats These allow you to enable or disable the Daily Graph and Daily Stats. Defaults are yes (enabled).
Include* This keyword allows you to include log   records based on hostname, URL, user agent, referrer, or username.
Debug Prints additional information within error messages.
Hide* This keyword will prevent items from being displayed in the Top tables but will be included in the main totals.
VisitTimeout This allows you to set the default timeout for a visit. Default is 1800 seconds.
IgnoreHist This option really shouldn’t be used. If used, it will cause Webalizer to ignore the history file.
FoldSeqErr If set to yes, Webalizer will ignore sequence messages.
Group* This keyword groups similar objects together.
For more options read the man page –> man webalizer




No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s