Watch Web site activity with Webalizer on CentOs February 23, 2011Posted by Tournas Dimitrios in Linux admin tools.
You probably take for granted that your Web site is always up and that people are actually visiting it. But are they? If they are, do you actually know where your visitors are coming from, what their referrers was, or what browser they were using? Do you know what the top pages of your site are? How about your top entry and exit pages?
These are the kind of statistics that a good Web admin needs to know. But before you start combing through log files, consider installing Webalizer. Started as a simple Perl script, Webalizer has grown into something far more useful. Webalizer is now a very fast, reliable application that reads your server log files and places them in a user-friendly format that can help you analyze your HTTP servers’ traffic, keeping you on top of your sites and how they are being used. In this article, I’ll show you what exactly Webalizer is and how to use it.
The installation can be made with different ways , of course we are lazy people an prefere the simplest method🙂 . Although the demonstration is made on a CentOs 5.x box , the process is similar on all RedHat based distributions .
Webalizer depends upon the gd graphics library so you will need to install gd first .
- yum install gd
- yum install webalizer
Since Webalizer is running as an cron job, you’re probably assuming you should point your browser to http://localhost/webalizer/ to see what you have. If you do, the only thing you’ll see is:
The requested URL /webalizer/ was not found on this server.
Hmmmm …. dont’t panic🙂
Lets do a ” rpm -ql webalizer ” , the most important files that I noticed are :
- The directive “outputDir /var/www/usage ” must be changed to
- mkdir /var/www/html/webalizer
- cp /var/www/usage/* /var/www/html/webalizer
- wait the cronjob to execute webalizer (probably the next day ) , or run it manually for now –> webalizer .
- Access with your browser locally or remotely .
You can customize Webalizer by making changes to its configuration file. Remember, the configuration file is /etc/webalizer.conf. Some of the configuration options you will want to deal with include:
|configuration directives of webalizer
|OutputDir||As described above, this is where the Webalizer will place its output.|
|LogType||This option defines the type of log file used. The types allowed are: clf (default), ftp (xferlogs produced by wu-ftp), or squid (native squid logs).|
|Incremental||If you run a larger site, you will want to enable this. Incremental processing allows you to set up multiple partial log files instead of one large file. The default is no.|
|HistoryName||This allows you to define the name of the history file produced. This file keeps data for up to twelve months and by default it is called webalizer.hist.|
|ReportTitle||This is the text displayed as the title of the report.|
|HostName||This defines the hostname used on the report. This hostname is the name used on the clickable entries within the report. If you change this, make sure it is correct. The default is localhost. Localhost, of course, will only work if you are viewing the report on the server running Webalizer.|
|PageType||This defines, for Webalizer, what URLS you (or your system) consider a page. The defaults are htm* and cgi.|
|IncrementalName||If you enable Incremental, you will want to check out this option (if you do not enable Incremental, ignore this option). The default name is webalizer.current. This file will store the most recent report data.|
|HTMLExtension||This allows you to define the file extension to use when creating the HTML pages. The default is .html.|
|DNSChildren||This is where you can define how many child processes may be used when performing DNS lookups. Standard values are between 5 and 20 with 10 being the default.|
|UseHTTPS||This is employed if Webalizer is deployed on a secure server.|
|HTMLHead||This allows you to define any HTML code to insert between the <HEAD></HEAD> tags.|
|HTMLBody||This allows you to define any HTML code inserted within the <BODY> tag.|
|HTMLPost||This allows you to define any HTML code immediately before the first <HR> of the page.|
|HTMLEnd||This allows you to define any HTML code to add at the very bottom of each HTML document.|
|HTMLTail||This allows you to define any HTML code at the bottom of each HTML document.|
|HTMLPre||This allows you to define any HTML code to insert at the beginning of the file. The default is DOCTYPE.|
|DNSCache||Here is where you specify your DNS cache file. This file is used for reverse DNS lookups. The default is dns_cache.db.|
|Quiet||This option suppresses any output messages. If you are running Webalizer from a cron job it is best to use this option.|
|ReallyQuiet||This option will suppress all messages, including warnings.|
|TimeMe||This option will force Webalizer to show the timing information at the end of processing.|
|GMTTime||All reports will be shown in GMT (UTC) time.|
|SearchEngine||Allows you to define search engines and their query strings that are used to find your site. An example: SearchEngine google.com q=|
|Dump*||These keywords allow sites, URLs, Referrers, User Agents, Usernames, and Search Strings to be dumped into a tab-delineated text file that can be used in database applications.|
|All Options||These keywords enable the display of all URL’s, Sites, Referrers, User Agents, Search Strings, and Usernames. When these are enabled each will have their own HTML page created. If these options are enabled there must first be more items than will fit in the Top tables and the listing will only show those items that are normally visible. The options are: AllSites, AllURLs, AllReferrers, AllAgents,AllSearchStr, and AllUsers.|
|GraphLegend||This allows you to enable the color-coded legends for all graphs. Default is yes.|
|GraphLines||This allows you to enable the lines used to make the graphs more easily readable. The value of the option is in a number; the lower the number the better. The default is 2.|
|HourlyGraph/HourlyStats||These allow you to enable or disable the Hourly Graph and Hourly Stats. Defaults are yes (enabled).|
|CountryGraph||This allows you to enable or disable the Country Graph. Default is yes (enabled).|
|IndexAlias||Using this feature will strip the need for the string index.html from an address. In otherwords /directory/index.html can be used as only /directory/.|
|Ignore*||This keyword will cause Webalizer to ignore records.|
|Top Options||These options set the number of entries for each table. You can define these to fit your needs. The options are: TopSites, TopkSites, TopURLs, TopKURLs, TopReferrers, TopAgents, TopCountries, TopEntry, TopExit, TopSearch, and TopUsers.|
|DailyGraph/DailyStats||These allow you to enable or disable the Daily Graph and Daily Stats. Defaults are yes (enabled).|
|Include*||This keyword allows you to include log records based on hostname, URL, user agent, referrer, or username.|
|Debug||Prints additional information within error messages.|
|Hide*||This keyword will prevent items from being displayed in the Top tables but will be included in the main totals.|
|VisitTimeout||This allows you to set the default timeout for a visit. Default is 1800 seconds.|
|IgnoreHist||This option really shouldn’t be used. If used, it will cause Webalizer to ignore the history file.|
|FoldSeqErr||If set to yes, Webalizer will ignore sequence messages.|
|Group*||This keyword groups similar objects together.|
|For more options read the man page –> man webalizer|