😴 I'm taking a break from February 1, 2024 until May 1, 2024, so I will be less active here and on social media.

Using GoAccess to read NGINX access logs

November 03, 2019

Did you know you can still analyze your web traffic without Google Analytics? You can just use your NGINX access logs! As part of logrotate’s default configuration on Ubuntu, NGINX keeps an access log of the last 14 days. You can see if any access log is being rotated by checking the following directory: /etc/logrotate.d.

In here, you’ll find an nginx configuration file:

/var/log/nginx/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 www-data adm
    sharedscripts
    prerotate
        if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
            run-parts /etc/logrotate.d/httpd-prerotate; \
        fi \
    endscript
    postrotate
        invoke-rc.d nginx rotate >/dev/null 2>&1
    endscript
}

You can find these logs in /var/log/nginx/. If you want to analyze these access files with goaccess, you can run:

cd /var/log/nginx && zcat -f access.log* | goaccess --log-format=COMBINED -c

This allows you to get a quick overview of the last 14 days of visitors and requests. This way, you can find out useful information about visitors.

For example, I was able to determine I had around 1300 unique visitors, however 777 of these were crawlers, leaving around 423 visitors in 14 days, which is more than I expected.

This is what the app displays:


 Dashboard - Overall Analyzed Requests (20/Oct/2019 - 03/Nov/2019)   [Active Panel: Geo Location]

 Total Requests  11052 Unique Visitors  1338 Requested Files 299  Referrers  0
 Valid Requests  11052 Init. Proc. Time 0s   Static Files    109  Log Size   0.0   B
 Failed Requests 0     Excl. IP Hits    0    Not Found       1098 Tx. Amount 205.54 MiB
 Log Source      STDIN

> 15 - Geo Location                                                                Total: 78/78

Hits      h% Vis.      v% Tx. Amount Data
---- ------- ---- ------- ---------- ----
2934  50.17%  447 100.00%  82.65 MiB EU Europe
1369  23.41%   73  16.33%  31.15 MiB  β”œβ”€ BE Belgium
 353   6.04%  146  32.66%   9.95 MiB  β”œβ”€ FR France
 236   4.04%    3   0.67%   2.79 MiB  β”œβ”€ SE Sweden
 196   3.35%   55  12.30%  13.19 MiB  β”œβ”€ CZ Czech Republic
 180   3.08%   32   7.16%   3.65 MiB  β”œβ”€ DE Germany
 179   3.06%   54  12.08%   3.87 MiB  β”œβ”€ RU Russian Federation
  64   1.09%   12   2.68%   4.07 MiB  β”œβ”€ GB United Kingdom
  61   1.04%    7   1.57%  34.50 KiB  β”œβ”€ NL Netherlands
  56   0.96%   29   6.49% 131.90 KiB  β”œβ”€ IE Ireland
  47   0.80%    1   0.22%  11.12 KiB  β”œβ”€ CH Switzerland
  34   0.58%    4   0.89%   1.42 MiB  β”œβ”€ UA Ukraine
  31   0.53%    1   0.22%   6.67 KiB  β”œβ”€ EU Europe
  27   0.46%    1   0.22% 522.49 KiB  β”œβ”€ LT Lithuania
  24   0.41%    8   1.79%   2.27 MiB  β”œβ”€ PL Poland
[?] Help [Enter] Exp. Panel  0 - Sun Nov  3 20:01:36 2019                   [q]uit GoAccess 1.3

You can also filter out specific IP addresses, so you don’t track your own visits. In this example, Belgian visitors’ count is probably higher than usual because my own visits are (still) counted.

Anyway, here’s a few fun things I found out using these access logs:

  • Plenty of the requests seemed to be coming from servers in China. (Asia was #1 in my regional overview.)
  • Bots and such are constantly being used to find vulnerabilities in public websites. Looking at requests, I can see certain pages being requested frequently (and failing), like: /aa.php, /log.php, /2.php, /wp-login.php, /editBlackAndWhiteList, /qq.php, and more. I’m guessing some of these come from China.
  • macOS Catalina (and I’m assuming any other future versions of macOS) get reported as Mac OS 10.1 Puma.
  • I found that a link on my website was broken and was causing lots of 404’s. (Whoops!)

Overall, I’m really liking this. You may need to adjust your settings to retain more logs if you wish to retain your logs longer, and if you’re a company, you have to adhere to GDPR and anonymize IP addresses after a certain period of time.

Tagged as: Programming