How to parse Apache access logs and count requests per IP in Perl?
Question
How to parse Apache access logs and count requests per IP in Perl?
Parsing Apache Access Logs and Counting Requests per IP in Perl
Analyzing Apache access logs to determine the number of requests made by each IP address is a common task in system administration and log analysis. Perl, with its powerful text processing capabilities and flexible data structures, is well-suited for this kind of work.
Below, you’ll find a concise yet robust explanation and a runnable Perl script example that:
- Reads Apache access log lines from standard input
- Parses each line to extract the client IP
- Maintains a count of how many requests came from each IP
- Prints a summary of requests per IP in descending order
Understanding the Apache Log Format
The default Apache access log format (“Common Log Format”) typically looks like this:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
The first field is the client IP address, which we want to capture and count.
Perl Concepts to Note
@ARGVand reading from<>: Typically, Perl scripts read files given as command line arguments or standard input if no args.- Hash data structure
%countto store IP → count mapping. - Regular expressions to parse and capture the IP from each log line.
- Sorting a hash by value descending and then key ascending.
- Sigils:
$for scalars,@for arrays,%for hashes.
Example Perl Script
This script reads from STDIN, so you can pipe your access log in or redirect a file:
#!/usr/bin/perl
use strict;
use warnings;
my %count;
while (my $line = <>) {
chomp $line;
# Apache common log format starts with IP as first element
# Use regex to capture IP at start of line (IPv4 and IPv6 simplified)
if ($line =~ /^(\S+)/) {
my $ip = $1;
$count{$ip}++;
}
}
# Sort IPs by descending request count, then lex ascending IP
foreach my $ip (sort { $count{$b} <=> $count{$a} || $a cmp $b } keys %count) {
print "$ip: $count{$ip}\n";
}
How to Run
Save the above to a file, say count_ips.pl, make it executable:
chmod +x count_ips.pl
Then run it on your Apache log file:
./count_ips.pl access.log
Or using a pipe:
cat access.log | ./count_ips.pl
Common Pitfalls and Gotchas
- IP format variance: This example assumes the IP is the very first "word" on each line. If your log uses a custom format or records proxies, you might need to adjust where you capture the IP (sometimes
X-Forwarded-Forheaders are logged elsewhere). - IPv6 support: The regex
/^(\S+)/captures any non-whitespace sequence at line start, which accommodates IPv4 and IPv6 addresses correctly. - Huge logs: For very large logs, consider processing the log in chunks or using more memory-efficient approaches (e.g., databases).
- Multithreading: This script is single-threaded and works well for moderate-sized logs.
- Log rotation: Remember to process rotated or compressed logs accordingly.
Extending This Script
- Count requests per IP per day
- Filter by HTTP method or status code
- Convert counts to percentages
- Output CSV for spreadsheet analysis
Perl’s TMTOWTDI (“There’s more than one way to do it”) philosophy means you could also use split, specialized log parsing modules, or different sorting techniques, but this approach balances simplicity, readability, and functionality for most needs.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 9ms
(empty)(empty)