Apache Log Parser
A PHP library to parse Apache logs., (*1)
, (*2)
Installation
This library is installable via Composer. Just run:, (*3)
composer require benmorel/apache-log-parser
Requirements
This library requires PHP 7.1 or later., (*4)
Project status & release process
This library is under development., (*5)
The current releases are numbered 0.x.y
. When a non-breaking change is introduced (adding new methods, optimizing
existing code, etc.), y
is incremented., (*6)
When a breaking change is introduced, a new 0.x
version cycle is always started., (*7)
It is therefore safe to lock your project to a given release cycle, such as 0.1.*
., (*8)
If you need to upgrade to a newer release cycle, check the release history
for a list of changes introduced by each further 0.x.0
version., (*9)
Package contents
This library provides a single class, Parser
., (*10)
Quick start
First construct a Parser
object with the LogFormat
defined in the httpd.conf file of the server that generated the log file:, (*11)
use BenMorel\ApacheLogParser\Parser;
$logFormat = "%h %l %u %t \"%{Host}i\" \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"";
$parser = new Parser($logFormat);
The library converts every format string of your log format to a field name;
the list of fields can be accessed through the getFieldNames()
method:, (*12)
var_export(
$parser->getFieldNames()
);
array (
0 => 'remoteHostname',
1 => 'remoteLogname',
2 => 'remoteUser',
3 => 'time',
4 => 'requestHeader:Host',
5 => 'firstRequestLine',
6 => 'status',
7 => 'responseSize',
8 => 'requestHeader:Referer',
9 => 'requestHeader:User-Agent',
)
You're then ready to parse a single line of your log file: the parse()
method accepts the log line,
and a boolean to indicate whether you want the results as a numeric array, whose keys match the ones of the field names array:, (*13)
$line = '1.2.3.4 - - [30/May/2018:15:00:23 +0200] "www.example.com" "GET / HTTP/1.0" 200 1234 "-" "Mozilla/5.0';
var_export(
$parser->parse($line, false)
);
array (
0 => '1.2.3.4',
1 => '-',
2 => '-',
3 => '30/May/2018:15:00:23 +0200',
4 => 'www.example.com',
5 => 'GET / HTTP/1.0',
6 => '200',
7 => '1234',
8 => '-',
9 => 'Mozilla/5.0',
)
Or as an associative array, with the field names as keys:, (*14)
var_export(
$parser->parse($line, true)
);
array (
'remoteHostname' => '1.2.3.4',
'remoteLogname' => '-',
'remoteUser' => '-',
'time' => '30/May/2018:15:00:23 +0200',
'requestHeader:Host' => 'www.example.com',
'firstRequestLine' => 'GET / HTTP/1.0',
'status' => '200',
'responseSize' => '1234',
'requestHeader:Referer' => '-',
'requestHeader:User-Agent' => 'Mozilla/5.0',
)
If a line cannot be parsed, an InvalidArgumentException
is thrown. Be sure to wrap your parse()
calls in a try-catch block:, (*15)
try {
$parser->parse($line, true)
} catch (\InvalidArgumentException $e) {
// ...
}
Field names returned by the library
This table shows how format strings are mapped to field names by the library:, (*16)
Format string |
Field name |
%a |
clientIp |
%{c}a |
clientIp:c |
%A |
localIp |
%B |
responseSize |
%b |
responseSize |
%{VARNAME}C |
cookie:VARNAME |
%D |
responseTime |
%{VARNAME}e |
env:VARNAME |
%f |
filename |
%h |
remoteHostname |
%H |
requestProtocol |
%{VARNAME}i |
requestHeader:VARNAME |
%k |
keepaliveRequests |
%l |
remoteLogname |
%L |
requestLogId |
%m |
requestMethod |
%{VARNAME}n |
note:VARNAME |
%{VARNAME}o |
responseHeader:VARNAME |
%p |
canonicalPort |
%{FORMAT}p |
canonicalPort:FORMAT |
%P |
processId |
%{FORMAT}P |
processId:FORMAT |
%q |
queryString |
%r |
firstRequestLine |
%R |
handler |
%s |
status |
%t |
time |
%{FORMAT}t |
time:FORMAT |
%T |
timeToServe |
%{UNIT}T |
timeToServe:UNIT |
%u |
remoteUser |
%U |
urlPath |
%v |
serverName |
%V |
serverName |
%X |
connectionStatus |
%I |
bytesReceived |
%O |
bytesSent |
%S |
bytesTransferred |
%{VARNAME}^ti |
requestTrailerLine:VARNAME |
%{VARNAME}^to |
responseTrailerLine:VARNAME |
If two or more format strings yield the same field name, the second one will get a :2
suffix, the third one a :3
suffix, etc., (*17)
You can expect to parse more than 250,000 records per second (> 50 MiB/s) when reading logs from a file on a modern server with an SSD drive., (*18)
Returning records as an associative array comes with a small performance penalty of about 6%., (*19)