2010/04/14
tags: solaris utmpx(4) wtmpx(4) last(1)
| github home | http://github.com/mcarpenter/ckwtmpx |
|---|---|
| repository URLs |
https://github.com/mcarpenter/ckwtmpx.git git://github.com/mcarpenter/ckwtmpx.git |
Periodically I discover a Solaris 10 server with a corrupted
/var/adm/wtmpx file, the accounting file that records
login times and reboots. I don't believe this to be malicious (eg a
hacker clumsily covering their tracks, although you should be aware
of that possibility) but more likely some subtle bug in the depths of
the login subsystem. Unfortunately most system tools don't report
this problem and simply stop processing when they read corrupted
data, although new records do continue to be appended to the corrupted
file. In particular, last(1) does not emit any error message
when reading such a file and the only symptom that you might notice is
truncated output. Another way to spot this problem is if the size of the
/var/adm/wtmpx file is not evenly divisible by the record
length (372 bytes for current releases of Solaris 10).
I haven't yet noticed any pattern to the corruption: sometimes it's just handful of zero bytes, other times there is an identifiable ASCII source IP address, and other times it's just junk.
Consequently I wrote ckwtmpx to:
wtmpx for validity
User Commands ckwtmpx(1)
NAME
ckwtmpx - check Solaris wtmpx files for corruption, and per-
form optional repairs.
SYNOPSIS
ckwtmpx [-d] [-o output_file] [-e error_file] [-t time_travel]
ckwtmpx -h
ckwtmpx -v
DESCRIPTION
It sometimes happens, either malevolently or otherwise, that
Solaris' binary format accounting file /var/adm/wtmpx
becomes corrupted. The only normal symptom of this is that
standard tools such as last stop processing the file as soon
as the corrupt data is encountered (last produces neither an
error message nor a non-zero return code).
ckwtmpx attempts to read a wtmpx file from standard input
one record at a time. Valid records are copied to the
(optional) output file (-o), and bytes that are discarded
are copied to the (optional) error file (-e).
When an invalid record is encountered, ckwtmpx moves forward
through the standard input one character at a time until the
start of a valid record is found. Skipped bytes are written
to the error file as they are discarded. Errors and debug
information are sent to stderr.
A valid record fulfills the following criteria:
Epoch time (ut_tv) is greater than 0 (was written after
1 Jan 1970).
Epoch time (ut_tv) is before now (was not written in the
future).
The wtmpx record type (ut_type) is valid.
Either this is the first valid record found or it is not
more than 70 seconds younger than the previous record
found. (Some systems may buffer output to wtmpx result-
ing occasional temporal misordering of records).
See <utmpx.h> and <utmp.h> for more details on the binary
record format, in particular struct futmpx in <utmpx.h> for
details of the record serialization.
SunOS 5.10 Last change: 14 Apr 2010 1
User Commands ckwtmpx(1)
OPTIONS
Flags -d, -e and -o may be combined as required but note
that
ckwtmpx -o /var/adm/wtmpx </var/adm/wtmpx
will almost certainly cause pain. Use a temporary file.
The following options are supported:
-d Enable debug output to stderr.
-h Print usage to stdout and exit.
-e error_file Writes skipped bytes from the
corrected wtmpx file to error_file.
-o output_file Writes the corrected wtmpx file to
output_file.
-v Print version to stdout and exit.
RETURN VALUE
Returns 0 if the wtmpx file is okay, 1 if it is corrupt and
2 on fatal errors (syntax, file permissions, ...) so ckwtmpx
can be run with no arguments for testing file validity
without spurious output:
if ! ckwtmpx