formats index
*** LHA, LZH, LZS (LHArc compressed files)
** Document revision 1.1
These files are created with LHA on the C64 (or C128), and can present
special problems to the typical PC user. The compression used is LH1, an
old method used on LZH 1.xx (pre-version 2), so any version of LHA on the
PC can uncompress them. However, LHA allows filenames of up to 18
characters long, and DOS doesn't know how to handle them (Windows 95 unLHA
utilities will extract the full filename). Usually, some of the files
already uncompressed will be overwritten by other files just being
uncompressed because the name seems the same to DOS. To LHA however, the
filenames are quite different.
LHA archives always have a string two bytes into the file ("-L??-") which
describe the type of compression used. Over the development life of LHA
there have been several different compression algorithms used. The "??" in
the "-L??-" can be one of several possibilites, but on the C64 it is likely
limited to "H0" (no compression) and "H1". Newer versions of LHA/LZH use
other combinations like "H2", "H3", "H4", "H5", "ZS", "Z5", and "Z4". The
letters typically used in the compression string come from a combination of
the creators initials of the LZ algorithm, Lempel/Ziv, and the author of
the LHA program, Haruyasu Yoshizaki.
The following is a sample of an LHA header. Note the string to search for
at byte $0002:
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ASCII
----------------------------------------------- ----------------
0000: 24 93 2D 6C 68 31 2D 39 02 00 00 16 04 00 00 00 ..-lh1-.........
0010: 08 4C 14 00 00 0E 73 79 73 2E 48 6F 75 73 65 20 ................
0020: 4D 34 00 53 DE 06 11 1C 12 C4 C8 FA 3A 5B DC CE ................
0030: B2 FA 38 1E 46 B0 B6 9E 9B 75 7A 49 71 72 B3 53 ................
0040: 6E 4E B4 A0 BF 5E 95 B3 05 8A 75 D5 6C E3 03 4A ................
0050: 2C 54 F4 AF 05 18 59 E2 F4 34 4A 0A 28 D4 33 E2 ................
0060: C4 9D 04 D7 C7 8B 91 66 0E E5 DE 98 3C 92 CC B5 ................
The header layout is fairly basic. The header for each file starts *two*
bytes before the "-lh?-" string. The above example has already been trimmed
down to start at these two bytes. Each header has the same layout, only the
length varies due to the length of the filename. Here is a breakdown of the
above example.
Bytes: $0000: 24 - Length of header (known as "LEN", not including this
and the next byte). If it is zero, we are at the end
of the file.
0001: 93 - Header checksum
0002: 2D 6C 68 31 2D - LHA compression type "-LH1-"
0007: 39 02 00 00 - Compressed file size ($00000239)
000B: 16 04 00 00 - Uncompressed file size ($00000416)
000F: 00 08 4C 14 - Time/date stamp
0013: 00 - File attribute
0014: 00 - Header level
00 = non-extended header
01, 02 = extended header
0015: 0E - Length of the following filename
0016: 73 79 73 2E 48 6F 75 - Filename, with a zero and filetype
73 65 20 4D 34 00 53 appended ("SYS.HOUSE M4úS"). The
name can be up to 18 characters in
length. Note the length *includes*
the zero and filetype, making the
actual filename length 2 bytes
shorter.
0024: DE 06 - File data checksum (starts at LEN)
0026: 11 1C 12 C4 C8 FA... - File data (starts at LEN+2)
The header checksum at byte $0001 is calculated by adding the bytes in
the header from $0002 (LHA compression type) to LEN+1 (File data checksum),
without carry.
The time/date stamp (bytes $000F-$0012), is broken down as follows:
Bytes:$000F-0010: Time of last modification:
BITS 0- 4: Seconds divided by 2
(0-58, only even numbers)
BITS 5-10: Minutes (0-59)
BITS 11-15: Hours (0-23, no AM or PM)
Bytes:$0011-0012: Date of last modification:
BITS 0- 4: Day (1-31)
BITS 5- 9: Month (1-12)
BITS 10-15: Year minus 1980
The format of the compressed data is much too complex to get into here.
Understanding the layout would require knowledge of Huffman coding and
sliding dictionaries, and is nowhere near as simple as ZipCode! The
description given in the LHA source code for the different compression
modes are as follows:
-lh0- no compression, file stored
-lh1- 4k sliding dictionary (max 60 bytes) + dynamic Huffman + fixed
encoding of position
-lh2- 8k sliding dictionary (max 256 bytes) + dynamic Huffman
-lh3- 8k sliding dictionary (max 256 bytes) + static Huffman
-lh4- 4k sliding dictionary (max 256 bytes) + static Huffman +
improved encoding of position and trees
-lh5- 8k sliding dictionary (max 256 bytes) + static Huffman +
improved encoding of position and trees
-lzs- 2k sliding dictionary (max 17 bytes)
-lz4- no compression, file stored
-lz5- 4k sliding dictionary (max 17 bytes)
There are several utilities that you can use to decompress these files,
like the already-mentioned LHA on the PC, or Star LHA, one of the many
excellent utilities contained in the Star Commander distribution package.
If you use Star LHA, keep in mind it needs the program LHA v2.14 (or newer)
to extract. If an older version of LHA is used (such as the common version
2.13), then the files being extracted will be corrupt. It will extract the
files directly into a D64 archive, so the long C64 filenames will not be
lost.
To an emulator user there is no use to these files, as their only real
usage on a C64 was for storage and transmission benefits. The standard
compression program on the PC is PKZIP (or ZIP compatibles), so unless you
have some need to send *compressed* files back the C64, there is no use in
using LHA.
˙