formats index
	*** ZIP (PKZip compressed files)

** Document revision 1.1

  The files seen on the C64 are generally  PKZIP  1.1-compatible  archives,
using the older IMPLODE algorithm, which are decompressible on the C64/C128
using various utilities. All versions of PKUNZIP (and compatible  programs)
will also handle the older archives. The explanation I provide below covers
up to the newest version PKZIP at the time of writing this document, 2.04g.

  They always start with the 'PK' string at the beginning of the file,  and
the first filename follows very closely. This archive holds 4-pack  ZipCode
files, as the filename shows:

       00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F        ASCII
       -----------------------------------------------   ----------------
00000: 50 4B 03 04 14 00 00 00 08 00 00 00 00 00 19 A1   PK..............
00010: EB 0D C2 45 00 00 50 69 00 00 07 00 00 00 31 21   ..............1!
00020: 4D 53 48 4F 57 E4 BC 7B 5C 53 47 DA 38 3E 67 CE   MSHOW...........

  Bytes: $00-03: PKZIP local file header signature ($50 $4B $03 $04,  first
                 two bytes are ASCII "PK"). This signature is used  at  the
                 beginning of *each* compressed file.
          04-05: Program version that created archive:
                  decimal value/10 = major version # (in this case 2)
                  decimal value%10 = minor version # (in this case .0)
          06-07: General purpose bit flags:
                  bit 0:    set - file is encrypted
                          clear - file is not encrytped
                  bit 1: if compression method 6 used (imploding)
                            set - 8K sliding dictionary
                          clear - 4K sliding dictionary
                  bit 2: if compression method 6 used (imploding)
                            set - 3 Shannon-Fano trees were used to  encode
                                    the sliding dictionary output
                          clear - 2 Shannon-Fano trees were used

                          For method 8 compression (deflate):
                           bit 2  bit 1
                             0      0    Normal (-en) compression
                             0      1    Maximum (-ex) compression
                             1      0    Fast (-ef) compression
                             1      1    Super Fast (-es) compression

                          Note:  Bits  1  and  2  are  undefined   if   the
                          compression method is any other than 6 or 8.
                  bit 3: if compression method 8 (deflate)
                            set - the fields crc-32,  compressed  size  and
                                  uncompressed size are set to zero in  the
                                  local header. The correct values are  put
                                  in  the   data   descriptor   immediately
                                  following the compressed data.

                  The upper three bits are reserved and used internally  by
                  the software when processing the zipfile.  The  remaining
                  bits are unused.
          08-09: Compression method:
                  0 - Stored (no compression)
                  1 - Shrunk
                  2 - Reduced with compression factor 1
                  3 - Reduced with compression factor 2
                  4 - Reduced with compression factor 3
                  5 - Reduced with compression factor 4
                  6 - Imploded
                  7 - Reserved for Tokenizing compression algorithm
                  8 - Deflated
          0A-0B: Last modified file time in MSDOS format
                  Bits 00-04: Seconds/2 (0-58, only even numbers)
                       05-10: Minutes (0-59)
                       11-15: Hours (0-23, no AM or PM)
          0C-0D: Last modified file date in MSDOS format
                  Bits 00-04: Day (1-31)
                       05-09: Month (1-12)
                       10-15: Year minus 1980
          0E-11: CRC-32 of file (low-high format)
          12-15: Compressed size of file (low-high format)
          16-19: Uncompressed size of file (low-high format)
          1A-1B: Filename length (FL)
          1C-1D: Extra field length, description (EFL)
          1E-(1E+FL-1): Filename
          (1E+FL)-(1E+FL+EFL-1): Extra field

  You will notice that in the above byte layout, there is no mention of C64
filetype. That particular field seems to be stored in the central directory
at the end of the ZIP archive.

  There are several other signatures used within the ZIP format.  The  byte
sequence 50 4B 01 02 is used  to  signify  the  beginning  of  the  central
directory while the byte sequence 50 4B 05 06 is used to show  the  end  of
the central directory.

  The above explanation is only included for completeness.  Without  source
code, it is almost impossible to work with ZIP archives.

ÿ