|
| |||
|
|
|
Nolce (Netscape's Off Line Cache Explorer) is a Linux program which allows an off-line navigation of Netscape Navigator cache files adjusting their names and links.
Introduction |
Every Netscape Navigator user probably knows
that it saves almost all files downloaded from the Internet in the local hard disk,
unless this option has been disabled by the user. Html files, images,
and downloaded documents are normally stored under the directory
$HOME/.netscape/cache
.
One could like to view those downloaded documents also off-line, read them
with calm, and possibly save them with related images.
But this isn't immediately possible, because stored files in the cache have their names
changed, i.e. an original main.html may become a
cache33BAD64001B0829.html
.
Besides they are stored under the cache directory in subdirs like 00,
01, ...
without any respect of the relative positions of files.
So even if you could guess what cached file corresponds to your desired
document, you see it without any image and with all links not working.
Saving a document from Netscape after receiving it doesn't save the related
images and links, so you can see only the textual part of the document when
you are off-line.
One can think that in this situation Netscape could retrieve lacking images from the cache, but it
isn't so because before using a cached file, it tries to connect to the
original site to check if the remote file is more recent than the local. As if
you're off-line this check isn't possible, the local file isn't used.
Usage and what the program does |
The file index.db
under the Netscape cache directory contains the informations
necessary to associate cached files with their original names, sizes, creation
date, file type and so on. It is created by Netscape when first documents
are cached.
Nolce must not run when Netscape is in execution, because the file
index.db
may be damaged if two programs open it at the same time. To avoid problems,
nolce uses and recognizes the same lock file of Netscape, so when one of
the two programs runs, the other knows that it can't use the cache.
Lock file is a symbolic link called lock
created in the directory
$HOME/.netscape
.
(However, from version 1.8-1 the option -l
may be used to
ignore lock file, see below)
With those informations nolce can copy those files in a new directory
structure under dest_dir
(default is $HOME/cached
) which
reflects the directory structure of the original site of the file,
restoring obviously their real names.
For example if 00/cache33BAD64001B0829.html
corresponds to an URL like
http://www.rai.it/raiuno/aree.html
, the program creates the directory
www.rai.it
, then under it the direcory raiuno
and finally copies
cache33BAD64001B0829.html
into aree.html
under it.
A summary file is created as an html file, so after that the program finishes
one can easily know what html documents it retrieved and can easily browse
them.
When viewing retrieved documents, links which are in italics are
links to other cached files, so you can view them off-line too.
Note that some fixed fonts may render italics as bold.
Copied html files are slightly modified when necessary, but we'll talk of this in the section HOW IT WORKS.
Nolce doesn't change in any way the original Netscape cache, which continues to work normally.
From version 1.5, nolce can also process caches generated by Netscape
for Windows, with the option -p
.
Let's now talk about how using nolce.
First of all you can obtain a small help launching it with --help
and this is what you get:
n_hours
parameter is very useful when you want to process only
the files downloaded during the last connection.
dest_dir
is the direcory under which will be created the direcory structures.
The program will distinguish between http://
and
ftp://
documents putting the
first ones under a subdir http
of dest_dir
and
the second ones under ftp
.
summary_file
will be always created in dest_dir, even if you supply an
absolute path. If summary file exist, it is not overwritten, but new entries
are appended to it.
summary_file
contains an entry for every HTML file processed.
summary_file
.
-m
option, missing
images are kept.
sub_string
(options -g
or -G
) is case sensitive.
-p
must be used if the cache to be processed
is generated by Netscape for Windows. In this case the name of index
file is assumed to be fat.db
and file names are all converted to
lower case, as are Dos files viewed from Linux.
-l
, the cache is processed even if there is a lock
file in $HOME/.netscape
. It's useful when the cache specified
with -c
isn't the one the Netscape in execution uses, or when
Netscape isn't installed. Use with care, and don't launch more copies of nolce on
the same directory.
nolce -smc /cache
is the same of nolce -s -m -c /cache
or nolce smc/cache
.
Installation |
This software is available in a package containing both source and binary versions.
It can be obtained at
ftp://sunsite.unc.edu/pub/Linux/apps/www/plugins and at
https://members.tripod.com/~giustrov/download.html
For using this program, you must have installed the DB library.
It's necessary to read records
stored in the index.db
file.
In practice you need libdb.so
to run the compiled version, and also db include
files to compile the program.
For Linux, with Slackware and Redhat distributions, the library should be
present by default.
For the include files, with Redhat you must install a package called
db-devel
or similar. For Slackware, they are in
libc.tgz
, so they aren't a problem.
For compiling, cd to src
subdir and do make
.
Do make install
to compile and copy the executable in
/usr/bin
, the man
page in /usr/man/man1
and the documentation in
/usr/doc/nolce-VERSION
.
If standard destinations don't fit your taste, modify them in the Makefile.
Compatibility |
I have tested the program under Linux only, and with Netscape Navigator 3.01,
4.0b5, and 4.03 .
Probably it works with version 2.0 also, since the present format of the cache
was introduced with this release.
It should work also with other Unix, if their Netscape indexes its cache in the
same way of the linux version, that is with a DB hash file named
index.db
under $HOME/.netscape/cache
.
If the name is different, it's easy to
change the value of CACHE_FILE, in the defines section of the source file.
From the point of view of the language, I use code conforming to ANSI C or
POSIX standards only, so if your system supports them, there must be no
problems.
As I know, the following circumstances may cause problems or errors in
compiling nolce
:
make
correctly defines the
variable CC
as your
site compiler name (i.e cc or gcc).
This must be ensured by every make
, but if not, define them by hand.
flex -l
. This is what happens on some Slackware systems, where flex calls the real
program flex.slk with the -l option. The result is a segmentation fault error
when nolce is executed.
-Darray
to DEFINES
in the Makefile (see below),
solves the problem.
LEX=flex
is present in the Makefile. On non Linux systems, this probably should be changed.
-lfl
library, and it's provided in the variable
LDFLAGS
of the Makefile.
LFLAGS
variable.
yylex()
function, called in the process_html_file
of main.c
.
Input and output files are supplied to yylex with the extern variables
yyin
and yyout
. Probably this is not conforming with original AT&T lex,
but, as I know, it conforms to POSIX specification for lex, and, above all,
it's almost the only way one can use with flex.
yytext
as a char pointer, while other lex may define it as a
char array. If this is your case, you must compile main.c
with
the -Darray
option, which can be done by setting the variable DEFINES
of the Makefile.
If you discovery a bug, i.e. an abnormal exit of the program with a Segmentation Fault error, please let me know. You should send me an e-mail with a brief
description of the circumstances under which the error happened, command line
options, and above all the core file generated by the program (compress it to avoid mail messages too heavy).
Shells permit to decide if one wants to obtain a core dump after an abnormal
termination of a program. With bash
see the command ulimit
.
For being the core file useful to me, it must be generated by a program
compiled with debug info: add the option -g3
to CFLAGS
in the Makefile. If you have libg
installed, add also -lg
to LDFLAGS
.
However, before sending the core file, it could be useful the simple output of
gdb
. In case of problems, compile nolce with debug infos, launch it from
the debugger, and when the execution stops with the error, inside gdb
,
give the command bt
and send to me the informations displayed.
How it works |
i. INDEX.HTML
A lot of urls, i.e. http://home.netscape.com
, don't contain an HTML file name.
In this situation the server provides a default HTML file, usually
index.html
,
and nolce appends this same name to these urls.
It could happen that an HTML file contains a link to such an url with the file
name explicited. If this name is different from index.html
, the link doesn't
work.
ii. LINKS
The main work nolce does is changing links in HTML files to point to local
files.
There are various types of links (imagine you're browsing the document
http://www.aaaa.com/bbb/index.html
):
HREF="ccc/image.gif"
. In this case the browser loads
the file image.gif
from the directory ccc
under
bbb
.
HREF="http://www.aaaa.com/ccc/image.gif"
. In this
case Netscape will always try to obtain the document from the net, so
nolce transforms the link in something like "../ccc/image.gif"
.
HREF="/ccc/image.gif"
. These links must be
interpreted as http://www.aaaa.com/ccc/image.gif
, not regarding of the
directory in which the HTML files is.
If a link points to a document present in the cache, it is changed to a relative link, otherwise it's turned in an absolute link.
iii. LEX
If your lex program is GNU flex, the flag -Cf
may be given to it (put in
the variable LFLAGS
of the Makefile). This makes the program bigger, but
execution speeds up of 10-15%.
iv. MISCELLANEOUS
nolce.h
there are some defines which can be customized.
<h3>Link</h3>
, the italics isn't shown.
`?'
. Mainly for this reason, when creating
directories, strange characters like `?', `=', `('
and so on are substituted
with an underscore.
Contacting the author |
>
For any question, bug report or comment, email to g.trovato@usa.net
My home page is
https://members.tripod.com/~giustrov
Nolce web page is:
https://members.tripod.com/~giustrov/nolce.html
LICENCE