Don't let your directories get too large
If you tend to create directories with very many items in them, you may
be throwing away performance without even knowing it. This may happen frequently
when you write programs to automatically process data for you.
Keep your directories small
Whenever you, or one of your programs, accesses a file, Solaris must first
locate it. In order to do so, Solaris must find each component of the path,
in order to know exactly where on the disk drive your file is located.
For example, if you specify the file /share/work/sort.c, Solaris
starts at the root directory to find the entry for the share directory.
Next, Solaris reads from the share directory looking for the work
directory; from there, it searches for the file sort.c. Suppose
for a moment that Solaris must read the disk for each directory access.
If a directory is small, the directory may take only one or two sectors
of disk space. However, when you search through a large directory that
consumes more space, you have to read more directory entries from the disk
and spend more time looking for the entry you want.
These disk reads are very expensive. If you're trying to read
a tiny file, Solaris might spend more time looking for the file than it
would reading the file for you.
To make file access as speedy as possible, Solaris maintains two levels
of buffering. First, frequently used disk sectors are cached in memory,
because once you use a file, chances are good that you'll need to access
it again soon. The second level of buffering is the directory-name lookup
cache. This is simply a buffer that contains information about the starting
location of frequently used directories.
So when you access a directory that's not in the buffers, normally
Solaris goes through the process we've described to locate the directories.
In doing so, it buffers the disk sectors it read to accomplish the task
and creates an entry for the directories it searched through.
The problem with large directories is that when Solaris searches
through them for a specified entry, it may have to read multiple sectors
from the hard drive. Since the disk sector buffer is a limited resource,
it will overflow, and Solaris will begin discarding sectors to make room
for the new ones.
This, then, is the major problem with large directories. The larger
the directory, the more information gets discarded, slowing down all processes
as Solaris is forced to re-read sectors from the disk drives. Even worse,
when a directory becomes really large, the directory itself might
overflow the buffer. Thus, even accessing the same directory may force
Solaris to re-read data from the disk.
What's the limit?
So, how many files can you place in a subdirectory before performance penalties
accrue? We can't give you a hard-and-fast answer. First, only you can determine
what performance tradeoffs you're willing to make. Second, directory entries
aren't of a fixed size. They vary primarily on the length of the filename.
For example, a directory containing five files named a, b,
c,
d, and e could take less space than a directory
containing a single file with a name such as
GNU_gcc_v2.7.2_Pentium_Optimized_Solaris_2.5.1_i386.pkg.tar.gz.
Conclusion
It may not matter if a directory is large if you use it infrequently. However,
the directories that you use often should be kept small to keep the system
running at its peak. Typically, you can arrange such a directory as a hierarchical
structure, breaking it into several smaller directories, and distributing
the files among them in a structured fashion. For example, if you have
many users, but only a few log in at any given time, you may want to divide
your user-account directory into multiple pieces. Thus, if you typically
place user accounts on /acct, you might break up the directory into
/acct/a, /acct/b, /acct/c..., and use the first letter
of the account to select the directory in which you'll put the user account.
Remember, each sector of directory information that Solaris reads
in forces other data in memory to be discarded. Keeping your directories
smaller can help you keep your data in RAM and give you the highest possible
performance.
<< Back to Tech Corner