Don't let your directories get too large

If you tend to create directories with very many items in them, you may be throwing away performance without even knowing it. This may happen frequently when you write programs to automatically process data for you.
 
 

Keep your directories small

Whenever you, or one of your programs, accesses a file, Solaris must first locate it. In order to do so, Solaris must find each component of the path, in order to know exactly where on the disk drive your file is located. For example, if you specify the file /share/work/sort.c, Solaris starts at the root directory to find the entry for the share directory. Next, Solaris reads from the share directory looking for the work directory; from there, it searches for the file sort.c. Suppose for a moment that Solaris must read the disk for each directory access. If a directory is small, the directory may take only one or two sectors of disk space. However, when you search through a large directory that consumes more space, you have to read more directory entries from the disk and spend more time looking for the entry you want.

 These disk reads are very expensive. If you're trying to read a tiny file, Solaris might spend more time looking for the file than it would reading the file for you.

To make file access as speedy as possible, Solaris maintains two levels of buffering. First, frequently used disk sectors are cached in memory, because once you use a file, chances are good that you'll need to access it again soon. The second level of buffering is the directory-name lookup cache. This is simply a buffer that contains information about the starting location of frequently used directories.

 So when you access a directory that's not in the buffers, normally Solaris goes through the process we've described to locate the directories. In doing so, it buffers the disk sectors it read to accomplish the task and creates an entry for the directories it searched through.

 The problem with large directories is that when Solaris searches through them for a specified entry, it may have to read multiple sectors from the hard drive. Since the disk sector buffer is a limited resource, it will overflow, and Solaris will begin discarding sectors to make room for the new ones.

 This, then, is the major problem with large directories. The larger the directory, the more information gets discarded, slowing down all processes as Solaris is forced to re-read sectors from the disk drives. Even worse, when a directory becomes really large, the directory itself might overflow the buffer. Thus, even accessing the same directory may force Solaris to re-read data from the disk.
 
 

What's the limit?

So, how many files can you place in a subdirectory before performance penalties accrue? We can't give you a hard-and-fast answer. First, only you can determine what performance tradeoffs you're willing to make. Second, directory entries aren't of a fixed size. They vary primarily on the length of the filename. For example, a directory containing five files named a, b, c, d, and e could take less space than a directory containing a single file with a name such as GNU_gcc_v2.7.2_Pentium_Optimized_Solaris_2.5.1_i386.pkg.tar.gz.

Conclusion

It may not matter if a directory is large if you use it infrequently. However, the directories that you use often should be kept small to keep the system running at its peak. Typically, you can arrange such a directory as a hierarchical structure, breaking it into several smaller directories, and distributing the files among them in a structured fashion. For example, if you have many users, but only a few log in at any given time, you may want to divide your user-account directory into multiple pieces. Thus, if you typically place user accounts on /acct, you might break up the directory into /acct/a, /acct/b, /acct/c..., and use the first letter of the account to select the directory in which you'll put the user account.

 Remember, each sector of directory information that Solaris reads in forces other data in memory to be discarded. Keeping your directories smaller can help you keep your data in RAM and give you the highest possible performance.
 

<< Back to Tech Corner