
                       Directory Structures for NewCore

                              Draft Proposal


Author:    Mike Challis

History:

  0.01  21-Dec-93  Document started
  0.02  22-Dec-93  Mailed to WS,JRe,JRo,KW for initial comments


Outstanding issues:

  - Should directory blocks be "doubled up" like map blocks? (AG says yes!)
  - Extra fields in directory blocks will be needed for integrity
  - Should there be an overall path-name length limit?



1  Summary


This document describes the structure and format of directories for NewCore,
but does not describe the detailed internal format of each directory entry.

Instead, we assume that each directory entry includes a variable-length part
in which whatever information is desired may be stored.



2  Introduction


A "directory" is a mapping from names to objects and object attributes, where
an "object" is a file or a directory.

Each element of this mapping is represented by a "directory entry" in which
is held the name of the referenced object, its location, and other attribute
information.

Each file is held as an object (small or large), and its directory entry 
contains the following information:

  - name
  - object-id
  - allocated size (sectors)
  - current extent (bytes)
  - other attribute data

A directory is represented as a sequence of fixed-size small objects* called
"directory blocks", and its directory entry includes:

  - name
  - object-id (of first directory block)
  - other attribute data

There may be other kinds of directory entry (for example, soft links) which
specify the referenced object in client-dependent ways.

  [ * An alternative representation is as a single small object, but this
      has the disadvantage that the directory's object-id may change when
      the directory is extended. If this happens, then the corresponding
      entry in the directory's parent must be updated - not necessarily an
      easy thing to manage in a "safe" way. ]


We expect that typical directory entries will be less than 80 bytes in size,
and that typical directories will have fewer than 40 entries. On the other
hand, some directory entries may become very large (long file names, and
extended attribute data), and some directories may include large numbers of
files.

For simplicity, we propose that directory entries may not cross directory
block boundaries. This places an upper limit on the size of any directory
entry (a few bytes less than the block size), and with this in mind we
propose a directory block size of 2048 bytes*. This means that many typical
directories will fit into just one or two blocks, and also means that little
space is wasted by unused space in small directories. It's also a convenient
buffer size for cacheing directory blocks in RAM.

  [ * This size can vary from disc to disc - but the limit on directory entry
      size must be fixed independently in order to ensure that directories
      can be copied from any disc to any other disc. ]


The blocks of a directory are linked together in a two-way chain, and within
each block directory entries are linked together as a list. The logical order
of the entries defined by these links is alphabetic by name.

Each block includes information about the total amount of free space
available as well as the "high water mark", and all entries in the block are
compacted when necessary.


Allocation strategies for inserting and deleting directory entries are
designed to maintain high occupancy of directory blocks, and to remain
efficient when directories are created or deleted in alphabetical order - as
happens, for example, when copying or deleting the contents of a directory.



3  Structure details


3.1  Directory block

Each directory block consists of a fixed size header, a variable size area
containing entries, and unused space:

                ----------------
               |  block header  |
               |----------------|
               |                |
               |    entries     |
               |                |
               |----------------|
               |    unused      |
                ----------------

The block header includes the following fields:

  int next_block     - object-id of the next directory block - or 0 if this
                       is the last one

  int prev_block     - object-id of the previous directory block - or 0 if
                       this is the first one

  short first_entry  - offset of the (logically) first entry - or 0 if the
                       block is empty

  short last_entry   - offset of the (logically) last entry - or 0 if the
                       block is empty

  short spare_space* - number of bytes spare

  short hwm*         - offset of first byte of unused space ("high water
                       mark")

  [ * The number of bytes spare is equal to the sum of the amount of unused
      space and the space occupied by any "dead" entries (see below); it is
      the amount of space that can be made available by compaction. The "hwm"
      is equal to the block size less the amount of unused space. ]


3.2  Directory entries

The entire space between the block header and the unused area is filled with
directory entries, which are either "active" or "dead".

Active entries are linked together on a chain whose first and last entries
are referenced by the corresponding first_entry and last_entry fields in the
block header.

Dead entries completely fill any gaps between active entries. They arise when
an active entry is deleted or truncated, and are not linked together.

All entries are word-aligned.


Five types of directory entry are defined in this document:

  TYPE_DIR      - which describes a directory
  TYPE_FILE     - which describes a file
  TYPE_INTERNAL - which holds information private to the object management
                  routines
  TYPE_EXTERNAL - which holds information which can only be interpreted by
                  higher-level routines
  TYPE_SPARE    - which describes an area of space made free when an active
                  entry is truncated

TYPE_SPARE entries are always dead entries; other types are active (but the
type information is preserved when an active entry becomes dead just in case
it can be of help when trying to untangle a damaged disc).


Every type of directory entry has the following fields:

  short next_log       - active entry:
                            offset of (logically) next entry - or 0 if this
                            is the last entry in the block
                         dead entry:
                            1 (this can never be a valid offset for an active
                               entry)

  short type_len       - length of this entry in bytes plus type flags:

                           bits         meaning

                          0 - 10    entry length in bytes (max 2048 less
                                      header size)
                         11 - 12    reserved
                         13 - 15    entry_type (as listed above)

Note that dead entries can be as small as one word, and, since entries are
word-aligned, this means that any gap between active entries can always be
filled with a dead one.


All types except TYPE_INTERNAL and TYPE_SPARE have a variable length name
field:

  char obj_name[..]    - entry name as zero-terminated string


TYPE_DIR and TYPE_FILE directory entries identify the location of the
corresponding object as follows:

  int obj_id           - object-id of file                   (if file)
                       - object-id of first directory block  (if directory)


TYPE_FILE directory entries include extra length fields:

  int obj_size         - allocated space in sectors

  int file_extent      - current extent in bytes


All types except TYPE_SPARE finish with a variable length list of further
type-specific attributes:

  char other_attrs[..] - other attributes (variable length)


3.3  Home zones entry

At present only one TYPE_INTERNAL directory entry is defined. It is always
the first entry in the directory, and holds a (partial) list of the zones in
which small files within this directory are held. Its format is:

    short next_log
    short type_len
    short zones[..]   - a variable length array of zone-id's

The maximum number of entries allowed in the zones[..] list is likely to be
of the order of 50.



4  Algorithms


4.1  Locating a directory entry or point of insertion

Given a name, the corresponding directory entry of type TYPE_FILE, TYPE_DIR
or TYPE_EXTERNAL is located by linear search from the beginning of the
directory. Any TYPE_INTERNAL entries encountered on the way are ignored.

A similar search is done to determine the entry after which a new entry is
to be inserted (note that such a "previous" entry always exists - even if it
is the TYPE_INTERNAL entry which is present at the start of every directory.


Notes:

  This process is not as bad as it sounds, since whole blocks of entries can
  be skipped by looking at the names of the first/last entries. Indeed, these
  values can be treated as a kind of index once a number of directory blocks
  have been cached in memory.

  Inserting entries in alphabetical order (or reverse) need also not be slow,
  since it makes sense to look at the most-recently-referenced directory
  block - presumably still buffered in memory - first.


4.2  Inserting a new directory entry

  a) Locate the point of insertion.

  b) If there is not enough spare space in the block, go to step (f).

 [Enough spare space]
  c) If there is not enough unused space, compact the block.

 [Enough unused space]
  d) Add the entry to the end of the "entries" area, thus reducing the size
     of the "unused" area.

  e) Link it into the list of active entries after the point of insertion.


 [Insufficient spare space]
  f) Can entries be moved from the current block to the next/previous block
     to create sufficient spare space? (The new entry is included in the list
     as this calculation is done). If not, go to step (h).

  g) Move entries from one block to another until sufficient space has been
     created; it may be necessary to compact one or both of the blocks during
     this process. Can now create the new entry (in one or other of the
     blocks) as in (d) and (e).


 [Insufficient space in adjacent blocks]
  h) Allocate and insert a new directory block before/after the current one,
     and go to (f).


Notes:

  The strategy used to allocate a new directory block to a directory is the
  same as that used to allocate space for a new small file within that
  directory.

  Since we don't want the address of the directory to change, no new block
  is ever inserted *before* the first block.

  If new entries are inserted in alphabetical order, then each entry will be
  added at the end of the directory. Once the final block is full, a new one
  is allocated after it, and that one is filled up - so no unnecessary entry
  shuffling takes place.

  If new entries are inserted in reverse alphabetical order, then each entry
  will be inserted immediately following the TYPE_INTERNAL entry (containing
  zone details) in the first directory block. Once the first block is full,
  a new block is allocated after it, and sufficient entries are moved from
  the first to the second block to create space for the new entry, which can
  then be added to the first block. Thus creating a large number of entries
  in reverse alphabetical order will result in most entries being copied once
  from the first block to another one.

  In both cases above, the newly-filled directory will be compact (and so
  will have little wasted space).


4.3  Deleting a directory entry

The entry is removed from the active list, and is then marked as dead by
setting its next_log field to 1.

If it is now possible to combine this directory block with the next/previous
block, then entries are moved from one to the other and the original block
is freed.


Notes:

  This strategy ensures that two adjacent blocks will never have average
  occupancy of less than 50%.


4.4  Extending a directory entry

  a) Is there sufficient space in dead entries immediately following the
     directory entry? If so, use this.

 [Insufficient free space immediately following the entry]
  b) Treat this as the insertion of an entry whose size is equal to the
     difference between the original entry and the new one.


4.5  Truncating a directory entry

Replace the space released at the end of the entry by a TYPE_SPARE entry.


4.6  Compaction

  a) Scan the entries in physical order (using the type_len field) to
     identify the blocks of active entries that must be moved, and by how
     far each block must be moved, eg:

          block start   block end    distance

               0           80            0
             100          140           20
             200          250           80

     [The distance can be derived by accumulating the inter-block gaps, but
      it is convenient to hold it explicitly.]

  b) Move each block as indicated by the table.

  c) Scan the entries in physical order, and update the next_log field of
     each one according to the table: for example, if its value lies between
     200 and 250, subtract 80 from it.


Notes:

  This algorithm can be modified to compact part of a block at a time; this
  might be appropriate if the number of gaps is very large. After each
  partial compaction all active entries must be scanned in logical order,
  updating the next_log pointers where necessary "on-the-fly".
