Storage Media Output Format

General

This document describes the media format written by the Storage daemon. The Storage daemon reads and writes in units of blocks. Blocks contain records. Each block has a block header followed by records, and each record has a record header followed by record data.

This chapter is intended to be a technical discussion of the Media Format.

Take note that the type char [n] almost always means that the value is a variable sized null terminated string of length at most n - 1.

Definitions

Block

A block represents the primitive unit of information that the Storage daemon reads and writes to a physical device. Normally, for a tape device, it will be the same as a tape block. The Storage daemon always reads and writes blocks. A block consists of block header information followed by records. Clients of the Storage daemon (the File daemon) normally never see blocks. However, some of the Storage tools (bls, bscan, bextract, …) may be use block header information.

Record

A record consists of a Record Header, which is managed by the Storage daemon and Record Data, which is the data received from the Client. A record is the primitive unit of information sent to and from the Storage daemon by the Client (File daemon) programs. The details are described below.

JobId

A number assigned by the Director daemon for a particular job. This number will be unique for that particular Director (Catalog). The daemons use this number to keep track of individual jobs. Within the Storage daemon, the JobId may not be unique if several Directors are accessing the Storage daemon simultaneously.

Session

A Session is a concept used in the Storage daemon corresponds one to one to a Job with the exception that each session is uniquely identified within the Storage daemon by a unique VolSessionId/VolSessionTime pair.

VolSessionId

A unique sequential number assigned by the Storage daemon to a particular session (Job) it is having with a File daemon. This number is sequential since the start of the daemon. This number by itself is not unique to the given Volume, but with the VolSessionTime, it is unique.

VolSessionTime

A unique number assigned by the Storage daemon to a particular Storage daemon execution. It is actually the Unix time_t value of when the Storage daemon began execution cast to a 32 bit unsigned integer. The combination of the VolSessionId and the VolSessionTime for a given Storage daemon is guaranteed to be unique for each Job (or session).

File Index

A sequential number beginning at one assigned by the File daemon to the files within a job that are sent to the Storage daemon for backup. The Storage daemon ensures that this number is greater than zero and sequential. Note, the Storage daemon uses negative FileIndexes to flag Session Start and End labels as well as End of Volume Labels. Thus, the combination of VolSessionId, VolSessionTime and FileIndex uniquely identifies the records for a single file written to a Volume.

Stream

While writing the information for any particular file to the Volume, there can be any number of distinct pieces of information about that file, e.g. the attributes, the file data, … The Stream indicates what piece of data it is, and it is an arbitrary number assigned by the File daemon to the parts (Unix attributes, Win32 attributes, data, compressed data, …) of a file that are sent to the Storage daemon. The Storage daemon has no knowledge of the details of a Stream; it simply represents a numbered stream of bytes. The data for a given stream may be passed to the Storage daemon in a single or multiple records.

Block Header

A block header consists of a block identification (“BB02”), a block length in bytes (typically 64,512) a checksum, and sequential block number. Each block starts with a Block Header and is followed by Records. Current block headers also contain the VolSessionId and VolSessionTime for the records written to that block.

Record Header

A record header contains the VolSessionId, the VolSessionTime, the FileIndex, the Stream type and the size of the data record which follows. The Record Header is always immediately followed by a Data Record if the size given in the Header is greater than zero.

Note, for Block headers of level BB02 (Bacula >= 1.27 and Bareos), the Record header as written to tape does not contain the Volume Session Id and the Volume Session Time as these two fields are stored in the BB02 Block Header. The in-memory record header does have those fields for convenience.

Data Record

A data record consists of a binary Stream of bytes and is always preceded by a Record Header. The details of the meaning of the binary stream of bytes are unknown to the Storage daemon, but the Client programs (File daemon) defines and thus knows the details of each record type.

Label

Volume, Start Of Session and End Of Session are special records that are used as Labels.

Volume Label

A label placed by the Storage daemon at the beginning of each storage volume. It contains general information about the volume. It is written in record format. The Storage daemon manages Volume Labels, and if the client wants, he may also read them.

Start Of Session Record

The Start Of Session (SOS) record is a special record placed by the Storage daemon on the storage medium as the first record of an append session job with a File daemon. This record is useful for finding the beginning of a particular session (Job), since no records with the same VolSessionId and VolSessionTime will precede this record. This record is not normally visible outside of the Storage daemon. The Begin Session Label is similar to the Volume Label except that it contains additional information pertaining to the Session.

End Of Session Record

The End Of Session (EOS) Record is a special record placed by the Storage daemon on the storage medium as the last record of an append session job with a File daemon. The End Session Record is distinguished by a FileIndex with a value of minus two (-2). This record is useful for detecting the end of a particular session since no records with the same VolSessionId and VolSessionTime will follow this record. This record is not normally visible outside of the Storage daemon. The End Session Label is similar to the Volume Label except that it contains additional information pertaining to the Session.

Overall Format

A Bareos output file consists of Blocks of data. Each block contains a block header followed by records. Each record consists of a record header followed by the record data. The first record on a tape will always be the Volume Label Record.

No Record Header will be split across Bareos blocks. However, Record Data may be split across any number of Bareos blocks. Obviously this will not be the case for the Volume Label which will always be smaller than the Bareos Block size.

To simplify reading tapes, the Start of Session (SOS) and End of Session (EOS) records are never split across blocks. If this is about to happen, Bareos will write a short block before writing the session record (actually, the SOS record should always be the first record in a block, excepting perhaps the Volume label).

Due to hardware limitations, the last block written to the tape may not be fully written. If your drive permits backspace record, Bareos will backup over the last record written on the tape, re-read it and verify that it was correctly written.

When a new tape is mounted Bareos will write the full contents of the partially written block to the new tape ensuring that there is no loss of data. When reading a tape, Bareos will discard any block that is not totally written, thus ensuring that there is no duplication of data. In addition, since Bareos blocks are sequentially numbered within a Job, it is easy to ensure that no block is missing or duplicated.

Storage Daemon File Output Format

The file storage and tape storage formats are identical except that tape records are by default blocked into blocks of 64,512 bytes, except for the last block, which is the actual number of bytes written rounded up to a multiple of 1024 whereas the last record of file storage is not rounded up. Each Session written to tape is terminated with an End of File mark (this will be removed later). Sessions written to file are simply appended to the end of the file.

Serialization

All Block Headers, Record Headers and Label Records are written using Bareos’s serialization routines. These routines guarantee that the data is written to the output volume in a machine independent format.

Block Header

The current Block Header version is BB02. (The prior version BB01 is unsupported.)

Each session or Job use their own private blocks.

The format of a Block Header is:

uint32_t CheckSum;                /* Block check sum */
uint32_t BlockSize;               /* Block byte size including the header */
uint32_t BlockNumber;             /* Block number */
char ID[4] = "BB02";              /* Identification and block level; not null terminated */
uint32_t VolSessionId;            /* Session Id for Job */
uint32_t VolSessionTime;          /* Session Time for Job */

The Block Header is a fixed length and fixed format.

The CheckSum field is a 32 bit checksum of the block data and the block header but not including the CheckSum field.

The Block Header is always immediately followed by a Record Header. If the tape is damaged, a Bareos utility will be able to recover as much information as possible from the tape by recovering blocks which are valid. The Block header is written using the Bareos serialization routines and thus is guaranteed to be in machine independent format.

Record Header

Each binary data record is preceded by a Record Header. The Record Header is fixed length and fixed format, whereas the binary data record is of variable length. The Record Header is written using the Bareos serialization routines and thus is guaranteed to be in machine independent format.

The format of the Record Header is:

int32_t FileIndex;   /* File index supplied by File daemon */
int32_t Stream;      /* Stream number supplied by File daemon */
uint32_t DataSize;   /* size of following data record in bytes */

This version 2 Record Header is written to the medium when using Version BB02 Block Headers.

This record is followed by the binary Stream data of DataSize bytes, followed by another Record Header record and the binary stream data. For the definitive definition of this record, see record.h in the src/stored directory.

Additional notes on the above:

FileIndex

is a sequential file number within a job. The Storage daemon requires this index to be greater than zero and sequential. Note, however, that the File daemon may send multiple Streams for the same FileIndex. In addition, the Storage daemon uses negative FileIndices to hold the Begin Session Label, the End Session Label, and the End of Volume Label.

Stream

is defined by the File daemon and is used to identify separate parts of the data saved for each file (Unix attributes, Win32 attributes, file data, compressed file data, sparse file data, …). The Storage Daemon has no idea of what a Stream is or what it contains except that the Stream is required to be a positive integer. Negative Stream numbers are used internally by the Storage daemon to indicate that the record is a continuation of the previous record (the previous record would not entirely fit in the block).

For Start Session and End Session Labels (where the FileIndex is negative), the Storage daemon uses the Stream field to contain the JobId.

The current stream definitions are:

#define STREAM_UNIX_ATTRIBUTES    1    /* Generic Unix attributes */
#define STREAM_FILE_DATA          2    /* Standard uncompressed data */
#define STREAM_MD5_SIGNATURE      3    /* MD5 signature for the file */
#define STREAM_GZIP_DATA          4    /* GZip compressed file data */
/* Extended Unix attributes with Win32 Extended data.  Deprecated. */
#define STREAM_UNIX_ATTRIBUTES_EX 5    /* Extended Unix attr for Win32 EX */
#define STREAM_SPARSE_DATA        6    /* Sparse data stream */
#define STREAM_SPARSE_GZIP_DATA   7
#define STREAM_PROGRAM_NAMES      8    /* program names for program data */
#define STREAM_PROGRAM_DATA       9    /* Data needing program */
#define STREAM_SHA1_SIGNATURE    10    /* SHA1 signature for the file */
#define STREAM_WIN32_DATA        11    /* Win32 BackupRead data */
#define STREAM_WIN32_GZIP_DATA   12    /* Gzipped Win32 BackupRead data */
#define STREAM_MACOS_FORK_DATA   13    /* Mac resource fork */
#define STREAM_HFSPLUS_ATTRIBUTES 14   /* Mac OS extra attributes */
#define STREAM_UNIX_ATTRIBUTES_ACCESS_ACL 15 /* Standard ACL attributes on UNIX */
#define STREAM_UNIX_ATTRIBUTES_DEFAULT_ACL 16 /* Default ACL attributes on UNIX */
DataSize

is the size in bytes of the binary data record that follows the Session Record header. The Storage Daemon has no idea of the actual contents of the binary data record. For standard Unix files, the data record typically contains the file attributes or the file data. For a sparse file the first 64 bits of the file data contains the storage address for the data block.

The Record Header is never split across two blocks. If there is not enough room in a block for the full Record Header, the block is padded to the end with zeros and the Record Header begins in the next block. The data record, on the other hand, may be split across multiple blocks and even multiple physical volumes. When a data record is split, the second (and possibly subsequent) piece of the data is preceded by a new Record Header. In this case the first record header has DataSize equal to total size of the data records whereas each other record header has DataSize equal to the size of their actual data record. Thus each piece of data is always immediately preceded by a Record Header. When reading a record, if Bareos finds only part of the data in the first record, it will automatically read the next record and concatenate the data record to form a full data record.

Volume Label Format

Tape volume labels are created by the Storage daemon in response to a label command given to the Console program, or alternatively by the btape program. Each volume is labeled with the following information using the Bareos serialization routines, which guarantee machine byte order independence.

For Bareos versions 12.4 and later, the Volume Label Format is:

char Id[32] = "Bareos 2.0 Immortal\n"; /* Identification */
uint32_t VerNum;          /* Label version number; = 20 since Bareos 12.4 */
btime_t   label_btime;    /* Time/date tape labeled */
btime_t   write_btime;    /* Time/date tape first written */
float64_t write_date;     /* Always 0 */
float64_t write_time;     /* Always 0 */
char VolName[128];        /* Volume name */
char PrevVolName[128];    /* Previous Volume Name */
char PoolName[128];       /* Pool name */
char PoolType[128];       /* Pool type */
char MediaType[128];      /* Type of this media */
char HostName[128];       /* Host name of writing computer */
char LabelProg[32];       /* Label program name */
char ProgVersion[32];     /* Program version */
char ProgDate[32];        /* Program build date/time */

Note, the LabelType (Volume Label, Volume PreLabel, Session Start Label, …) is stored in the record FileIndex field of the Record Header and does not appear in the data part of the record.

Session Label

The Session Label is written at the beginning and end of each session as well as the last record on the physical medium. It has the following binary format:

char Id[32];              /* Bareos Immortal ... */
uint32_t VerNum;          /* Label version number */
uint32_t JobId;           /* Job id */
btime_t   write_btime;    /* time/date record written */
float64_t write_time;     /* Always 0 */
char PoolName[128];       /* Pool name */
char PoolType[128];       /* Pool type */
char JobName[128];        /* base Job name */
char ClientName[128];
char Job[128];            /* Unique Job name */
char FileSetName[128];    /* FileSet name */
uint32_t JobType;
uint32_t JobLevel;
char FileSetChecksum[128]

In addition, the EOS label contains:

/* The remainder are part of EOS label only */
uint32_t JobFiles;
uint64_t JobBytes;
uint32_t start_block;
uint32_t end_block;
uint32_t start_file;
uint32_t end_file;
uint32_t JobErrors;
uint32_t JobStatus          /* Job termination code */

Note, the LabelType (Volume Label, Volume PreLabel, Session Start Label, …) is stored in the record FileIndex field and does not appear in the data part of the record. Also, the Stream field of the Record Header contains the JobId. This permits quick filtering without actually reading all the session data in many cases.

Overall Storage Format

               Bareos Tape Format
               28 September 2002
A Bareos tape is composed of tape Blocks.  Each block
  has a Block header followed by the block data. Block
  Data consists of Records. Records consist of Record
  Headers followed by Record Data.
:=======================================================:
|                                                       |
|                 Block Header (24 bytes)               |
|                                                       |
|-------------------------------------------------------|
|                                                       |
|              Record Header (12 bytes)                 |
|                                                       |
|-------------------------------------------------------|
|                                                       |
|                  Record Data                          |
|                                                       |
|-------------------------------------------------------|
|                                                       |
|              Record Header (12 bytes)                 |
|                                                       |
|-------------------------------------------------------|
|                                                       |
|                       ...                             |
Block Header: the first item in each block. The format is
  shown below.
Partial Data block: occurs if the data from a previous
  block spills over to this block (the normal case except
  for the first block on a tape). However, this partial
      data block is always preceded by a record header.
Record Header: identifies the FileIndex, the Stream
  and the following Record Data size. See below for format.
Record data: arbitrary binary data.
                 Block Header Format BB02
:=======================================================:
|              CheckSum         (uint32_t)              |
|-------------------------------------------------------|
|              BlockSize        (uint32_t)              |
|-------------------------------------------------------|
|              BlockNumber      (uint32_t)              |
|-------------------------------------------------------|
|              "BB02"           (char [4])              |
|-------------------------------------------------------|
|              VolSessionId     (uint32_t)              |
|-------------------------------------------------------|
|              VolSessionTime   (uint32_t)              |
:=======================================================:
BB02: Serves to identify the block as a
  Bareos block and also serves as a block format identifier
  should we ever need to change the format.
BlockSize: is the size in bytes of the block. When reading
  back a block, if the BlockSize does not agree with the
  actual size read, Bareos discards the block.
CheckSum: a checksum for the Block.
BlockNumber: is the sequential block number on the tape.
VolSessionId: a unique sequential number that is assigned
              by the Storage Daemon to a particular Job.
              This number is sequential since the start
              of execution of the daemon.
VolSessionTime: the time/date that the current execution
              of the Storage Daemon started.  It assures
              that the combination of VolSessionId and
              VolSessionTime is unique for all jobs
              written to the tape, even if there was a
              machine crash between two writes.
               Record Header Format BB02
:=======================================================:
|              FileIndex        (int32_t)               |
|-------------------------------------------------------|
|              Stream           (int32_t)               |
|-------------------------------------------------------|
|              DataSize         (uint32_t)              |
:=======================================================:
FileIndex: a sequential file number within a job.  The
              Storage daemon enforces this index to be
              greater than zero and sequential.  Note,
              however, that the File daemon may send
              multiple Streams for the same FileIndex.
              The Storage Daemon uses negative FileIndices
              to identify Session Start and End labels
              as well as the End of Volume labels.
Stream: defined by the File daemon and is intended to be
              used to identify separate parts of the data
              saved for each file (attributes, file data,
              ...).  The Storage Daemon has no idea of
              what a Stream is or what it contains.
DataSize: the size in bytes of the binary data record
              that follows the Session Record header.
              The Storage Daemon has no idea of the
              actual contents of the binary data record.
              For standard Unix files, the data record
              typically contains the file attributes or
              the file data.  For a sparse file
              the first 64 bits of the data contains
              the storage address for the data block.
                    Volume Label
:=======================================================:
|              Id               (32 bytes)              |
|-------------------------------------------------------|
|              VerNum           (uint32_t)              |
|-------------------------------------------------------|
|              label_btime      (btime_t)               |
|-------------------------------------------------------|
|              write_btime      (btime_t)               |
|-------------------------------------------------------|
|                  0            (float64_t)             |
|-------------------------------------------------------|
|                  0            (float64_t)             |
|-------------------------------------------------------|
|              VolName          (128 bytes)             |
|-------------------------------------------------------|
|              PrevVolName      (128 bytes)             |
|-------------------------------------------------------|
|              PoolName         (128 bytes)             |
|-------------------------------------------------------|
|              PoolType         (128 bytes)             |
|-------------------------------------------------------|
|              MediaType        (128 bytes)             |
|-------------------------------------------------------|
|              HostName         (128 bytes)             |
|-------------------------------------------------------|
|              LabelProg        (32 bytes)              |
|-------------------------------------------------------|
|              ProgVersion      (32 bytes)              |
|-------------------------------------------------------|
|              ProgDate         (32 bytes)              |
:=======================================================:

Id: 32 byte identifier "Bareos 2.0 immortal\n"
LabelType (Saved in the FileIndex of the Header record).
    PRE_LABEL -1    Volume label on unwritten tape (deprecated :sinceVersion:`23.1.0: worm support`)
    VOL_LABEL -2    Volume label after tape written
    EOM_LABEL -3    Label at EOM (not currently implemented)
    SOS_LABEL -4    Start of Session label (format given below)
    EOS_LABEL -5    End of Session label (format given below)
VerNum: 20
label_btime: Bareos time/date tape labeled
write_btime: Bareos time/date tape first used (data written)
VolName: "Physical" Volume name
PrevVolName: The VolName of the previous tape (if this tape is
             a continuation of the previous one).
PoolName: Pool Name
PoolType: Pool Type
MediaType: Media Type
HostName: Name of host that is first writing the tape
LabelProg: Name of the program that labeled the tape
ProgVersion: Version of the label program
ProgDate: Date Label program built
                    Session Label
:=======================================================:
|              Id               (32 bytes)              |
|-------------------------------------------------------|
|              VerNum           (uint32_t)              |
|-------------------------------------------------------|
|              JobId            (uint32_t)              |
|-------------------------------------------------------|
|              write_btime      (btime_t)               |
|-------------------------------------------------------|
|                 0             (float64_t)             |
|-------------------------------------------------------|
|              PoolName         (128 bytes)             |
|-------------------------------------------------------|
|              PoolType         (128 bytes)             |
|-------------------------------------------------------|
|              JobName          (128 bytes)             |
|-------------------------------------------------------|
|              ClientName       (128 bytes)             |
|-------------------------------------------------------|
|              Job              (128 bytes)             |
|-------------------------------------------------------|
|              FileSetName      (128 bytes)             |
|-------------------------------------------------------|
|              JobType          (uint32_t)              |
|-------------------------------------------------------|
|              JobLevel         (uint32_t)              |
|-------------------------------------------------------|
|              FileSetMD5       (128 bytes)             |
|-------------------------------------------------------|
        Additional fields in End Of Session Label
|-------------------------------------------------------|
|              JobFiles         (uint32_t)              |
|-------------------------------------------------------|
|              JobBytes         (uint32_t)              |
|-------------------------------------------------------|
|              start_block      (uint32_t)              |
|-------------------------------------------------------|
|              end_block        (uint32_t)              |
|-------------------------------------------------------|
|              start_file       (uint32_t)              |
|-------------------------------------------------------|
|              end_file         (uint32_t)              |
|-------------------------------------------------------|
|              JobErrors        (uint32_t)              |
|-------------------------------------------------------|
|              JobStatus        (uint32_t)              |
:=======================================================:
Id: 32 byte identifier "Bareos 2.0 immortal\n"
LabelType (in FileIndex field of Header):
    EOM_LABEL -3     Label at EOM
    SOS_LABEL -4     Start of Session label
    EOS_LABEL -5     End of Session label
VerNum: 20
JobId: JobId
write_btime: Bareos time/date this tape record written
PoolName: Pool Name
PoolType: Pool Type
MediaType: Media Type
ClientName: Name of File daemon or Client writing this session
            Not used for EOM_LABEL.

Examine Volumes

bls command

To get these information from actual volumes (disk or tape volumes), the bls command can be used.

  • bls <StorageName> -V <VolumeName>

    • shows general volume information, jobs and files in these jobs

  • bls <StorageName> -V <VolumeName> -v

    • shows general volume, block and detailed record information. As files are stored in record, also all files are listed, together with information about sparse, compression, encryption, …

  • bls <StorageName> -V <VolumeName> -k -vv

    • shows block and record information. Opposite to the commands before, it also shows all parts of records splitted by block boundaries.

Unix File Attributes

The Unix File Attributes packet consists of the following:

FileIndex Type Filename@FileAttributes@Link @ExtendedAttributes@

where

@

represents a byte containing a binary zero.

FileIndex

is the sequential file index starting from one assigned by the File daemon.

Type

is one of the following:

#define FT_LNKSAVED   1    /* hard link to file already saved */
#define FT_REGE       2    /* Regular file but empty */
#define FT_REG        3    /* Regular file */
#define FT_LNK        4    /* Soft Link */
#define FT_DIR        5    /* Directory */
#define FT_SPEC       6    /* Special file -- chr, blk, fifo, sock */
#define FT_NOACCESS   7    /* Not able to access */
#define FT_NOFOLLOW   8    /* Could not follow link */
#define FT_NOSTAT     9    /* Could not stat file */
#define FT_NOCHG     10    /* Incremental option, file not changed */
#define FT_DIRNOCHG  11    /* Incremental option, directory not changed */
#define FT_ISARCH    12    /* Trying to save archive file */
#define FT_NORECURSE 13    /* No recursion into directory */
#define FT_NOFSCHG   14    /* Different file system, prohibited */
#define FT_NOOPEN    15    /* Could not open directory */
#define FT_RAW       16    /* Raw block device */
#define FT_FIFO      17    /* Raw fifo device */
Filename

is the fully qualified filename.

FileAttributes

consists of the 13 fields of the stat() buffer in ASCII base64 format separated by spaces. These fields and their meanings are shown below. This stat() packet is in Unix format, and MUST be provided (constructed) for ALL systems.

Link

when the FT code is FT_LNK or FT_LNKSAVED, the item in question is a Unix link, and this field contains the fully qualified link name. When the FT code is not FT_LNK or FT_LNKSAVED, this field is null.

ExtendedAttributes

The exact format of this field is operating system dependent. It contains additional or extended attributes of a system dependent nature. Currently, this field is used only on WIN32 systems where it contains a ASCII base64 representation of the WIN32_FILE_ATTRIBUTE_DATA structure as defined by Windows. The fields in the base64 representation of this structure are like the File-Attributes separated by spaces.

The File-attributes consist of the following:

Stat Name

Unix

Windows

MacOS

st_de v

Device number of filesystem

Drive number

vRefNum

st_in o

Inode number

Always 0

fileID/dirID

st_mo de

File mode

File mode

777 dirs/apps; 666 docs; 444 locked docs

st_nl ink

Number of links to the file

Number of link (only on NTFS)

Always 1

st_ui d

Owner ID

Always 0

Always 0

st_gi d

Group ID

Always 0

Always 0

st_rd ev

Device ID for special files

Drive No.

Always 0

st_si ze

File size in bytes

File size in bytes

Data fork file size in bytes

st_bl ksize

Preferred block size

Always 0

Preferred block size

st_bl ocks

Number of blocks allocated

Always 0

Number of blocks allocated

st_at ime

Last access time since epoch

Last access time since epoch

Last access time -66 years

st_mt ime

Last modify time since epoch

Last modify time since epoch

Last access time -66 years

st_ct ime

Inode change time since epoch

File create time since epoch

File create time -66 years