==========================================================
==========================================================
TLC: Tecella Lossless Compression Format
File Format Specification V1
Copyright Tecella LLC 2010
www.tecella.com
==========================================================
==========================================================

Release Notes:

6/1/2009 - V1 Draft
	* Added a new differential encoding scheme that trades a small percent of
		file size for lower cpu usage and parallelizability of encoding/decoding.
9/27/2008 - V0 Draft
	*Supports a simple differential encoding scheme.


================================================================================
0) Introduction
================================================================================
Features:
	File size can be reduced by 50-60% for 16-bit samples. 
		Even higher % savings for for 15-bit samples or lower.
	Uses lossless compression schemes.
		Easily extendable to add new schemes.
		Switching schemes has low file-size overhead.
	Relatively low computational complexity.
		Good for high channel counts.
	Large file support. 64-bit addresses.
	Quick Random Access.
		Samples are partitioned into frames that can be accessed in constant time.
	Possible file recovery if file does not end properly.
		For example, in the event of a power outage or computer lockup.

Drawbacks:
	Higher CPU usage than raw data.
	Access to individual samples is not as quick as raw data.
		All samples within a frame (B-Frame) must be decoded to get the last sample of that frame.
	Large files (>4GB) may not be readable on some systems.

Recomendations:
	Noise cannot be compressed well, so reduce noise as much as possible.  For example, using a good amplifier or reducing bits per sample helps.


---------------------------------------
0.1) General High-Level File Format
---------------------------------------
The file data is arranged in the following order:

*Identifying ASCII String "TLCave" - To make sure we are decoding a valid file.
*File Header - Contains details about the file.  (Sample period, number of channels, etc.)
*Channel Headers - Contains info for each of the channels. (Label, scale, etc.)
*Sequence of A-Frames - A single A-Frame represents a block of samples for all channels.
*A-Frame Index - File pointers to each A-Frame, for quick random access.
*Identifying ASCII String "TLCave" - To make sure the file was closed properly.


Sections 1.x explain the first level of the file heirarchy.
Sections 2.x explain the A-Frame.
Sections 3.x explain the B-Frame and C-Frame.
Sections 4.x explain the different encoding schemes used for C-Frames.
Sections 5.x talk about possible future features and encoding schemes.


================================================================================
1.1) File Header
================================================================================
Note: Multi-byte integer fields are encoded in little endian, with the least significant byte first.

Bytes	Description

9	Identifying ASCII String: "TLCave"
		History note: The id was "TCWave" when the format's extension was TCW for Tecella Compressed Waveform.  Replacing "TCW" with "TLC" gives us "TLCave".

1	TLC Version Number. 8-bit unsigned integer.
	0x00: Beta file format.
	0x01: Indicates file was encoded using the v1 file format.
	0x02: Indicates file was encoded using this specificaion.
	[0x03,0xFF]: Reserved for future versions.

2	Number of channels. 16-bit unsigned integer.
	0x0000 - 1 channel
	0x0001 - 2 channels
	...

1	SamplePeriod/SampleRate Flag:
	0x0: Next value is a SamplePeriod (seconds)
	0x1: Next value is a SampleRate (Hz)

8	SamplePeriod/SampleRate Value.  64-bit IEEE float.

1	Bits per sample.  8-bit unsigned integer.

8	Total number of samples per channel.  64-bit unsigned integer.

4	Number of samples per A-Frame.  32-bit unsigned integer.
	32 bits is overkill, but 16 bits might be too small.

8	Byte offset of the A-Frame Index.  64-bit unsigned integer.
	When writing, should be kept zero until the file is complete.
	When reading, if zero the A-Frame index must be re-built for random access.

4	Size of user defined metadata in bytes.  32-bit unsigned int.

??	Metadata.  Size depends on the previous field.


---------------------------------------
1.2) Per-Channel Headers
---------------------------------------
The per-channel headers are variable length and are arranged in channel order.

The following sequence is repeated for each channel:

Bytes	Descritpion
8	Scale factor used to convert signed integer samples to the proper units.
		64-bit IEEE floating point value.
1	Channel label length.  8-bit unsigned integer. 0 implies no label.
??	Channel label. Size depends on previous feild.


---------------------------------------
1.3) Sequence of A-Frames
---------------------------------------
All A-Frames represent the same number of samples, as specified in the file header.  The exception being the last A-Frame, which contains the leftover samples.

A-Frames are arranged in sequential order, back-to-back.

A-Frame 0
...
A-Frame M

See sections 2.x for a more detailed description of A-Frames.


---------------------------------------
1.4) A-Frame Index
---------------------------------------
Bytes	Description
4	Number of A-Frame offsets in this index.  32-bit unsigned integer.
8	File byte offset of A-Frame #0. 64-bit unsigned integer.
8	File byte offset of A-Frame #1. 64-bit unsigned integer.
...


================================================================================
2) A-Frames
================================================================================
A-Frames contain samples for all the channels within a block of time.
A-Frames have the following structure:

*B-Frame Index (see 2.1)
*Ch0 B-Frame0
...
*ChN B-Frame0

See sections 3.x for a more detailed description of B-Frames.

---------------------------------------
2.1) B-Frame Index
---------------------------------------
Note: All B-Frame offsets are relative to the start of the parent A-Frame offset.  This allows us to get away with 32-bit B-Frame offsets as compared to 64-bit A-Frame offsets.

Bytes	Descritpion
4	Offset of B-Frame for channel 2. (Channel 1 can already be infered)
...
4	Offset of B-Frame for channel N.
4	Offset of next A-Frame (necessary to rebuild frame index, without any sample decoding)


================================================================================
3) B-Frame
================================================================================
The B-Frame represents a stream of samples for a single channel.

The B-Frame is split into C-Frames of 1025 samples each.
If we let N be the number of C-Frames, the format of a B-Frame is as follows:

Bytes	Description
2*N	N 16-bit sizes.  One for each C-Frame.
??	The sequence of N C-Frames.  See section 4 for detail on each C-Frame.


================================================================================
4) C-Frames
================================================================================
Each C-Frame represents 1025 samples.  Possibly less if it's the last C-frame in a B-Frame.

The first sample is uncompressed, and the remaining 1024 samples are converted into deltas and are split into 128 blocks of 8 deltas each.  The 8 deltas in a block are encoded using the same number of bits (usually the minimum number of bits necessary for the largest of the 8 deltas.)

There is a header that encodes how many bytes are used for each block of 8 deltas.  Since it's likely that neighboring blocks are the same size, a run length encoding scheme is used for the header.

The C-Frame has the following format:
Bytes	Description
2	First sample uncompressed.

1	R = Total number of run lengths. (0 means 1, 1 means 2, etc...).
2*R	1 byte for run length.  (0 means 1, 1 means 2, etc...).
	1 byte for bits per delta.  (0 = 0 bits).
	Repeat R times.

??	Encoded deltas, tightly packed.


Note that the choice of 8 deltas per block ensures everything starts and ends on byte boundaries, which is not the case for TLCv0.


================================================================================
5) The Future
================================================================================
The following are features that may or may not make it into future versions of the format.

---------------------------------------
5.1 Future Features
---------------------------------------
*Store the min/max values of each B-Frame to allow quicker searches for interesting parts of a file.
*Implement some kind of marker system.
*Mip-map the samples to allow quick navigation at various "zoom levels".

---------------------------------------
5.2 Future encoding schemes
---------------------------------------
*Run length encoding.  If there is little to no noise, this can result in very large compression ratios.
