==========================================================
==========================================================
TLC: Tecella Lossless Compression Format
File Format Specification V0
Copyright Tecella LLC 2008
www.tecella.com
==========================================================
==========================================================

Release Notes:

9/27/2008 - V0 Draft
	*Supports a simple differential encoding scheme.


================================================================================
0) Introduction
================================================================================
Features:
	File size can be reduced by 50-60% for 16-bit samples. 
		Even higher % savings for for 15-bit samples or lower.
	Uses lossless compression schemes.
		Easily extendable to add new schemes.
		Switching schemes has low file-size overhead.
	Relatively low computational complexity.
		Good for high channel counts.
	Large file support. 64-bit addresses.
	Quick Random Access.
		Samples are partitioned into frames that can be accessed in constant time.
	Possible file recovery if file does not end properly.
		For example, in the event of a power outage or computer lockup.

Drawbacks:
	Higher CPU usage than raw data.
	Access to individual samples is not as quick as raw data.
		All samples within a frame (B-Frame) must be decoded to get the last sample of that frame.
	Large files (>4GB) may not be readable on some systems.

Recomendations:
	Noise cannot be compressed well, so reduce noise as much as possible.  For example, using a good amplifier or reducing bits per sample helps.


---------------------------------------
0.1) General High-Level File Format
---------------------------------------
The file data is arranged in the following order:

*Identifying ASCII String "TLCave" - To make sure we are decoding a valid file.
*File Header - Contains details about the file.  (Sample period, number of channels, etc.)
*Channel Headers - Contains info for each of the channels. (Label, scale, etc.)
*Sequence of A-Frames - A single A-Frame encodes all samples of all channels within a constant time period.
	*B-Frame Index - Contains offsets for each B-Frame.
	*Sequence of B-Frames - One for each channel.  Same time constant time period as parent A-Frame.
		*First sample - Uncompressed.
		*Sequence of C-Frames - of variable time period and of variable compression schemes.
*A-Frame Index - File pointers to each A-Frame, for quick random access.
*Identifying ASCII String "TLCave" - To make sure the file was closed properly.

Sections 1.x explain the first level of the file heirarchy.
Sections 2.x explain the A-Frame.
Sections 3.x explain the B-Frame and C-Frame.
Sections 4.x explain the different encoding schemes used for C-Frames.
Sections 5.x talk about possible future features and encoding schemes.


================================================================================
1.1) File Header
================================================================================
Note: Multi-byte integer fields are encoded in little endian, with the least significant byte first.

Bytes	Description

9	Identifying ASCII String: "TLCave"
		History note: The id was "TCWave" when the format's extension was TCW for Tecella Compressed Waveform.  Replacing "TCW" with "TLC" gives us "TLCave".

1	TLC Version Number. 8-bit unsigned integer.
	0x00: Beta file format.
	0x01: Indicates file was encoded using this specification.
	[0x02,0xFF]: Reserved for future versions.

2	Number of channels. 16-bit unsigned integer.
	0x0000 - 1 channel
	0x0001 - 2 channels
	...

1	0=SamplePeriod, 1=SampleRate

8	SamplePeriod/SampleRate.  64-bit IEEE float.

1	Bits per sample.  8-bit unsigned integer.

8	Total number of samples per channel.  64-bit unsigned integer.

4	Number of samples per A-Frame.  32-bit unsigned integer.
	32 bits is overkill, but 16 bits might be too small.

8	Byte offset of the A-Frame Index.  64-bit unsigned integer.
	When writing, should be kept zero until the file is complete.
	When reading, if zero the A-Frame index must be re-built for random access.

4	Size of user defined metadata in bytes.  32-bit unsigned int.

??	Metadata.  Size depends on the previous field.


---------------------------------------
1.2) Per-Channel Headers
---------------------------------------
The per-channel headers are variable length and are arranged in channel order.

The following sequence is repeated for each channel:

Bytes	Descritpion
8	Scale factor used to convert signed integer samples to the proper units.
		64-bit IEEE floating point value.
1	Channel label length.  8-bit unsigned integer. 0 implies no label.
??	Channel label. Size depends on previous feild.


---------------------------------------
1.3) Sequence of A-Frames
---------------------------------------
All A-Frames represent the same number of samples, as specified in the file header.  The exception being the last A-Frame, which contains the leftover samples.

A-Frames are arranged in sequential order, back-to-back.

A-Frame 0
...
A-Frame M

See sections 2.x for a more detailed description of A-Frames.


---------------------------------------
1.4) A-Frame Index
---------------------------------------
Bytes	Description
4	Number of A-Frame offsets in this index.  32-bit unsigned integer.
8	File byte offset of A-Frame #0. 64-bit unsigned integer.
8	File byte offset of A-Frame #1. 64-bit unsigned integer.
...


================================================================================
2) The A-Frame
================================================================================
A-Frames contain samples for all the channels within a block of time.
A-Frames have the following structure:

*B-Frame Index
*Ch0 B-Frame0
...
*ChN B-Frame0

See sections 3.x for a more detailed description of B-Frames.

---------------------------------------
2.1) B-Frame Index
---------------------------------------
Note: All B-Frame offsets are relative to the start of the parent A-Frame offset.  This allows us to get away with 32-bit B-Frame offsets as compared to 64-bit A-Frame offsets.

Bytes	Descritpion
4	Offset of B-Frame for channel 2. (Channel 1 can already be infered)
...
4	Offset of B-Frame for channel N.
4	Offset of next A-Frame (necessary to rebuild frame index, without any sample decoding)


================================================================================
3) The B-Frame and C-Frame
================================================================================

B-Frames contain a fixed number of sampes encoded using various schemes.  C-Frames, however, can have a variable number of samples and are encoded using a single scheme.  This allows the encoder to partition B-Frames into C-Frames wherever is most efficient, while still allowing for quick random access to B-Frames.

In order to help facilitate certain encoding schemes, it's assumed the previous sample is known.  For this reason, B-Frames consist of the first sample uncompressed, followed by a sequence of C-Frames.

The B-Frame and C-Frame are a streams of tightly packed bits that are not byte aligned (except when noted), and have the following bit structure.

Bits	Description
X	First sample of the frame, uncompressed where X=bits per sample
4	Encoding method of the next C-Frame
??	C-Frame
4	Encoding method of the next C-Frame
??	C-Frame
...
?	Pad of 0's to the next nearest byte.

The stream goes on until it encodes the number of expected samples in the B-Frame.

These are the  C-Frame types currently supported:

Value	Description
0	Simple differential reduced bit width.
[1,F]	Reserved


================================================================================
4) The C-Frame Types
================================================================================
All C-Frame types can know the value of the sample prior to the first sample of the C-Frame.
All C-Frame types must have its own method of determing the end of its data. 

---------------------------------------
4.1) Differential Reduced Bit Width
---------------------------------------
Encode the difference between successive samples using tightly packed constant N-bit integers.  The last sample is NOT padded with 0's to the next nearest byte boundary.

Bits	Description
4	N = Number of bits per diff. (Vary size of this field depending on bits per sample?)
2	Diff count size.  00=>4 bits; 01=>8 bits; 10=>12 bits; 11=>16 bits;
?	Diff count. Actual width depends on previous two bits.
N	First difference.
N	Second difference.
...


================================================================================
5) The Future
================================================================================
The following are features that may or may not make it into future versions of the format.

---------------------------------------
5.1 Future Features
---------------------------------------
*Store the min/max values of each B-Frame to allow quicker searches for interesting parts of a file.
*Implement some kind of marker system.
*Mip-map the samples to allow quick navigation at various "zoom levels".

---------------------------------------
5.2 Future encoding schemes
---------------------------------------
*Run length encoding.  If there is little to no noise, this can result in very large compression ratios.
