The RLE acronym stands for Run Length Encoding. It is a form of compression/encoding, and is perhaps the simplest and most basic of such schemes available. The general way in which it works is to take a stream of bytes and look for sections in that stream where identical bytes (in our case pixels) repeat in runs. This will typically occur in a visual image where the majority of the image is a background colour, which is then divided up by lines and forms of other colours. When thinking about this, it is clear that the scheme is fairly limited. A horizontal line of a uniform colour, would represent a good candidate for encoding as a run, since the line will cause a sequence of identical bytes in the binary image. On the other hand a vertical line will not compress very well since a trace of it's manifestation appears once on every line, and therefore does not constitute a "run". So then, when a run is found, the length of the run is determined, and the run is then represented in the output by data that indicates the value and length of the run. That, is run length encoding.
The RLE library functions have been divided into two discrete groups. One deals with RLE8 the other RLE4. The two formats are essentially the the same, with the exception that the RLE format attempts to seek runs within nibbles (four bit words), or bytes, depending on the request. The two halves of the library are essentially the same, with the exception that type changes, and minor differences to the logic, allow the two formats to be read accurately. Had the library been written for C++, it is almost certain that the common functionality would have been placed in a base class. Subclasses could then have been written to reflect the slight differences in format. As it is the rest of the library is written as C code, so this code follows that convention. The code could have been written to mirror the notional C++ architecture, but that was deemed unworthwhile.
The implementation of the Microsoft bitmap RLE format, involves some careful consideration. Such considerations are largely to do with the way they define the standard. In our discussion of this scheme we assume the perspective of the RLE8 decompression machine.
The RLE data stream takes the form of a pipe. When we read data from the pipe, we automatically increment a pointer to the data in the pipe, such that subsequent reads extract subsequent data. When writing data to uncompressed output image, we will already have determined the overall size of the image, and allocated memory for it. By default when we write data to the output, it is written to the next pixel location in the image. We assume that the next pixel is to the right of the current one, and at the end of the line, the next pixel is the first on the next line.
As a result of reading the data we view the machine as being in one of three states;
IDLE
ESCAPED
ENCODED
Before reading any data, we assume the IDLE state. In this IDLE state, we read a byte from the pipe, and this byte tells us which of the other states to enter. If the byte is zero, then we enter the ESCAPED state, otherwise we enter the ENCODED state.
In the ESCAPED state the next byte read from the pipe may either be a command or a count. There are three commands;
0x00 - End of line - indicates that there is no more data available for the current line and that the next pixel should be placed in the output image at the beginning of the next line.
0x01 - End of bitmap - indicates that there is no more data available for the output image and that the next pixel should be considered to be the first in the image.
0x02 - Delta - indicates that the next pixel is at a given offset from the current one. Another two bytes are read from the input to determine the x and y sizes of this offset. The offsets are added to the current output pixel position.
If the command code is none of these three then the command code is considered to be a count instead. The count represents the number of bytes to read from the input that should be repeated as pixels on the output. Microsoft refer to this condition as "Absolute Mode". This is a mode where uncompressed data is retrieved from the input pipe.
After either a command or an absolute mode block in the ESCAPED state has been processed the machine returns to the IDLE state.
In the ENCODED state we read a byte from the input, and this is considered to be a count. We read another, and it is considered to be a pixel value. We add this single pixel value to the next output pixel, and repeat as indicated by the count. Microsoft refer to this condition as "Encoded Mode". This is a mode where compressed data is retrieved from the input. Once this action is complete the machine returns to the IDLE state.
The following table encapulates these considerations;
IDLE State
Byte 0
Byte 1
Byte 2
Byte 3
Byte n
0x00 ESCAPE State
0x00 End of line (Command Mode)
(invalid)
0x01 End of bitmap (Command Mode)
(invalid)
0x02 Delta (Command Mode)
X Offset
Y Offset
(invalid)
0x03 - 0xFF "Absolute Mode"
Pixel 0 (direct transfer)
Pixel 1 (direct transfer)
Pixel n-2 (direct transfer)
0x00 - 0xFF ENCODED State "Encoded Mode"
Pixel (repeat replicate)
(invalid)
In the best tradition of home car mechanics, compression is the reverse of expansion! There is, however, one notable point, that the table above makes clearer. When compressing bitmaps to RLE, the circumstance may arise that it is necassary to emit one, or two bytes that are not compressed. The single byte case is the clue, because a single byte repeat in encoded mode is the same as the single byte transfer we may wish to make in absolute mode. Sending the one byte in encoded mode is more efficient than sending one byte in absolute mode, were that available. Clearly if it is possible to send one absolute byte in encoded mode, then sending two is also possible.
The final observation, is that two different encoders could produce two different but completely valid RLE data files, from the same source bitmap.
We leave the detail of the encoder for the reader to estabish, the sources are zipped up under the download page on the left. Our encoder is not as smart at optimisation as some. Both the encoder and decoder have been tested, against itself and external readers and writers. Given the simplicity of this standard, compression, even on images which one would not normally expect dramatic benefits, show surprising results.