Here is a code I created long time ago for PPC target (there is still some asm statements that might work, although I have not tested them recently).
It should be pretty simple to re-use it on any other target that can do 16×16 multiplications.
The main loop looks like this :
For each group of 8x8 block(MCU)
process MCU
write outputbuffer
Code is around 15K in standard C, and results in a static executable file of around the same size (14440 on my machine).
In the example, the input file is ‘image.y’ (Netpbm PGM “rawbits” image data , 1280×1024).
Output file is ‘testout.jpg’.
The jpeg header & related quantification tables are all fixed and pre-calculated (in included files).
‘image.y’ was created using this original file :
Content of the tgz file :
total 1388
-rwxrwxr-x 1 jmichel jmichel 8004 Nov 20 16:40 dct1d.c
-rwxrwxr-x 1 jmichel jmichel 14440 Nov 20 16:55 fjpeg
-rwxrwxr-x 1 jmichel jmichel 3216 Nov 20 16:41 fjpeg.c
-rwxrwxr-x 1 jmichel jmichel 2606 Nov 20 16:40 huffAC.h
-rwxrwxr-x 1 jmichel jmichel 238 Nov 20 16:40 huffDC.h
-rwxrwxr-x 1 jmichel jmichel 1310737 Nov 20 16:41 image.y
-rwxrwxr-x 1 jmichel jmichel 1714 Nov 20 16:40 jpegheader.h
-rwxrwxr-x 1 jmichel jmichel 9069 Nov 20 16:47 makequant
-rwxrwxr-x 1 jmichel jmichel 879 Nov 20 16:45 makequant.c
-rwxr-xr-x 1 jmichel jmichel 88 Nov 20 16:47 prepare_quant
-rwxrwxr-x 1 jmichel jmichel 269 Nov 20 16:47 quant.c
-rwxrwxr-x 1 jmichel jmichel 204 Nov 20 16:44 quant.txt
-rw-rw-r-- 1 jmichel jmichel 36239 Nov 20 16:41 testout.jpg