Do you recognise these images? There's a good chance you've seen some of them before now, probably several times. Here's a hint: They all depict the same thing.
They all look pretty similar - so why the size difference?
A quick intro to PNG
The basic PNG file is comprised of recurring chunks. Each chunk is comprised of four parts:
- Length of the data block (four bytes)
- Chunk type (four bytes)
- CRC (four bytes)
Here's the hex content of the smallest PNG shown above - the 103 byte file:
89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52 | .PNG........IHDR 00 00 01 00 00 00 01 00 01 03 00 00 00 66 BC 3A | .............f¼: 25 00 00 00 03 50 4C 54 45 B5 D0 D0 63 04 16 EA | %....PLTEµÐÐc..ê 00 00 00 1F 49 44 41 54 68 81 ED C1 01 0D 00 00 | ....IDATh.íÁ.... 00 C2 A0 F7 4F 6D 0E 37 A0 00 00 00 00 00 00 00 | .Â ÷Om.7 ....... 00 BE 0D 21 00 00 01 9A 60 E1 D5 00 00 00 00 49 | .¾.!....`áÕ....I 45 4E 44 AE 42 60 82 | END®B`.
The IHDR section - further specification here - marks this as a 256x256 pixel image (00 00 01 00 00 00 01 00), 1 bit depth (01), with each pixel represented as an index into a palette (03). The image uses the default deflate compression (00), basic filtering (00) and no interlacing (00).
The PLTE section is mandatory for paleted images (PNG also supports true color images, where it isn't required) - it can contain up to 256 entries, but this one only contains a single entry - B5 D0 D0 - as only a single color is needed. This is also an example of a "truncated palette" - the one-bit depth selected in the IHDR section allows 2 colors, but as only one color is used, only one is included in the palette.
The IDAT section is the actual image data. It's 31 bytes long, compressed with the DEFLATE algorithm. If you inflate the data in this example, you get 8448 bytes of zeros. Why 8448? Well, 8192 bytes correspond to the pixels at one bit per pixel ( 256*256/8 = 8192 ) and 256 correspond to one byte for each line of the image, specifying a filter to apply. In our case, the filter byte is always 0 which corresponds to no filtering being performed.
The IEND section marks the end of the image.
Why the differences in size between the images above?
|103 bytes||Palettised, 1 bit per pixel, 3 byte PLTE, 31 byte IDAT|
|156 bytes||Palettised, 8 bits per pixel, 3 byte PLTE, 84 byte IDAT|
|178 bytes||Palettised, 8 bits per pixel, 3 byte PLTE, 85 byte IDAT, pHYs section (physical pixel dimensions)|
|379 bytes||Palettised, 8 bits per pixel, 3 byte PLTE, 307 byte IDAT (The IDAT has the same all-zeros content as the two images previously - just a less effective compression algorithm?)|
|921 bytes||Palettised, 8 bits per pixel, 768 byte PLTE (with padding in all but the first three bytes), 84 byte IDAT|
|1,189 bytes||Palettised, 8 bits per pixel, 768 byte PLTE (with padding in all but the first three bytes), 84 byte IDAT, 256 byte tRNS section (transparency data for palette entries, padding in all but the first byte)|
(The last one is actually served by redirecting to a cached resource, so it doesn't work out that bad compared to the others)
Why would it take 84 bytes to represent data that was all zeros? Haven't these people heard of run length encoding?
The deflate algorithm is used to compress the IDAT section of PNGs, and it does support run length encoding - but the run length is represented as an 8-bit huffman-encoded number; the theoretical maximumn compression ratio for deflate is about 1030.3 : 1
We don't achieve that theoretical maximum in practice (there are fixed overheads for things like defining the dictionary). The compression ratios we see include 264.2:1 for the 103 byte file and 780.1:1 for the 156 byte file.
So who needs these micro-optimisations?
Here's the 103 byte image in a more familiar context:
Map tiles representing the sea!
Slippy maps traditionally serve fixed size 'map tiles' in a grid at a given zoom level - for example http://b.tile.openstreetmap.org/6/29/20.png would be zoom level 6, the 29th column, and the 20th row; http://a.tile.openstreetmap.org/19/516664/319949.png would be zoom level 19, column 516664, row 319949. The higher the zoom level, the more tiles are needed to show the entire planet - and there's a whole lot of sea within that data. Depending on the size of your monitor, you might have downloaded 10 or more all-sea tiles just to render that map above, and that's before you've zoomed in or scrolled around.
Servers could send a redirect to a cached resource - some maps do exactly that! But when you can send a tile in just 103 bytes, sending the tile is as simple as sending the redirect.
|156 bytes||Bing Maps||Example|
|178 bytes||Google Maps (classic)||Example|
|379 bytes||Skobbler Maps||Example|
|921 bytes||TomTom||Example (actually a redirect - so the size of the initial image doesn't matter that much)|
|1,189 bytes||here.com (Navteq)||Example|
Got any other tricks up your sleeve?
When you make a very small image it's actually more efficient to use a true color image instead of a palettised image, as each extra segment in the PNG file adds 12 bytes of length, header and CRC information. But that benefit quickly diminishes as image size increases; converting the 103 byte file to true color increases its size to 567 bytes.
Some slippy maps use larger tiles; a 512px tile, taking up 126 bytes, can replace four 256px tiles taking up 412 bytes between them as well as overhead for http headers. But in areas where tiles show more data, you'll use relatively more bandwidth on things outside the viewport, which won't be shown unless the user scrolls. Some versions of Google Maps use 512px tiles, as does here.com.
TomTom doesn't serve the 921 byte tile repeatedly - instead users get a HTTP 302 redirect with header "Location: http://s3.amazonaws.com/mascoma-renderer-static-san-prod/rialto/water.png" which is an 84-byte header - but it means users only have to retrieve the generic water tile once; thereafter it's served from the browser cache. The gains can be even greater if you want something more complicated than a single-color image!
Of course, no matter how small you get your PNG you'll still have to send 400+ bytes of HTTP headers in each direction!
You can make a PNG pretty small if you want to, and different software and settings can produce files that vary in size quite a bit.
If you're doing something that involves moving lots of small PNGs around, a hex editor is all you need to check you're using the minimum number of bits per pixel, your palette is truncated, and you don't have any data segments you don't absolutely need.
The compression in PNG files isn't as good as the source coding theorem would hope for, and large blocks of identical data don't compress down to nothing as completely as you might hope for.