the notes of ZIP workshop 1-3

introduction

this entry follows http://d.hatena.ne.jp/takahirox/20110628/1309267617

what i did

i implemented CRC 32 calculation logic into archive_zip i made at last entry.

the summary of CRC32

  • it is the error detecting code
    • not perfect logic
      • the target is limited like inverted several bits due to network error
      • it can't detect an intentional alteration
    • fast
  • the key point of the calculation
    • handling data as numerical number
    • the CRC32 value is the reminder of division by certain constant
    • using modulo2 because division is heavy
      • modulo2 ignores carry bits
      • modulo2 calculation is the same as XOR. so modulo2 division is much lighter than original division
    • the flow is
      • first, pretreating a dividend(original data)
      • second, doing modulo2 division
      • finally, aftertreating remainder. and identifying it as CRC32 value
    • we can gather the part of the calculating process due to the commutative low of XOR
    • in most case, first of all calculating remainder of all 8bits patterns and save them into table
modulo2 division example

         10011
     ---------
1011) 10101001
      1011      <-- key point
      --------        doing calculation
         1100         if the highest bits are both 1
         1011         even divisor is greater than dividend.
         -----
          1111
          1011
          ----
           100  <-- remainder

you can see the detail at the host blog. http://d.hatena.ne.jp/n7shi/20110529

source code

you can see the source code implemented into archive_zip.c at google code.

uint32_t table[ 256 ] ;

void init_table( ) {

  int i, j ;

  for( i = 0; i < 256; i++ ) {
    uint32_t a = i ;
    for( j = 0; j <= 7; j++ ) {
      uint32_t old_a = a ;
      a >>= 1 ;
      if( old_a & 1 ) a ^= 0xedb88320 ;
    }
    table[i] = a ;
  }

}

uint32_t crc32( FILE *fp ) {

  uint32_t a = ~0 ;
  int c ;

  fseek( fp, 0, SEEK_SET ) ;
  while( ( c = getc( fp ) ) != EOF ) {
    a ^= ( uint8_t ) c ;
    a = ( a >> 8 ) ^ table[ a & 0xff ] ;
  }
  return ~a ;

}

... snip ...

    zip_headers[ i ].crc32              = crc32( fp ) ;

it is the almost same as the sample code :)

i just edited to use File *fp on crc32( ).

others

it was so boring, that's because i just copied it from beautiful sample code :), that i translated it into python.

class Homu:

  table = [0] * 256

  def init_table( self ):
    for i in xrange( 256 ):
      a = i
      for j in xrange( 8 ):
        old_a = a
        a >>= 1
        if old_a & 1:
          a ^= 0xedb88320
      self.table[i] = a
    return

  def get_crc32( self, str ):
    a = 0xffffffff
    for s in str.encode( 'utf-8' ):
      a = a ^ ord( s ) & 0xffffffff
      a = ( a >> 8 ) ^ self.table[ a & 0xff ]
    return a ^ 0xffffffff

and i made the bot @crc32hoge replies CRC32 value with python and GAE.

i'm going to introduce it another entry in detail.

conclusion

finally i implemented calculating CRC32 logic into archive_zip.c. so the figure is no longer displayed.

i'm going to implement compression logic into archive_zip.c at last. i'll really do it :)