i made a bot replies CRC32 with GAE datastore and python

introduction

i translated CRC32 calculation logic from C into pytohn, as you saw at the last entry.

and i made a bot replies CRC32 using it with GAE datastore. i'll introduce it.

sample

@crc32hoge

you get a CRC32 reply if you tweet to it. japanese language is handled as utf-8.

like this,


and, you can recognize that the result is the same as the C version of CRC32 calculation logic.

% cat main.c

#include <stdio.h>
#include <stdint.h>

uint32_t table[256] ;

void init_table( ) {
  int i, j ;
  for( i = 0; i < 256; i++ ) {
    uint32_t a = i ;
    for( j = 0; j < 8; j++ ) {
      uint32_t old_a = a ;
      a >>= 1 ;
      if( old_a & 1 )
        a ^= 0xedb88320 ;
    }
    table[ i ] = a ;
  }
}

uint32_t crc32( uint8_t *buf, int len ) {
  uint32_t a =~ 0 ;
  int i, j ;
  for( i = 0; i < len; i++ ) {
    a ^= buf[ i ] ;
    a = ( a >> 8 ) ^ table[ a & 0xff ] ;
  }
  return ~a ;
}


int main( int argc, char **argv ) {

  unsigned int length = 0 ;
  char *p ;

  if( argc != 2 ) {
    printf( "usage : %s <strings>\n", argv[ 0 ] ) ;
    return -1 ;
  }

  // i'm not sure how to get the length of argv[1] :P
  p = argv[ 1 ] ;
  while( *p ) {
    p++ ;
    length++ ;
  }

  init_table( ) ;
  printf( "%x\n", crc32( argv[ 1 ], length ) ) ; 
  return 0 ;
}

% ./main abcde
8587d865

source code

you can get the source code at googlecode.


one of the key points is to use XOR '^', not NOT '~', to make inverted bits. that's because the specification of NOT '~' of python represents ~x = -(x+1). otherwise, you need to use ~a & 0xffffffff.

and to encode into utf-8 is important. in python, multi byte strings are typically handled as Unicode type. (i know it similar to utf-16, but i'm not sure it's certainly the same as utf-16.) so, you need to encode strings into utf-8.

otherwise, it doesn't calculate correctly in case that data is mixed in multi byte strings and single byte strings. if you don't want to encode strings into utf-8 and calculate CRC32 value as utf-16, you need to handle single byte strings as multi byte strings that have higher byte '0x00' in that case.

  def get_crc32( self, str ):
    a = 0xffffffff
    for s in str.encode( 'utf-8' ):
      a = a ^ ord( s ) & 0xffffffff
      a = ( a >> 8 ) ^ self.table[ a & 0xff ]
    return a ^ 0xffffffff

the datastore of GAE

this bot does reply. so it need to remember tweets to which already it has replied.

on GAE, we need to use datastore to save something data. we can't use file on it.

how to use the datastore of GAE

we don't need to do configuration in advance.

just

  • use web app framework(or just import db??)
  • insert class specifies data structure into a source code.
from google.appengine.ext import webapp
from google.appengine.ext.webapp import util
from google.appengine.ext import db

... snip...

class Replied( db.Model ):
  id = db.IntegerProperty( required=True )

... snip ...

in this case, i defined Replied class to remember tweets to which already the bot replied.

i defined only 'id' as IntegerProperty and i made only one id exist in datastore. that's because tweet id is a serial number. it's enough to remember a tweet id to which the bot replied last, not need to remember all tweets to which the bot has replied.

you can see the saved data on your GAE account.

you can get the detail of the GAE datastore at google.

conclusion

i made the bot replies CRC32 value. and also i tried to use GAE datastore.

the fact i learned how to use GAE datastore will help me to make various types of bot, i believe.