1193: Externalities

This forum is for the individual discussion thread that goes with each new comic.

Moderators: Moderators General, Prelates, Magistrates

User avatar
edo
Posts: 436
Joined: Thu Mar 28, 2013 7:05 pm UTC
Location: ~TrApPeD iN mY PhOnE~

Re: 1193: Externalities

Postby edo » Fri Apr 05, 2013 10:11 pm UTC

If I did the math right, if he used a four common word (2048 word list) password, at 1e6 hashes per second, it can be cracked in 5 days. There are permutations (spaces, capitalization) IdonTthink heMixed thEse, ( :D ) so it only adds a few bits, and if each person takes a permutation, we could get back to the 5 days to try all of them...
Co-proprietor of a Mome and Pope Shope

User avatar
pitareio
Posts: 128
Joined: Wed Sep 19, 2012 7:03 pm UTC
Location: the land of smelly cheese

Re: 1193: Externalities

Postby pitareio » Fri Apr 05, 2013 10:35 pm UTC

speising wrote:btw, does anybody know why this comic's font doesn't render correctly in IE10? it just shows the text in times new roman.
i thought ie10 was supposed to be standards compliant.


Same with Firefox if you use http://www.xkcd.com/1193/, but it renders ok if you use http://xkcd.com/1193/ .

User avatar
seraku
Posts: 10
Joined: Tue Apr 02, 2013 10:10 pm UTC

Re: 1193: Externalities

Postby seraku » Sat Apr 06, 2013 5:56 am UTC

pitareio wrote:
speising wrote:btw, does anybody know why this comic's font doesn't render correctly in IE10? it just shows the text in times new roman.
i thought ie10 was supposed to be standards compliant.


Same with Firefox if you use http://www.xkcd.com/1193/, but it renders ok if you use http://xkcd.com/1193/ .

In IE10, you can bring up the Developer Tools (F12) and look at the console output when loading the page. In particular, the following error is logged when loading the font:

Code: Select all

CSS3114: @font-face failed OpenType embedding permission check. Permission must be Installable.
xkcd-Regular.otf

I'm not an expert on web fonts, but it looks like IE may be enforcing a stricter policy here.

User avatar
Flumble
Yes Man
Posts: 2265
Joined: Sun Aug 05, 2012 9:35 pm UTC

Re: 1193: Externalities

Postby Flumble » Sat Apr 06, 2013 8:49 pm UTC

seraku wrote:...

I'm not an expert on web fonts, but it looks like IE may be enforcing a stricter policy here.

A very strange and mystical policy concerning distribution rights by that. (source)

trakof
Posts: 12
Joined: Sat Jul 17, 2010 8:33 pm UTC

Re: 1193: Externalities

Postby trakof » Sat Apr 06, 2013 10:10 pm UTC

Here's an updated version of my OpenCL code, it's a decent speed increase, better random and variable length inputs to hash.

kernel.txt
Spoiler:

Code: Select all

//const short modifier_words = 2;
//const short state_words = 16;
//const short state_bytes = 128;
//const short state_bits = 1024;
//const short block_bytes = 128;

typedef unsigned char u08b_t;   
typedef unsigned long u64b_t; 

typedef struct
{
   uint  hashBitLen;
   uint  bCnt;
   u64b_t  T[2];   
} Skein_Ctxt_Hdr_t;


typedef struct
{
   Skein_Ctxt_Hdr_t h;
   u64b_t  X[16];
    u08b_t  b[128];   
} Skein1024_Ctxt_t;

#define SKEIN_T1_BIT(BIT)       ((BIT) - 64)            /* offset 64 because it's the second word  */
                               
#define SKEIN_T1_POS_TREE_LVL   SKEIN_T1_BIT(112)       /* bits 112..118: level in hash tree       */
#define SKEIN_T1_POS_BIT_PAD    SKEIN_T1_BIT(119)       /* bit  119     : partial final input byte */
#define SKEIN_T1_POS_BLK_TYPE   SKEIN_T1_BIT(120)       /* bits 120..125: type field               */
#define SKEIN_T1_POS_FIRST      SKEIN_T1_BIT(126)       /* bits 126     : first block flag         */
#define SKEIN_T1_POS_FINAL      SKEIN_T1_BIT(127)       /* bit  127     : final block flag         */
                               
/* twerk word T[1]: flag bit definition(s) */
#define SKEIN_T1_FLAG_FIRST     (((u64b_t)  1 ) << SKEIN_T1_POS_FIRST)
#define SKEIN_T1_FLAG_FINAL     (((u64b_t)  1 ) << SKEIN_T1_POS_FINAL)
#define SKEIN_T1_FLAG_BIT_PAD   (((u64b_t)  1 ) << SKEIN_T1_POS_BIT_PAD)
                               
/* twerk word T[1]: tree level bit field mask */
#define SKEIN_T1_TREE_LVL_MASK  (((u64b_t)0x7F) << SKEIN_T1_POS_TREE_LVL)
#define SKEIN_T1_TREE_LEVEL(n)  (((u64b_t) (n)) << SKEIN_T1_POS_TREE_LVL)

/* twerk word T[1]: block type field */
#define SKEIN_BLK_TYPE_KEY      ( 0)                    /* key, for MAC and KDF */
#define SKEIN_BLK_TYPE_CFG      ( 4)                    /* configuration block */
#define SKEIN_BLK_TYPE_PERS     ( 8)                    /* personalization string */
#define SKEIN_BLK_TYPE_PK       (12)                    /* public key (for digital signature hashing) */
#define SKEIN_BLK_TYPE_KDF      (16)                    /* key identifier for KDF */
#define SKEIN_BLK_TYPE_NONCE    (20)                    /* nonce for PRNG */
#define SKEIN_BLK_TYPE_MSG      (48)                    /* message processing */
#define SKEIN_BLK_TYPE_OUT      (63)                    /* output stage */
#define SKEIN_BLK_TYPE_MASK     (63)                    /* bit field mask */

#define SKEIN_T1_BLK_TYPE(T)   (((u64b_t) (SKEIN_BLK_TYPE_##T)) << SKEIN_T1_POS_BLK_TYPE)
#define SKEIN_T1_BLK_TYPE_KEY   SKEIN_T1_BLK_TYPE(KEY)  /* key, for MAC and KDF */
#define SKEIN_T1_BLK_TYPE_CFG   SKEIN_T1_BLK_TYPE(CFG)  /* configuration block */
#define SKEIN_T1_BLK_TYPE_PERS  SKEIN_T1_BLK_TYPE(PERS) /* personalization string */
#define SKEIN_T1_BLK_TYPE_PK    SKEIN_T1_BLK_TYPE(PK)   /* public key (for digital signature hashing) */
#define SKEIN_T1_BLK_TYPE_KDF   SKEIN_T1_BLK_TYPE(KDF)  /* key identifier for KDF */
#define SKEIN_T1_BLK_TYPE_NONCE SKEIN_T1_BLK_TYPE(NONCE)/* nonce for PRNG */
#define SKEIN_T1_BLK_TYPE_MSG   SKEIN_T1_BLK_TYPE(MSG)  /* message processing */
#define SKEIN_T1_BLK_TYPE_OUT   SKEIN_T1_BLK_TYPE(OUT)  /* output stage */
#define SKEIN_T1_BLK_TYPE_MASK  SKEIN_T1_BLK_TYPE(MASK) /* field bit mask */

#define SKEIN_T1_BLK_TYPE_CFG_FINAL       (SKEIN_T1_BLK_TYPE_CFG | SKEIN_T1_FLAG_FINAL)
#define SKEIN_T1_BLK_TYPE_OUT_FINAL       (SKEIN_T1_BLK_TYPE_OUT | SKEIN_T1_FLAG_FINAL)

#define SKEIN_VERSION           (1)

#ifndef SKEIN_ID_STRING_LE      /* allow compile-time personalization */
#define SKEIN_ID_STRING_LE      (0x33414853)            /* "SHA3" (little-endian)*/
#endif

#define SKEIN_MK_64(hi32,lo32)  ((lo32) + (((u64b_t) (hi32)) << 32))
#define SKEIN_SCHEMA_VER        SKEIN_MK_64(SKEIN_VERSION,SKEIN_ID_STRING_LE)
#define SKEIN_KS_PARITY         SKEIN_MK_64(0x1BD11BDA,0xA9FC1A22)

#define SKEIN_CFG_STR_LEN       (4*8)

/* bit field definitions in config block treeInfo word */
#define SKEIN_CFG_TREE_LEAF_SIZE_POS  ( 0)
#define SKEIN_CFG_TREE_NODE_SIZE_POS  ( 8)
#define SKEIN_CFG_TREE_MAX_LEVEL_POS  (16)

#define SKEIN_CFG_TREE_LEAF_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_LEAF_SIZE_POS)
#define SKEIN_CFG_TREE_NODE_SIZE_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_NODE_SIZE_POS)
#define SKEIN_CFG_TREE_MAX_LEVEL_MSK  (((u64b_t) 0xFF) << SKEIN_CFG_TREE_MAX_LEVEL_POS)

#define SKEIN_CFG_TREE_INFO(leaf,node,maxLvl)                   \
    ( (((u64b_t)(leaf  )) << SKEIN_CFG_TREE_LEAF_SIZE_POS) |    \
      (((u64b_t)(node  )) << SKEIN_CFG_TREE_NODE_SIZE_POS) |    \
      (((u64b_t)(maxLvl)) << SKEIN_CFG_TREE_MAX_LEVEL_POS) )

#define SKEIN_CFG_TREE_INFO_SEQUENTIAL SKEIN_CFG_TREE_INFO(0,0,0)

#define Skein_Start_New_Type(ctxPtr,BLK_TYPE) { Skein_Set_T0_T1(ctxPtr,0,SKEIN_T1_FLAG_FIRST | SKEIN_T1_BLK_TYPE_##BLK_TYPE); (ctxPtr)->h.bCnt=0; }
#define Skein_Get_Tweak(ctxPtr,TWK_NUM)         ((ctxPtr)->h.T[TWK_NUM])
#define Skein_Set_Tweak(ctxPtr,TWK_NUM,tVal)    {(ctxPtr)->h.T[TWK_NUM] = (tVal);}

#define Skein_Get_T0(ctxPtr)    Skein_Get_Tweak(ctxPtr,0)
#define Skein_Get_T1(ctxPtr)    Skein_Get_Tweak(ctxPtr,1)
#define Skein_Set_T0(ctxPtr,T0) Skein_Set_Tweak(ctxPtr,0,T0)
#define Skein_Set_T1(ctxPtr,T1) Skein_Set_Tweak(ctxPtr,1,T1)

/* set both twerk words at once */
#define Skein_Set_T0_T1(ctxPtr,T0,T1)           \
    {                                           \
    Skein_Set_T0(ctxPtr,(T0));                  \
    Skein_Set_T1(ctxPtr,(T1));                  \
    }

#define Skein_Set_Type(ctxPtr,BLK_TYPE)         \
    Skein_Set_T1(ctxPtr,SKEIN_T1_BLK_TYPE_##BLK_TYPE)
#define Skein_Clear_First_Flag(hdr)      { (hdr).T[1] &= ~SKEIN_T1_FLAG_FIRST;       }
#define Skein_Set_Bit_Pad_Flag(hdr)      { (hdr).T[1] |=  SKEIN_T1_FLAG_BIT_PAD;     }

#define Skein_Set_Tree_Level(hdr,height) { (hdr).T[1] |= SKEIN_T1_TREE_LEVEL(height);}

void Skein1024_Init(Skein1024_Ctxt_t* ctx, unsigned int hashBitLen)
{
   //union
   //{
   //   u08b_t b[128];
   //   u64b_t w[16];
   //}cfg;

   ctx->h.hashBitLen = hashBitLen;

    //case 1024: memcpy(ctx->X,SKEIN1024_IV_1024,sizeof(ctx->X)); break;
    //((lo32) + (((u64b_t) (hi32)) << 32))
    ctx->X[0] = (0x41E72355) + (((u64b_t)(0xD593DA07)) << 32);
    ctx->X[1] = (((u64b_t)(0x15B5E511)) << 32)+(0xAC73E00C);
    ctx->X[2] = (((u64b_t)(0x5180E5AE)) << 32)+(0xBAF2C4F0);
    ctx->X[3] = (((u64b_t)(0x03BD41D3)) << 32)+(0xFCBCAFAF);
    ctx->X[4] = (((u64b_t)(0x1CAEC6FD)) << 32)+(0x1983A898);
    ctx->X[5] = (((u64b_t)(0x6E510B8B)) << 32)+(0xCDD0589F);
    ctx->X[6] = (((u64b_t)(0x77E2BDFD)) << 32)+(0xC6394ADA);
    ctx->X[7] = (((u64b_t)(0xC11E1DB5)) << 32)+(0x24DCB0A3);
    ctx->X[8] = (((u64b_t)(0xD6D14AF9)) << 32)+(0xC6329AB5);
    ctx->X[9] = (((u64b_t)(0x6A9B0BFC)) << 32)+(0x6EB67E0D);
    ctx->X[10] = (((u64b_t)(0x9243C60D)) << 32)+(0xCCFF1332);
    ctx->X[11] = (((u64b_t)(0x1A1F1DDE)) << 32)+(0x743F02D4);
    ctx->X[12] = (((u64b_t)(0x0996753C)) << 32)+(0x10ED0BB8);
    ctx->X[13] = (((u64b_t)(0x6572DD22)) << 32)+(0xF2B4969A);
    ctx->X[14] = (((u64b_t)(0x61FD3062)) << 32)+(0xD00A579A);
    ctx->X[15] = (((u64b_t)(0x1DE0536E)) << 32)+(0x8682E539);

   Skein_Start_New_Type(ctx,MSG);
}

void* memcpy(void* dest, const void* src, uint count) {
    char* dst8 = (char*)dest;
    char* src8 = (char*)src;

    while (count--) {
        *dst8++ = *src8++;
    }
   return dest;
}

void *memset(char *s, char c, size_t n)
{
uint i;
for (i = 0; i < n; i++, s++)
{
*s = c;
}
return s;
}

#define BLK_BITS        (WCNT*64)               
#define KW_TWK_BASE     (0)
#define KW_KEY_BASE     (3)
#define ks              (kw + KW_KEY_BASE)               
#define ts              (kw + KW_TWK_BASE)
#define WCNT 16
#define RCNT 10
#define SKEIN_UNROLL_1024 1
#define RotL_64(x,N)    (((x) << (N)) | ((x) >> (64-(N))))
#define Skein_Put64_LSB_First(dst08,src64,bCnt) memcpy(dst08,src64,bCnt)
#define Skein_Get64_LSB_First(dst64,src08,wCnt) memcpy(dst64,src08,8*(wCnt))
enum   
    { 
    R1024_0_0=24, R1024_0_1=13, R1024_0_2= 8, R1024_0_3=47, R1024_0_4= 8, R1024_0_5=17, R1024_0_6=22, R1024_0_7=37,
    R1024_1_0=38, R1024_1_1=19, R1024_1_2=10, R1024_1_3=55, R1024_1_4=49, R1024_1_5=18, R1024_1_6=23, R1024_1_7=52,
    R1024_2_0=33, R1024_2_1= 4, R1024_2_2=51, R1024_2_3=13, R1024_2_4=34, R1024_2_5=41, R1024_2_6=59, R1024_2_7=17,
    R1024_3_0= 5, R1024_3_1=20, R1024_3_2=48, R1024_3_3=41, R1024_3_4=47, R1024_3_5=28, R1024_3_6=16, R1024_3_7=25,
    R1024_4_0=41, R1024_4_1= 9, R1024_4_2=37, R1024_4_3=31, R1024_4_4=12, R1024_4_5=47, R1024_4_6=44, R1024_4_7=30,
    R1024_5_0=16, R1024_5_1=34, R1024_5_2=56, R1024_5_3=51, R1024_5_4= 4, R1024_5_5=53, R1024_5_6=42, R1024_5_7=41,
    R1024_6_0=31, R1024_6_1=44, R1024_6_2=47, R1024_6_3=46, R1024_6_4=19, R1024_6_5=42, R1024_6_6=44, R1024_6_7=25,
    R1024_7_0= 9, R1024_7_1=48, R1024_7_2=35, R1024_7_3=52, R1024_7_4=23, R1024_7_5=31, R1024_7_6=37, R1024_7_7=20
    };

void Skein1024_Process_Block(Skein1024_Ctxt_t *ctx,const u08b_t *blkPtr,uint blkCnt,uint byteCntAdd)
{
   uint r;
   u64b_t kw[WCNT+4+RCNT*2];

    u64b_t  X00,X01,X02,X03,X04,X05,X06,X07,X08,X09,X10,X11,X12,X13,X14,X15;
    u64b_t  w [WCNT];   

   ts[0] = ctx->h.T[0];
    ts[1] = ctx->h.T[1];
   
   do  {
        /* this implementation only supports 2**64 input bytes (no carry out here) */
        ts[0] += byteCntAdd;                    /* update processed length */

        /* precompute the key schedule for this block */
        ks[ 0] = ctx->X[ 0];
        ks[ 1] = ctx->X[ 1];
        ks[ 2] = ctx->X[ 2];
        ks[ 3] = ctx->X[ 3];
        ks[ 4] = ctx->X[ 4];
        ks[ 5] = ctx->X[ 5];
        ks[ 6] = ctx->X[ 6];
        ks[ 7] = ctx->X[ 7];
        ks[ 8] = ctx->X[ 8];
        ks[ 9] = ctx->X[ 9];
        ks[10] = ctx->X[10];
        ks[11] = ctx->X[11];
        ks[12] = ctx->X[12];
        ks[13] = ctx->X[13];
        ks[14] = ctx->X[14];
        ks[15] = ctx->X[15];
      ks[16] = ks[ 0] ^ ks[ 1] ^ ks[ 2] ^ ks[ 3] ^
                 ks[ 4] ^ ks[ 5] ^ ks[ 6] ^ ks[ 7] ^
                 ks[ 8] ^ ks[ 9] ^ ks[10] ^ ks[11] ^
                 ks[12] ^ ks[13] ^ ks[14] ^ ks[15] ^ SKEIN_KS_PARITY;
      
      ts[2]  = ts[0] ^ ts[1];

        Skein_Get64_LSB_First(w,blkPtr,WCNT); /* get input block in little-endian format */////////////////////////////////////

        X00    = w[ 0] + ks[ 0];                 /* do the first full key injection */
        X01    = w[ 1] + ks[ 1];
        X02    = w[ 2] + ks[ 2];
        X03    = w[ 3] + ks[ 3];
        X04    = w[ 4] + ks[ 4];
        X05    = w[ 5] + ks[ 5];
        X06    = w[ 6] + ks[ 6];
        X07    = w[ 7] + ks[ 7];
        X08    = w[ 8] + ks[ 8];
        X09    = w[ 9] + ks[ 9];
        X10    = w[10] + ks[10];
        X11    = w[11] + ks[11];
        X12    = w[12] + ks[12];
        X13    = w[13] + ks[13] + ts[0];
        X14    = w[14] + ks[14] + ts[1];
        X15    = w[15] + ks[15];

      #define Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rNum) \
    X##p0 += X##p1; X##p1 = RotL_64(X##p1,ROT##_0); X##p1 ^= X##p0;   \
    X##p2 += X##p3; X##p3 = RotL_64(X##p3,ROT##_1); X##p3 ^= X##p2;   \
    X##p4 += X##p5; X##p5 = RotL_64(X##p5,ROT##_2); X##p5 ^= X##p4;   \
    X##p6 += X##p7; X##p7 = RotL_64(X##p7,ROT##_3); X##p7 ^= X##p6;   \
    X##p8 += X##p9; X##p9 = RotL_64(X##p9,ROT##_4); X##p9 ^= X##p8;   \
    X##pA += X##pB; X##pB = RotL_64(X##pB,ROT##_5); X##pB ^= X##pA;   \
    X##pC += X##pD; X##pD = RotL_64(X##pD,ROT##_6); X##pD ^= X##pC;   \
    X##pE += X##pF; X##pF = RotL_64(X##pF,ROT##_7); X##pF ^= X##pE;   \
   
   #define R1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn) \
      Round1024(p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,pA,pB,pC,pD,pE,pF,ROT,rn)

   #define I1024(R)                                                      \
    X00   += ks[r+(R)+ 0];    /* inject the key schedule value */     \
    X01   += ks[r+(R)+ 1];                                            \
    X02   += ks[r+(R)+ 2];                                            \
    X03   += ks[r+(R)+ 3];                                            \
    X04   += ks[r+(R)+ 4];                                            \
    X05   += ks[r+(R)+ 5];                                            \
    X06   += ks[r+(R)+ 6];                                            \
    X07   += ks[r+(R)+ 7];                                            \
    X08   += ks[r+(R)+ 8];                                            \
    X09   += ks[r+(R)+ 9];                                            \
    X10   += ks[r+(R)+10];                                            \
    X11   += ks[r+(R)+11];                                            \
    X12   += ks[r+(R)+12];                                            \
    X13   += ks[r+(R)+13] + ts[r+(R)+0];                              \
    X14   += ks[r+(R)+14] + ts[r+(R)+1];                              \
    X15   += ks[r+(R)+15] +    r+(R)   ;                              \
    ks[r  +       (R)+16] = ks[r+(R)-1];  /* rotate key schedule */   \
    ts[r  +       (R)+ 2] = ts[r+(R)-1];                             
   
   for (r=1;r <= 2*RCNT;r+=2*SKEIN_UNROLL_1024)
   {
      #define R1024_8_rounds(R) \
         R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_0,8*(R) + 1); \
         R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_1,8*(R) + 2); \
         R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_2,8*(R) + 3); \
         R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_3,8*(R) + 4); \
         I1024(2*(R));                                                             \
         R1024(00,01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,R1024_4,8*(R) + 5); \
         R1024(00,09,02,13,06,11,04,15,10,07,12,03,14,05,08,01,R1024_5,8*(R) + 6); \
         R1024(00,07,02,05,04,03,06,01,12,15,14,13,08,11,10,09,R1024_6,8*(R) + 7); \
         R1024(00,15,02,11,06,13,04,09,14,01,08,05,10,03,12,07,R1024_7,8*(R) + 8); \
         I1024(2*(R)+1);

        R1024_8_rounds(0);
   }
      ctx->X[ 0] = X00 ^ w[ 0];
        ctx->X[ 1] = X01 ^ w[ 1];
        ctx->X[ 2] = X02 ^ w[ 2];
        ctx->X[ 3] = X03 ^ w[ 3];
        ctx->X[ 4] = X04 ^ w[ 4];
        ctx->X[ 5] = X05 ^ w[ 5];
        ctx->X[ 6] = X06 ^ w[ 6];
        ctx->X[ 7] = X07 ^ w[ 7];
        ctx->X[ 8] = X08 ^ w[ 8];
        ctx->X[ 9] = X09 ^ w[ 9];
        ctx->X[10] = X10 ^ w[10];
        ctx->X[11] = X11 ^ w[11];
        ctx->X[12] = X12 ^ w[12];
        ctx->X[13] = X13 ^ w[13];
        ctx->X[14] = X14 ^ w[14];
        ctx->X[15] = X15 ^ w[15];

      ts[1] &= ~SKEIN_T1_FLAG_FIRST;
        blkPtr += 128;
    } while (--blkCnt);
    ctx->h.T[0] = ts[0];
    ctx->h.T[1] = ts[1];
}

void Skein1024_Update(Skein1024_Ctxt_t *ctx,  const u08b_t *msg, uint msgByteCnt)
{
   uint n;

    /* process full blocks, if any */
    if (msgByteCnt + ctx->h.bCnt > 128)
        {
        if (ctx->h.bCnt)                              /* finish up any buffered message data */
            {
            n = 128 - ctx->h.bCnt;  /* # bytes free in buffer b[] */
            if (n)
                {
                memcpy(&ctx->b[ctx->h.bCnt],msg,n);
                msgByteCnt  -= n;
                msg         += n;
                ctx->h.bCnt += n;
                }
            Skein1024_Process_Block(ctx,ctx->b,1,128);//////////////////////////////////////////////////////////
            ctx->h.bCnt = 0;
            }
        /* now process any remaining full blocks, directly from input message data */
        if (msgByteCnt > 128)
            {
            n = (msgByteCnt-1) / 128;   /* number of full blocks to process */
            Skein1024_Process_Block(ctx,msg,n,128);///////////////////////////////////////////////////////////
            msgByteCnt -= n * 128;
            msg        += n * 128;
            }
        }

    /* copy any remaining source message data bytes into b[] */
    if (msgByteCnt)
    {
    memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
    ctx->h.bCnt += msgByteCnt;
    }
}


void Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
{
   uint i,n,byteCnt;
   u64b_t X[16];

   ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;           
   if (ctx->h.bCnt < 128)   
   {
      memset(&ctx->b[ctx->h.bCnt],0,128 - ctx->h.bCnt);
   }     
   Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt); 
   
   byteCnt = (ctx->h.hashBitLen + 7) >> 3;   

   
   memset(ctx->b,0,sizeof(ctx->b));
   memcpy(X,ctx->X,sizeof(X));   
   for (i=0;i*128 < byteCnt;i++)
   {
      Skein_Start_New_Type(ctx,OUT_FINAL);
      Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t));
      n = byteCnt - i*128; 
      if (n >= 128)
      {
         n  = 128;
      }
      Skein_Put64_LSB_First(hashVal+i*128,ctx->X,n);

      //memcpy(ctx->X,X,sizeof(X));
      ctx->X[0] = X[0];
      ctx->X[1] = X[1];
      ctx->X[2] = X[2];
      ctx->X[3] = X[3];
      ctx->X[4] = X[4];
      ctx->X[5] = X[5];
      ctx->X[6] = X[6];
      ctx->X[7] = X[7];
      ctx->X[8] = X[8];
      ctx->X[9] = X[9];
      ctx->X[10] = X[10];
      ctx->X[11] = X[11];
      ctx->X[12] = X[12];
      ctx->X[13] = X[13];
      ctx->X[14] = X[14];
      ctx->X[15] = X[15];
   }
}

const int bits[256] = {
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8};

const char target[] = "5b4da95f5fa08280fc9879df44f418c8f9f12ba424b7757de02bbdfbae0d4c4fdf9317c80cc5fe04c6429073466cf29706b8c25999ddd2f6540d4475cc977b87f4757be023f19b8f4035d7722886b78869826de916a79cf9c94cc79cd4347d24b567aa3e2390a573a373a48a5e676640c79cc70197e1c5e7f902fb53ca1858b6";
const char goal[] = {0x5b, 0x4d, 0xa9, 0x5f, 0x5f, 0xa0, 0x82, 0x80, 0xfc, 0x98, 0x79, 0xdf, 0x44, 0xf4, 0x18, 0xc8, 0xf9, 0xf1, 0x2b, 0xa4, 0x24, 0xb7, 0x75, 0x7d, 0xe0, 0x2b, 0xbd, 0xfb, 0xae, 0x0d, 0x4c, 0x4f, 0xdf, 0x93, 0x17, 0xc8, 0x0c, 0xc5, 0xfe, 0x04, 0xc6, 0x42, 0x90, 0x73, 0x46, 0x6c, 0xf2, 0x97, 0x06, 0xb8, 0xc2, 0x59, 0x99, 0xdd, 0xd2, 0xf6, 0x54, 0x0d, 0x44, 0x75, 0xcc, 0x97, 0x7b, 0x87, 0xf4, 0x75, 0x7b, 0xe0, 0x23, 0xf1, 0x9b, 0x8f, 0x40, 0x35, 0xd7, 0x72, 0x28, 0x86, 0xb7, 0x88, 0x69, 0x82, 0x6d, 0xe9, 0x16, 0xa7, 0x9c, 0xf9, 0xc9, 0x4c, 0xc7, 0x9c, 0xd4, 0x34, 0x7d, 0x24, 0xb5, 0x67, 0xaa, 0x3e, 0x23, 0x90, 0xa5, 0x73, 0xa3, 0x73, 0xa4, 0x8a, 0x5e, 0x67, 0x66, 0x40, 0xc7, 0x9c, 0xc7, 0x01, 0x97, 0xe1, 0xc5, 0xe7, 0xf9, 0x02, 0xfb, 0x53, 0xca, 0x18, 0x58, 0xb6};

int countbits(char* c1, char* c2)
{
   int count = 0;
   uchar x;
   for(int iu=0;iu<128;++iu)
   {
      x = c1[iu] ^ c2[iu];
      count += bits[x];
   }
   return count;
}

__kernel void skein(__global char* input,
   __global int* scores,
   const unsigned int INPUT_SIZE)           
{                                       
    uint i = get_global_id(0);

   char word[64];
   int isi = (INPUT_SIZE * i);
   for(int j=0;j<INPUT_SIZE;j++)
   {
      word[j] = *(input + isi + j);
   }
   

   u08b_t hashval[128] = {0};
   Skein1024_Ctxt_t ctx;
   Skein1024_Init(&ctx, 1024);
   Skein1024_Update(&ctx, word, INPUT_SIZE);
   Skein1024_Final(&ctx, hashval);

   scores[i] = countbits(&hashval,&goal);
}
 


main.cpp
Spoiler:

Code: Select all

#include <iostream>
#include <fstream>
#include <string>
#include <time.h>
using namespace std;

#define __NO_STD_VECTOR
#include <CL/cl.h>

#define DATA_SIZE (1LL<<18)//21)


/* Period parameters */ 
#define N 624
#define M 397
#define MATRIX_A 0x9908b0dfUL   /* constant vector a */
#define UPPER_MASK 0x80000000UL /* most significant w-r bits */
#define LOWER_MASK 0x7fffffffUL /* least significant r bits */

static unsigned long mt[N]; /* the array for the state vector  */
static int mti=N+1; /* mti==N+1 means mt[N] is not initialized */

void init_genrand(unsigned long s)
{
    mt[0]= s & 0xffffffffUL;
    for (mti=1; mti<N; mti++) {
        mt[mti] =
       (1812433253UL * (mt[mti-1] ^ (mt[mti-1] >> 30)) + mti);
        /* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */
        /* In the previous versions, MSBs of the seed affect   */
        /* only MSBs of the array mt[].                        */
        /* 2002/01/09 modified by Makoto Matsumoto             */
        mt[mti] &= 0xffffffffUL;
        /* for >32 bit machines */
    }
}


unsigned long genrand_int32(void)
{
    unsigned long y;
    static unsigned long mag01[2]={0x0UL, MATRIX_A};
    /* mag01[x] = x * MATRIX_A  for x=0,1 */

    if (mti >= N) { /* generate N words at one time */
        int kk;

        //if (mti == N+1)   /* if init_genrand() has not been called, */
        //    init_genrand(5489UL); /* a default initial seed is used */

        for (kk=0;kk<N-M;kk++) {
            y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
            mt[kk] = mt[kk+M] ^ (y >> 1) ^ mag01[y & 0x1UL];
        }
        for (;kk<N-1;kk++) {
            y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
            mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1UL];
        }
        y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);
        mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL];

        mti = 0;
    }
 
    y = mt[mti++];

    /* Tempering */
    y ^= (y >> 11);
    y ^= (y << 7) & 0x9d2c5680UL;
    y ^= (y << 15) & 0xefc60000UL;
    y ^= (y >> 18);

    return y;
}

/* generates a random number on [0,0x7fffffff]-interval */
long genrand_int31(void)
{
    return (long)(genrand_int32()>>1);
}



int main(int argc, char* argv[])
{
   //srand(time(NULL));
   init_genrand(time(NULL));
  int devType=CL_DEVICE_TYPE_GPU;

 
  cl_int err; 
 
  size_t global; 
  size_t local;   
 
  cl_platform_id cpPlatform;
  cl_device_id device_id; 
  cl_context context;   
  cl_command_queue commands;
  cl_program program;   
  cl_kernel kernel;
 

  err = clGetPlatformIDs(1, &cpPlatform, NULL);
  if (err != CL_SUCCESS) {
    cerr << "Error: Failed to find a platform!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  err = clGetDeviceIDs(cpPlatform, devType, 1, &device_id, NULL);
  if (err != CL_SUCCESS) {
    cerr << "Error: Failed to create a device group!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
  if (!context) {
    cerr << "Error: Failed to create a compute context!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  commands = clCreateCommandQueue(context, device_id, 0, &err);
  if (!commands) {
    cerr << "Error: Failed to create a command commands!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  string source;
  std::ifstream in("kernel.txt", std::ios::in | std::ios::binary);
  if (in)
  {
    in.seekg(0, std::ios::end);
    source.resize(in.tellg());
    in.seekg(0, std::ios::beg);
    in.read(&source[0], source.size());
    in.close();
  }
  else
  {
   cerr << "Error: Could not load kernel.txt!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
  const char* ks = source.c_str();

  program = clCreateProgramWithSource(context, 1,
                (const char**)&ks,
                  NULL, &err);
  if (!program) {
    cerr << "Error: Failed to create compute program!" << endl;
   getchar();
    return EXIT_FAILURE;
  }
 
  err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
  if (err != CL_SUCCESS) {
    size_t len;
    char buffer[2048];
   
    cerr << "Error: Failed to build program executable!" << endl;
    clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG,
           sizeof(buffer), buffer, &len);
    cerr << buffer << endl;
   getchar();
    exit(1);
  }
 
  kernel = clCreateKernel(program, "skein", &err);
  if (!kernel || err != CL_SUCCESS) {
    cerr << "Error: Failed to create compute kernel!" << endl;
   getchar();
    exit(1);
  }

  unsigned int MIN_INPUT_SIZE = 12;
  unsigned int MAX_INPUT_SIZE = 64;
  unsigned int LAST_INPUT_SIZE = MAX_INPUT_SIZE;
  unsigned int INPUT_SIZE = MIN_INPUT_SIZE;
  unsigned int NEXT_INPUT_SIZE = INPUT_SIZE + 1;

  char* databuff1 = new char[DATA_SIZE * MAX_INPUT_SIZE];
  char* databuff2 = new char[DATA_SIZE * MAX_INPUT_SIZE]; 
  char** data = &databuff1;// = new char[DATA_SIZE * INPUT_SIZE];
  char** otherdata = &databuff2;
  int* score = new int[DATA_SIZE];
  int bestScore = 10000;
  char* bestWord = new char[MAX_INPUT_SIZE + 1];
  bestWord[0] = 0;
  cl_mem input;                   
  cl_mem outputscores;
 
  unsigned int count = DATA_SIZE;
  memset(score,0,DATA_SIZE * sizeof(int));

  input = clCreateBuffer(context,  CL_MEM_READ_ONLY, 
          sizeof(char) * MAX_INPUT_SIZE * count, NULL, NULL);
  outputscores = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
         sizeof(int) * count, NULL, NULL);
  if (!input || !outputscores) {
    cerr << "Error: Failed to allocate device memory!" << endl;
   getchar();
    exit(1);
  }   

  char* chars = "0123456789qwertyuioplkjhgfdsazxcvbnmQWERTYUIOPLKJHGFDSAZXCVBNM";
  int charslen = strlen(chars);

  unsigned int is = INPUT_SIZE;

  unsigned long long loops = 0LL;

  char* b = *data;
   for(int i=0;i<DATA_SIZE * INPUT_SIZE;i++){
       //b[i] = chars[(rand() % charslen)];
      b[i] = chars[genrand_int32() % charslen];
   }
       
err = clGetKernelWorkGroupInfo(kernel, device_id,
         CL_KERNEL_WORK_GROUP_SIZE,
         sizeof(local), &local, NULL);

  while(1)
  {
     err = clEnqueueWriteBuffer(commands, input,
                CL_TRUE, 0, sizeof(char) * INPUT_SIZE * count,
                *data, 0, NULL, NULL);
   //  if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to write to source array!" << endl;
   //   getchar();
   //   exit(1);
   //  }
 
     err = 0;
     err  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
     err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &outputscores);
     err |= clSetKernelArg(kernel, 2, sizeof(unsigned int), &INPUT_SIZE);
   //  if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to set kernel arguments! " << err << endl;
   //   getchar();
   //   exit(1);
   //  }
 

   //  if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to retrieve kernel work group info! "
   //    <<  err << endl;
   //   getchar();
   //   exit(1);
   //  }
 

     global = count;
     err = clEnqueueNDRangeKernel(commands, kernel,
                  1, NULL, &global, &local,
                  0, NULL, NULL);
     //if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to execute kernel!" << endl;
   //   getchar();
   //   return EXIT_FAILURE;
    // }
 
     clFlush(commands);

     if(loops > 0)
     {
        for(int i=0;i<count;++i)
        {
           if(score[i] < bestScore)
           {
              bestScore = score[i];
              char* w = &(*otherdata)[i * LAST_INPUT_SIZE];
              strncpy(bestWord, w, LAST_INPUT_SIZE);
              bestWord[LAST_INPUT_SIZE] = 0;

              printf("%d %s\n",bestScore, bestWord);
           }
        }
     }


   char* b = *otherdata;
   for(int i=0;i<DATA_SIZE * NEXT_INPUT_SIZE;i++){
       b[i] = chars[(genrand_int32() % charslen)];
   }



     clFinish(commands);
 

     err = clEnqueueReadBuffer( commands, outputscores,
               CL_TRUE, 0, sizeof(int) * count,
               score, 0 ,NULL, NULL);
     //if (err != CL_SUCCESS) {
   //   cerr << "Error: Failed to read output array! " <<  err << endl;
   //   getchar();
   //   exit(1);
    // }



     char* t = *data;
     *data = *otherdata;
     *otherdata = t;

     INPUT_SIZE++;
     NEXT_INPUT_SIZE++;
     LAST_INPUT_SIZE++;
     if(LAST_INPUT_SIZE > MAX_INPUT_SIZE)
     {
        LAST_INPUT_SIZE = MIN_INPUT_SIZE;
     }
     if(INPUT_SIZE > MAX_INPUT_SIZE)
     {
        INPUT_SIZE = MIN_INPUT_SIZE;
     }
     if(NEXT_INPUT_SIZE > MAX_INPUT_SIZE)
     {
        NEXT_INPUT_SIZE = MIN_INPUT_SIZE;
     }
    
     loops++;
     if(loops % 1000 == 0)
     {
        //srand(time(NULL));
        //init_genrand(time(NULL));
        printf("%llu\t%d - %s\n",loops * DATA_SIZE, bestScore, bestWord);
     }
  }

  delete [] data;
  //delete [] results;
  delete [] score;
  delete [] databuff1;
  delete [] databuff2;
 
  clReleaseMemObject(input);
  //clReleaseMemObject(output);
  clReleaseMemObject(outputscores);
  clReleaseProgram(program);
  clReleaseKernel(kernel);
  clReleaseCommandQueue(commands);
  clReleaseContext(context);
 
  getchar();

  return 0;
}

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Sun Apr 07, 2013 2:09 am UTC

trakof wrote:Here's an updated version of my OpenCL code, it's a decent speed increase, better random and variable length inputs to hash.

I get about 900 khash/sec out of this on Linux / nvidia 304.64 drivers for an (old) 8800 GT, vs about 1.4 Mhash/sec out of each of my 4 intel cores. Still, 6 Mhash/sec is better than 5 Mhash/sec =).

(With the previous code, I got about 850 khash/sec... a little improvement.)

trakof
Posts: 12
Joined: Sat Jul 17, 2010 8:33 pm UTC

Re: 1193: Externalities

Postby trakof » Sun Apr 07, 2013 6:12 am UTC

cm_ wrote:
trakof wrote:Here's an updated version of my OpenCL code, it's a decent speed increase, better random and variable length inputs to hash.

I get about 900 khash/sec out of this on Linux / nvidia 304.64 drivers for an (old) 8800 GT, vs about 1.4 Mhash/sec out of each of my 4 intel cores. Still, 6 Mhash/sec is better than 5 Mhash/sec =).

(With the previous code, I got about 850 khash/sec... a little improvement.)


Cool, thanks for the numbers. More than half the time seems to be spent counting the bit difference, guess I should try optimizing that some.

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Sun Apr 07, 2013 3:45 pm UTC

trakof wrote:Cool, thanks for the numbers. More than half the time seems to be spent counting the bit difference, guess I should try optimizing that some.

x86 cores have a builtin instruction for this that outperforms the math (at least on the iteration of x86 processor I own). I wonder if OpenCL cores do as well? Btw, I had to make some changes to get this to build with nvidia's OpenCL.

The kernel required some __constant annotations and explicit casts in places:

kernel.txt diff
Spoiler:

Code: Select all

diff --git b/exp/kernel2.cl a/exp/kernel2.cl
index cb076dc..d449131 100644
--- b/exp/kernel2.cl
+++ a/exp/kernel2.cl
@@ -351,116 +351,116 @@ void Skein1024_Update(Skein1024_Ctxt_t *ctx,  const u08b_t *msg, uint msgByteCnt
             }
         }
 
     /* copy any remaining source message data bytes into b[] */
     if (msgByteCnt)
     {
     memcpy(&ctx->b[ctx->h.bCnt],msg,msgByteCnt);
     ctx->h.bCnt += msgByteCnt;
     }
 }
 
 
 void Skein1024_Final(Skein1024_Ctxt_t *ctx, u08b_t *hashVal)
 {
    uint i,n,byteCnt;
    u64b_t X[16];
 
    ctx->h.T[1] |= SKEIN_T1_FLAG_FINAL;           
    if (ctx->h.bCnt < 128)   
    {
-      memset(&ctx->b[ctx->h.bCnt],0,128 - ctx->h.bCnt);
+      memset((char*)&ctx->b[ctx->h.bCnt],0,128 - ctx->h.bCnt);
    }     
    Skein1024_Process_Block(ctx,ctx->b,1,ctx->h.bCnt); 
     
    byteCnt = (ctx->h.hashBitLen + 7) >> 3;   
 
     
-   memset(ctx->b,0,sizeof(ctx->b));
+   memset((char*)ctx->b,0,sizeof(ctx->b));
    memcpy(X,ctx->X,sizeof(X));   
    for (i=0;i*128 < byteCnt;i++)
    {
       Skein_Start_New_Type(ctx,OUT_FINAL);
       Skein1024_Process_Block(ctx,ctx->b,1,sizeof(u64b_t));
       n = byteCnt - i*128; 
       if (n >= 128)
       {
          n  = 128;
       }
       Skein_Put64_LSB_First(hashVal+i*128,ctx->X,n);
 
       //memcpy(ctx->X,X,sizeof(X));
       ctx->X[0] = X[0];
       ctx->X[1] = X[1];
       ctx->X[2] = X[2];
       ctx->X[3] = X[3];
       ctx->X[4] = X[4];
       ctx->X[5] = X[5];
       ctx->X[6] = X[6];
       ctx->X[7] = X[7];
       ctx->X[8] = X[8];
       ctx->X[9] = X[9];
       ctx->X[10] = X[10];
       ctx->X[11] = X[11];
       ctx->X[12] = X[12];
       ctx->X[13] = X[13];
       ctx->X[14] = X[14];
       ctx->X[15] = X[15];
    }
 }
 
-const int bits[256] = {
+__constant const int bits[256] = {
 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,
 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
 4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8};
 
-const char target[] = "5b4da95f5fa08280fc9879df44f418c8f9f12ba424b7757de02bbdfbae0d4c4fdf9317c80cc5fe04c6429073466cf29706b8c25999ddd2f6540d4475cc977b87f4757be023f19b8f4035d7722886b78869826de916a79cf9c94cc79cd4347d24b567aa3e2390a573a373a48a5e676640c79cc70197e1c5e7f902fb53ca1858b6";
-const char goal[] = {0x5b, 0x4d, 0xa9, 0x5f, 0x5f, 0xa0, 0x82, 0x80, 0xfc, 0x98, 0x79, 0xdf, 0x44, 0xf4, 0x18, 0xc8, 0xf9, 0xf1, 0x2b, 0xa4, 0x24, 0xb7, 0x75, 0x7d, 0xe0, 0x2b, 0xbd, 0xfb, 0xae, 0x0d, 0x4c, 0x4f, 0xdf, 0x93, 0x17, 0xc8, 0x0c, 0xc5, 0xfe, 0x04, 0xc6, 0x42, 0x90, 0x73, 0x46, 0x6c, 0xf2, 0x97, 0x06, 0xb8, 0xc2, 0x59, 0x99, 0xdd, 0xd2, 0xf6, 0x54, 0x0d, 0x44, 0x75, 0xcc, 0x97, 0x7b, 0x87, 0xf4, 0x75, 0x7b, 0xe0, 0x23, 0xf1, 0x9b, 0x8f, 0x40, 0x35, 0xd7, 0x72, 0x28, 0x86, 0xb7, 0x88, 0x69, 0x82, 0x6d, 0xe9, 0x16, 0xa7, 0x9c, 0xf9, 0xc9, 0x4c, 0xc7, 0x9c, 0xd4, 0x34, 0x7d, 0x24, 0xb5, 0x67, 0xaa, 0x3e, 0x23, 0x90, 0xa5, 0x73, 0xa3, 0x73, 0xa4, 0x8a, 0x5e, 0x67, 0x66, 0x40, 0xc7, 0x9c, 0xc7, 0x01, 0x97, 0xe1, 0xc5, 0xe7, 0xf9, 0x02, 0xfb, 0x53, 0xca, 0x18, 0x58, 0xb6};
+__constant const char target[] = "5b4da95f5fa08280fc9879df44f418c8f9f12ba424b7757de02bbdfbae0d4c4fdf9317c80cc5fe04c6429073466cf29706b8c25999ddd2f6540d4475cc977b87f4757be023f19b8f4035d7722886b78869826de916a79cf9c94cc79cd4347d24b567aa3e2390a573a373a48a5e676640c79cc70197e1c5e7f902fb53ca1858b6";
+__constant const char goal[] = {0x5b, 0x4d, 0xa9, 0x5f, 0x5f, 0xa0, 0x82, 0x80, 0xfc, 0x98, 0x79, 0xdf, 0x44, 0xf4, 0x18, 0xc8, 0xf9, 0xf1, 0x2b, 0xa4, 0x24, 0xb7, 0x75, 0x7d, 0xe0, 0x2b, 0xbd, 0xfb, 0xae, 0x0d, 0x4c, 0x4f, 0xdf, 0x93, 0x17, 0xc8, 0x0c, 0xc5, 0xfe, 0x04, 0xc6, 0x42, 0x90, 0x73, 0x46, 0x6c, 0xf2, 0x97, 0x06, 0xb8, 0xc2, 0x59, 0x99, 0xdd, 0xd2, 0xf6, 0x54, 0x0d, 0x44, 0x75, 0xcc, 0x97, 0x7b, 0x87, 0xf4, 0x75, 0x7b, 0xe0, 0x23, 0xf1, 0x9b, 0x8f, 0x40, 0x35, 0xd7, 0x72, 0x28, 0x86, 0xb7, 0x88, 0x69, 0x82, 0x6d, 0xe9, 0x16, 0xa7, 0x9c, 0xf9, 0xc9, 0x4c, 0xc7, 0x9c, 0xd4, 0x34, 0x7d, 0x24, 0xb5, 0x67, 0xaa, 0x3e, 0x23, 0x90, 0xa5, 0x73, 0xa3, 0x73, 0xa4, 0x8a, 0x5e, 0x67, 0x66, 0x40, 0xc7, 0x9c, 0xc7, 0x01, 0x97, 0xe1, 0xc5, 0xe7, 0xf9, 0x02, 0xfb, 0x53, 0xca, 0x18, 0x58, 0xb6};
 
-int countbits(char* c1, char* c2)
+int countbits(char* c1, __constant char* c2)
 {
    int count = 0;
    uchar x;
    for(int iu=0;iu<128;++iu)
    {
       x = c1[iu] ^ c2[iu];
       count += bits[x];
    }
    return count;
 }
 
 __kernel void skein(__global char* input,
    __global int* scores,
    const unsigned int INPUT_SIZE)           
 {                                       
     uint i = get_global_id(0);
 
    char word[64];
    int isi = (INPUT_SIZE * i);
    for(int j=0;j<INPUT_SIZE;j++)
    {
       word[j] = *(input + isi + j);
    }
   
 
    u08b_t hashval[128] = {0};
    Skein1024_Ctxt_t ctx;
    Skein1024_Init(&ctx, 1024);
-   Skein1024_Update(&ctx, word, INPUT_SIZE);
+   Skein1024_Update(&ctx, (const u08b_t*)word, INPUT_SIZE);
    Skein1024_Final(&ctx, hashval);
 
-   scores[i] = countbits(&hashval,&goal);
+   scores[i] = countbits((char*)hashval,goal);
 }
   
For main.cpp, the main change is adding some C headers not included by default on my platform and giving clGetProgramBuildInfo() a larger buffer to store compilation errors. Otherwise, nvidia's libOpenCL doesn't dump any logs at all: just "error!" I also added some benchmarking code.

main.cpp diff
Spoiler:

Code: Select all

diff --git b/exp/main2.cpp a/exp/main2.cpp
index 52e115c..528ba5a 100644
--- b/exp/main2.cpp
+++ a/exp/main2.cpp
@@ -1,57 +1,90 @@
+#include <cassert>
 #include <iostream>
 #include <fstream>
 #include <string>
 #include <time.h>
+#include <cstdlib>
+#include <cstring>
 using namespace std;
 
 #define __NO_STD_VECTOR
-#include <CL/cl.h>
+#include <include/CL/cl.h>
 
 #define DATA_SIZE (1LL<<18)//21)
 
 
 /* Period parameters */ 
 #define N 624
 #define M 397
 #define MATRIX_A 0x9908b0dfUL   /* constant vector a */
 #define UPPER_MASK 0x80000000UL /* most significant w-r bits */
 #define LOWER_MASK 0x7fffffffUL /* least significant r bits */
 
 static unsigned long mt[N]; /* the array for the state vector  */
 static int mti=N+1; /* mti==N+1 means mt[N] is not initialized */
 
 void init_genrand(unsigned long s)
 {
    mt[0]= s & 0xffffffffUL;
    for (mti=1; mti<N; mti++) {
       mt[mti] =
           (1812433253UL * (mt[mti-1] ^ (mt[mti-1] >> 30)) + mti);
       /* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */
       /* In the previous versions, MSBs of the seed affect   */
       /* only MSBs of the array mt[].                        */
       /* 2002/01/09 modified by Makoto Matsumoto             */
       mt[mti] &= 0xffffffffUL;
       /* for >32 bit machines */
    }
 }
 
+static inline void
+ASSERT(intptr_t i)
+{
+
+   if (i == 0)
+      abort();
+}
+
+static inline void
+gettime(struct timespec *t)
+{
+   int r;
+
+   r = clock_gettime(CLOCK_MONOTONIC_RAW, t);
+   ASSERT(r == 0);
+}
+
+static inline double
+elapsed(struct timespec *begin)
+{
+   struct timespec cur;
+   double el;
+
+   gettime(&cur);
+
+   el = (double)cur.tv_sec - begin->tv_sec + 0.000000001 *
+       ((double)cur.tv_nsec - begin->tv_nsec);
+   return el;
+}
+
 
 unsigned long genrand_int32(void)
 {
    unsigned long y;
    static unsigned long mag01[2]={0x0UL, MATRIX_A};
    /* mag01[x] = x * MATRIX_A  for x=0,1 */
 
    if (mti >= N) { /* generate N words at one time */
       int kk;
 
       //if (mti == N+1)   /* if init_genrand() has not been called, */
       //    init_genrand(5489UL); /* a default initial seed is used */
 
       for (kk=0;kk<N-M;kk++) {
          y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
          mt[kk] = mt[kk+M] ^ (y >> 1) ^ mag01[y & 0x1UL];
       }
       for (;kk<N-1;kk++) {
          y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
          mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1UL];
@@ -71,40 +104,44 @@ unsigned long genrand_int32(void)
    y ^= (y >> 18);
 
    return y;
 }
 
 /* generates a random number on [0,0x7fffffff]-interval */
 long genrand_int31(void)
 {
    return (long)(genrand_int32()>>1);
 }
 
 
 
 int main(int argc, char* argv[])
 {
    //srand(time(NULL));
    init_genrand(time(NULL));
    int devType=CL_DEVICE_TYPE_GPU;
 
 
+   struct timespec begin;
+
+   gettime(&begin);
+
    cl_int err; 
 
    size_t global; 
    size_t local;   
 
    cl_platform_id cpPlatform;
    cl_device_id device_id; 
    cl_context context;   
    cl_command_queue commands;
    cl_program program;   
    cl_kernel kernel;
 
 
    err = clGetPlatformIDs(1, &cpPlatform, NULL);
    if (err != CL_SUCCESS) {
       cerr << "Error: Failed to find a platform!" << endl;
       getchar();
       return EXIT_FAILURE;
    }
 
@@ -113,75 +150,76 @@ int main(int argc, char* argv[])
       cerr << "Error: Failed to create a device group!" << endl;
       getchar();
       return EXIT_FAILURE;
    }
 
    context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
    if (!context) {
       cerr << "Error: Failed to create a compute context!" << endl;
       getchar();
       return EXIT_FAILURE;
    }
 
    commands = clCreateCommandQueue(context, device_id, 0, &err);
    if (!commands) {
       cerr << "Error: Failed to create a command commands!" << endl;
       getchar();
       return EXIT_FAILURE;
    }
 
    string source;
-   std::ifstream in("kernel.txt", std::ios::in | std::ios::binary);
+   std::ifstream in("kernel2.cl", std::ios::in | std::ios::binary);
    if (in)
    {
       in.seekg(0, std::ios::end);
       source.resize(in.tellg());
       in.seekg(0, std::ios::beg);
       in.read(&source[0], source.size());
       in.close();
    }
    else
    {
       cerr << "Error: Could not load kernel.txt!" << endl;
       getchar();
       return EXIT_FAILURE;
    }
    const char* ks = source.c_str();
 
    program = clCreateProgramWithSource(context, 1,
        (const char**)&ks,
        NULL, &err);
    if (!program) {
       cerr << "Error: Failed to create compute program!" << endl;
       getchar();
       return EXIT_FAILURE;
    }
 
    err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
    if (err != CL_SUCCESS) {
       size_t len;
-      char buffer[2048];
+      char *buffer = (char*)calloc(1, 16384);
 
       cerr << "Error: Failed to build program executable!" << endl;
       clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG,
-          sizeof(buffer), buffer, &len);
+          16384, buffer, &len);
       cerr << buffer << endl;
+      free(buffer);
       getchar();
       exit(1);
    }
 
    kernel = clCreateKernel(program, "skein", &err);
    if (!kernel || err != CL_SUCCESS) {
       cerr << "Error: Failed to create compute kernel!" << endl;
       getchar();
       exit(1);
    }
 
    unsigned int MIN_INPUT_SIZE = 12;
    unsigned int MAX_INPUT_SIZE = 64;
    unsigned int LAST_INPUT_SIZE = MAX_INPUT_SIZE;
    unsigned int INPUT_SIZE = MIN_INPUT_SIZE;
    unsigned int NEXT_INPUT_SIZE = INPUT_SIZE + 1;
 
    char* databuff1 = new char[DATA_SIZE * MAX_INPUT_SIZE];
    char* databuff2 = new char[DATA_SIZE * MAX_INPUT_SIZE]; 
    char** data = &databuff1;// = new char[DATA_SIZE * INPUT_SIZE];
@@ -250,41 +288,41 @@ int main(int argc, char* argv[])
       //    <<  err << endl;
       //   getchar();
       //   exit(1);
       //  }
 
 
       global = count;
       err = clEnqueueNDRangeKernel(commands, kernel,
           1, NULL, &global, &local,
           0, NULL, NULL);
       //if (err != CL_SUCCESS) {
       //   cerr << "Error: Failed to execute kernel!" << endl;
       //   getchar();
       //   return EXIT_FAILURE;
       // }
 
       clFlush(commands);
 
       if(loops > 0)
       {
-         for(int i=0;i<count;++i)
+         for(unsigned i=0;i<count;++i)
          {
             if(score[i] < bestScore)
             {
                bestScore = score[i];
                char* w = &(*otherdata)[i * LAST_INPUT_SIZE];
                strncpy(bestWord, w, LAST_INPUT_SIZE);
                bestWord[LAST_INPUT_SIZE] = 0;
 
                printf("%d %s\n",bestScore, bestWord);
             }
          }
       }
 
 
       char* b = *otherdata;
       for(int i=0;i<DATA_SIZE * NEXT_INPUT_SIZE;i++){
          b[i] = chars[(genrand_int32() % charslen)];
       }
 
 
@@ -307,45 +345,58 @@ int main(int argc, char* argv[])
       *data = *otherdata;
       *otherdata = t;
 
       INPUT_SIZE++;
       NEXT_INPUT_SIZE++;
       LAST_INPUT_SIZE++;
       if(LAST_INPUT_SIZE > MAX_INPUT_SIZE)
       {
          LAST_INPUT_SIZE = MIN_INPUT_SIZE;
       }
       if(INPUT_SIZE > MAX_INPUT_SIZE)
       {
          INPUT_SIZE = MIN_INPUT_SIZE;
       }
       if(NEXT_INPUT_SIZE > MAX_INPUT_SIZE)
       {
          NEXT_INPUT_SIZE = MIN_INPUT_SIZE;
       }
 
       loops++;
-      if(loops % 1000 == 0)
+      if(loops % 16 == 0)
       {
          //srand(time(NULL));
          //init_genrand(time(NULL));
+
          printf("%llu\t%d - %s\n",loops * DATA_SIZE, bestScore, bestWord);
+
+         uint64_t hashes = (uint64_t)DATA_SIZE * loops;
+         struct timespec now;
+
+         gettime(&now);
+
+         double seconds = ((double)now.tv_sec - begin.tv_sec) +
+             (1./1000000000.) * ((double)now.tv_nsec - begin.tv_nsec);
+
+         double hps = (double)hashes / seconds;
+
+         printf("HPS %.01f (total hashes: %lu loops: %llu)\n", hps, hashes, loops);
       }
    }
 
    delete [] data;
    //delete [] results;
    delete [] score;
    delete [] databuff1;
    delete [] databuff2;
 
    clReleaseMemObject(input);
    //clReleaseMemObject(output);
    clReleaseMemObject(outputscores);
    clReleaseProgram(program);
    clReleaseKernel(kernel);
    clReleaseCommandQueue(commands);
    clReleaseContext(context);
 
    getchar();
 
    return 0;

brock256
Posts: 4
Joined: Wed Apr 03, 2013 8:21 am UTC

Re: 1193: Externalities

Postby brock256 » Sun Apr 07, 2013 6:38 pm UTC

edo wrote:If I did the math right, if he used a four common word (2048 word list) password, at 1e6 hashes per second, it can be cracked in 5 days. There are permutations (spaces, capitalization) IdonTthink heMixed thEse, ( :D ) so it only adds a few bits, and if each person takes a permutation, we could get back to the 5 days to try all of them...

I've been running (with some hiccups due to other factors) since Thursday doing exactly that, but with a reduced word list (850 word Basic English wordlist). The only permutation I'm doing is spaces/no spaces, since that wasn't clear in the original comic. The string manipulations slow things down a bit from the more random approach (and I'm sure it doesn't help that I haven't spent much time optimizing beyond getting a compiler that could actually target x86-64), but I'm still getting about 1.3e6 hashes/s (running with 6 threads on a non-dedicated Core i7). Current ETA for completion (after hiccups) is sometime next Monday.

Even if he did use that password pattern (as opposed to the other pattern from the comic, a non-published password pattern, or a pure random hash), I'm not very optimistic about my chances of hitting it with this word list, but since you'd have to go to a 2^14 word list before picking up "staple", I didn't see much benefit to trying with a 2^11 word list. Also, 2 hours of programming time + 2ish weeks of cpu time seemed like a reasonable maximum investment in this particular problem to me :D In hindsight, though, I wish I had used as my dictionary the "ten hundred" words used to make Up Goer Five.
cm_ wrote:x86 cores have a builtin instruction for this that outperforms the math (at least on the iteration of x86 processor I own). I wonder if OpenCL cores do as well? Btw, I had to make some changes to get this to build with nvidia's OpenCL.

not sure about openCL, but CUDA seems to offer __popc(). I can't post links yet, but the first google result for "cuda __popc" (no quotes) tells the tale. The CUDA C Programming guide indicates this compiles to a single instruction for compute capability>=2.x (i.e. newer than fermi architecture/geforce 400-series).

cm_
Posts: 11
Joined: Wed Apr 03, 2013 5:22 pm UTC

Re: 1193: Externalities

Postby cm_ » Sun Apr 07, 2013 9:44 pm UTC

brock256 wrote:The string manipulations slow things down a bit from the more random approach (and I'm sure it doesn't help that I haven't spent much time optimizing beyond getting a compiler that could actually target x86-64), but I'm still getting about 1.3e6 hashes/s (running with 6 threads on a non-dedicated Core i7). Current ETA for completion (after hiccups) is sometime next Monday.

Hm, I get about 1.4e6 per core on my i7 (x4 cores at nearly linear speedup = 5e6 hashes/s), even with the string approach you should be able to do a bit better than that, I think. Oh well, Monday is soon.
brock256 wrote:Also, 2 hours of programming time + 2ish weeks of cpu time seemed like a reasonable maximum investment in this particular problem to me :D

Hah, yeah I've definitely wasted too much time on this.
brock256 wrote:
cm_ wrote:x86 cores have a builtin instruction for this that outperforms the math (at least on the iteration of x86 processor I own). I wonder if OpenCL cores do as well? Btw, I had to make some changes to get this to build with nvidia's OpenCL.

not sure about openCL, but CUDA seems to offer __popc(). I can't post links yet, but the first google result for "cuda __popc" (no quotes) tells the tale. The CUDA C Programming guide indicates this compiles to a single instruction for compute capability>=2.x (i.e. newer than fermi architecture/geforce 400-series)./quote]
Cool, although I have an older architecture card =(.

brock256
Posts: 4
Joined: Wed Apr 03, 2013 8:21 am UTC

Re: 1193: Externalities

Postby brock256 » Sun Apr 07, 2013 10:02 pm UTC

cm_ wrote:
brock256 wrote:The string manipulations slow things down a bit from the more random approach (and I'm sure it doesn't help that I haven't spent much time optimizing beyond getting a compiler that could actually target x86-64), but I'm still getting about 1.3e6 hashes/s (running with 6 threads on a non-dedicated Core i7). Current ETA for completion (after hiccups) is sometime next Monday.

Hm, I get about 1.4e6 per core on my i7 (x4 cores at nearly linear speedup = 5e6 hashes/s), even with the string approach you should be able to do a bit better than that, I think. Oh well, Monday is soon.

Hm. Mine's an older one (Bloomfield core@2.8GHz), but that shouldn't explain the full discrepancy. Guess I'm at least curious enough to drop it in gprof and see if there's anywhere I'm blatantly wasting time.
not sure about openCL, but CUDA seems to offer __popc(). I can't post links yet, but the first google result for "cuda __popc" (no quotes) tells the tale. The CUDA C Programming guide indicates this compiles to a single instruction for compute capability>=2.x (i.e. newer than fermi architecture/geforce 400-series).

Cool, although I have an older architecture card =(.

On the older architectures it compiles to a multi-instruction sequence, which I would assume should still be highly optimized, so it may be worth a try just for grins.

trakof
Posts: 12
Joined: Sat Jul 17, 2010 8:33 pm UTC

Re: 1193: Externalities

Postby trakof » Sun Apr 07, 2013 10:20 pm UTC

x86 cores have a builtin instruction for this that outperforms the math (at least on the iteration of x86 processor I own). I wonder if OpenCL cores do as well? Btw, I had to make some changes to get this to build with nvidia's OpenCL.

not sure about openCL, but CUDA seems to offer __popc(). I can't post links yet, but the first google result for "cuda __popc" (no quotes) tells the tale. The CUDA C Programming guide indicates this compiles to a single instruction for compute capability>=2.x (i.e. newer than fermi architecture/geforce 400-series).


OpenCL does have a popcount() function, it didn't seem to make that much of a difference, was somewhere in the 3-5% range.

ekim
Posts: 109
Joined: Mon Dec 18, 2006 12:40 pm UTC
Location: Seattle

Re: 1193: Externalities

Postby ekim » Mon Apr 08, 2013 2:43 pm UTC

trakof wrote:
x86 cores have a builtin instruction for this that outperforms the math (at least on the iteration of x86 processor I own). I wonder if OpenCL cores do as well? Btw, I had to make some changes to get this to build with nvidia's OpenCL.

not sure about openCL, but CUDA seems to offer __popc(). I can't post links yet, but the first google result for "cuda __popc" (no quotes) tells the tale. The CUDA C Programming guide indicates this compiles to a single instruction for compute capability>=2.x (i.e. newer than fermi architecture/geforce 400-series).


OpenCL does have a popcount() function, it didn't seem to make that much of a difference, was somewhere in the 3-5% range.

I imagine precomputing a lookup table is the fastest portable way to go, but Kernighan's bitcount is still my fave.

Code: Select all

while (c &= c-1)
    count++;

rmsgrey
Posts: 3655
Joined: Wed Nov 16, 2011 6:35 pm UTC

Re: 1193: Externalities

Postby rmsgrey » Mon Apr 08, 2013 3:09 pm UTC

ekim wrote:
trakof wrote:
x86 cores have a builtin instruction for this that outperforms the math (at least on the iteration of x86 processor I own). I wonder if OpenCL cores do as well? Btw, I had to make some changes to get this to build with nvidia's OpenCL.

not sure about openCL, but CUDA seems to offer __popc(). I can't post links yet, but the first google result for "cuda __popc" (no quotes) tells the tale. The CUDA C Programming guide indicates this compiles to a single instruction for compute capability>=2.x (i.e. newer than fermi architecture/geforce 400-series).


OpenCL does have a popcount() function, it didn't seem to make that much of a difference, was somewhere in the 3-5% range.

I imagine precomputing a lookup table is the fastest portable way to go, but Kernighan's bitcount is still my fave.

Code: Select all

while (c &= c-1)
    count++;


The problem with pre-computed lookup tables is that as soon as you start caching, the performance plummets - naive performance testing by running through large numbers of numbers in sequence will get good cache performance because of the high spatial locality. Real-life applications are likely to have more random access, making cache optimisation much harder.

nit_picker
Posts: 9
Joined: Fri Dec 30, 2011 5:21 am UTC

Re: 1193: Externalities

Postby nit_picker » Mon Apr 08, 2013 4:55 pm UTC

I wonder how much bigger the dog will become …

ekim
Posts: 109
Joined: Mon Dec 18, 2006 12:40 pm UTC
Location: Seattle

Re: 1193: Externalities

Postby ekim » Wed Apr 10, 2013 2:32 am UTC

rmsgrey wrote:The problem with pre-computed lookup tables is that as soon as you start caching, the performance plummets - naive performance testing by running through large numbers of numbers in sequence will get good cache performance because of the high spatial locality. Real-life applications are likely to have more random access, making cache optimisation much harder.

Maybe? Cache misses are slow, but our lookup table is tiny (256 bytes if we're making it for one-byte values) and might not get bumped from the cache in the time between accesses while we do the skein hashing. Sounds like lookup tables did pretty well on x86, at least.

score_under
Posts: 6
Joined: Tue Sep 02, 2008 5:24 pm UTC

Re: 1193: Externalities

Postby score_under » Sat Apr 13, 2013 3:11 pm UTC

The comic won't load.

With the name "Externalities", I assumed that it was poking fun at websites which hosted all their content fragmented across other, frequently unresponsive, servers, and took years to load as a consequence.
If it's not poking fun at that, it's a very good example of it nonetheless.

brock256
Posts: 4
Joined: Wed Apr 03, 2013 8:21 am UTC

Re: 1193: Externalities

Postby brock256 » Tue Apr 16, 2013 1:47 am UTC

brock256 wrote:I've been running (with some hiccups due to other factors) since Thursday doing exactly that, but with a reduced word list (850 word Basic English wordlist). The only permutation I'm doing is spaces/no spaces, since that wasn't clear in the original comic. The string manipulations slow things down a bit from the more random approach (and I'm sure it doesn't help that I haven't spent much time optimizing beyond getting a compiler that could actually target x86-64), but I'm still getting about 1.3e6 hashes/s (running with 6 threads on a non-dedicated Core i7). Current ETA for completion (after hiccups) is sometime next Monday.

Finished some time this afternoon, best result I found was:
new best distance = 400!
input was "nailsystemcakefrom"

Not too exciting :D

davidy22
Posts: 39
Joined: Sat Jan 26, 2013 9:29 am UTC

Re: 1193: Externalities

Postby davidy22 » Sat Apr 20, 2013 12:48 pm UTC

Anyone wanna help push the thing over the $50000 mark, to see the dog grow? I'm pretty darned sure the dog grows again at 50000, it's a nice round number.

tielenaar
Posts: 1
Joined: Thu Apr 25, 2013 4:59 pm UTC

Re: 1193: Externalities

Postby tielenaar » Thu Apr 25, 2013 5:01 pm UTC

50k reached. Very nice! Dog looks the same size to me though.

elmarj
Posts: 1
Joined: Sat Apr 04, 2009 11:37 am UTC

Re: 1193: Externalities

Postby elmarj » Tue May 07, 2013 7:30 am UTC

seraku wrote:

Code: Select all

CSS3114: @font-face failed OpenType embedding permission check. Permission must be Installable.
xkcd-Regular.otf

I'm not an expert on web fonts, but it looks like IE may be enforcing a stricter policy here.


IE9/10 otf-embedding only works when the embedding bits in the font-file are set to installable, some protection against unlicensed copying of fonts. Still, I would have expected it to fall back to the provided eot file.

marsman57
Posts: 67
Joined: Fri Jul 27, 2007 1:40 pm UTC

Re: 1193: Externalities

Postby marsman57 » Tue May 07, 2013 4:13 pm UTC

:( I saw this at the top and really hoped that people were posting updated results still.

User avatar
mrob27
Posts: 1337
Joined: Tue Jun 28, 2011 2:19 am UTC
Location: ]〖  
Contact:

Server failures?

Postby mrob27 » Sun Dec 08, 2013 10:08 pm UTC

This comic (Externalities) has not been working for me for a couple days now. I suspect some kind of problem with the servers that are responsible for running "imgs․xkcd․com/comics/externalities.png" . I get a normal xkcd window with blank space (780x969 pixels) in the area where the comic would be. Chrome 29.0.x, Safari 5.1.10, Opera 12.16, and an ancient Firefox 2.0.x all have the same problem; also two iPad 2's, one with iOS 5.0 and another with iOS 6.1.3. I have no problems with the Store1, Blag, fora, other xkcd comics, etc.

Can anyone else test this? Perhaps the first 2 or 3 people who get any kind of success (like seeing part of a comic, or perhaps the text but not the images, etc.) could post here to let us know your browser, OS, device type, and your approximate location in the world.

- Robert Munafo

1The Store banner on my second iPad 2 tells me, "ɪ ᴡɪʟʟ ᴘʀᴏʙᴀʙʟʏ ɴᴏᴛ ꜱʜɪᴘ ʏᴏᴜ ᴀ BOBCAT". That's very reassuring, BHG :twisted:
Robert Munafohttp://mrob.com@mrob_27
Image
I ᴍᴀᴅᴇ sᴏɍᴛᴡᴀʀᴇ ᴛʜᴀᴛ Rᴀɴᴅᴀʟʟ ɍᴏᴜɴᴅ ᴜsᴇɍᴜʟ ɪɴ ᴛʜɪs хᴋᴄᴅ

User avatar
Klear
Posts: 1965
Joined: Sun Jun 13, 2010 8:43 am UTC
Location: Prague

Re: 1193: Externalities

Postby Klear » Sun Dec 08, 2013 11:19 pm UTC

It's down for me too, on Chrome. I recall that it often failed to load even when it was new.

User avatar
edo
Posts: 436
Joined: Thu Mar 28, 2013 7:05 pm UTC
Location: ~TrApPeD iN mY PhOnE~

Re: 1193: Externalities

Postby edo » Fri Apr 11, 2014 9:55 pm UTC

Based on the latest comic, has anyone tried "CoHoBaSt"?
Co-proprietor of a Mome and Pope Shope

User avatar
pitareio
Posts: 128
Joined: Wed Sep 19, 2012 7:03 pm UTC
Location: the land of smelly cheese

Re: 1193: Externalities

Postby pitareio » Sat Apr 12, 2014 12:02 am UTC

edo wrote:Based on the latest comic, has anyone tried "CoHoBaSt"?


Such a simple input would probably have been guessed by the insane number of people running brute-force hash calculation on insanely powerful computers.

For all we know, there maybe wasn't even a "correct" input to begin with, and Randall just picked a random hash.


Return to “Individual XKCD Comic Threads”

Who is online

Users browsing this forum: sotanaht and 103 guests