浅析java的hashmap

话说java的hashmap很好用啊，看了java的源代码之后其实发现实现也蛮简单的，map的用法就是key-value的用法，然后对于每一个对象都有一个hashcode，在hashmap内部就是利用这个hashcode来找到value的。

那么hashcode一般而言是对象的在jvm中的一个32位的地址，所以不会重复，可能有人注意到，这样的话如果用string做key那么两个内容相同的串其地址不同，那么hashcode也不同，这样就不合理了，所以实际上在String.java里是override了hashcode的方法的。

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

其实这就是BKDRHash，详细可以看这里，话说31也是个magic number 呢。

有了这个可以作为主键的值，剩下的问题就是hash函数和冲突处理了。

static int indexFor(int h, int length) {
    return h & (length-1);
}

hash函数用了位运算里常用的&操作，而这里的length是2的幂，这样比一般的取模操作快了很多。

我们再看get里冲突是如何处理的（put同理）

public V get(Object key) {
    if (key == null)
        return getForNullKey();
    int hash = hash(key.hashCode());
    for (Entry e = table[indexFor(hash, table.length)];
        e != null;
        e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }
    return null;
}

一般处理冲突的方法有两种，一个是开散列，就是如果这个位置被占了就换个位置。另一个是hash链表，对于每个位置都创建链表来存储所有hash值相同的，而java里就是用了第二种方案。

注意到还有个hash函数，这是为了让所有位都能够影响到index的计算

static int hash(int h) {
    // This function ensures that hashCodes that differ only by
    // constant multiples at each bit position have a bounded
    // number of collisions (approximately 8 at default load factor).
    h ^= (h >>> 20) ^ (h >>> 12);
    return h ^ (h >>> 7) ^ (h >>> 4);
}

ok，基本就是这样了，道理还是比较好懂的，利用hashcode使得所有对象都可以作为key还是比较好的想法。