【数据结构与算法】初探哈希表

xiaoxiao2021-11-30 32

哈希表的存在是为了能够以O(1)平均时间复杂度插入和读取数据。

参考《STL源码剖析》和《数据结构与算法分析——C语言描述》两本书，总算让我对哈希表有了一个基本的认识。

谈到哈希表就会谈到冲突，两本书无疑都介绍了线性探测，二次探测等方法来解决冲突，不过在SGI STL库里，我们更常常用开放定址法（开链法）来解决冲突。

SGI STL 里面的 hashtable 的具体实现是： 1）由一个bucket数组组成； 2）每个bucket下面挂着一个hash_node组成的list； 3）每个hash_node由一个Val对象（存储真正元素）和一个hash_node指针（next指针）组成。

hashtable的工作过程是： 1）将key用HashFcn进行hash； 2）将hash的结果执行取模操作%n（其中n是hashtable中bucket的数目），定位到具体bucket的位置； 3）依次用EqualKey比较bucket中hash_node的key，找到与输入元素相同的node，返回；若找不到，则新添加一个node返回。

STL里实现了 hashtable ，并以它为基础实现了 hash_set, hash_map, hash_mutiset, hash_mutimap等，为了讲解简单，我们以 hash_set 为例讲解。hash_set 这个模板类有四个参数：

template<class Value, //hast_set的key值就是value值 class HashFcn = hash<Value>, //hash仿函数，用于执行真正的hash操作。有默认模板参数hash<Value> class EqualKey = equal_to<Value>, //比较仿函数，用于执行hash冲突后，bucket内的find工作。有默认模板参数equal_to<Value> class Alloc = alloc> //内存分配器。有默认模板参数参数 class hash_set{...}

SGI STL 里提供了针对 char, short, int, long等整数型别以及它们的 signed 和 unsigned 版本（其他类型不能处理，如string, float, double等，这些需要自定义哈希函数和比较函数）的哈希函数，不过这些函数其实什么也没做，只是忠实返回原值，但对于字符串类型（const char *），就设计了一个简单的转换函数如下：

//以下定义于<stl_hash_fun.h> template <class Key> struct hash{ }; inline size_t __stl_hash_string(const char *s) { unsigned long h = 0; for(; *s; ++s) h = 5 * h + *s; return size_t(h); } 这样一看确实是有够简单的，考虑到ASCII表从0~127的范围，同一hash值可能对应不同字符串。此外，对于字符串类型，hash函数使用如上自带的可以，但比较仿函数不能使用自带的 eqial_to<T>，因为该方法只会直接对比指针是否相等，而不会一个字符一个字符的比较，所以我们可以使用 strcmp()函数制作如下比较方法：

struct cmp { bool operator()(const char *s1, const char *s2) const { return strcmp(s1, s2) == 0; } } 如此，查找函数可以利用find()：

void lookup(const hash_set<const char *, hash<const char *>, cmp>&se, const char *word) { hash_set<const char *, hash<const char *>, cmp>::const_iterator iter = se.find(word); cout << " " << word << ": " << (iter != se.end() ? "present" : "not present") << endl; } 下面举例如下：

#include <iostream> #include <string.h> #include <hash_set> using namespace std; using namespace __gnu_cxx; struct cmp { bool operator()(const char *s1, const char *s2) const { return strcmp(s1, s2) == 0; } }; int main() { cout << "------------------Test 1------------------" << endl; hash_set<const char*> se; hash_set<const char*>::const_iterator iter; se.insert("kiwi"); char c1[] = "kiwi"; char *c2 = "kiwi"; iter = se.find(c1); if(iter != se.end()) { cout << "Find c1" << endl; } iter = se.find(c2); if(iter != se.end()) { cout << "Find c2" << endl; } cout << "------------------Test 2------------------" << endl; hash_set<const char*, hash<const char*>, cmp> newSe; hash_set<const char*, hash<const char*>, cmp>::const_iterator newIter; newSe.insert("kiwi"); char c3[] = "kiwi"; char *c4 = "kiwi"; newIter = newSe.find(c3); if(newIter != newSe.end()) { cout << "Find c3" << endl; } newIter = newSe.find(c4); if(newIter != newSe.end()) { cout << "Find c4" << endl; } return 0; } 输出结果如下：

------------------Test 1------------------ Find c2 ------------------Test 2------------------ Find c3 Find c4

果然，缺省的equal_to<Value>确实是仅仅对比指针值，必须要使用strcmp()代替之。

下面演示下为 string 添加 hash 函数：

#include <iostream> #include <string.h> #include <hash_set> using namespace std; using namespace __gnu_cxx; namespace __gnu_cxx{ template<> struct hash<string> { size_t operator()(const string &str) const { size_t res = 0; for(int i = 0; i < str.size(); ++i) res = res * 5 + str[i]; return res; } }; } int main() { hash_set<string, hash<string>, equal_to<string> > seString; hash_set<string, hash<string>, equal_to<string> >::const_iterator iter; seString.insert(string("kiwi")); seString.insert(string("plum")); seString.insert(string("apple")); seString.insert(string("mango")); seString.insert(string("apricot")); seString.insert(string("banana")); for(iter = seString.begin(); iter != seString.end(); ++iter) { cout << *iter << endl; } return 0; }

此hash函数与hash<const char *>相同，故以上输出与《STL源码剖析》P274的输出结果一样。

几点我也不太清楚的地方：

1、命名空间问题；

2、为什么有默认的 equal_to<string>

这里特别要注意的是在 hash_set 里保存的 node 的 key 是 const char *，这里就涉及到一个内存管理的问题，我们要确保之前insert时用的 const char* 不能失效，而且内容不能被更改，否则在bucket内比对key时就会出现严重的问题，轻则找不到元素，甚至dump掉。这个注意点参考：尽量不用char*作为hash_map的key

转载请注明原文地址: https://ju.6miu.com/read-678981.html

专利

最新回复(0)