Hbase表级别元数据一致性和hbck原理

xiaoxiao2021-03-25 93

最近重新回到熟悉的hbase领域，感慨还是很多。首先终于又可以沉下心来好好搞技术了，其次看到现在有冲劲有追求的年轻人就像看到原来的自己。大数据需要一代一代人传承下去。

最近处于集群管理方便以及资源合理利用的考虑我们上线了region group的patch，将原来在2.0里面才合并的patch 加到了0.98版本中。初始使用的时候挺好，但是也遇到了一点问题——在做表group之间迁移的时候发现master页面上的元数据信息有误。而实际的region分配却没问题

为啥会出现这种情况，就需要我们了解hbase关于表、region元数据如何管理的问题。

首先，大家知道的比较多的是zk中存储的元数据信息

一、ZK元数据

第一级目录/hbase 第二级子目录 /meta-region-server ## meta表所在的rs位置，最初的bigtable论文中是有root表到meta表两级的，hbase原来也有，后来是发现一个meta表能索引的region数量已经足够用了，而多加一级root表多一次路由没意义就舍弃掉了 /acl ## 子节点存储表以及namespace级别权限控制，再下一级子节点存储哪些user拥有什么权限 /backup-masters ## 子节点存储standby master的地址,端口,启动时间 /table ## 子节点存储这个集群所有的表信息，无论是否enable /draining ## 存储regionserver的临时变化情况，一般是下线多个regionserver时使用 /region-in-transition ## 存储处于事物中的region(split/online/offline/compact等) /running ## hbase集群是否正常运行 /table-lock ## 锁表信息，在表发生变更时使用 /master ## 集群的master地址 /balancer ## loaderbalancer是否被开启 /namespace ## 当前所有的namespace /hbaseid ## 集群启动时生成的唯一id /online-snapshot ## 在线的快照 /replication ## hbase的replication配置，有rs和peers两个元数据信息 /groupInfo ## 存储的group信息 /splitWAL ## 用来构造一个region server的splitlog目录 /recovering-regions ## 存储恢复中的regions /rs ## 当前所有在线的regionserver信息

二、meta表元数据

meta表中存储的就是所有region状态的信息

rowkey组成为——namespace:tablename,,timestamp.md5

列族为info，子列包括

server——region所属regionserver位置

serverstartcode——server启动的startcode，rs每次重启之后就不是"自己"了，而是用startcode标识的一个rs，所以要重新分配region

regioninfo——region的ENCODED, STARTKEY, ENDKEY

三、HDFS目录中元数据

/hbase/.tmp ## 临时目录，一般是存放compact、split等操作过程中的临时文件 /hbase/WALs ## 存储每个regionserver的WAL日志，子目录是每个rs的名称 /hbase/archive ## 存储compact过程中不用的HFile,删除表的数据也再，过期(5分钟)会被删除.可以用来恢复误drop的表，快照也存储在这 /hbase/corrupt ## 错误文件路径 /hbase/data ## 所有的表数据都在data下 /hbase/hbase.id ## 当前集群启动的id，每次启动都不同 /hbase/hbase.version ## hbase版本号 /hbase/oldWALs ## 过期的WAL日志，等待被清除

四、Hmaster内存中的数据

这个部分的数据在任何地方都比较少介绍，但是其实是非常重要的！

infoServer——存储web UI需要的相关信息

ZookeeperWatcher——保持和zooKeeper连接

activeMasterManager——管理并存储了当前active的master

regionServerTracker——追踪regionserver

drainingServerTracker——追踪drainning状态regionserver

groupAdminServer——region group元数据信息

tableNamespaceManager——namespace元数据信息

五、HBck检查过程

有了上面所说的元数据，大家可以注意到，同样的一份数据在hbase中分别存储在了4个不同的地方，数据就存在不一致的可能。那我们就从Hbase自带的hbck角度来看看什么样的情况会被hbase认为是元数据异常，又是如何去做修复的？

这里只分析核心检查的部分，其余检查准备阶段略过

// do the real work of hbck connect(); try { // if corrupt file mode is on, first fix them since they may be opened later if (checkCorruptHFiles || sidelineCorruptHFiles) { LOG.info("Checking all hfiles for corruption"); HFileCorruptionChecker hfcc = createHFileCorruptionChecker(sidelineCorruptHFiles); setHFileCorruptionChecker(hfcc); // so we can get result Collection<TableName> tables = getIncludedTables(); Collection<Path> tableDirs = new ArrayList<Path>(); Path rootdir = FSUtils.getRootDir(getConf()); if (tables.size() > 0) { for (TableName t : tables) { tableDirs.add(FSUtils.getTableDir(rootdir, t)); } } else { tableDirs = FSUtils.getTableDirs(FSUtils.getCurrentFileSystem(getConf()), rootdir); } hfcc.checkTables(tableDirs); hfcc.report(errors); } //到这一步先检查HFile的数据格式是否正确，作为第一步做的检查 // check and fix table integrity, region consistency. int code = onlineHbck(); //这里调用了onlineHbck做线上检查使用 setRetCode(code); // If we have changed the HBase state it is better to run hbck again // to see if we haven't broken something else in the process. // We run it only once more because otherwise we can easily fall into // an infinite loop. if (shouldRerun()) { try { LOG.info("Sleeping " + sleepBeforeRerun + "ms before re-checking after fix..."); Thread.sleep(sleepBeforeRerun); } catch (InterruptedException ie) { return this; } // Just report setFixAssignments(false); setFixMeta(false); setFixHdfsHoles(false); setFixHdfsOverlaps(false); setFixVersionFile(false); setFixTableOrphans(false); errors.resetErrors(); code = onlineHbck(); setRetCode(code); } } finally { IOUtils.cleanup(null, connection, meta, admin); } return this; --------------------------------------------------------------------------------------------------------- /** * Contacts the master and prints out cluster-wide information * @return 0 on success, non-zero on failure */ public int onlineHbck() throws IOException, KeeperException, InterruptedException, ServiceException { // print hbase server version errors.print("Version: " + status.getHBaseVersion()); offlineHdfsIntegrityRepair(); //这里是对HBase表在hdfs路径上的存储路径进行检查，是否符合标准 // turn the balancer off boolean oldBalancer = admin.setBalancerRunning(false, true); try { onlineConsistencyRepair(); } finally { admin.setBalancerRunning(oldBalancer, false); } if (checkRegionBoundaries) { checkRegionBoundaries(); } offlineReferenceFileRepair(); checkAndFixTableLocks(); // Check (and fix if requested) orphaned table ZNodes checkAndFixOrphanedTableZNodes(); // Remove the hbck lock unlockHbck(); // Print table summary printTableSummary(tablesInfo); return errors.summarize(); }--------------------------------------checkAndFixConsistency();------------------------- private void checkRegionConsistencyConcurrently( final List<CheckRegionConsistencyWorkItem> workItems) throws IOException, KeeperException, InterruptedException { if (workItems.isEmpty()) { return; // nothing to check } //workItems是具体去做修复的任务 List<Future<Void>> workFutures = executor.invokeAll(workItems); for(Future<Void> f: workFutures) { try { f.get(); } catch(ExecutionException e1) { LOG.warn("Could not check region consistency " , e1.getCause()); if (e1.getCause() instanceof IOException) { throw (IOException)e1.getCause(); } else if (e1.getCause() instanceof KeeperException) { throw (KeeperException)e1.getCause(); } else if (e1.getCause() instanceof InterruptedException) { throw (InterruptedException)e1.getCause(); } else { throw new IOException(e1.getCause()); } } } } 六、思考目前看来hbase在处理元数据时信息并不是集中存储，对于一些操作失败时会产生数据不一致的情况。提供了HBCK的方式进行修复，不过对于新的region group没有做检查以及修复元数据，待后续改进。另外，这种数据分散的方式对hbase的一致性也还是造成挑战。

转载请注明原文地址: https://ju.6miu.com/read-21564.html

技术

最新回复(0)