hive 高级数据类型使用

    xiaoxiao2026-03-04  7

    hive 高级数据类型使用

    用了许久的hive,但是一直都是简单的sql join,sort, order by等,今天有一个业务场景需要使用array数据类型存储数据并进行横表转纵表的转换。mark下以后用了可以查询。  数据样子是这样的。

    ID type_flag tags 10001 3 11_20_30,11_22_34,12_23_30,13_24_36 10002 2 11_20,11_22,12_23,13_24 10003 1 11,12

    表格1

    需要转化成的样子如下:

    ID type_flag tag1 tag2 tag3 10001 3 11 20 30 10001 3 11 22 34 10001 3 12 23 30 10001 3 13 24 36 10002 2 11 20   10002 2 11 22   10002 2 12 23   10002 2 13 24   10003 1 11     10003 1 12    

    表格2

    创建表tmp_type_tags存储表格1的数据 <code class="hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">create table tmp.tmp_type_tags ( id bigint , type_flag <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>, tags <span class="hljs-stl_container" style="box-sizing: border-box;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">array</span> < <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> ></span> ) row format delimited fields terminated by <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'\t'</span> COLLECTION ITEMS TERMINATED BY <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">','</span> stored as textfile;</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul> 创建表tmp_type_tag_split存储转化中间的数据 <code class="hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">create table tmp.tmp_type_tag_split ( id bigint , type_flag <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>, tag <span class="hljs-stl_container" style="box-sizing: border-box;"><span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">array</span> < <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> ></span> ) row format delimited fields terminated by <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'\t'</span> COLLECTION ITEMS TERMINATED BY <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'_'</span> stored as textfile;</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li></ul> 第一步横转纵  将数据按照第一层分隔符,转化成数据size那么多的行数。sql如下 <code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">insert</span> overwrite <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">table</span> tmp.tmp_type_tag_split <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">select</span> id , type_flag, split(tag0, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'_'</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> tmp.tmp_type_tags lateral <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">view</span> explode(tags) r1 <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag0</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

    数据结果: 

    第二步按照类型进行分列 <code class="hljs sql has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: "Source Code Pro", monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"> <span class="hljs-operator" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">create</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">table</span> tmp.tmp_type_tag_split_info <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">select</span> id , type_flag, (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag1, (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag2, (<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">case</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">when</span> type_flag <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">in</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">then</span> tag[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">else</span> <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">''</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">end</span>) <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">as</span> tag3 <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">from</span> tmp.tmp_type_tag_split</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right: 1px solid rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li></ul>

    最终表数据结果如下:转换完成 

    转载请注明原文地址: https://ju.6miu.com/read-1307591.html
    最新回复(0)