Spark SQL-Pivot

2021-08-09 82 words One minute

Contents

类似Excel的数据透视表，分类聚合。也可以协助实现行转列，Pivoting “Wide” to “Long” Format

在统计分析时总会遇到分类汇总的场景，类似Excel的数据透视表。SQL中按照 case when 或 IF 的写法往往会显得臃肿，较为方便的便是通过 pivot ¹实现，但 Hive 不支持😢

以下基于 spark-2.4.5U3 及以上版本

基本语法

1
2


PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ]
       FOR column_list IN ( expression_list ) )

The PIVOT clause can be specified after the table name or subquery.

假设有张存有各个地区、各个产品的月销量的表（sales_table），我们需要统计各个月份所有地区产品销量的加总，形如👇

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


select	month
				,'毛巾'
				,'肥皂'
from		sales_table
pivot
 		(
     sum(sales)
     for product in ('毛巾','肥皂')
    )
;