Browsing All posts tagged under »parquet«

Creating Nested data (Parquet) in Spark SQL/Hive from non-nested data

April 4, 2015 by


Sometimes you need to create denormalized data from normalized data, for instance if you have data that looks like CREATE TABLE flat ( propertyId string, propertyName String, roomname1 string, roomsize1 string, roomname2 string, roomsize2 int, .. ) but we want something like   CREATE TABLE nested ( propertyId string, propertyName string, rooms <array<struct<roomname:string,roomsize:int>> )   […]