Structured data in Hive: a generic UDF to sort arrays of structs

September 17, 2013

Introduction Hive has a rich and complex data model that supports maps, arrays and structs, that could be mixed and matched, leading to arbitrarily nested structures, like in JSON. I wrote about a JSON SerDe in another post and if you use it, you know it can lead to pretty complicated nested tables. Unfortunately, hive […]

Posted in: programming

A JSON read/write SerDe for Hive

July 11, 2011

Today I finished coding another SerDe for Hive which, with my employer’s permission, I published on github here: https://github.com/rcongiu/Hive-JSON-Serde.git. Since the code is still fresh in my mind, I thought I’d write another article on how to write a SerDe, since the official documentation on how to do it it scarce and you’d have to […]

Tagged: ,
Posted in: programming

Writing a Hive SerDe for LWES event files

October 27, 2009

I am currently working to set up an OLAP data warehouse using Hive on top of Hadoop. We have a considerable amount of data that comes from the ad servers on which we need to perform various kinds of analysis. Writing a map-reduce job is not difficult in principle – it’s just time consuming and […]

Tagged: ,
Posted in: programming

Data Warehousing Books

October 27, 2009

With the constant increasing of the quantity of data that companies collect and need to process, Data Warehousing is a job sector that’s expnding even in the recession. It it also living a second youth, thanks to a number of open source projects that have been slowly but surely gaining popularity in a manner similar […]

Tagged:
Posted in: programming

Joins in Hadoop using CompositeInputFormat

June 7, 2009

One of the first questions that a ‘traditional’ ETL engineer asks when learning hadoop is, “How do I do a join ?” For instance, how can we do in hadoop something like querying for the names of all employees who are in a California city: SELECT e.name, c.name from employees e INNER JOIN cities c […]

Tagged: ,
Posted in: programming

Setting up a JSF Maven project in NetBeans (including working autocompletion for JSP/JSF)

April 6, 2009

Today’s rich IDEs make a lot of tasks easier…usually. With Java and its IDEs you often end up spending more time than you anticipated to just set up a project, especially when dealing with the complexities of J2EE: there are multiple versions of the specifications 1.3,1.4,5.0), each one with multiple implementations by different vendors plus […]

Tagged:
Posted in: programming