Browsing All posts tagged under »java«

A JSON read/write SerDe for Hive

July 11, 2011 by


Today I finished coding another SerDe for Hive which, with my employer’s permission, I published on github here: Since the code is still fresh in my mind, I thought I’d write another article on how to write a SerDe, since the official documentation on how to do it it scarce and you’d have to […]

Writing a Hive SerDe for LWES event files

October 27, 2009 by


I am currently working to set up an OLAP data warehouse using Hive on top of Hadoop. We have a considerable amount of data that comes from the ad servers on which we need to perform various kinds of analysis. Writing a map-reduce job is not difficult in principle – it’s just time consuming and […]

Data Warehousing Books

October 27, 2009 by


With the constant increasing of the quantity of data that companies collect and need to process, Data Warehousing is a job sector that’s expnding even in the recession. It it also living a second youth, thanks to a number of open source projects that have been slowly but surely gaining popularity in a manner similar […]

Joins in Hadoop using CompositeInputFormat

June 7, 2009 by


One of the first questions that a ‘traditional’ ETL engineer asks when learning hadoop is, “How do I do a join ?” For instance, how can we do in hadoop something like querying for the names of all employees who are in a California city: SELECT, from employees e INNER JOIN cities c […]

Setting up a JSF Maven project in NetBeans (including working autocompletion for JSP/JSF)

April 6, 2009 by


Today’s rich IDEs make a lot of tasks easier…usually. With Java and its IDEs you often end up spending more time than you anticipated to just set up a project, especially when dealing with the complexities of J2EE: there are multiple versions of the specifications 1.3,1.4,5.0), each one with multiple implementations by different vendors plus […]