You are here

data warehousing

Writing a SerDe in Hive for Lwes event files

I am currently working to set up an OLAP data warehouse using Hive on top of Hadoop. We have a considerable amount of data that comes from the ad servers on which we need to perform various kinds of analysis.

Writing a map-reduce job is not difficult in principle – it's just time consuming and requires the skills of a trained java engineer, which wouldn't be needed were we using SQL. That's where hive comes in: it allows us to query an hadoop data store using a flavor of SQL.

 

Data Warehousing books

With the constant increasing of the quantity of data that companies collect and need to process, Data Warehousing is a job sector that's expnding even in the recession. It it also living a second youth, thanks to a number of open source projects that have been slowly but surely gaining popularity in a manner similar to linux 10 years ago. One of this technologies is Hadoop, a distributed filesystem and data processing framework based on Google's Map/Reduce paper. Hadoop powers Yahoo! Search, Facebook and many other sites' data warehouses.

Subscribe to RSS - data warehousing