Browsing Archives of Author »Roberto Congiu«

Basic Authorization and htaccess style authentication on the Play! Framework an Silhouete

August 18, 2018


Silhouette is probably the best library to implement authentication and authorization within the Play Framework. Git repo here : It is very powerful, as you can manage a common identity from multiple providers, so you can have users logging into your site from google, facebook, JWT,  and may other methods. It also allows you to fine […]

Custom Window Function in Spark to create Session IDs

October 29, 2017


(note: crossposted from my Nuvolatech Blog If you’ve worked with Spark, you have probably written some custom UDF or UDAFs. UDFs are ‘User Defined Functions’, so you can introduce complex logic in your queries/jobs, for instance, to calculate a digest for a string, or if you want to use a java/scala library in your queries.

Creating Nested data (Parquet) in Spark SQL/Hive from non-nested data

April 4, 2015


Sometimes you need to create denormalized data from normalized data, for instance if you have data that looks like CREATE TABLE flat ( propertyId string, propertyName String, roomname1 string, roomsize1 string, roomname2 string, roomsize2 int, .. ) but we want something like   CREATE TABLE nested ( propertyId string, propertyName string, rooms <array<struct<roomname:string,roomsize:int>> )   […]

Panna Cotta, my recipe.

January 10, 2015


Panna cotta is one of my favorite dessert and one you can enjoy at many Italian restaurants here in LA. It looks and sounds fancy, but it’s incredibly easy to make if you just get the right ingredients, in particular the gelatin. It is also very important to get very fresh ingredients, since it’s basically […]

Structured data in Hive: a generic UDF to sort arrays of structs

September 17, 2013


Introduction Hive has a rich and complex data model that supports maps, arrays and structs, that could be mixed and matched, leading to arbitrarily nested structures, like in JSON. I wrote about a JSON SerDe in another post and if you use it, you know it can lead to pretty complicated nested tables. Unfortunately, hive […]

A JSON read/write SerDe for Hive

July 11, 2011


Today I finished coding another SerDe for Hive which, with my employer’s permission, I published on github here: Since the code is still fresh in my mind, I thought I’d write another article on how to write a SerDe, since the official documentation on how to do it it scarce and you’d have to […]

Writing a Hive SerDe for LWES event files

October 27, 2009


I am currently working to set up an OLAP data warehouse using Hive on top of Hadoop. We have a considerable amount of data that comes from the ad servers on which we need to perform various kinds of analysis. Writing a map-reduce job is not difficult in principle – it’s just time consuming and […]

Data Warehousing Books

October 27, 2009


With the constant increasing of the quantity of data that companies collect and need to process, Data Warehousing is a job sector that’s expnding even in the recession. It it also living a second youth, thanks to a number of open source projects that have been slowly but surely gaining popularity in a manner similar […]

Joins in Hadoop using CompositeInputFormat

June 7, 2009


One of the first questions that a ‘traditional’ ETL engineer asks when learning hadoop is, “How do I do a join ?” For instance, how can we do in hadoop something like querying for the names of all employees who are in a California city: SELECT, from employees e INNER JOIN cities c […]

Setting up a JSF Maven project in NetBeans (including working autocompletion for JSP/JSF)

April 6, 2009


Today’s rich IDEs make a lot of tasks easier…usually. With Java and its IDEs you often end up spending more time than you anticipated to just set up a project, especially when dealing with the complexities of J2EE: there are multiple versions of the specifications 1.3,1.4,5.0), each one with multiple implementations by different vendors plus […]