ETL

Joins in Hadoop using CompositeInputFormat

One of the first questions that a 'traditional' ETL engineer asks when learning hadoop is, "How do I do a join ?"

For instance, how can we do in hadoop something like querying for the names of all employees who are in a California city:

SELECT e.name, c.name from employees e INNER JOIN cities c
    on e.city_id = c.id AND c.state ='CA'
Syndicate content