Thursday, 19 September 2013

Pig: Counting the occurence of a grouped column

Pig: Counting the occurence of a grouped column

In this raw data we have info of baseball players, the schema is
name:chararray, team:chararray, position:bag{t:(p:chararray)}, bat:map[].
Using the following script we are able to list out players and the
different positions they have played. How do we get a count of how many
players have played a particular position (for instance: how many players
were in 'Designated_hitter' position?
Pig Script and sample out is listed below. Please help me out, thanks in
advance.
--pig script
players = load 'baseball' as (name:chararray,
team:chararray,position:bag{t:(p:chararray)}, bat:map[]);
pos = foreach players generate name, flatten(position) as position;
groupbyposition = group pos by position;dump groupbyposition;
--dump groupbyposition (output of one position i.e Designated_hitter)
(Designated_hitter,{(Ken Griffey, Jr.,Designated_hitter),(Jack
Cust,Designated_hitter),(Jason Giambi,Designated_hitter),(Michael
Young,Designated_hitter),(Hank Blalock,Designated_hitter),(Hensley
Meulens,Designated_hitter),(Johnny Damon,Designated_hitter),(Ryan
Garko,Designated_hitter),(Lance Berkman,Designated_hitter),(Rocco
Baldelli,Designated_hitter),(Adam Lind,Designated_hitter),(Carlos
Guillén,Designated_hitter),(Josh Bard,Designated_hitter),(Vladimir
Guerrero,Designated_hitter),(Nick Johnson,Designated_hitter),(Willy
Aybar,Designated_hitter),(Luke Scott,Designated_hitter),(Milton
Bradley,Designated_hitter),(Jason Kubel,Designated_hitter),(Jermaine
Dye,Designated_hitter),(Wladimir Balentien,Designated_hitter),(Hideki
Matsui,Designated_hitter),(Adam Dunn,Designated_hitter),(Cliff
Floyd,Designated_hitter),(Travis Hafner,Designated_hitter),(Billy
Butler,Designated_hitter),(Matt Stairs,Designated_hitter),(Pat
Burrell,Designated_hitter),(Jorge Posada,Designated_hitter),(Jeff
Larish,Designated_hitter),(Marcus Thames,Designated_hitter),(Jim
Thome,Designated_hitter),(David Ortiz,Designated_hitter)})

No comments:

Post a Comment