- Describe any 4 of the following PIG statements and elaborate on how will you use those in the real-life data processing. 1 pt
LOAD
FILTER
STORE
FOREACH
UNION
SPLIT
LIMIT
COUNT
TOKENIZE
JOIN
Ref: http://pig.apache.org/docs/r0.17.0/basic.html (Links to an external site.)
http://pig.apache.org/docs/r0.17.0/func.html (Links to an external site.)
- Describe Pig Latin operators that can help you debug your Pig Latin statements - 1 PT
- Create Dataproc Hadoop cluster and upload this file in cloud storage and then in HDFS, this file contains 3 fields with schema (Product:int, User:int, Ratings:float)- 1 PT
pig_rating.txtPreview the document
Write PIG code to show the use of LOAD, FILTER, STORE and FOREACH, share your code and results with the class
Example Sample code:
A = LOAD 'student' USING PigStorage(',') AS (name:chararray, age:int, gpa:float); -- loading data
B = FILTER A BY age > 25 ; Filter record with age >25
C = FOREACH A GENERATE name; -- transforming data
DUMP C; -- retrieving results