BIG DATA Management - Data Transformation with Apache PIG and Google Cloud Platform

  1. Describe any 4 of the following PIG statements and elaborate on how will you use those in the real-life data processing. 1 pt

LOAD

FILTER

STORE

FOREACH

UNION

SPLIT

LIMIT

COUNT

TOKENIZE

JOIN

Ref: http://pig.apache.org/docs/r0.17.0/basic.html (Links to an external site.)

http://pig.apache.org/docs/r0.17.0/func.html (Links to an external site.)

  1. Describe Pig Latin operators that can help you debug your Pig Latin statements - 1 PT
  2. Create Dataproc Hadoop cluster and upload this file in cloud storage and then in HDFS, this file contains 3 fields with schema (Product:int, User:int, Ratings:float)- 1 PT

pig_rating.txtPreview the document

Write PIG code to show the use of LOAD, FILTER, STORE and FOREACH, share your code and results with the class

Example Sample code:

A = LOAD 'student' USING PigStorage(',') AS (name:chararray, age:int, gpa:float); -- loading data
B = FILTER A BY age > 25 ; Filter record with age >25
C = FOREACH A GENERATE name; -- transforming data

DUMP C; -- retrieving results