Contents

1 pig 語 磯蟆 譬蟾?
2 一危 詞伎 HDFS j鍵
3 pig ろ襴渚 ろ螻 れ 貎朱Μ ろ螻 store
4 ろる襯 螳讌讌 襴企伎 豺 谿語^
5 sample data input
6 谿瑚襭


1 pig 語 磯蟆 譬蟾? #

螳語 蟆..
  • ETL
  • 一危 襴 語伎企襦 れ 一危 覿 --> 企れ瑚? 企糾.
  • .. .. 譬蟇 螳..

2 一危 詞伎 HDFS j鍵 #

HADOOP 覯所企 一危 れ企 HDFS 襴磯.
wget http://hanb.co.kr/exam/1746/htdg-examples-0.1.1.tar.gz
tar htdg-examples-0.1.1.tar.gz
mv htdg-examples-0.1.1 samples
cd samples
hadoop fs -put input input

3 pig ろ襴渚 ろ螻 れ 貎朱Μ ろ螻 store #

vi store_test.pig
--store_test.pig
rmf output/1949
rmf output/1950

A = LOAD 'input/ncdc/micro-tab/sample.txt' AS (year:chararray, temperature:int, quality:int);
B = FILTER A BY year == '1949';
C = FILTER A BY year == '1950';
STORE B INTO 'output/1949';
STORE C INTO 'output/1950';

cat output/1949/part-m-00000
cat output/1950/part-m-00000
ろ襴渚 覈襦 ろ覃 A襦覿 2覯 曙 .

pig ろ襴渚碁ゼ ろ.
pig store_test.pig

蠏碁壱語 exec run 碁Μ磯ゼ . exec 覦一 覈願, run 覓語ル(;襦 )襦 ろ (?) 覈. 谿企 れ貎朱Μ 豕(A襯 覯襷 暑讌 覯 暑讌 谿, exec螳 覯襷 曙).

grunt> exec store_test.pig



grunt> run store_test.pig

4 ろる襯 螳讌讌 襴企伎 豺 谿語^ #

B = FOREACH A GENERATE $0; --$0 襴企伎A 豌覯讌 危碁Μ觀壱碁ゼ 谿語^ 覩
DUMP B;

5 sample data input #

wget http://databaser.net/moniwiki/pds/Hive_ec_98_88_ec_a0_9c_ed_8c_8c_ec_9d_bc/data.zip
unzip data.zip
hadoop fs -mkdir scott
hadoop fs -put dept.csv scott
hadoop fs -put emp.csv scott
hadoop fs -put salgrade.csv scott
pig

emp = load 'scott/emp.csv' using PigStorage(',') as (empno, ename, job, mgr, hiredate, sal, comm, deptno:int);
grouped = group emp by deptno;
total = foreach grouped generate group, SUM(emp.sal) as total_sal;
--total = foreach grouped generate emp.depno, SUM(emp.sal) as total_sal;

dept = load 'scott/dept.csv' using PigStorage(',') as (dname, loc, deptno:int);
join_data = join total by group left, dept by deptno;
view = foreach join_data generate $0, $3, $1;
dump view;

emp = load 'scott/emp.csv' using PigStorage(',') as (empno, ename, job, mgr, hiredate, sal, comm, deptno:int);
emp = foreach emp generate ename, sal;
filtered_set = filter emp by sal >= 2000;
sorted_set = order filtered_set by sal desc;
top3 = limit sorted_set 3;
dump top3;

6 谿瑚襭 #