#title Hive 서버와 Hive Python 클라이언트 사용하기
[[TableOfContents]]

원문: http://mixellaneous.tistory.com/826

==== Abstraction ====
Hive Thrift 서버와 Python 클라이언트를 사용하는 방법에 대하여 알아본다.

==== Hive ====
 * Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files.
 * http://hadoop.apache.org/hive/

==== Apache Thrift ====
 * Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
 * http://incubator.apache.org/thrift/

==== Hive Server ====
Hive 서버는 Thrift 서버로 동작한다.

서버 시작
{{{
$ hive --service hiveserver &

[1] 9818

$ Starting Hive Thrift Server

09/12/17 16:59:39 INFO service.HiveServer: Starting hive server on port 10000


.

.


$
}}}

==== Hive Python 클라이언트 ====
Hadoop & Hive 설치 및 확인 (Hadoop 0.20.1 & Hive 0.4.0)
{{{
$ rpm -qa | grep hadoop-0.20

hadoop-0.20-jobtracker-0.20.1+133-1

hadoop-0.20-libhdfs-0.20.1+133-1

hadoop-0.20-tasktracker-0.20.1+133-1

hadoop-0.20-0.20.1+133-1

hadoop-0.20-datanode-0.20.1+133-1

hadoop-0.20-secondarynamenode-0.20.1+133-1

hadoop-0.20-conf-pseudo-0.20.1+133-1

hadoop-0.20-pipes-0.20.1+133-1

hadoop-0.20-namenode-0.20.1+133-1

hadoop-0.20-native-0.20.1+133-1

hadoop-0.20-docs-0.20.1+133-1

$

$ rpm -qa | grep hive

hadoop-hive-webinterface-0.4.0+14-1

hadoop-hive-0.4.0+14-1
}}}

샘플 데이터
{{{
$ cat /tmp/r.txt

a       1       1.0

b       2       2.0

c       3       3.0

$
}}}

PYTHONPATH 설정 (Hive Python 라이브러리)
{{{
$ export PYTHONPATH="/usr/lib/hive/lib/py"

$ env | grep PYTHONPATH

PYTHONPATH=/usr/lib/hive/lib/py
}}}

코드
{{{
import sys


from hive_service import ThriftHive

from hive_service.ttypes import HiveServerException

from thrift import Thrift

from thrift.transport import TSocket

from thrift.transport import TTransport

from thrift.protocol import TBinaryProtocol


try:

    transport = TSocket.TSocket('localhost', 10000)

    transport = TTransport.TBufferedTransport(transport)

    protocol = TBinaryProtocol.TBinaryProtocol(transport)


    client = ThriftHive.Client(protocol)

    transport.open()


    client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS TEXTFILE")

    client.execute("LOAD DATA LOCAL INPATH '/tmp/r.txt' OVERWRITE INTO TABLE r")

    client.execute("SELECT * FROM r")

    for row in client.fetchAll():

      print row


    transport.close()


except Thrift.TException, tx:

    print '%s' % (tx.message)

실행
{{{
$ python hive_py.py

a       1       1.0

b       2       2.0

c       3       3.0
}}}

==== References ====
 * Hive Wiki
 * Apache Thrift