#title Hive 서버와 Hive Python 클라이언트 사용하기 [[TableOfContents]] 원문: http://mixellaneous.tistory.com/826 ==== Abstraction ==== Hive Thrift 서버와 Python 클라이언트를 사용하는 방법에 대하여 알아본다. ==== Hive ==== * Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. * http://hadoop.apache.org/hive/ ==== Apache Thrift ==== * Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml. * http://incubator.apache.org/thrift/ ==== Hive Server ==== Hive 서버는 Thrift 서버로 동작한다. 서버 시작 {{{ $ hive --service hiveserver & [1] 9818 $ Starting Hive Thrift Server 09/12/17 16:59:39 INFO service.HiveServer: Starting hive server on port 10000 . . $ }}} ==== Hive Python 클라이언트 ==== Hadoop & Hive 설치 및 확인 (Hadoop 0.20.1 & Hive 0.4.0) {{{ $ rpm -qa | grep hadoop-0.20 hadoop-0.20-jobtracker-0.20.1+133-1 hadoop-0.20-libhdfs-0.20.1+133-1 hadoop-0.20-tasktracker-0.20.1+133-1 hadoop-0.20-0.20.1+133-1 hadoop-0.20-datanode-0.20.1+133-1 hadoop-0.20-secondarynamenode-0.20.1+133-1 hadoop-0.20-conf-pseudo-0.20.1+133-1 hadoop-0.20-pipes-0.20.1+133-1 hadoop-0.20-namenode-0.20.1+133-1 hadoop-0.20-native-0.20.1+133-1 hadoop-0.20-docs-0.20.1+133-1 $ $ rpm -qa | grep hive hadoop-hive-webinterface-0.4.0+14-1 hadoop-hive-0.4.0+14-1 }}} 샘플 데이터 {{{ $ cat /tmp/r.txt a 1 1.0 b 2 2.0 c 3 3.0 $ }}} PYTHONPATH 설정 (Hive Python 라이브러리) {{{ $ export PYTHONPATH="/usr/lib/hive/lib/py" $ env | grep PYTHONPATH PYTHONPATH=/usr/lib/hive/lib/py }}} 코드 {{{ import sys from hive_service import ThriftHive from hive_service.ttypes import HiveServerException from thrift import Thrift from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol try: transport = TSocket.TSocket('localhost', 10000) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) client = ThriftHive.Client(protocol) transport.open() client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS TEXTFILE") client.execute("LOAD DATA LOCAL INPATH '/tmp/r.txt' OVERWRITE INTO TABLE r") client.execute("SELECT * FROM r") for row in client.fetchAll(): print row transport.close() except Thrift.TException, tx: print '%s' % (tx.message) 실행 {{{ $ python hive_py.py a 1 1.0 b 2 2.0 c 3 3.0 }}} ==== References ==== * Hive Wiki * Apache Thrift