Contents

1 Abstraction
2 Hive
3 Apache Thrift
4 Hive Server
5 Hive Python 企殊伎誤
6 References


覓: http://mixellaneous.tistory.com/826

1 Abstraction #

Hive Thrift 覯 Python 企殊伎誤碁ゼ 覦覯 覲碁.

2 Hive #

  • Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files.
  • http://hadoop.apache.org/hive/

3 Apache Thrift #

  • Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.
  • http://incubator.apache.org/thrift/

4 Hive Server #

Hive 覯 Thrift 覯襦 .



$ hive --service hiveserver 
[1] 9818

$ Starting Hive Thrift Server

09/12/17 16:59:39 INFO service.HiveServer: Starting hive server on port 10000



.

.



$

5 Hive Python 企殊伎誤 #

Hadoop & Hive れ 覦 (Hadoop 0.20.1 & Hive 0.4.0)
$ rpm -qa | grep hadoop-0.20

hadoop-0.20-jobtracker-0.20.1+133-1

hadoop-0.20-libhdfs-0.20.1+133-1

hadoop-0.20-tasktracker-0.20.1+133-1

hadoop-0.20-0.20.1+133-1

hadoop-0.20-datanode-0.20.1+133-1

hadoop-0.20-secondarynamenode-0.20.1+133-1

hadoop-0.20-conf-pseudo-0.20.1+133-1

hadoop-0.20-pipes-0.20.1+133-1

hadoop-0.20-namenode-0.20.1+133-1

hadoop-0.20-native-0.20.1+133-1

hadoop-0.20-docs-0.20.1+133-1

$

$ rpm -qa | grep hive

hadoop-hive-webinterface-0.4.0+14-1

hadoop-hive-0.4.0+14-1

一危
$ cat /tmp/r.txt

a       1       1.0

b       2       2.0

c       3       3.0

$

PYTHONPATH れ (Hive Python 殊企襴)
$ export PYTHONPATH="/usr/lib/hive/lib/py"

$ env | grep PYTHONPATH

PYTHONPATH=/usr/lib/hive/lib/py


import sys



from hive_service import ThriftHive

from hive_service.ttypes import HiveServerException

from thrift import Thrift

from thrift.transport import TSocket

from thrift.transport import TTransport

from thrift.protocol import TBinaryProtocol



try:

    transport = TSocket.TSocket('localhost', 10000)

    transport = TTransport.TBufferedTransport(transport)

    protocol = TBinaryProtocol.TBinaryProtocol(transport)



    client = ThriftHive.Client(protocol)

    transport.open()



    client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\\t' STORED AS TEXTFILE")

    client.execute("LOAD DATA LOCAL INPATH '/tmp/r.txt' OVERWRITE INTO TABLE r")

    client.execute("SELECT * FROM r")

    for row in client.fetchAll():

      print row



    transport.close()



except Thrift.TException, tx:

    print '%s' % (tx.message)

ろ
{{{
$ python hive_py.py

a       1       1.0

b       2       2.0

c       3       3.0

6 References #

  • Hive Wiki
  • Apache Thrift