acryl-pyhive
Python interface to Hive
Python interface to Hive
To install packages from this channel, you must first login at the command line:
anaconda loginYou can then use the channel temporarily with conda or add it to your .condarc file for configured ongoing access. Learn more
PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.
from pyhive import presto # or import hive or import trino cursor = presto.connect('localhost').cursor() cursor.execute('SELECT * FROM my_awesome_data LIMIT 10') print cursor.fetchone() print cursor.fetchall()
from pyhive import hive from TCLIService.ttypes import TOperationState cursor = hive.connect('localhost').cursor() cursor.execute('SELECT * FROM my_awesome_data LIMIT 10', async=True) status = cursor.poll().operationState while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNING_STATE): logs = cursor.fetch_logs() for message in logs: print message # If needed, an asynchronous query can be cancelled at any time with: # cursor.cancel() status = cursor.poll().operationState print cursor.fetchall()
In Python 3.7 async became a keyword; you can use async_ instead:
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10', async_=True)
First install this package to register it with SQLAlchemy (see setup.py).
from sqlalchemy import * from sqlalchemy.engine import create_engine from sqlalchemy.schema import * # Presto engine = create_engine('presto://localhost:8080/hive/default') # Trino engine = create_engine('trino://localhost:8080/hive/default') # Hive engine = create_engine('hive://localhost:10000/default') logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True) print select([func.count('*')], from_obj=logs).scalar() # Hive + HTTPS + LDAP or basic Auth engine = create_engine('hive+https://username:password@localhost:10000/') logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True) print select([func.count('*')], from_obj=logs).scalar()
Note: query generation functionality is not exhaustive or fully tested, but there should be no problem with raw SQL.
# DB-API hive.connect('localhost', configuration={'hive.exec.reducers.max': '123'}) presto.connect('localhost', session_props={'query_max_run_time': '1234m'}) trino.connect('localhost', session_props={'query_max_run_time': '1234m'}) # SQLAlchemy create_engine( 'presto://user@host:443/hive', connect_args={'protocol': 'https', 'session_props': {'query_max_run_time': '1234m'}} ) create_engine( 'trino://user@host:443/hive', connect_args={'protocol': 'https', 'session_props': {'query_max_run_time': '1234m'}} ) create_engine( 'hive://user@host:10000/database', connect_args={'configuration': {'hive.exec.reducers.max': '123'}}, ) # SQLAlchemy with LDAP create_engine( 'hive://user:password@host:10000/database', connect_args={'auth': 'LDAP'}, )
Install using
PyHive works with
Run the following in an environment with Hive/Presto:
./scripts/make_test_tables.sh virtualenv --no-site-packages env source env/bin/activate pip install -e . pip install -r dev_requirements.txt py.test
WARNING: This drops/creates tables named one_row, one_row_complex, and many_rows, plus a database called pyhive_test_database.
The TCLIService module is autogenerated using a TCLIService.thrift file. To update it, the generate.py file can be used: python generate.py <TCLIServiceURL>. When left blank, the version for Hive 2.3 will be downloaded.
Summary
Python interface to Hive
Last Updated
May 1, 2026 at 17:24
License
Apache-2.0