Labels

Thursday, October 26, 2017

Install and configure custom Python version for Spark

As of the CentOS 7, the default Python version remains python 2.7, and python3 is not available in base repositories. If you need to use python3 as part of Python Spark application , there are several ways to install python3 on CentOS.

Method 1) Build and Install Python3 from the Source

Step 1) Install the following packages:
yum install gcc python-devel yum-utils

Step 2) Then using yum-builddep, set up a necessary build environment for python3 and install missing dependencies.
yum-builddep python

Step 3) Download the latest python3 (e.g., python 3.5) from https://www.python.org/ftp/python/
wget https://www.python.org/ftp/python/3.5.0/Python-3.5.0.tgz

Step 4) Build and install python3, the default installation directory is /usr/local. If you want to change this to some other directory, pass "--prefix=/alternative/path" parameter to configure before running make.
tar xf Python-3.5.0.tgz
cd Python-3.5.0
./configure
make
sudo make install

Step 5) This will install python3, pip3, setuptools as well as python3 libraries on your CentOS system. Check the installed python version to validate 
python3 --version
Python 3.5.0

Now your python3.5 installation is complete can configure Spark-2.1.0 to use to Python 3.5

Add the following config in $SPARK_HOME/conf/spark-env.sh
export PYSPARK_PYTHON=/usr/local/bin/python3
export PYSPARK_DRIVER_PYTHON=python3

Open the pyspark shell to validate
 
/opt/mapr/spark/spark-2.1.0/bin/pyspark
Python 3.5.0 (default, Oct 26 2017, 14:40:09)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0-mapr-1707
      /_/

Using Python version 3.5.0 (default, Oct 26 2017 14:40:09)
SparkSession available as 'spark'.
>>>

No comments:

Post a Comment