Labels

Friday, November 9, 2018

Spark port range instead of random ports

My cluster has firewall setup between our edge node (where spark is installed) and master/data node (where yarn is installed). When running spark application in yarn-client mode, driver and executors have to communicate via opened port through the firewall.

Due to security reasons we can only open a range of ports on the firewall between gateway and master nodes, not full access to all ports.

Driver port can only be assigned to a specific port or a completely random one. We can bind driver port to a range of port instead.

for all these ports :

spark.blockManager.port (random)
spark.broadcast.port (random)
spark.driver.port (random)
spark.executor.port (random)
spark.fileserver.port (random)
spark.replClassServer.port (random)

e.g) We can set starting port spark.blockManager.port=20000 and spark.port.maxRetries=40 to get port range 20000-20040.

3 comments:

  1. Hi Vinayak,

    Not sure if this is similar.

    We are using Hadoop 2.7.7, spark 2.4.4, and hive version 2.3.3. Hive has been configured to spark as engine and the deployment mode is yarn;cluster.

    The hive CLI is deployed in a separate network segment from which we are restricted (port-specific) access to/from YARN.

    We noticed that while submitting a query in the hive, the spark-submit which gets generated has the following argument in the background: --remote-host --remote-port etc.

    The hostname (--remote-host) is being added with that of the hive server and the port (--remote-port) is being randomly generated.

    We would like to control the port numbers (so that our firewall rules can be changed accordingly) which hive shell generates.

    Any thoughts are much appreciated.

    Below is the generated sample for a request from hive CLI.

    spark-submit --executor-cores 1 --executor-memory 2g --num-executors 5 --properties-file /tmp/spark-submit.8964692037304815807.properties --class org.apache.hive.spark.client.RemoteDriver /usr/local/apache-hive-2.3.3-bin/lib/hive-exec-2.3.3.jar --remote-host hiveserver-hostname --remote-port 46342 --conf hive.spark.client.connect.timeout=30000 --conf hive.spark.client.server.connect.timeout=60000 --conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=1262485504 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 --conf hive.spark.client.rpc.server.address=null

    Thank you,
    Srini

    ReplyDelete
  2. F1 body design. Formula 1 titanium teeth. - Titsanium
    F1 iron titanium body design. Formula black titanium rings 1 microtouch trimmer titanium teeth.. F1 carmaker Ford, stilletto titanium hammer a supplier of high quality quality materials, has built a reputation sia titanium for

    ReplyDelete