Orca Context#

orca.init_orca_context#

bigdl.orca.common.init_orca_context(cluster_mode=None, runtime='spark', cores=None, memory='2g', num_nodes=1, init_ray_on_spark=False, init_executor_gateway=True, **kwargs)[source]#

Creates or gets a SparkContext for different Spark cluster modes (and launch Ray services across the cluster if necessary) or an OrcaRayContext when the runtime is ray.

Parameters
  • runtime – The runtime for backend. One of “ray” and “spark”. Default to be “spark”.

  • cluster_mode

    The mode for the Spark cluster. One of “local”, “yarn-client”, “yarn-cluster”, “k8s-client”, “k8s-cluster”, “standalone”, “spark-submit” and “bigdl-submit”. You are highly recommended to install and run bigdl through pip, which is more convenient.

    Default to be None and in this case there is supposed to be an existing SparkContext in your application from spark-submit and you need to set the Spark configurations through command line options or the properties file. To make things easier, you are recommended to use bigdl-submit after pip install bigdl.

    For “yarn-client” and “yarn-cluster”, you are supposed to use conda environment and set the environment variable HADOOP_CONF_DIR.

    For “k8s-client” and “k8s-cluster”, you are supposed to additionally specify the arguments master and container_image.

  • runtime – The runtime for backend. One of “ray” and “spark”. Default to be “spark”.

  • cores – The number of cores to be used on each node. For Spark local mode, default to use all the cores on the node. For other cluster_mode, default to use 2 cores per node. You are highly recommended to set this value by yourself, instead of using the default one.

  • memory – The memory allocated for each node. Default to be ‘2g’.

  • num_nodes – The number of nodes to be used in the cluster. Default to be 1. For Spark local mode, num_nodes should always be 1 and you don’t need to change it.

  • init_ray_on_spark – Whether to launch Ray services across the cluster. Default to be False and in this case the Ray cluster would be launched lazily when Ray is involved in Project Orca.

  • init_executor_gateway – Whether to launch Java gateway on executors. Default to be True.

  • kwargs – The extra keyword arguments used for creating SparkContext and launching Ray if any.

Returns

An instance of SparkContext or OrcaRayContext.