The fastest way to get a Jupyter-based local development environment for Apache Spark 3 in Scala

3 min readJun 15, 2021

As one of my main activities is training on Big Data topics, I often find myself having to set up local development environments for using the Apache Spark framework in Scala. Undoubtedly, notebooks are one of the best choices in this regard because they are particularly suitable for sharing portions of code, being able to view the output immediately, etc.

Jupyter from this point of view represents almost a standard-de-facto if we talk about Python. However, if our language of interest is Scala, even if there are several very well made kernels, we could encounter unpleasant surprises. Especially if we intend to use the most recent versions of Apache Spark compiled with equally recent versions of Scala.

Going into detail, there are currently three Scala kernels for Jupyter:

Spylon (https://github.com/mariusvniekerk/spylon-kernel)
Apache Toree (https://github.com/apache/incubator-toree)
Almond (https://github.com/almond-sh/almond)

For all three there are docker images, official or “third party”, which allow you to pull up an environment in a few minutes.

However Spylon and Apache Toree do not work “out-of-the-box” with the most recent versions of Spark (3.x). For this reason I will describe the procedure for setting up an Almond kernel-based environment.

First of all, it is advisable to create a local directory to be mounted on the docker image fs, for example to permanently save the notebooks and the data you intend to use.

Once this is done, we can launch the docker image with the following command:

In particular, the local ports 8888 and 8889, mapped to the same container ports, will be used for Jupyter and Spark Web UI respectively.

Once the container is started, we can connect to Jupyter by following the link shown on the console which also contains the authentication token that avoids manual entry of the same at the first access.

We will then create a new notebook that uses the Almond kernel with Scala 2.12

and at this point we will be able to run our first piece of Spark code

Of course, once the code has been started, and the SparkSession has been created, it will be possible to access the Spark Web UI on the local port 8889

For more information on the syntax for importing dependencies via ivy and more, you can consult the Almond documentation at the following link:

https://almond.sh/docs/intro

The fastest way to get a Jupyter-based local development environment for Apache Spark 3 in Scala

Written by Mario Cartia