Jupyter Data Science Notebook & MySQL on a Docker mounted volume

Norma López-Sancho
May 7, 2024
5 min read

Updated: May 11, 2024

In this post we will mount a volume in docker as to have one container with the Jupyter Data Science Notebook Image and another with MySQL image.

By mounting a volume with both containers, data is shared between them, and means any data written in the MySQL database by the MySQL service is accessible to the Jupyter Service and vice versa. This allows for seamless communication and data sharing between the Jupyter environment and the MySQL database.

For example, if we are running the Python notebook in Jupyter and want to connect to the MySQL database we could use libraries like pymsql or sqlalchemy to establish the connection.

We would use the MySQL server’s address (‘db’ for us as we will see a bit later in our docker-compose.yml) and the appropriate credentials (also in the .yml config)

Another advantage of mounting volumes is that if we stop, restart or delete the container, the data persists in the volume, therefore we don’t lose any information.

Here is all that we will need:

Docker Desktop (to host our containers):

https://www.docker.com/products/docker-desktop/

MySQL Workbench (this will be our interface for database creation and SQL fun):

https://www.mysql.com/products/workbench/

If using Mac you don’t know your OS Version, check on the top left corner, in the apple icon ‘About this Mac’ to see if your processor is INTEL (then would be X64) or if says Apple M1 or M2 (then it’s ARM)

Last but not least, Visual Studio Code (This will be used to create our code file for composing the volume later):

https://code.visualstudio.com/

Now let’s create a new folder in our computer, which can be done the normal way:

Finder/Explorer > create new folder

Or by running the following command in your Terminal/Command console:

mkdir data_science

This will create a folder named data_science in your home directory (which is where the terminal/command console starts by default, unless you’ve changed that, of course).

Now go to Visual Studio and open the folder we’ve just created:

And create a new file with the first icon beside your data_science folder, must be named:

docker-compose.yml

On the right panel we can write the following command:

version: "3.7" 
services:
	jupyter:
		image: quay.io/jupyter/datascience-notebook:2024-04-29
		ports:
			- 8888:8888
		volumes:
			- ./jupyter:/home/jovyan/work
	db:
		image: mysql:8.0
		restart: always
		environment:
        		MYSQL_ROOT_PASSWORD: whateveryouwant
    		ports:
        		- 3306:3306
    		volumes:
        		- mysql:/var/lib/mysql
volumes:
	mysql:

(Ignore the jupyter folder, it will appear later)

Let’s break down the code:

Version 3.7 specifies the version of the Docker Compose file syntax we’re using.

It’s always a good idea to specify the version to ensure compatibility and consistency across different environments and versions of Docker Compose.

For the image we’re specifying which ones we want, version included.

Doing restart: always in MySQL server is to make sure that the database remains accessible even if it crashes or encounters temporary issues, given that in production environments data integrity and availability are critical.

This is a common practice for databases to minimise downtime and ensure data consistency.

We don’t do this with Jupyter as is not a critical environment, is used for development or analysis tasks therefore restarts are usually better done manually than automatically, as this could disrupt ongoing work (only imagine…!)

Environment kinda obvious this one, is to set a password in our Mysql server!

Ports are the standardised ones for these services (but bear in mind that if you have other containers running with same images, you will need to set up a different port for them, as we cannot have multiple containers running at the same time with the same ports on the same host, they must be unique)

The volumes are the permitted folders within the containers in which data will be stored and managed.

For the Jupyter Service ./jupyter directory will be on our host machine and will be mapped to the /home/jovyan/work directory inside the container. Therefore any data written to /home/jovyan/work from within the container will be stored in the ./jupyter directory in our host.

For MySQL var/lib/mysql is the conventional location for MySQL’s data directory within a containerised environment and it’s where MySQL stores its database files by default.

Now that we have our .yml file ready, in our terminal/command console we write the following:

cd data_science

Which will get us within the data_science folder.

If we want to see what the folder contains we can write ls in the terminal (which lists what the folder has):

Once within the folder we are ready to compose (still in the terminal):

docker-compose up -d

-d is so you free up the terminal (detached mode) so we can use it for other tasks or close it without terminating the process which will run in the background (the logs will be within the mounted volume)

If you don’t have the MySQL image, or the Jupyter Data Science Notebook image, it will pull it for you. For demonstration purposes have deleted my JDS notebook image in docker:

TROUBLESHOOTING: if the terminal throws some error, and the screenshoots above did not happen at all, try clicking yes on your Visual Studio to instal the docker extension, it will offer it to you as soon as you name the document docker-compose.yml

And here we have our mounted volume with the containers in docker:

Here the logs within the volume:

On here tells us how to access the Jupyter Notebook, which is via browser, usually is the second link, but try both if you need it.

Now, being our container db-1 within our data-science mounted volume one way to connect to it in our notebook (having done import mysql.connector) will be:

connection = mysql.connector.connect(
	host='data_science-db-1',   
	user='root', 
	password='your_password',  
	database='your_database_name'
)

Or if you need to create a SQLAlchemy engine:

engine = sa.create_engine(‘mysql+mysqlconnector://root:your_password@data_science-db-1/your_database_name')

sa being the alias we gave when import sqlalchemy as sa

And last step, we need to connect MySQL server (aka the db-1 container) to the Workbench.

To mention that we specified MySQL 8.0 image (in the Visual Studio code we wrote) because is the one that works with the Workbench, as at the moment has only been tested up to that version, so to avoid possible problems (and the warning that comes with it) we used 8.0 instead of the latest (which is the one that would have been if you don’t specify any version).

Once you have it up and running in the home screen press the + where it says MySQL Connections and set the parameters as below (default user if you do not define it, is always ‘root’)

Press Test Connection, which will prompt for your password and boom

We are ready to roll

Let’s see our babies:

The Workbench

Data Science Notebook on the browser

If you want to have a quick test on the Notebook, open python and write the following:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.randint(0,100,size=(100,2)), columns=list('AB'))
df.plot(kind='scatter', x='A', y='B', color='red')
plt.show()