Running Selenium testing in a single Docker container

Selenium is a pretty neat bit of kit, it is a framework that makes it easy to create browser automation for testing and other web-scraping activities. Unfortunately it seems there is a dependency mess just to get going, and when I hit these types of problems I turn to Docker to contain the mess.

While there are a number of “Selenium + Docker” posts out there, many have more complex multi-container setups. I wanted a very simple single container to have Chrome + Selenium + my code to go grab something off the web. This article is close, but doesn’t work out of the box due to various software updates. This blog post will cover the changes needed.

First up is the Dockerfile.

FROM --platform=linux/amd64 python:3.9-buster

# install google chrome

RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -

RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'

RUN apt-get -y update

RUN apt-get install -y google-chrome-stable

# install chromedriver

RUN apt-get install -yqq unzip

RUN wget -O /tmp/chromedriver.zip https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/`curl -sS https://googlechromelabs.github.io/chrome-for-testing/LATEST_RELEASE_STABLE`/linux64/chromedriver-linux64.zip

RUN unzip /tmp/chromedriver.zip chromedriver-linux64/chromedriver -d /tmp

RUN cp /tmp/chromedriver-linux64/chromedriver /usr/local/bin/

# set display port to avoid crash

ENV DISPLAY=:99

# install selenium

RUN pip install selenium==4.14.0

COPY . .

CMD python tests.py

FROM --platform=linux/amd64 python:3.9-buster

# install google chrome

RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -

RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'

RUN apt-get -y update

RUN apt-get install -y google-chrome-stable

# install chromedriver

RUN apt-get install -yqq unzip

RUN wget -O /tmp/chromedriver.zip https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/`curl -sS https://googlechromelabs.github.io/chrome-for-testing/LATEST_RELEASE_STABLE`/linux64/chromedriver-linux64.zip

RUN unzip /tmp/chromedriver.zip chromedriver-linux64/chromedriver -d /tmp

RUN cp /tmp/chromedriver-linux64/chromedriver /usr/local/bin/

# set display port to avoid crash

ENV DISPLAY=:99

# install selenium

RUN pip install selenium==4.14.0

COPY . .

CMD python tests.py

The changes needed from the original article are minor. Since Chrome 115 the chromedriver has changed locations, and the zip file layout is slightly different. I also updated it to pull the latest version of Selenium.

ChromeDriver is a standalone server that implements the W3C WebDriver standard. This is what Selenium will use to control the Chrome browser.

The second part is the Python script tests.py

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

# Define options for running the chromedriver
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-dev-shm-usage")

# Initialize a new chrome driver instance
driver = webdriver.Chrome(options=chrome_options)

driver.get('https://www.example.com/')
header_text = driver.find_element(By.XPATH, '//h1').text

print("Header text is:")
print(header_text)

driver.quit()

from selenium import webdriver

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.common.by import By

# Define options for running the chromedriver

chrome_options = Options()

chrome_options.add_argument("--no-sandbox")

chrome_options.add_argument("--headless")

chrome_options.add_argument("--disable-dev-shm-usage")

# Initialize a new chrome driver instance

driver = webdriver.Chrome(options=chrome_options)

driver.get('https://www.example.com/')

header_text = driver.find_element(By.XPATH, '//h1').text

print("Header text is:")

print(header_text)

driver.quit()

Again, only minor changes here to account for changes in Selenium APIs. This script does do some of the key ‘tricks’ to ensure that Chrome will run inside Docker (providing a few arguments to Chrome).

This is a very basic ‘hello world’ style test case, but it’s a starting point to start writing a more complicated web scraper.

Building is as simple as:

docker build -t webscraper .

1	docker build -t webscraper .

And then we run it and get output on stdout:

$ docker run webscraper
Header text is:
Example Domain

$ docker run webscraper

Header text is:

Example Domain

Armed with this simple Docker container, and using the Python Selenium documentation you can now scrape complex web pages with relative ease.

Leave a Reply