Skip to content

Getting Started with Page Object Models in Python

Michael Ruttenberg Jun 04, 2020 Tutorial

So I assume you have read my article Scaredy Cat’s Guide to Getting Started with Selenium in Python. If not, I suggest you do so first, in order to understand basic Selenium Webdriver in Python, loading basic pages and calling some basic elements on a page.

Now I’d like to show you how to get going with Page Objects. Page what? Okay let’s back this up a sec.

I had heard about Page Objects for many years, and didn’t grasp the concept until now (many years later). Recently I saw an incomplete example on a webinar, copied it down and had a play around with it until it worked. It’s a satisfying feeling getting something to work and, more importantly, understanding why.

So let’s start with an explanation about some computer languages. Python (and Ruby) are great languages for scripting. You don’t have to create any objects, you can just write a program (a “script”) that does what you tell it to and runs from start to finish. That is “a scripting language”. Java is not a scripting language. It is verbose and forces everything to be an "object". You can’t avoid it, even if you wanted to. Personally, I’m a fan of scripting languages.

The Page Object Model breaks a script up into chunks that can be plugged back together again, making them reusable without repetition. If you are checking multiple web pages, you write scripts for each page separately, and call the bits you want when you want them. This means you don’t have a single script in charge of all the details about all the pages you're testing. Everything to do with a page is written into a different script that deals just with that page. It doesn’t know (or care) about any other page. If something changes on the page, you change the code only in one place.

Automation as a tool

If I write code at all, I write code that helps me as a tester. But how? I use it as a shortcut to doing something else repetitive. Code is great for doing things like registering many users quickly by filling in forms. I don’t use it to assert anything, which is checking that things work as expected. That’s not how I prefer to work.

So why use code? Automation is a fast, rigid and unquestioning robot. It never goes on a bender, smoke break or gets the hump. It doesn’t get tired or go AWOL, well, mostly not. Automation is a tool, not a goal. I still like the brain-centred, thinking part of testing. Michael Bolton and James Bach call this "sapience", and automating code to assert things gives me little satisfaction. That said, there are alternatives to writing scripts and Page Object Model is it.

Setting the scene for our Page Object example

Imagine we have a script that runs through a set of actions as follows:

  1. Open a specific site’s homepage
  2. Log into the site using valid credentials
  3. Add product X to your basket
  4. Click Checkout
  5. Verify you are on the checkout page and item is in your basket
  6. Close the session
  7. Open the site’s homepage (the same site as step 1)
  8. Add product X to your basket
  9. Click Checkout. You are prompted to log in (since you were not logged in previously)
  10. Login into the site using valid credentials
  11. Verify you are on the checkout page and item is in your basket
  12. Close the session

Hopefully you will have noticed that there are really only six actions above, each repeated twice in a slightly different sequence. If I wrote these actions in isolation from each other, I’d probably write the same or similar code, including the repetition. This violates the principle of "DRY" (Don’t Repeat Yourself). It might be that I’m having a bad day/week/month. Sometimes the code may differ, even slightly, sometimes it’s entirely different, and sometimes it is written by different people with different coding styles. Either way, it’s duplication.

Why DRY?

Now imagine you called the same page many times to do many different things, e.g. for discounts, promotions, out-of-stock labels or whatever. You would be calling the same pages (login screen, product listing screen, checkout screen) many times over, for different purposes. You are viewing the same pages every time. And imagine those pages are accessed by different scripts all going to the same places from different places in the code, and with a variety of styles and authors for those pieces of code.

Now imagine (heaven forfend!) that a developer changes the page structure for one or more pages that your code is accessing. Each one of this myriad of scripts will probably fail. When the developer changes the code, you have to change the code for each reference to the page that has now changed. You probably can’t find-replace all the changes. You might change the ones you spotted and forget some of the others (if you even knew they existed at all). Now imagine doing this dozens (or hundreds) of times. It is unsustainable. Your code now works in some places but fails in others, for the same page. And for the bits that didn’t fail yet and that you didn’t know about, it is a bug lurking and will probably choose the most inopportune moment to cause havoc, probably on a Friday evening when someone deploys something small. What could possibly go wrong? I hope you don’t have dinner or weekend plans!

How we are going to approach the problem

Going back to the 12 steps in the example above, there are really only six steps, each of which can be repeated with the same code, and put in any more or less any sequence:

  1. Open the site on the homepage
  2. Login into the site using valid credentials
  3. Add product X to your basket
  4. Click Checkout
  5. Verify you are on the checkout page and item is in your basket
  6. Close the session

So how do you do this with code? Taking steps 1 and 2 for our example, we will start by putting all the code for the page we want into one script (piece of code) so that we only have one place where everything about the page lives. If it goes wrong, it’s all only in one place in the code, not copied (with or without variations) in many places. Now we can execute the actions on it (and those actions can also be rolled up into a single neat piece of code, perhaps even as small as one line of code!). Let’s start with a simple script. Afterwards I’ll show you how to do the same thing but using the Page Object Model.

System setup

For the following to work you’ll need to be using Python 3.x, and have Chromedriver installed somewhere on your system that Windows (or your Mac etc) can find it. In my case, just as in the Scaredy Cat article, I put all my code in C:\selenium and have subfolders under there. I put Chromedriver in C:\selenium. I have C:\selenium in my system PATH so that Windows can find Chromedriver. See How to add a folder to PATH.

Chromedriver can be downloaded from its homepage.

Note that Chromedriver needs to match the version of Chrome that you are using, so you may have to download an updated version of Chromedriver from time to time.

In case you haven’t installed Selenium already, open a command line (or Terminal on Mac) and type pip install -U selenium. This will also update any existing version you may have installed. You don’t need to download anything for this from the Selenium website.

For reference, I use SublimeText 3 for editing Python code, but you can use any IDE or editor that works for you.

If you have Python 2 and Python 3 installed, you may have to use pip3 to force Python3’s pip to run instead of Python 2’s pip.

What we’re going to do

So let’s start with a simple Python script to open a page and log into a dummy site. I’m using "The Internet". On the login page, the credentials are "tomsmith" and "SuperSecretPassword!".

We’ll be telling Python to do the following:

  • Load Selenium Webdriver so we can call it,
  • start a session of Webdriver using Chrome,
  • go to the The Internet site’s Login page,
  • find and fill in the username and password fields,
  • find the Submit button and then click it,
  • and lastly, we’ll close the session.

There are two ways to do this: The scripted way, and the Page Object Model way. Let’s get started with the scripted way. Each line has a comment (prefixed by #) to tell you what is happening.

The scripted way

theinternet_scripted.py

# Make Selenium available to Python
from selenium import webdriver 
# Allow us to access the By function which we’ll use later. See note 1
from selenium.webdriver.common.by import By
# we need this to use "sleep" (pause)
import time
 
# Open a Chrome session. Relies on Chromedriver. See note 2
driver = webdriver.Chrome() 
 
# Open the web page
driver.get("https://the-internet.herokuapp.com/login") 
 
# Find the username field by ID and send a value
driver.find_element(By.ID, "username").send_keys("tomsmith")
 
# Find the password field byID and send a value
driver.find_element(By.ID, "password").send_keys("SuperSecretPassword!") 
 
# Wait for 3 seconds
time.sleep(3)
 
# Find the Submit button using classes (the "." prefix means "class"), convert it to a CSS selector and then click it
driver.find_element(By.CSS_SELECTOR, ".fa.fa-2x.fa-sign-in").click() 
 
# Wait for 3 seconds
time.sleep(3)
 
# Close the session
driver.quit()

NOTE 1: In my Scaredy Cat article I used find_element_by_… instead of find_element(By…. Here I am using find_element(By…) so I need to include from selenium.webdriver.common.by import By at the top to allow me to use the By function. Both ways of calling elements work, but By is cleaner for Page Object Model code. I don’t know if I agree, but that’s the convention so let’s go with the flow.

NOTE 2: If you don’t have the Chromedriver located in a folder which is in PATH, you can specify the location explicitly here, using driver = webdriver.Chrome("/PATH/TO/YOUR/CHROMEDRIVER/FOLDER")

How to run our Scripted Way code

You can run the code either from your IDE or command line. If you are using the Command Line interface, change to the folder where the theinternet_scripted.py file is and then run python theinternet_scripted.py. Since I have Python 2 and 3 installed, I run py -3 theinternet_scripted.py to force Python 3 to run.

If you see an error mentioning Chromedriver, you probably need to download and update Chromedriver as above.

If your setup is correct, you should see the login page load, get filled in, a pause, the success page loads, pauses and then finally the page closes.

The Page Object Model way

Next we are going to rewrite the above script using a Page Object, by splitting the code into sections. Yes, the code is longer, but it’s more robust and reusable.

We will need two files to be created. Let’s call them runlogintest.py and loginpage.py.

In runlogintest.py we will do the following:

  1. Open a session of Webdriver.
  2. Go to the relevant page in a custom function, which will call the contents from loginpage.py and then quit the session when it completes.

In loginpage.py we will set up all the variables and features that we want to make the login process work.

Here’s my code. I’ll explain what is happening along the way in notes below the code snippets. I won’t restate the comments where things are already explained in the “the scripted way” code above.

runlogintest.py

from selenium import webdriver
from selenium.webdriver.common.by import By
from loginpage import *
 
def setup():
    return webdriver.Chrome()
 
def logintest():
    driver = setup()
 
    driver.get("https://the-internet.herokuapp.com/login")
 
    login_form = Login(driver)
    login_form.login()
    driver.quit()
 
if __name__ == "__main__":
    logintest()

So what is going on in runlogintest.py?

In the first two lines we call all the libraries we need to get Selenium to work, just as before. In line 3 we are loading in the contents of loginpage.py (using from loginpage import *). This is the way to say "get everything from loginpage.py". (You don’t need to include the file extension .py.)

In def setup() we are saying "when something calls the setup() method, start a Webdriver session and send that back [return] to the place that asked", in this case with a new Chrome session.

In def logintest() we are doing a lot of things with what seems like not much code:

  • Call setup(), which we defined above, and call it driver. We don’t want to define this many times, so we define it once and call it when we need it.
  • Next, go to the login page using driver.get(...). Here we specify the URL of the page we want to navigate to.
  • Create the variable login_form containing the login page details and behavior.
    • The details are defined in loginpage.py. Specifically, when we call Login(...), we get an object from the Class defined in loginpage.py. (We’ll explain more about objects and classes below.) Remember that we are able to call Login() because we imported the loginpage.py contents in the header.
    • The (driver) part of Login(driver) allows the Login object to access the Selenium session.
  • Next, do all the login actions specified for the login page (run login_form.login()). .login() means “run the login() method specified in the Login Class from loginpage.py.
  • Then end the session by calling teardown().
  • Lastly, the if __name__=="__main__":...) is a trick in Python that says "if the file being run is called directly (i.e. you run THIS file of code, in our case, runlogintest.py), run whatever is below; otherwise, just make the code in this script available to whatever is calling it and don’t run it." Our code says we’d like to run logintest(), which in effect runs everything that we just described above, in one single line of code. If you want to run the test twice, just put logintest() in twice.

Next, let’s look at loginpage.py.

loginpage.py

from selenium.webdriver.common.by import By
 
class Login:
    def __init__(self, driver):
        self.driver = driver
 
        self.email = "tomsmith"
        self.password = "SuperSecretPassword!"
 
        self.username_locator = "username"
        self.password_locator = "password"
 
        # 3 classes chained together, which we will use via a CSS selector
        self.login_button = ".fa.fa-2x.fa-sign-in"
 
    def login(self):
        self.driver.find_element(By.ID, self.username_locator).send_keys(self.email)
        self.driver.find_element(By.ID, self.password_locator).send_keys(self.password)
        self.driver.find_element(By.CSS_SELECTOR, self.login_button).click()

So what is going on in loginpage.py?

First, since we are going to be interacting with the login screen, we want to put everything we are going to do with that screen in a single "object". An object is a bundle of code in memory that we want to do something with. We can then reference it ("call it") later. The object we will use is a "class". A class is a template which we can reuse many times, and under that class we can put the controls ("methods") and attributes (characteristics) that we want it to have. All the controls for the login page are rolled up nicely together here and named "Login". Now that we have the Login class defined, we can call that in runlogintest.py.

The next bits are a bit tricky so stick with me...

Anything in def __init__(...) says "when the class is called ("instantiated"), run this next bit automatically". Here we set up all the variables for the class so it knows what values to use and/or where to find them. This is also where we would add or edit the variables if they change later, if the code of the login page changes.

"self..." is a bit confusing and took me a long time to get my head around. It just means "allow the rest of the code in the class to be able to access the variables outside of __init__". This is because classes don't like to share variables, so they lock variable access down unless you tell them otherwise. Using self in the __init__ method overrides that behavior and makes the variables more widely available for use elsewhere . This is a vast oversimplification of something called "scope" but suffice to say, it works. (Read more about scope here).

Finally, in def login(...) we specify the actions to access the username and password form fields for login, and click the Submit button. Note that we are calling the variables that we specified above. No actual values are hard-coded here, so it is driven by the element type (in this case IDs and a CSS selector) and using variables for each one respectively. All the hard work of moving parts are in the variables, so the code here is quite light. We are making 2 assumptions:

  • The variables are correct, but if they change in the web page source code, we just need to edit the variable values in __init__.
  • The page structure of the page we are accessing won’t change; e.g. if we are referencing an ID and the ID gets removed entirely, the code will break. However, we can swap the "find_element(By.XXX)" to something else quite easily.

How to run our Page Object Model code

Assuming you have a folder on your machine with the runlogintest.py and loginpage.py in the same folder, you run runlogintest.py either within your IDE/editor, or by using the Command Line, navigating to the folder where you saved the two files and then running "python runlogintest.py". As before, if you have both Python 2.x and 3.x installed, you may need to use "python3 … " or "py -3 ..." instead. Hopefully the Chrome page loaded, filled in the form and then shut.

Add a pause to the test

It all happens rather fast. If you want to put in a pause so you can see what happened when we entered the username and password but before we clicked Submit, then in loginpage.py add import time to the top somewhere, and add time.sleep(3) in the login method. We can add another pause for the results page so we can see that we landed there successfully before the browser session got closed.

    def login(self):
        self.driver.find_element(By.ID, self.username_locator).send_keys(self.email)
        self.driver.find_element(By.ID, self.password_locator).send_keys(self.password)
        
        # add a pause of 3 seconds
        time.sleep(3)
 
        self.driver.find_element(By.CSS_SELECTOR, self.login_button).click()
        
        # pause so we can see the logged in page after we clicked the submit button
        time.sleep(3)

I don’t suggest you put sleep into your scripts, but for the purposes of learning, it’s a way to slow down the code so you can see it happening and review it as it happens.

Summary of what we’ve learnt

  • We can take code and break it up, so that the standard stuff (running Selenium) is configured separately from the moving parts (the web page we are interacting with), while still allowing them to be able to talk to each other.
  • We can remove duplication by putting everything for setup in one file and everything from the page we are interacting with in another file, so that if something changes, we only need to change our code in that one place.
  • We can consolidate anything we need from the web page into variables (which are more likely to change) and locators (which are less likely to change), so that we can interact with them (e.g. find this, click that). Either way, they are in one file, so easy to find and fix, if needed.
  • We can call the page as many times as we want without duplicating the code.

How to extend the code further

If we want to extend what we have done further, we could abstract (pull out and put somewhere else) the def setup() and driver.quit() bits of code from runlogintest.py. We could put them into a "Base" class. That means that any file or script can access them to start up and close a browser session, not just the logintest.py file.

Thanks for reading. I hope you found this useful.

Michael Ruttenberg

Tester, all round quality geek, radio ham, father, husband, Francophile and linguist, not necessarily in that order. Michael lives and works in London, UK and represents the UK in his ham radio hobby.