Python >> Web Scraping

Error Handling in Selenium on Python

14 Oct 2018

Selenium is a very versatile web scraping tool that is accessible via multiple programming languages.

It's distinguished from text-parsing scrapers like BeautifulSoup as it actually simulates a web navigation experience, enabling you to scrape website running on a lot of Javascript and iframes.

That makes Selenium especially powerful when you are in need of scraping large websites, like e-commerce sites.

However, as with large websites, the pages you scrape won't be totally identical with one another.

Hence, error or exception handling is very, very important.

Without proper exception handling, you may face errors after errors and waste time, as any error will simply halt your scraping work.

This is especially bad when you have set up your scraping task to take place over lunch or overnight.

Missing Element

Sometimes, certain pages do not have a certain element, which is very common.

For example, you might be scraping Amazon for products' reviews. Some products that do not have any reviews simply do not have any review element to show you.

This can be easily solved by appending a "" or None to the list that you're populating your scrape result.

from selenium.common.exceptions import NoSuchElementException

reviews = []

for page in pages_to_scrape:
  try:
    # Insert your scraping action here
    reviews.append(driver.find_element_by_css_selector('div.review').text)
  except NoSuchElementException:
    # Just append a None or ""
    reviews.append(None)

Timeout Exception

Some websites are simply slow or too small that your scraping has caused their server to overload.

The latter is always bad and you shouldn't crash other people's websites when scraping.

In any case, timeouts (i.e. page failed to load) are common on large websites.

However, I won't recommend catching this error unless you know from experience that the website is prone to timeouts. It can be a huge waste of resource if the website has a 0.01% chance of timeout.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver.get("your url")
# Remove the while loop and break if you don't want to try again when it took too long
while True:
  try:
    # Define an element that you can start scraping when it appears
    # If the element appears after 5 seconds, break the loop and continue
    WebDriverWait(driver, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, "your selector")))
    break
  except TimeoutException:
    # If the loading took too long, print message and try again
    print("Loading took too much time!")

Element Click Intercepted Exception

This error happens when the element you try to click (e.g. the "next page" button) gets blocked by another element and becomes unclickable.

The common cause of this is a pop-up being triggered, or there is a chat box that appears at the bottom-right of the page.

There are a few ways to solve this. The first way is to close the pop-up.

from selenium.common.exceptions import ElementClickInterceptedException

try:
  # Tries to click an element
  driver.find_element_by_css_selector("button selector").click()
except ElementClickInterceptedException:
  # If pop-up overlay appears, click the X button to close
  time.sleep(2) # Sometimes the pop-up takes time to load
  driver.find_element_by_css_selector("close button selector").click()

You may also use Javascript to hide that element (credit to Louis' StackOverflow answer):

from selenium.common.exceptions import ElementClickInterceptedException

try:
  # Tries to click an element
  driver.find_element_by_css_selector("button selector").click()
except ElementClickInterceptedException:
  element = driver.find_element_by_class_name("blocking element's class")
  driver.execute_script("""
            var element = arguments[0];
            element.parentNode.removeChild(element);
            """, element)

If it's not a pop-up, the problem could be solved by scrolling away, hoping that the blocking element moves with you and away from the button/link to be clicked.

from selenium.common.exceptions import ElementClickInterceptedException

try:
  # Tries to click an element
  driver.find_element_by_css_selector("button selector").click()
except ElementClickInterceptedException:
  # Use Javascript to scroll down to bottom of page
  driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

Stale Element

Stale element happens when the element is was deleted or no longer attached to the DOM.

Though a not very common error, some websites are prone to having this error.

When encountered with this error, you can just try again for a number of times.

from selenium.common.exceptions import StaleElementReferenceException

while True:    
  try:
    item_list.append(driver.find_element_by_id("item id").text)
  except StaleElementReferenceException:
    continue # If StaleElement appears, try again
  break # once try is successful, stop while loop

Best Practice - Error Handling & Exceptions

Only use necessary error exceptions

Catching errors and using these exceptions slows your code down significantly. It can cause a regular 1-hour scraping task to double in time especially if you wrap all your scraping actions with error exceptions.

Sometimes it's totally inescapable, and that's when a full coverage of error exceptions is necessary so that you can run it overnight without any worries.

However, I recommend to do a trial and error on your code and implement error exceptions only when you encounter them.

Start with a small, representative sample of your web pages to find out the type of errors that can happen.

Use placeholder element before appending to a list

If you're scraping a few elements in a row, it's best to assign the result of the scrape to a temporary, placeholder variable before appending to a list.

This is because if you're appending all to their respective lists, an error in the later stages will cause your prior appended lists having more elements.

# ====================
# Use this way
# ====================
try:
  item_one_temp = driver.find_element_by_id("item one id").text
  item_two_temp = driver.find_element_by_id("item two id").text
  item_one.append(item_one_temp)
  item_two.append(item_two_temp)
except NoSuchElementException:
  item_one.append(None)
  item_two.append(None)

# ====================
# Instead of this
# ====================
try:
  item_one.append(driver.find_element_by_id("item one id").text)
  # if the next element does not exist, item_one list is already appended
  # i.e. one of your list is longer than another
  item_two.append(driver.find_element_by_id("item two id").text)
except NoSuchElementException:
  item_one.append(None)
  item_two.append(None)

Specify specific errors, unless your action plan for any error is the same

It's useful to specify specific errors that you expect as different errors require different treatments.

Only use a broad "except" statements if you're planning to do the same if any error arises.

I wish that I know all this before I started scraping as I had to waste a lot of time on StackOverflow. Hope this guide has been useful :)

Error Handling in Selenium on Python

Missing Element

Timeout Exception

Element Click Intercepted Exception

Stale Element

Best Practice - Error Handling & Exceptions

Only use necessary error exceptions

Use placeholder element before appending to a list

Specify specific errors, unless your action plan for any error is the same

Share the love

Related Posts

Analysing Subreddit's Keyword Concentration with Python and PRAW

Scraping Search Results from Google Search

Identifying the Right Element Location when Scraping with BeautifulSoup