Having a strong pool of good backlinks is key in building a good organic search ranking and is a general SEO practice.
We will use some basic Python code to gather a list of potential backlinks from websites that have mentioned your brand but has not linked to your website.
With the list of URLs, you can contact the admin of the website after assessing the likelihood of them helping you.
In this example, I'll be using the Eve V product. Back when it was released in late 2017, it was branded as the Microsoft Surface killer. Since then it has faced some delivery issues like many other crowdfunded, indie projects.
We'll use this example as I've found quite a few unlinked review sites.
First off, here are the dependencies
tqdm==4.23.4
selenium==3.12.0
requests==2.18.4
fake_useragent==0.1.11
beautifulsoup4==4.7.1
We'll first establish:
query = "eve v review"
owned_domain = 'eve-tech.com'
exclude_urls = ['forums', 'forum', 'comment', 'comments', 'wikipedia',
'youtube', 'facebook', 'instagram', 'pinterest', 'ebay',
'tripadvisor', 'reddit', 'twitter', 'flickr', 'amazon', 'etsy',
'dailymotion', 'linkedin', 'google', 'aliexpress']
You may define your own search query or run it a few times across a few queries. You can also add more to the list of "exclude_urls"
Next, we'll define the search query string using some Google tips to search in URLs.
for exclude in exclude_urls:
query = query + " -inurl:" + exclude
import urllib
query = urllib.parse.quote_plus(query)
number_result = 20 # You may define more, but it will take longer
We'll use the method to scrape Google Search results that was already introduced in this post.
import requests
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
ua = UserAgent()
google_url = "https://www.google.com/search?q=" + query + "&num=" + str(number_result)
response = requests.get(google_url, {"User-Agent": ua.random})
soup = BeautifulSoup(response.text, "html.parser")
result_div = soup.find_all('div', attrs = {'class': 'g'})
links = []
titles = []
descriptions = []
for r in result_div:
try:
link = r.find('a', href = True)
title = r.find('h3', attrs={'class': 'r'}).get_text()
description = r.find('span', attrs={'class': 'st'}).get_text()
if link != '' and title != '' and description != '':
links.append(link['href'])
except:
continue
If the classes defined above do not work for you i.e. they're returning empty results, please refer the guide to finding the right selectors.
The links will be quite dirty at this moment so we'll clean it up while making sure that the links are valid. If the regex search came up empty (i.e. link is not of a valid pattern), we'll remove the entry.
import re
to_remove = []
clean_links = []
for i, l in enumerate(links):
clean = re.search('\/url\?q\=(.*)\&sa',l)
if clean is None:
to_remove.append(i)
continue
clean_links.append(clean.group(1))
We're now ready to assess the websites if there are opportunities.
The following will be a large chunk of code in a loop, so I'll be explaining the process here first:
opportunity = []
error_manual = []
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from tqdm import tqdm
import time
for url in tqdm(clean_links, ncols = 100):
tries = 0
while tries < 10:
try:
driver = webdriver.Firefox(executable_path = '/Users/PSChua/Py_codes/Scraping/geckodriver')
driver.get(url)
time.sleep(6)
body = driver.page_source
soup = BeautifulSoup(body, 'html.parser')
hyperlinks = []
for a in soup.find_all('a', href = True):
hyperlinks.append(a['href'])
found = False
for h in hyperlinks:
if owned_domain in h:
print('False Alert')
found = True
break
if found == False:
print('Opportunity Found')
opportunity.append(url)
driver.quit()
break
except:
tries = tries + 1
if tries == 10:
print('Error')
error_manual.append(url)
As of the date of this post, the following are the opportunities we have found for Eve V.
It will take some qualitative analysis of these websites to determine if they can link to you or if they're worth contacting.
For example, the last link is just a video, so there is no way they could backlink. Meanwhile, some websites are already linked internally, so they'll probably be more reluctant to link.
In any case, you've already automated a huge chunk of your work and this will certainly speed up your work in finding potential backlinks. #AutomateTheBoringStuff
Like the post? Consider donating to fund the maintenance of this website: