internal linking

Internal linking is crucial for SEO, helping search engines crawl your site and improving content visibility. However, manually finding internal linking opportunities can be tedious, especially for larger websites.

This article introduces a Python script that automates the process, making it easier to identify potential links across your content. By utilizing Python’s data processing and web scraping capabilities, the script quickly analyzes your website, saving you time and improving your internal link structure. In the following sections, we’ll walk you through setting up the script and using it to enhance your SEO strategy. I should credit the article “Finding Inlink Opportunities with Python for SEO” because I created this script inspired by the method introduced there.

So what we gonna do?

Before diving into the script, let’s briefly outline the overall process:

  1. Download Keyword Data from SEMrush: We’ll start by exporting data from SEMrush, including keyword positions, search volume, and keyword difficulty.
  2. Parse Website Content: Using Python, we’ll crawl and parse the text from your website to analyze the content on each page.
  3. Identify Internal Linking Opportunities: The script will then search for instances where these keywords appear in your content but are not currently linked to relevant pages.
  4. Generate a Recommendations File: Finally, the script outputs an Excel file with recommendations, showing where each keyword appears, the source page (where the internal link should be added), and the target page (the URL that should be linked).

Step #1.Loading Keyword Data

To begin the process, the first step is to gather and load your keyword data. You’ll need information like keyword positions, search volume, and keyword difficulty, which can be easily obtained from SEMrush.

Download Keyword Data from SEMrush: The first step in the script is to load the keyword data from a CSV file. This data typically includes important metrics such as keyword positions, search volume, keyword difficulty, and URLs that are ranking for these keywords.

How to Find Internal Linking Opportunities

Loading the Data with Pandas: After downloading the CSV file, the next step is to load it into Python using the pandas library. This will allow us to process and analyze the data efficiently. Here’s how to load the data:

import pandas as pd
import requests
from bs4 import BeautifulSoup

# 1. Load Keywords Data
list_keywords = pd.read_csv('your_file.csv')
list_keywords = list_keywords.values.tolist()

pandas is a powerful Python library used for data manipulation and analysis. It provides data structures like DataFrames, which are ideal for handling and processing structured data, such as the keyword data in a CSV file. In this script, pandas is used to load and manipulate the keyword data from the CSV file. It allows for easy reading of the data, conversion to different formats (like lists), and further processing.

requests is a popular Python library used to make HTTP requests, such as getting the content of a webpage. It simplifies the process of interacting with web resources. The script uses requests to send HTTP requests to the URLs extracted from the keyword data. It retrieves the HTML content of these pages, which is then parsed to find potential internal linking opportunitie.

BeautifulSoup is a library from the bs4 package that is used for parsing HTML and XML documents. It creates a parse tree that can be used to extract data from HTML tags. After the requests library retrieves the HTML content of a webpage, BeautifulSoup is used to parse this content. It helps in finding and extracting specific elements, such as paragraphs of text and existing links on the page, which are essential for identifying where internal links can be added.

Step #2: Get the URL List

The next step in the script involves extracting the URLs from the keyword data. These URLs represent the pages on your website that are already ranking for the keywords you’re analyzing. This is crucial because these are the pages where you’ll potentially want to add internal links.

# 2. Get the URL list
list_urls = []
for x in list_keywords:
list_urls.append(x[6]) # Assuming the URL is in the 7th column (index 6)
  • Extracting Relevant URLs: This step is crucial because it identifies the specific pages on your website that are currently ranking for the keywords in your data. These are the pages where you may want to add internal links.
  • Prepping for Crawling: The extracted URLs will be used in the next step of the script, where the pages will be crawled and analyzed for potential internal linking opportunities.

Step 3: Remove Duplicate URLs and Prepare the Keyword-URL List

In this step, the script removes duplicate URLs from the list and prepares a structured list that associates each keyword with its corresponding URL and related metrics. This is an important step to ensure that the script runs efficiently and accurately.

# Remove duplicate URLs
list_urls = list(dict.fromkeys(list_urls))

After extracting the URLs from the keyword data, the next task is to ensure there are no duplicates in the list. Duplicate URLs can lead to redundant processing, which is inefficient. The script removes these duplicates by converting the list of URLs into a dictionary and back into a list, leveraging the fact that dictionary keys must be unique.

list_keyword_url = []
for x in list_keywords:
# Adjusted columns: Keyword Position (1st column), Search Volume (3rd column), Keyword Difficulty (4th column)
list_keyword_url.append([x[6], x[0], x[1], x[3], x[4]])

After extracting the URLs from the keyword data, the next task is to ensure there are no duplicates in the list. Duplicate URLs can lead to redundant processing, which is inefficient. The script removes these duplicates by converting the list of URLs into a dictionary and back into a list, leveraging the fact that dictionary keys must be unique.

Step #4: Crawling the Pages and Finding Internal Linking Matches

In this step, the script crawls each URL extracted earlier, analyzes the content of the pages, and identifies opportunities to add internal links based on the keywords in your dataset.

# 3. Crawling the pages and finding the matches
internal_linking_opportunities = []
absolute_rute = str(input("Insert your absolute rute: "))

for iteration in list_urls:
page = requests.get(iteration)
print(iteration)
soup = BeautifulSoup(page.text, 'html.parser')
paragraphs = soup.find_all('p')
paragraphs = [x.text for x in paragraphs]

links = []
for link in soup.findAll('a'):
links.append(link.get('href'))

for x in list_keyword_url:
for y in paragraphs:
if " " + x[1].lower() + " " in " " + y.lower().replace(",", "").replace(".", "").replace(";", "").replace("?", "").replace("!", "") + " " and iteration != x[0]:
links_presence = False
for z in links:
try:
if x[0].replace(absolute_rute, "") == z.replace(absolute_rute, ""):
links_presence = True
except AttributeError:
pass

if not links_presence:
internal_linking_opportunities.append([x[1], y, iteration, x[0], "False", x[2], x[3], x[4]])
else:
internal_linking_opportunities.append([x[1], y, iteration, x[0], "True", x[2], x[3], x[4]])

For each URL in your list, the script retrieves the page content and parses it to find paragraphs of text. It then checks each paragraph for the presence of keywords from your dataset.

If a keyword is found and an internal link to the relevant page does not already exist, the script logs this as an opportunity. It records the keyword, the paragraph where it was found, the current page (source URL), the target URL, and whether or not the link already exists.

Step # 5: Output the DataFrame to Excel

pd.DataFrame(internal_linking_opportunities, columns=["Keyword", "Text", "Source URL", "Target URL", "Link Presence", "Keyword Position", "Search Volume", "Keyword Difficulty"]).to_excel('internal_linking_opportunities.xlsx', header=True, index=False)

The final step of the script involves taking the internal linking opportunities identified in the previous step and exporting them into an Excel file. This allows you to review the results and implement the suggested internal links on your website.

After identifying potential internal linking opportunities, the script’s final task is to compile these results into an Excel file. This step is crucial because it transforms the raw data into a format that is easy to review and implement. The script uses pandas to create a DataFrame from the list of internal linking opportunities, assigning meaningful column names to each piece of data.

The DataFrame is then exported to an Excel file (internal_linking_opportunities.xlsx), which includes all the necessary details: the keyword found, the specific text where it appears, the source page, the target page, and additional SEO metrics. This file serves as a practical guide for improving your website’s internal link structure, providing clear and actionable insights for your SEO efforts.

Find Internal Linking Opportunities

Conclusion

Automating the process of finding internal linking opportunities using Python not only saves time but also ensures a more thorough and systematic approach to enhancing your website’s SEO. By following the steps outlined in this guide, you can efficiently identify pages that can be interconnected through relevant keywords, improving your site’s overall link structure and search engine visibility. If you have any questions or need further assistance, please don’t hesitate to reach out.

Related Post

Leave a Comment