Google index checker API

 

Google Search Console (GSC) is a powerful tool for webmasters and SEO specialists to monitor site performance in Google search results. However, if you’re dealing with hundreds or thousands of URLs, manually inspecting each one is not feasible. This is where automation can save time and effort.

In this tutorial, I will walk you through a Python script that connects to the Google Search Console API and automates URL inspections. You can fetch data like last crawl time, indexing state, and robots.txt status, and export it directly into a CSV file. I’ll also show how to handle your list of URLs dynamically using a simple text file.

Why Automate Google Search Console URL Inspection?

Google Search Console provides the “URL Inspection” tool, which allows you to check various aspects of a URL in your domain. But if you need to inspect multiple URLs, the manual process can be slow. Automating this process has several benefits:

  • Save Time: Automatically inspect thousands of URLs in a single run.
  • Data at Scale: Extract crawl times, indexing states, robots.txt statuses, and coverage states for all URLs.
  • Custom Analysis: Export data to CSV for further analysis in tools like Excel, Google Sheets, or even Pandas in Python.

Using this approach, you can know the exact time when Googlebot visited your web pages.

Pre-requisites:

Before we dive into the Python script, make sure you have:

  1. Google Cloud Account: Create a project in Google Cloud Console.
  2. Service Account: Download the JSON key for a service account with access to your Google Search Console.
  3. List of URLs: A text file containing the URLs you want to inspect.

Step-by-Step Guide: Automating URL Inspection with Python

1. Setting Up Google Search Console API Credentials

To access the Google Search Console API, you need to create a service account in your Google Cloud Console:

  • Go to APIs & Services > Credentials.
  • Click on Create credentials > Service account.
  • After setting up the service account, download the JSON file containing your credentials. This file will be used for authentication in the Python script.

2. Installing Required Python Libraries

Make sure you have the necessary Python libraries installed. Run the following commands in your terminal or Jupyter environment:

pip install google-auth google-auth-oauthlib google-api-python-client pandas requests

3. The Python Script

Here’s the core Python script that automates the process of fetching data from Google Search Console:

import pandas as pd
import json
import requests
from google.oauth2 import service_account
from google.auth.transport.requests import Request

# Load your credentials from the service account JSON file
SERVICE_ACCOUNT_FILE = 'path_to_your_service_account.json' # You need to generate this file
SCOPES = ['https://www.googleapis.com/auth/webmasters']

# Authenticate and build the service
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
credentials.refresh(Request()) # Refresh token if needed
access_token = credentials.token

# Function to get data from the Google Search Console API
def get_data(gsc_property, url):
api_url = "https://searchconsole.googleapis.com/v1/urlInspection/index:inspect"
headers = {"Authorization": f"Bearer {access_token}"}

payload = {
"inspectionUrl": url,
"siteUrl": gsc_property
}

response = requests.post(api_url, headers=headers, json=payload)
if response.status_code == 200:
return response.json()
else:
print(f"Error fetching data for {url}: {response.text}")
return None

# Load URLs from the "List_of_URLs.txt" file (provided by you)
with open('List_of_URLs.txt', 'r') as file:
urls = [line.strip() for line in file.readlines()]

# Site to inspect (replace with your own site)
gsc_property = "sc-domain:example.com"

# Prepare a list to store the results
results = []

# Fetch data for each URL and append to results
for url in urls:
data = get_data(gsc_property, url)
if data:
inspection_result = data.get('inspectionResult', {}).get('indexStatusResult', {})
results.append({
"URL": url,
"Crawl Time": inspection_result.get('lastCrawlTime'),
"Coverage State": inspection_result.get('coverageState'),
"Robots State": inspection_result.get('robotsTxtState'),
"Indexing State": inspection_result.get('indexingState')
})

# Convert results to a DataFrame and save as CSV
df = pd.DataFrame(results)
df.to_csv('gsc_results.csv', index=False)

print("Results saved to gsc_results.csv")

4. Explanation of Key Components:

  • Authentication: The script uses a service account to authenticate with the Google Search Console API.
  • API Request: The function get_data() sends an API request to inspect a given URL and retrieves crawl time, coverage state, robots state, and indexing state.
  • Reading URLs: The script reads URLs from a text file named List_of_URLs.txt, which contains the URLs you want to inspect.
  • Saving to CSV: Once the data is collected, it is stored in a Pandas DataFrame and saved as gsc_results.csv.

5. Running the Script

Once you have the URLs listed in your List_of_URLs.txt, run the script to get your results. The output will be stored in a file called gsc_results.csv, which you can easily open in Excel or Google Sheets for further analysis.

6. What You Can Do with the Data

Now that you’ve automated the URL inspection process, you can:

  • Track indexing status: Monitor which URLs are indexed or not.
  • Check crawl times: Analyze how often Google is crawling your pages.
  • Optimize your robots.txt: Ensure that the correct URLs are being crawled by reviewing the robots state.
  • Export Data: Use the CSV file for reporting or deeper analysis using Python’s Pandas library.

Google index checker API

Conclusion:

Automating the Google Search Console URL inspection process with Python can save you hours of manual work. By leveraging the GSC API and Python’s powerful libraries, you can easily scale your SEO efforts, monitor key metrics, and keep your website optimized.

Feel free to adapt the script to suit your specific needs, whether it’s adding more data points or integrating it with other automation workflows.

Related Post

Leave a Comment