• About The Tech SEO

The Tech SEO

Because SEO Is More Than Keywords

  • Tech SEO Blog
  • SEO Tools
    • Quick Click SEO Audit – Chrome Extension
    • Discover Lost URLs
    • Redirectinator
You are here: Home / SEO Tools / Discover Lost URLs with Wayback Machine’s API

Discover Lost URLs with Wayback Machine’s API

After reading Patrick Stox’s post on Search Engine Land about Fixing Historical Redirects, I was inspired to start using Wayback Machine’s API within one of my Google Sheets. After a little playing with some formula’s, I developed the Discover Lost URLs Google Sheet.

Discover Lost URLs Spreadsheet

How The Sheet Works

Step 1: Enter Domain

Enter a domain name that you are looking to audit. The URL can be entered in several ways.

  • domain.com
  • subdomain.domain.com
  • www.domain.com/sub-folder/

Once the URL is entered, the sheet will begin to populate.

Step 2: Modify Dates

Set the “Date From” and “Date To” fields to pull the URLs that are in the Wayback Machine’s database.

Step 3: Choose What To Show

By default, the Wayback API spits out all URLs. This includes images, CSS, JS, and HTML. The sheet is set to show “Just HTML” by default. To show all, drop down the the selection box to “All”

Reading the Sheet

Time Stamp: The Time Stamp column shows the date the URL was in the index in the format of yyyyMMddhhmmss.

Wayback Machine URL: This is the URL from the database

Mime Type: This indicates whether the URL is for an HTML page, CSS file, JS file, etc…

Current Status: This is a custom function that was written to pull back the URLs status code. If there is a redirect in place, it will follow the redirect and post the path as “301 > 200”. If it is is a redirect chain, it will pull in the chain too, such as “301 > 301 > 302 > 404”

Final URL: This URL is the current final destination of the older Wayback URL. This is great to see where SEO value is being lost.

Download The Discover Lost URLs Google Sheet Now!

Sheet Limitations

Google Sheets can choke up on large amounts of data. I set the API to only pull in 20,000 rows max. I think it will struggle at that, depending on ram and browser. If the sheet is running too slow, try to do a few runs with a smaller date range.

Get My New Tools First! Sign Up For The Tech SEO Mailing List

Get the first dibs at beta testing some of my new sheets and receive information on building your own. I hate spam myself, so your email is protected with me.

Welcome To The Tech SEO

Welcome to The Tech SEO website. My name is Jeff Louella. This site is dedicated to some SEO tools I build to help with my daily SEO job. I make them available to the world for free so that you can used them for yourself. Please share the tools with your teams as well a mention where you get them from. If you have any recommendations on make these tools better, feel free to reach out to me at any of the social channels below. Also, vote for me as your favorite SEO on the Be an SEO Hero website.

Connect With Me On Social

  • Facebook
  • Instagram
  • Linkedin
  • Twitter
  • YouTube

Recent Posts

  • How Technical SEO Procrastination Hurts Your Redesign Effort

Newest Tools

  • Discover Lost URLs with Wayback Machine’s API
  • Quick Click SEO Audit – Chrome Extension
  • Redirectinator: A URL Redirect Monitoring Spreadsheet

Tags

redesigning replatforming seo website redesign