Screaming Frog: Find all URLs on your website

As an online marketer, you may be familiar with the Screaming Frog program. The program crawls your website and with that you can simulate how that process goes when Google tries to do the same on your site. After reading this article, this program will become even more powerful for you, because you will find all URLs on your site.

If you want to check your site for technical errors with Screaming Frog, it’s nice that the program crawls all pages on your website. This way you prevent that there are errors on pages that the program cannot find that you are not going to solve. With the standard configuration, a lot of pages are not included in the crawl, a shame!

Crawling a website can be compared to a librarian wanting to read all the books by one author. Think of each book as a page and the references in the book as links to other pages. Crawling retrieves the contents of the page and discovers new links, much like a librarian would with a book.

Default vs custom configuration

When you Screaming Frog opens the program has a default configuration. This will find a bunch of URLs, but you won’t be able to harness the program’s true power until you tweak the configuration.

After applying the changes in this article to ZIGT’s website, I discovered 3900% more URLs compared to the default configuration.

screaming frog custom configuration vs normal configuration

What was included that was not included in the standard configuration?

  • Links in the sitemap that are not linked to on the site (old news items for example + images in those news items)
  • Loads of external links to Facebook/Twitter blocked in robots.txt
  • URLs in Google Analytics / Google Search Console that are not linked on the site (usually URLs with UTM parameters)

Where can I adjust everything?

Everything that we are going to adjust can be found under the heading ‘Configuration’, where it is indicated exactly where the adjustment can be made.

screaming frog configuration

1. Spider

In the first heading Spider you can adjust the behavior of the spider in the web to your liking. To adjust the behavior we look at three tabs: Crawl, Limits and Advanced.

Crawl

I recommend ticking everything that surrounds red.

screaming frog crawl configuration

The biggest profit in the ‘Crawl’ tab is by checking additional functions under ‘Crawl Behaviour’ and ‘XML Sitemaps’. This allows you, for example, to crawl URLs that are accidentally set to nofollow on a website. Or URLs that are in the sitemap but are not linked to on the website.

Is the sitemap URL not in the robots.txt? Then you can also manually enter the URL of the sitemap.

It is good to know that the numbers can increase considerably if you check these settings. For example, if there is a staging domain that is accessible, you suddenly crawl twice as many URLs. Following external nofollow can also cause you to suddenly crawl a lot more.

Tip! Use the exclude function in Screaming Frog if, for example, you want to exclude external URLs. This can be done by .* . domain name .* in exclude. For example .*.facebook.com.*

Limits

screaming frog limits configuration

If your site has more than 5 million links and a computer that can handle it, you can disable the crawl limit. In practice, the limit will never be reached, but it may limit your crawl.

A good one to adjust is the number of redirects that Screaming Frog follows. Google stops crawling after 20 redirects and therefore good to adjust this for exceptional cases. An important category can only be reached after 18 redirects? Then Screaming Frog takes this too.

Advanced

I would recommend ticking everything that is surrounded by red.

screaming frog advanced configuration

Always following redirects and canonicals affects Screaming Frog’s list mode, but can be useful. Did you accidentally crawl a redirect in your list? Then you also know where the redirect is going.

We increase the response timeout so that the server has more time to return something when the program does a crawl. Still nothing after 60 seconds? Only then does Screaming Frog give an error.

We do the same by increasing the 5xx Response Retries. If a server receives many requests at once, the server may return a 500 error. Sometimes this is a temporary error. Screaming Frog only marks the URL as 500 error after 10 attempts.

2. Robots.txt

screaming frog robots.txt configuratie

In the robots.txt configuration, we change the setting so that Screaming Frog ignores the rules in the robots.txt but indicates that these URLs are excluded by robots.txt .

Suppose you have products that are only listed in https://www.website.nl/uitluizen-product-categorie/. Screaming Frog does not find these products in the default configuration if you have accidentally excluded this category in the robots.txt. Due to the change, these products are now crawled and these pages are checked for errors.

3. User-Agent

screaming frog user agent configuration

By default, Screaming Frog visits the site as ‘Screaming Frog SEO Spider’. The ideal is that you can pretend to be Googlebot Smartphone. This way you do the simulation as Google would see everything. Very occasionally it still wants to prevent sites from loading separate content for Google, then that is also included.

Did you know that Google crawls mobile first? This means that Google always visits your website as a mobile user.

4. API Access

With Screaming Frog you have the option to connect the API of Google Analytics, Google Search Console and other tools.

A big advantage is that you can check that Screaming Frog should crawl new URLs that are found.

screaming frog google analytics 4 configuratie

In the date range you indicate what Screaming Frog should bring. Want to retrieve specific metrics or dimensions? You set it all up easily. This data can then also be viewed per URL, ideal if you want to have the data of URLs next to each other.

screaming frog google analytics en google search console data

Have you done a site migration? Then take the URLs in Google Analytics and Google Search Console from the last 12 months, then sort by number of sessions and see if you haven’t forgotten to redirect any important URLs. For example, we found important URLs at several sites for which we did not do SEO without a redirect.

Avoid resetting

We have since made quite a few changes to the configuration. By going to File > Configuration in Screaming Frog you have the option to save your current configuration as default. This way you don’t have to make all the changes every time you want to do a crawl.

You also have the option to return to the default configuration if you wish.

screaming frog save configuration as default

Actual all URLs

An online marketer’s goal is to find all URLs with Screaming Frog. With the steps above, the chances are many times greater that you will actually at URLs of your website.

This article has been checked by the SEO panel.


Source: Frankwatching by www.frankwatching.com.

*The article has been translated based on the content of Frankwatching by www.frankwatching.com. If there is any problem regarding the content, copyright, please leave a report below the article. We will try to process as quickly as possible to protect the rights of the author. Thank you very much!

*We just want readers to access information more quickly and easily with other multilingual content, instead of information only available in a certain language.

*We always respect the copyright of the content of the author and always include the original link of the source article.If the author disagrees, just leave the report below the article, the article will be edited or deleted at the request of the author. Thanks very much! Best regards!