Using Log Analysis for SEO


On the Google side, technical SEO is one of the most important metrics. Large-scale sites, in particular, need to make effective use of the time Google spends to visit the site. Although there are several methods that can be used for this, the most challenging of these methods is Log analysis. Basically it works like this; When you look at it, there is a computer and a server, they are communicating between each other. In the server, all requests from computers, ie users, are recorded, even if not for SEO purposes.


How Can We Infer From The Recorded Data For SEO? 


First of all, we should try to understand how this data is formed.


127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

  • 127.0.0.1 - The name of the remote host. The IP address is displayed as in this example when the DNS hostname is not available or DNSLookup is disabled.
  • user-identifier is the remote username / user id per RFC 1413. (This is not that important.)
  • frank is the ID of the user requesting the page. From what I see on my Moz profile, Moz log entries will probably show either "SamuelScott" or "392388" whenever I visit the page after logging in.
  • [10 / Oct / 2000: 13: 55: 36 -0700] - Date, time and time zone of the action in question in strftime format.
  • GET /apache_pb.gif HTTP / 1.0 - "GET" is one of two commands (the other is "POST") that can be executed. "GET" gets the url while "POST" sends something (for example, a comment on the forum). The second part is the URL that is being accessed and the last part is the HTTP version that is being accessed.
  • 200 is the status code of the returned document.
  • 2326 - Size of the returned document in bytes.


For large sites, log analysis can span millions of pages. Because every movement made on the site is processed in a one-line analysis log. For very large websites and e-commerce sites, you may want to estimate how loaded the files are to record an analysis of these transactions. It is generally recommended to process these files monthly. In addition, information such as the total number of bytes downloaded, the hostname, the IP address that sent the request, and how long the request was responded to, can be stored in the log file. All of this data can vary depending on where it came from. There are usually two types of data providers here. The first is direct data from the server, the second is the devices that send the request depending on the availability of several servers behind it, there may be developers who manage the hardware or software. In this case, it is necessary to take the found magazines and bring them in line with the vehicles used.


Used indicators:

Error codes and server codes: When the Google bot came to the server, the request code was received. Error code: 404 page not found, server error 500, server error 200, etc.

Total Bot Crawl Area: You have over 1.5 million pages on your website. Does the Google bot really crawl all pages or is the crawl area much smaller? The Google bot might not be crawling the pages that really matter to you, you can see this much more clearly after analyzing the log.

Bot Crawl Priorities: Is the Google bot spending time on pages that really matter to the site? If you are a news website, how long does it take for a Google bot to land on this page after you enter an article?

Misinterpreting crawl budget: Google bot spends too much time on unwanted pages.

Timing: Some pages of the website may need to be updated, but this does not make sense if the Google bot does not see or visit them.


What should we ask?

What is my scan rate? Indeed, after crawling all of my pages, which group of pages are Google bots or other search engine spiders focused on. Once you can clearly identify them, you will see problems in the link structure and notice that Google indexes the wrong regions depending on these problems. How often a Google bot visits current pages is one of the most important questions you need to answer.

The number of pages Google crawls each month: When you look at data, such as whether Google crawls 30% of the page and not crawls 70% often, your link structure helps you understand that there are some issues with your core external SEO.

How strong is Google building a site?

Is there any similarity between the priority of the target word and the habit of crawling?

User agent: is there a difference between the crawl rate of a Google bot and other search engine spiders? The server may react differently to Google bot requests from time to time. The path of the Google bot promotion can be easily determined by looking at the analysis of the log.