Googlebot Crawls & Indexes First 15 MB HTML Content

Googlebot Crawls & Indexes First 15 MB HTML Content

According to the help documentation update, Google’s web crawler uses only 15MB of a page’s HTML to determine rank.

Googlebot has updated its help document to confirm that it will crawl only the first 15MB of web pages and exclude anything beyond this point from ranking calculations.

Google explains in its help document:

“All resources referenced in HTML, such as images, videos, and CSS, are pulled separately.

Googlebot stops crawling files over 15 MB and indexes only the first 15MB.

The uncompressed data is subject to the file size limit.

Some in the SEO community were left wondering if Googlebot would ignore the text below images in HTML files.

Googlebot Crawls & Indexes First 15 MB HTML Content

John Mueller, Google Search Advocate, clarified via Twitter that “it’s specific to HTML file itself, as it’s written,” John Mueller said.

“Embedded resources/content pulled into with IMG tags are not part of the HTML files.”

What Does This Mean for SEO

Important content should be placed near the top of web pages to ensure that Googlebot weights it.

This means that code must be structured so that the SEO-relevant information is placed within the first 15 MB of an HTML or supported text-based HTML file.

This also means that both images and videos should not be encoded directly in HTML.

SEO best practices recommend that HTML pages be kept to less than 100 KB. This will ensure that many websites are not affected by the change. You can check page size with many tools, such as Google Page Speed Insights.

Although it sounds alarming, the possibility that content on a page could be lost to indexing may seem scary. However, 15MB of HTML is quite a lot.

Google states that resources like images and videos can be fetched separately. According to Google, the 15MB limit applies only to HTML.

If you publish whole books of text on one page, it would be hard to exceed that limit with HTML.

If your pages exceed 15MB in HTML, you may have underlying issues that must be addressed.

© Intentify Media Group