AJAX and SEO: Problems with indexing

AJAX uses several technologies: the Dynamic HTML approach for constantly changing the content of a web page, and several technologies for dynamically addressing the server. In particular, dynamic generation of img and script tags and dynamic generation of side frames.

As for data formats, you can use XML, standard text, JSON and standard HTML for AJAX.

Can Google crawl AJAX content? The short answer is yes, it can. The longer answer is yes, but it’s harder for crawlers to do. Single-page web applications that use AJAX frameworks have historically been very problematic from an SEO perspective. Here are their main drawbacks:

Crawling problems. Important content for crawlers was hidden inside JavaScript that only displayed on the client side, which meant that Google’s robots essentially saw a blank screen.
Problems with the navigation bar (The back button didn’t work in the browser or didn’t work correctly).
Masking. With the AJAX approach, webmasters created two versions of content. The first – for the user, the second – for search engines. And this is forbidden and punishable by sanctions.

For years, Google advised webmasters to use an appropriate AJAX crawling scheme – to let the crawlers know that the site has AJAX content. The AJAX crawling scheme, using the _escaped_fragment parameter, allowed Google to get a pre-processed version of the page.

Such a version had standard static HTML code that Google could easily analyze and index. In other words, the server was instructing crawlers to crawl a different page than the one available in the source code.

That all changed in 2015. Google announced that its crawlers had now learned to crawl, read and analyze content within JavaScript without any problems, making the use of the AJAX scanning scheme with the _escaped_fragment parameter obsolete.

What happens to AJAX page indexing in 2022
Google often lies. Today it claims that it has no difficulty with bypassing and indexing AJAX sites. But it would be risky to simply take his word for it, leaving years of accumulated traffic and site positions to chance.

Indeed, Google can index dynamic AJAX content. But there are non-obvious points:

Hidden HTML. If important content is hidden inside JavaScript, it may be harder for crawlers to access it. Indexing (and subsequent ranking) can be artificially halted. To avoid this, make sure that user-relevant content is stored in HTML format. In this case, Google and Yandex crawlers will index it easily.
Missing links. Google uses internal links as a signal to link between pages of a site. And external links are one of the ranking factors. In other words, when the content is quality, expert – other trusted domains link to it. It is very important that links on the site are accessible to crawlers and not hidden inside JavaScript.

How AJAX affects SEO
It turns out, observing the above conditions, you can not worry about AJAX content indexing at all?

For this, let’s go back in time and find the official answer Google gave on the subject:

“…as long as you don’t block Googlebot from crawling JavaScript or CSS, Google will display your pages in search results.”

The wording is somewhat vague, but the point is clear. It’s as if Google is saying, “It’s not our problem, it’s yours.” Thus, in 2022 you no longer need to use workarounds and “crutches” to let Google know where the AJAX content and where it is conventional. It knows how to scan it on its own, too.

Google actively advocates for content and user experience. AJAX content is somewhat at odds with this approach: just take the incorrect page URL that is generated by this approach. And for crawlers, this is important: the URL must reflect the actual location of the page!

To solve the problem with URLs in the AJAX approach, you need to use the History API with pushState(). It changes the URL that is rendered on the client side.

Using pushState allows you to keep the AJAX content on the site and solves the problem of an incorrect URL page.

Another thing Google is paying attention to in 2022 is hashbang (#!). Google looks for hashbang parameters to identify dynamic URLs and process them (in different ways). The crawler scans anything that comes after the hashbang and passes that as a URL parameter, and then simply asks for a static version of the page that it can read, index, and rank.