A crawl is migration evidence, not a vanity audit

Crawl checks prove whether important old URLs, new Shopify destinations, redirects, canonicals, indexability and internal links behave the way the migration plan says they should.

Run separate crawls for old, staging and live

The old site crawl captures what must be protected. The staging crawl catches Shopify template and indexability mistakes. The live crawl confirms redirects, canonicals, robots and sitemap output after launch.

Crawl data needs commercial context

A crawler will not know which URLs earn revenue or links. Merge crawl data with Search Console, analytics and backlink evidence before deciding what matters.

A Shopify migration can look successful in the browser and still be unclear to search engines.

The old URLs may redirect, but to weak destinations. The new sitemap may exist, but not include the pages that matter. Important collections may be live, but canonical signals, internal links or noindex rules may stop them being treated as primary pages.

The critical checks sit between “the store launched” and “search engines can understand the new store”.

The work is not to crawl every possible URL. It is to prove that important old value has a crawlable, indexable and relevant new home.

Start with old-site evidence

Before judging the new Shopify store, collect evidence from the old store.

You need:

  • old crawl export
  • old sitemap URLs
  • Search Console landing page data
  • top organic pages
  • backlink target URLs
  • old canonical targets
  • old noindex rules
  • old redirect paths
  • old parameter/filter patterns

Without the old evidence, you cannot tell whether Shopify has simplified the site cleanly or accidentally removed important search paths.

Build a crawl sample set

Do not crawl only the new homepage and sitemap.

Build a sample set with:

  • top old organic landing pages
  • top old product URLs
  • top old category URLs
  • important blog/guide URLs
  • old filtered URLs
  • old tag/archive URLs
  • backlink targets
  • discontinued product URLs
  • newly created Shopify collections
  • newly created Shopify products
  • resource/download pages if relevant

The sample set should expose risk. If it only includes clean pages, the crawl will look better than the migration really is.

Check old URLs first

For each important old URL, confirm:

  • does it still resolve?
  • does it redirect?
  • is the redirect permanent?
  • does it redirect in one hop?
  • is the destination relevant?
  • does the destination return 200?
  • is the destination indexable?
  • does the destination canonicalise to itself or a sensible parent?

A redirect that lands on the homepage is usually not a successful SEO migration for an important category or product.

The useful question is not only “does it redirect?”.

Ask what the redirect proves:

Crawl resultWhat it usually meansFirst response
Old URL returns 404The redirect is missing or was never mappedCheck the redirect sheet and old URL priority
Old URL redirects to homepageThe old intent may not be preservedFind or build a closer destination
Old URL redirects in several hopsOld rules were probably carried overFlatten the chain to the final Shopify URL
Destination is noindexThe redirect points into a page that cannot rankFix indexability or choose another destination
Destination canonicalises elsewhereThe signal may be diluted or confusedCheck whether the canonical target is intentional

This is where crawl data becomes migration judgement.

Check Shopify sitemap coverage

Shopify normally generates sitemap files, but the presence of a sitemap is not the same as good coverage.

Check whether important Shopify pages are included:

  • collections
  • products
  • blogs/posts
  • pages
  • resources if they are public

Then compare the sitemap to your priority list.

If an important collection is live but absent from the sitemap or not internally linked, investigate before assuming it will be discovered quickly.

Check robots and noindex rules

Look for mistakes that block the wrong pages.

Check:

  • robots.txt output
  • noindex tags
  • x-robots-tag headers if used
  • password protection
  • app-injected directives
  • staging-domain remnants
  • template-level rules

Most crawl/indexing disasters are not subtle. They are usually simple blocking rules applied too broadly.

Check canonical signals

For priority pages, check:

  • canonical URL
  • status code of canonical target
  • whether canonical target is indexable
  • whether internal links point to the canonical version
  • whether old product-with-collection paths create confusion
  • whether filters or parameters canonicalise sensibly

Canonical tags are hints, but inconsistent hints create avoidable doubt.

Check collection indexing quality

Important collections should be easy to crawl, index and understand.

For each priority collection, check:

  • 200 status
  • indexable directive
  • self-referencing canonical or deliberate canonical target
  • title and H1
  • product relevance
  • internal links from navigation/content/products
  • no accidental filter URL selected as the main page
  • crawl depth

If a collection is commercially important but buried, thin or internally unsupported, indexing may not be the only problem.

Check product indexing quality

For important products, check:

  • product URL status
  • canonical
  • indexability
  • image/media output
  • structured data
  • collection membership
  • internal links from collections
  • discontinued/out-of-stock handling
  • redirect from old product URL

Do not index every product blindly if the catalogue has duplicates, variants or discontinued items. Decide what should remain discoverable.

Check filtered and parameter URLs

Faceted navigation can create migration noise.

Look for URLs with:

  • filter parameters
  • sort parameters
  • tag paths
  • vendor/type patterns
  • search URLs
  • app-generated filter URLs

For each pattern, decide:

  • should this be crawlable?
  • should it be indexable?
  • should it canonicalise to the base collection?
  • should a high-demand filter become a dedicated collection instead?

Do not let filters become accidental landing pages because they happened to exist after launch.

Example:

An old WooCommerce URL for ?filter_size=wide may have earned search demand because shoppers wanted a specific product group. If Shopify turns that into a crawlable filter URL with no stable collection page, the migration may preserve access but weaken the landing page.

In that case, the better fix may be a proper collection, not just a canonical tag.

Use Search Console carefully after launch

Search Console will not update instantly, but it will show patterns.

Monitor:

  • indexing status for priority URLs
  • submitted vs indexed sitemap URLs
  • crawl errors
  • soft 404s
  • pages with redirects
  • excluded by noindex
  • duplicate/canonical reports
  • clicks and impressions by page type

Do not panic at every early warning. Look for repeated patterns across important URL groups.

First crawl after launch

Run the first post-launch crawl with:

  • sitemap crawl
  • internal crawl
  • redirect list crawl
  • priority URL list crawl
  • rendered HTML where useful

Compare outputs rather than looking at one report.

A sitemap crawl tells you what Shopify is submitting. An internal crawl tells you what the site actually links to. A redirect crawl tells you whether old value has a new home.

Common crawl/indexing mistakes

Watch for:

  • old URLs redirecting to irrelevant pages
  • important collections missing from navigation
  • accidental noindex on templates
  • staging URLs left in links
  • canonical targets pointing to the wrong version
  • filter URLs being crawled heavily
  • Shopify product URLs linked inconsistently
  • blog content not linked from commercial pages
  • sitemap submitted but not compared against priority URLs

These are fixable, but they are expensive to discover late.

Minimum crawl and indexing sheet

Use these columns:

  • URL
  • old URL
  • page type
  • priority
  • status code
  • indexability
  • canonical
  • sitemap presence
  • internal link count
  • redirect source
  • destination relevance
  • Search Console status
  • issue
  • severity
  • owner
  • action

This turns crawl data into decisions.

If the redirect review is the weak point, pair this sheet with the Migration Redirect Risk Review.

What to do next

If redirects are failing, use the Shopify redirect mapping guide.

If launch QA is still underway, use the Shopify migration QA checklist.

If traffic has already dropped, use the Shopify SEO traffic drop after migration runbook.

For broader live-store technical checks, use the Shopify technical SEO checklist.

Quick answer

Run crawl and indexing checks before and after a Shopify migration so the team can prove which old URLs existed, which new URLs replaced them, and which pages are crawlable, indexable and internally linked after launch.

What you will do

  • Save old-site crawl evidence before migration work changes the source site.
  • Catch staging noindex, canonical, robots, sitemap and template problems before launch.
  • Use live crawl evidence to fix redirect chains, 404s and indexation gaps quickly.

What to check first

  • Screaming Frog, Sitebulb or an equivalent crawler.
  • Google Search Console page, query and indexing exports.
  • GA4 or Shopify reports for landing-page value.
  • Backlink export for URLs that may not appear in the current crawl.
  • Shopify sitemap, robots.txt and URL redirect controls.

Work through it in this order

  1. Crawl the old site and export indexable URLs, status codes, titles, meta descriptions, canonicals, H1s and inlinks.
  2. Merge the crawl with Search Console, analytics and backlink exports so commercial URLs are not treated like low-value crawl noise.
  3. Crawl the Shopify staging store and check product, collection, blog and page templates for indexability, schema, links and password/noindex leftovers.
  4. Prepare an old-URL test list from the top organic, revenue and backlinked pages.
  5. After launch, crawl the live domain, sitemap URLs and old high-priority URL list.
  6. Fix one-hop redirect failures, unexpected 404s, noindex/canonical mistakes and sitemap-only orphan pages before lower-value warnings.
  7. Keep the old, staging and live crawl exports in the migration evidence folder.

Real-world notes

  • Old category pages often vanish because the new Shopify collection structure was built from product imports rather than search demand.
  • Staging crawls regularly catch noindex tags, password remnants and app schema conflicts before anyone notices in Search Console.
  • A post-launch crawl can reveal that internal links still point through old redirected URLs even when the redirect map itself works.

Final checks

  • Old site crawl saved before migration changes.
  • Search Console and analytics data merged into URL list.
  • Staging crawl reviewed for noindex, canonical, robots and schema issues.
  • Top old URLs tested after launch.
  • Redirect chains and loops reviewed.
  • Live sitemap URLs crawled.
  • 404s prioritised by traffic, links and revenue.
  • Crawl exports stored with launch date.

Watch-outs

  • If stock sync unpublishes products after launch, crawl data may show sudden 404s that are actually inventory process problems.
  • If an app creates filter or search-result pages, the crawl may expose index bloat that the original migration plan never considered.
  • If old redirects existed in WordPress, Shopify imports can create chains unless the final destination is mapped directly.
Next action

Run this crawl pass before final redirect QA, then use the traffic-drop guide if Search Console movement looks abnormal.

Field questions

Should I crawl the old WooCommerce site before moving to Shopify?

Yes. Crawl the old site before URLs, navigation, content or plugins change. Keep the export because it becomes the source list for redirects, metadata checks and post-launch 404 monitoring.

What should I crawl after Shopify launch?

Crawl the final domain, high-priority old URLs, sitemap URLs, top organic landing pages and a sample of product, collection, blog and page templates.

Can Search Console replace a migration crawl?

No. Search Console is essential, but it lags and does not replace a controlled crawl of old URLs, staging URLs and live redirects.

Commercial disclosure

Partner links mentioned on this page

Some links may earn a commission, but recommendations still start with the store problem, the evidence, and the simplest workable next step.