Lumear
Docs
← Back to lumear.aiSign in

Domains & crawling

Lumear crawls your website so it can match AI prompts to your actual page content. This is what makes recommendations specific instead of generic.

Adding a domain

Most users add their primary domain at brand-create time. To add additional domains (microsites, landing pages, country variants), go to the Sites page and click + Add domain.

Enter the bare host: example.com. No https://, no trailing slash, no path. If you provide a custom sitemap URL, Lumear will use it as the entry point; otherwise we discover URLs through Firecrawl's site mapper.

Crawl scopes

Three scopes control how aggressive a crawl is:

  • Topical (default). We /map the domain, then filter the URL list down to pages whose path tokens overlap with your prompt-set vocabulary. The cheapest mode, and usually the right one.
  • Cited-only. We crawl only the URLs that AI assistants have already referenced for one of your prompts. Used by the auto-cascade — typically you don't trigger this manually.
  • Full audit. Every URL Firecrawl can find, up to the per-domain page_cap (default 1000). The most expensive mode — use it occasionally to refresh the full index.

When does crawling happen automatically?

Lumear fires a crawl in three situations:

  1. You add a brand with a domain → topical crawl of that domain.
  2. You add a domain on the Sites page → topical crawl.
  3. A prompt run completes → for each URL the AI cited, we kick off a cited-only crawl of that competitor URL so we have its content to compare against.

You almost never need to click the Crawl button manually. It's there for when you've made site changes you want re-indexed before the next scheduled crawl.

What gets stored

For each page we successfully scrape:

  • target_pages — the page metadata (URL, title, H1, page type, content hash, HTTP status).
  • target_content_blocks — the page split into heading-bounded sections, each with a 1536-dimensional embedding.

The embeddings are what let us match your prompts to specific sections of your pages with vector similarity. Page-type classification (faq, product, blog, etc.) helps us pick the right surface to recommend edits on.

Re-crawl frequency

Today crawls are manual / event-driven. Scheduled re-crawls (weekly, monthly) are on the roadmap — for now you can hit Re-crawl on the Sites page any time you push content updates.

Troubleshooting

  • 0 pages crawled, status “success”. Firecrawl couldn't reach the site (down, behind aggressive bot protection, or no public sitemap). Try a different domain or check whether https://yourdomain.com loads from a clean browser.
  • Crawl failed. Check Troubleshooting for the common error codes.
Domains & crawling — Lumear Docs