Error
About This Sitemap Extractor
What This Tool Does
This sitemap extractor fetches and parses any XML sitemap or sitemap index and extracts every URL with its metadata. It supports automatic discovery of sitemap locations from a website's robots.txt, recursive fetching of nested sitemap indexes, and four different views of the extracted data — making it the fastest way to audit a site's full URL inventory without downloading raw XML files.
- robots.txt auto-discovery — enter a homepage URL and the tool finds the sitemap automatically via robots.txt or common paths
- Recursive sitemap index support — follows all child sitemaps in a sitemap index file to extract every URL
- Metadata extraction — captures lastmod, changefreq, and priority fields alongside each URL
- Deduplication — removes repeated URLs that appear in multiple child sitemaps
- Configurable limit — optionally cap extraction at 100, 500, 1000, or 5000 URLs
- 4 result tabs — URL List table, Sitemap Tree hierarchy, Analysis charts, Raw XML viewer
- Filters and sorting — filter by keyword, change frequency, and priority; sort any column
- 3 export formats — CSV (with headers), plain TXT (one URL per line), JSON
How to Use This Tool
- Enter a sitemap URL (ending in .xml) or a website homepage URL — the tool auto-discovers the sitemap
- Enable Recurse nested sitemaps to follow all child sitemaps in a sitemap index (recommended for large sites)
- Enable Auto-discover from robots.txt to find sitemap URLs from the site's robots.txt file automatically
- Enable Deduplicate URLs to remove repeated URLs that appear in multiple child sitemaps
- Set a URL limit if you only need a sample or want faster results
- Click Extract URLs — the live progress log shows each sitemap as it is fetched
- Use the URL List tab to browse, filter, sort, and copy individual URLs
- Check the Sitemap Tree tab to see the hierarchy of index and urlset files
- Open the Analysis tab for charts of frequency, priority, URL depth, and file extension distribution
- Click CSV, TXT, or JSON to download the filtered URL list for use in other tools
XML Sitemap Complete Reference Guide
What Is an XML Sitemap?
An XML sitemap is a file that lists all important URLs on a website along with optional metadata: lastmod (last modification date), changefreq (update frequency hint), and priority (relative importance from 0.0 to 1.0). Search engines like Google, Bing, and Yandex use sitemaps to discover and crawl pages efficiently — especially useful for large sites, new pages without inbound links, or pages behind deep navigation. Sitemaps do not guarantee indexing but significantly improve crawl coverage.
Sitemap Index Files
Large websites split their URL inventory across multiple files using a sitemap index. The index (sitemapindex element) links to child sitemaps (urlset elements), each containing up to 50,000 URLs and a maximum uncompressed size of 50 MB. This tool automatically detects sitemap index files and, when recursion is enabled, fetches every child sitemap to build the complete URL list. The Sitemap Tree tab visualises the full hierarchy including which child sitemaps belong to which index.
Sitemap Metadata Fields
The loc field (required) is the absolute URL of the page. lastmod (optional) is the ISO 8601 date of the last content change — search engines use this to decide whether to recrawl. changefreq (optional) hints at update frequency: always, hourly, daily, weekly, monthly, yearly, or never. priority (optional) is a decimal from 0.0–1.0 indicating relative importance. Google has stated it treats changefreq and priority as hints only and may ignore them, but SEO teams still use them to audit sitemap quality and identify stale or misconfigured pages.
SEO Use Cases for Sitemap Extraction
Extracting sitemap URLs supports several SEO workflows: importing the URL list into Screaming Frog or Sitebulb to crawl only sitemapped pages and check for 4xx, 5xx, or redirects; comparing the sitemap URL list against a crawl to find orphaned pages (indexed but not linked); filtering by lastmod to identify stale content not updated in over a year; auditing priority distribution to see if high-priority pages are correctly marked; and exporting to CSV to track URL count changes over time as a quick measure of content growth or site migrations.