The sitemap is a simple yet powerful SEO tool, not only for new and unknown sites but in general. If you don’t have one, you must create one right now (and submit it to Google). I made this tool and wrote this quick guide to help you do it fast and do it right.
You know all about sitemaps and only want to get the Sitemap Generator itself? Skip the rest and go to the download section.
A sitemap is an XML[1] or a plain text file that contains the list of all the important pages on your website.
It tells search engines what do you want them to index.
The SEO usefulness of the sitemap is twofold:
1) to make it easier for the bots to find your pages and
2) to help the search engine make sense of your website.
The sitemap will help the spider find your pages, which helps with new sites. But the real reason to use sitemap is other:
When done right, it says to the search engine what's important from your point of view. (Including or not an URL in the sitemap is signal on what you think matters or not)
On the technical side:
The two popular sitemap formats are XML and plain text.
XML -- This is the usual way to submit your sitemaps to Google and looks like this:
NB: this is a strict format and you should avoid using it if creating a sitemap by hand. Unlike HTML, XML doesn't tolerate errors. If you want to create a manual sitemap, use plain text
TXT -- the plain text sitemap works fine in most cases and is much simpler, you only list the links and that's all. The XML example above look like this in TXT sitemap:
XML seems to be the “industry standard” nowadays (because it's more flexible).
In both cases, your sitemap should follow some requirements:
No more than 50,000 pages in one sitemap file. If you want to submit more, you will need to separate the links into several files.
No more than 50MB file size. As in the above point, you must separate links in more sitemaps if over the limit
No redirects. It's pointless at best and eats your crawling "quota" (if such a thing exists).
No error pages. Why on earth will one want search engine bot to visit 404 page?
No forbidden by robots.txt or "noindex" tag pages. This will give conflicting signals to the search engine and is useless at best.
No duplicates and no "non-canonical" links. The best case scenario is when you have "canonical" tag set for each page. Duplicates should refer to the canonical one. Always avoid adding duplicates or "non-canonical" urls to the sitemap.
As a general rule, you should include only meaningful content in your sitemap.
Even if it is possible to create one by hand, I wouldn't offer this. The best way is to use your site’s integrated sitemap features (if any). Or a third-party tool (sitemap generator) that will do the job for you.
Depending on your site's technology, and do you have access to the server, there are quite a few options:
for WordPress or other content management systems — the best way is to use the integrated sitemap features or plugins. Those will use the database itself, and are fast and auto-updated as you add content with time.
for database-driven sites in general — if custom made or proprietary — the developer who made the site could help with this.
for any site that can not use server-side solutions: use some sitemap generator software. This includes static old sites, or something custom that you have no access to (like client’s sites, if you are the SEO service person)
I know there are plenty of Sitemap Generator tools around the web. But most of these donn’t fit my simple need for the spider to be SEO-aware. (See ahref’s sitemap creation post for the SEO perspective)
I didn’t like the idea to use online tools, ahrefs doesn’t work well in my use-cases and most of the tools on the market (even the old version of this one) don’t take into account the SEO side of things.
So — I stole few weeks from my “main” job and remembered myself what it means to be a software developer. A month later, here it is: the best free seo-aware sitemap generator around.
A tool like this should, in general, crawl the entire target site and list only the important — from SEO-point of view — pages. It should be aware of error pages, noindex tags, robots.txt directives, canonicals etc.)
I made the new version of this sitemap generator with SEO in mind, and:
removes 4xx,5xx,3xx pages (server side errors and redirects)
follows robots.txt directives (forbidden urls)
removes non-canonical duplicates
scans the documents for “noindex” tags (not only server responses)
generates reports on what and why it adds or not to the sitemap
multithreaded crawling
produces XML, TXT and HTML sitemaps
Note that currently, it supports only sitemaps under 50K pages and 50MB of size.
Click the button below to download the tool, start the setup then follow the instructions.
System Requirements:
64-bit Windows
8GB RAM
6GB HDD (SSD is better)
1) Download and install the Sitemap Generator tool
2) Enter your site url, click “create” to create a project and “Run” to start crawling. If everything is OK, after some time you’ll get the sitemap in the “Final Sitemap” tab
3) Save the xml/txt file(s), and upload it to your site root folder. That’s it, now you are ready to go to the respective webmaster console and submit it to google, yahoo and the rest.
Last Updated: February 27, 2022
Author: Vlad Hristov