It's easily possible to create a proper sitemap.xml, with the right URL, in a vanilla Umbraco installation with these three easy steps:
1) Create a "XML sitemap" document type and template in the backoffice
2) Add a line in the template for the "XML sitemap" doctype so that Umbraco serves it as XML rather than HTML
3) Configure Umbraco's built in url rewriting module to handle a request that ends in ".xml"
If you've read my previous post on creating a robots.txt in Umbraco, you'll notice that this tutorial is almost the same. Here's what we're going to achieve, with the example taken from a site which I work on. This sitemap is dynamically generated by Umbraco.
1) Creating the document types
There are two document types you should create for this sitemap.xml.
- An XML sitemap document type (with template)
- (optional) An "XML Sitemap Settings" document type, without template, which you can compose into other document types to implement optional sitemap settings like change frequency.
First you'll need to create the XML Sitemap document type. No extra properties are needed here, since everything the sitemap needs is generated dynamically from the rest of your site's content tree.
The next, optional step, involves creating a document type without a template, which you can use in document type compositions to implement the optional properties as defined in the sitemap.xml standard (See here for a full list: http://www.sitemaps.org/protocol.html#xmlTagDefinitions). Here's an example from one of my sites, with sitemap-relevant settings skillfully highlighted:
2) Write the razor template for the XML sitemap
You can get a little creative here so that you end up with the right solution for your site (eg. if you need to split the sitemap up into multiple files), but I'll first provide my example code for you to read, then explain the interesting features.
I stress that this is an example - however, you could use this code as-is, and it will correctly handle a site that has multiple root nodes. On my production version of this I have an extension method which I use in place of
IPublishedContent.UrlAbsolute() to ensure absolute URLs are rendered correctly when SSL is provided by my CDN.
Line 5: This line instructs Umbraco to serve this page with the content type of
text/xml, (instead of the default
text/html), ensuring that browsers and crawlers understand that this is an XML page.
Line 9: The call to
Umbraco.TypedContentAtRoot(), and the two foreach loops are needed to ensure that sites with multiple root nodes have all of their content nodes listed in the sitemap.
Lines 12 & 23: The version of this code which is closer to what I have in production, available here, has a line handling pages which have a
canonicalUrl property. This code, however, just calls
UrlAbsolute() which is fine for most cases.
Lines 14 & 25: This is where the optional settings mentioned in section 1 come into play, with some pages having an
updateFrequency property which tells Google that our page has frequently changing content. This property is , however, completely optional.
3) Use Umbraco's built-in URL rewriting module to give our sitemap the right URL
Did you know that Umbraco has a built-in URL rewriting module? It takes one line of XML to configure it to handle requests to
/sitemap.xml, giving us the standard sitemap URL and keeping Google happy.
The config file is located at
First, after you've created a page with the XML Sitemap document type, you'll need to find the URL that Umbraco has generated for you. This is found under the properties tab of your page:
And then, all you have to do is add one line of XML configuration to rewrite the URL:
<add name="sitemap-rewrite" virtualUrl="^~/sitemap.xml" destinationUrl="~/sitemapxml" />
And now your sitemap will be correctly configured and ready for primetime. I also have a tutorial which might interest you which demonstrates the application of this same technique to the robots.txt.