How to create a robots.txt in Umbraco and edit it from the backoffice

It's very easy to create a robots.txt in Umbraco which you can edit from the backoffice. You can achieve this natively without installing any packages or writing custom code with these simple steps:

1) Create a "txt file" document type in the backoffice
2) Add a line in the template for the "txt file" doctype so that Umbraco serves it as text rather than HTML
3) Configure Umbraco's built in url rewriting module to handle a request that ends in ".txt"

Here's what we're going to achieve:

Editing the robots.txt in the Umbraco backoffice

This file will be accessible to the crawlers at www.mywebsite.co.uk/robots.txt

1. Create a "txt file" document type

We need to create a document type for ".txt" files. I refrain from calling it "robots.txt doctype" or similar, because there's no reason this document type couldn't be used again for another txt file web standard, such as humans.txt

Showing the creating process of the robots.txt document type

This is easy. All we need is a text area for the file content. Make sure that the document type is created with accompanying template and that all your permissions are set up correctly.

2. Write the txt file template

Again, this is very simple. We need one line of Razor code to take the string from our text area and render it in the template.

Line no. 5 however, may take some explaining. This line tells Umbraco to set this page's content type header to plain text. Web servers generally send these extra bits of information with the web page to tell the user's browser what type of content it's dealing with so it can be rendered properly. For instance, it would obviously be incorrect for the browser to assume that HTML files and PNG images should be rendered in the same way!

Umbraco's default content type header is text/html, so we need to change it to text/plain so that our clients know they're dealing with a plain text file.

Now we can create the robots.txt file in our content tree and add our content to it.

3. Configure Umbraco to recognise the "robots.txt" URL

Once you've created a "robots.txt" file with your new document type in the backoffice, and you try to access it on www.mywebsite.com/robots.txt, you may see a 404 page, a blank page, or something else depending on how your web server is configured. This is because Umbraco doesn't intercept URLs with extensions like .txt by default. You'll need to configure your site to intercept the request to /robots.txt and send it to your content node.

The good news is that Umbraco provides an out-the-box solution for this. Did you know that Umbraco comes with a URL rewriting module? You can easily configure it to intercept a url with one line of XML configuration.

First you'll need to find the URL that Umbraco will have generated for you by clicking on your content node and going to the "Properties" tab.

Showing Umbraco's automatically generated URL in the backoffice

You see here that Umbraco's auto-generated URL for me is /robotstxt. You'll notice that even though there's a dot in the name of this content node, Umbraco doesn't add one to the URL for me.

Now, we need to open up Umbraco's URL rewriting config in our editor and add a line to rewrite the /robots.txt URL.

The config file is located at ~\Config\UrlRewriting.config. You can find an example from a live Umbraco site here.

And here's the line that we have to add:

Name can be set to whatever you want, as long as it's unique in the file. virtualUrl is the URL people will enter to get to your page (represented as a regular expression). destinationUrl is the URL we're rewriting to (Umbraco's auto-generated URL from the properties tab in the backoffice).

Et Voila

A robots.txt file shown in a web browser