Control the pages Web crawlers can access

Updated on 20-October-2016 at 10:16 AM

Business Catalyst End of life announcement - find out more details.

This article details how to prevent a specific domain or even an URL that meets a certain condition from search engine indexing.

In order to set this up you need to add the bit of Liquid code below to the templates used to render your pages. If you are not using templates for your pages this code snippet will need to be added to each individual page.

To add the code snippet to a page template:

  1. Login to Admin panel
  2. Go to Site Manager-> Page Template and switch to HTML view
  3. Paste the code below as it is under the head section

It is very important to paste the code directly in the head section of the template. If you place it in the body it will not work.


  {%if globals.site.host  contains "worldsecuresystems.com" %}
  	<meta  name="ROBOTS" content="NOINDEX, NOFOLLOW">
  {%endif%}  

Specifying the crawler

This is a page-specific approach to controlling how an individual page should be indexed and served to users in search results. You need to put the robots meta tag in the <head> section of a given page, like this:


  <!DOCTYPE html>
  <html>
  <head>
  <meta name="robots" content="noindex"  />
  </head>
  <body>
    <p> Sample content here </p>
  </body>
  </html>
  

Above example instructs all search engine not to index the page. The value of the name attribute (robots in example above) specifies that the instructions applies to all crawlers from all search engine providers.
To address a specific crawler, replace the robots value of the name attribute with the name of the crawler that you are addressing. Here are a few more examples:

  • To prevent search engine web crawlers from indexing a page on your site, place the following meta tag:
<meta name="robots" content="noindex" />
  • To prevent only Googlebot from crawling your page, update the tag as follows:
<meta name="googlebot" content="noindex" />
  • To prevent only Googlebot from crawling your page, update the tag as follows:
<meta name="slurp" content="noindex" />

Only use the meta if...

Using Liquid you can only show meta tags if a certain condition or set of conditions are met. This was not possible before Liquid. For example let's prevent the Yahoo search engine from indexing any webpages that contain secure in the URL:


  {%if globals.site.host  contains "secure" %}
  	<meta  name="slurp" content="NOINDEX">
  {%endif%}  

The conditions can be even more complex, let's prevent the Yahoo crawler from indexing the webpages in the secure folder and on the worldsecuresystems.com domain. This would be done like so:


  {%if globals.site.host  contains "/secure/" and globals.site.host  contains "worldsecuresystems.com" %}
  	<meta  name="slurp" content="NOINDEX">
  {%endif%}  

You can even specify the date after which the crawlers will be prevented or allowed to index your pages. This code snippet below will prevent the crawlers from indexing your site before the specified date:


  {% assign myDate = "2015-08-07" | convert: "date" %}
  {% if globals.site.dateNow < myDate %}
  	<meta  name="robots" content="NOINDEX">
  {%endif%}  

With this technique you can even create content-related rules, for example do not index products that contain a certain parameter. In this example we do not want to index any products with the salePrice over 100:


  {% if salePrice > 100 %}
  	<meta  name="robots" content="NOINDEX">
  {%endif%}  

Other resources