Best practices of using XML Sitemaps and RSS Feeds
by Geethalakshmi[ Edit ] 2014-10-17 16:33:03
Best practices
Important fields
XML sitemaps and RSS/Atom feeds, in their core, are lists of URLs with metadata attached to them. The two most important pieces of information for Google are the URL itself and its last modification time:
URLs
URLs in XML sitemaps and RSS/Atom feeds should adhere to the following guidelines:
- Only include URLs that can be fetched by Googlebot. A common mistake is including URLs disallowed by robots.txt — which cannot be fetched by Googlebot, or including URLs of pages that don't exist.
- Only include canonical URLs. A common mistake is to include URLs of duplicate pages. This increases the load on your server without improving indexing.
Last modification time
Specify a last modification time for each URL in an XML sitemap and RSS/Atom feed. The last modification time should be the last time the content of the page changed meaningfully. If a change is meant to be visible in the search results, then the last modification time should be the time of this change.
- XML sitemap uses
- RSS uses
- Atom uses
Be sure to set or update last modification time correctly:
- Specify the time in the correct format: W3C Datetime for XML sitemaps, RFC3339 for Atom and RFC822 for RSS.
- Only update modification time when the content changed meaningfully.
- Don’t set the last modification time to the current time whenever the sitemap or feed is served.
XML sitemaps
XML sitemaps should contain URLs of all pages on your site. They are often large and update infrequently. Follow these guidelines:
- For a single XML sitemap: update it at least once a day (if your site changes regularly) and ping Google after you update it.
- For a set of XML sitemaps: maximize the number of URLs in each XML sitemap. The limit is 50,000 URLs or a maximum size of 10MB uncompressed, whichever is reached first. Ping Google for each updated XML sitemap (or once for the sitemap index, if that's used) every time it is updated. A common mistake is to put only a handful of URLs into each XML sitemap file, which usually makes it harder for Google to download all of these XML sitemaps in a reasonable time.
RSS/Atom
RSS/Atom feeds should convey recent updates of your site. They are usually small and updated frequently. For these feeds, we recommend:
- When a new page is added or an existing page meaningfully changed, add the URL and the modification time to the feed.
- In order for Google to not miss updates, the RSS/Atom feed should have all updates in it since at least the last time Google downloaded it. The best way to achieve this is by using PubSubHubbub. The hub will propagate the content of your feed to all interested parties (RSS readers, search engines, etc.) in the fastest and most efficient way possible.