- Accessibility
- Actions
- Blog
- Bootstrap
- Calendar
- Code Snippets
- Core Hacks
- Cron
- Development Environment
- Drupal 7
- Flash
- Forms
- Goodies
- Hooks
- Hosting
- Images
- Installation, Updating and Moving
- Internationalization
- Javascript/JQuery
- Learning Drupal (New Users)
- Login/User Management/Permissions
- Logs
- Menus
- Modules
- Nodes
- Paths
- Performance
- RSS
- Reference Sites
- Registry
- Reporting
- Reporting Bugs/Issues
- SEO
- Schema
- Search
- Security
- SimpleTest
- Sites
- Support
- Testing
- Themes
- Troubleshooting
- Upgrading
- User Groups
- Workflow
Submitted by captaindav on Fri, 2009-03-20 13:04
The robots.txt file is used to prevent search engine crawlers from indexing pages you don't want indexed. Drupal comes with a default robots.txt that is fairly complete, except that, if you are using pathauto, you may want to add the line: Disallow: /node/. This is done to prevent crawlers from indexing duplicate content, that is, the www.example.com/node/999 and the www.example.com/pathauto-named version of the page are duplicates, and this may reduce page's score and/or waste site bandwidth with crawlers indexing each page twice.