Monday, October 18, 2010

Stop non-HTML file (for example, a downloadable PDF) from appearing in search results


The first step requires the webmaster to make a change. If you own the site, you'll need to do ONE of the actions listed below. If you don't own the site, contact the webmaster and request that one of these changes be made. (If one of these changes isn't made, you will not be able to use this tool to process your removal request.)
  • If the page no longer exists, make sure that the server returns a 404 (Not Found) or 410 (Gone) HTTP status code. This will tell Google that the page is gone and that it should no longer appear in search results.
  • If the page still exists but you don't want it to appear in search results, use robots.txt to prevent Google from crawling it. Note that in general, even if a URL is disallowed by robots.txt we may still index the page if we find its URL on another site. However, Google won't index the page if it's blocked in robots.txt and there's an active removal request for the page.
  • Alternatively, you can use a noindex meta tag. When we see this tag on a page, Google will completely drop the page from our search results, even if other pages link to it. This is a good solution if you don't have direct access to the site server. (You will need to be able to edit the HTML source of the page).
In addition, if you want a non-HTML file (for example, a downloadable PDF) to be removed from search results, you or the webmaster should ensure that the file is removed from the server. Once it's gone, use the process below to request that the page is completely removed from search results.

If you own the site

  1. Verify your ownership of the site in Webmaster Tools.
  2. On the Webmaster Tools home page, click the site you want.
  3. On the Dashboard, click Site configuration in the left-hand navigation.
  4. Click Crawler access, and then click Remove URL.
  5. Click New removal request.
  6. Type the URL of the page you want removed from search results (not the Google search results URL or cached page URL), and then click Continue. How to find the right URL. Note that the URL is case-sensitive—you will need to submit the URL using exactly the same characters and the same capitalization that the site uses.
  7. Click Remove page from search results and cache.
  8. Select the checkbox to confirm that you have completed the requirements listed in this article, and then click Submit Request.

If you don't own the site

  1. Go to http://www.google.com/webmasters/tools/removals.
  2. If you're not immediately taken to the 'Create a new request' page, click New Removal Request.
  3. Type the URL of the webpage you want removed (not the Google search results URL or cached page URL). Note that the URL is case-sensitive—you will need to submit the URL using exactly the same characters and the same capitalization that the site uses. How to find the right URL. Click Continue.
  4. Click Webmaster has already blocked the page.
  5. Select the checkbox to confirm the requirements listed in this article have been completed, and then click Submit Request.
Source:http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=164734&from=61062&rd=1

No comments:

Post a Comment