I am just doing some research into Google behavior based on an increased reporting of 404′s from our customers – it appears as though (Contrary to previous knowledge) Google is indexing pages that don’t actually exist (They are “search results)

For instance – REW has a listings tool on our site, the only way to get a /listings/?search in a URL is to actually fill out the form. Therefor no page with those parameters (which combinations can be in the hundreds of thousands) should be indexed.

Yet I do a search site:http://www.realestatewebmasters.com/listings/?search and Google reports over 50,000 of them DOH!!!!

Now just so you know – Google already has a way to deal with these duplicate (or near duplicate pages) https://support.google.com/webmaster…answer=1235687 – and it should not do any harm except for when Google chooses one of these pages over one of your quality pages in the index.

I am testing various methods of blocking these URL’s (I never wanted them in the index) there are several ways to do this
In the .htaccess file
In the robots.txt file (especially if they are in a folder)
In the Google URL parameters tool

Will it improve the number of “intentional” pages in the index? Not sure (and kind of hard to test on REW because we have so many pages / authority) – but def something to take a look at. Anyone else have these kinds of pages being monitored?