How to disappear from Google, Yahoo, MSN etc... indexes with CommunityServer 2.0 in less than a week...

  • Hi guys,

    Many months ago, I found a BIG bug inside ASP.NET 2...
    And the big surpise for me tonight was to see that Community Server 2.0 has this bug too (and no one seen it or spoke about it... I searched on this forum before posting)

    A small exemple is better than a big text (and sorry I'm french it's not easy for my to write in english)

    Install Fiddler (http://www.fiddlertool.com/)
    Build this request in the Request Builder :

    Accept: */*
    Accept-Language: fr
    Accept-Encoding: gzip, deflate
    User-Agent: Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp)

    Put http://communityserver.org/blogs/dailynews/archive/2006/06/28/536312.aspx in the field and press the "Execute !" Button.

    Look at the left side in the http sessions listview.
    You will see : Result : 302 Instead of 200 !
    Double click on the line of the result, go in the textvix tab, you will see that :

    <html><head><title>Object moved</title></head><body>
    <h2>Object moved to <a href="/error.htm?aspxerrorpath=/blogs/post.aspx">here</a>.</h2>
    </body></html>

    This is EXACTLY what Yahoo Slurp, GoogleBot, MSNBot etc... see (this is the case for MANY Bots)

    (Try it with :

    User-Agent: Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)

     it the same problem)

    Months ago before I had exactly the same probleme on my site, the problem come from the rewriting (it's difficult to explain the root of the problem in english for me sorry).
    I sent a feedback to the Scott Guthrie Team but I have no news since many months.

    I didn't test it on my blogs when I migrate to CS 2.0 because I was sure that Telligent should have done this kind of tests... This is not the case and now I understand why my blogs lost there ranking in Google & co...(I'm "a little bit" angry about this)

    I corrected the problem on my blogs tonight, If the guys of Telligent are interested by a "dirty hack" that I spent 1 week to found, contact me.

    I tested a lots of sites using CommunityServer Like TheSpoke.net, and every time I get a 302 Error...
    Try this url for exemple : http://thespoke.net/blogs/laurelle/archive/2006/06/07/960872.aspx => 302 !

    Nicolas SOREL (Nix)
    MVP .NET

    Sorry for my english, I'm french. I hope my text is readable Big Smile

    P.S : My solution is a really "dirty hack", that's why I don't write it here for the moment. I prefer that Microsoft release a hotfix instead of explain a bad way to fix the problem.

  • If this behavior is confirmed, it's certainly a huge problem.

    This problem concern only persons using CS with asp.net 2.0 I suppose ?

    I'll track this post.

  • The Bug is in ASP.NET 2
    Didn't tried on ASP.NET 1.x but this rewriting bug is not in ASP.NET 1.X so if you use ASP.NET 1.X it's ok

  • Im' very suprise that nobody comment this post
    Maybe my english is worst than I think.

    I'll try to explain more clearly the situation :

    If you use CommunityServer 2.0 on ASP.NET 2.0 YOU CAN'T be indexed In Google, Yahoo, MSN etc...

    Everybody don't care about this ?
  • Surprise

    Hope that Telligent Guys fix it until C.S. 2.1 cause I studying buy one license and migrate my community.
    I have been looking around this community for days and if what NicolasS said is true, that´s a really big issue.

    You from Telligent do a fantastic job here... But any feedback about it?

    Sorry about my english and thanks in advance!

  • Not sure if you would call this "a dirty hack" but if you set this in the web.config, the issue is resolved:

    <forms name=".CommunityServer" cookieless="UseCookies" protection="All" timeout="60" loginUrl="login.aspx" slidingExpiration="true" />

    The difference is the cookieless-attribute.

    This overwrites the default behavior. The default behavior doesn't seem to work correcly for the searchengine-bots.

     

  • That's could stop some 500 errors, but i'm not sure that will resolve the real bug that is in .NET 2.
    The problem come from the management of browsers, and if you force the use of cookies can in some case resolve the problem.
    This trick is a way to resolve part of the problem of CS 2.0

  • Well, it does resolve the problem of CS 2.0 sites not being indexed by search-engines. As far as the underlying .NET 2.0 issue, I haven't looked into that.
  • Thanks for taking the time to post this NicolasS I will be tracking this thread as well.

     

    By the way I'm from UK and your English is better than mine Embarrassed

    Good luck in the Coupe Du Monde today

     

     

  • JoeWork:

    Thanks for taking the time to post this NicolasS I will be tracking this thread as well.

    By the way I'm from UK and your English is better than mine Embarrassed

    Good luck in the Coupe Du Monde today

    I'll try to find time to blog (in french) a full explenation about this .NET 2 bug during next week but a friend of mine should translate/explain it in english on his blog.
    This bug is not affecting only CS 2.0 but could affect many .NET 2 sites using rewriting. And if noon saw the problem since many months for CS 2.0  we can imagnie that many many many sites should have the ".NET 2 Bot Crash"
    (I discovered and sent feedback about this bug in febuary... I think it's time now that every one know how to prevent this bug...)

    I'm happy to read that my english is not so bad Stick out tongue

  • NicolasS,

    I did confirm the user agents you specified below return 302 status codes (and I sent you an email about your fix, I can be reached at scottw -----  telligent.com).

    However, I did notice that the dailynews blog has been updated recently in the google index. See Here

    -Scott

  • ScottW:

    NicolasS,

    I did confirm the user agents you specified below return 302 status codes (and I sent you an email about your fix, I can be reached at scottw -----  telligent.com).

    However, I did notice that the dailynews blog has been updated recently in the google index. See Here

    -Scott

    Yes it's possible IF googlebot used his "old" signature ( Googlebot/2.1 (+http://www.google.com/bot.html) ) that don't crash.
    But Google use less and less this signature, now it use this signature that crash :
    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    Nicolas

  • Windows Live Search seems to have index my site correctly, no 302 errors, the last it looked at it was the 28th, and my site runs CS in ASP.net 2.0 mode

    http://www.live.com/#q=site%3Anbdev.co.uk&offset=1
    http://search.msn.co.uk/results.aspx?q=site%3Anbdev.co.uk&FORM=SMCRT

    Google's a little slower, and has only a cache from 22nd

    http://www.google.co.uk/search?q=site%3Anbdev.co.uk&start=0&ie=utf-8&oe=utf-8

    And yahoo is the 26th

    http://uk.search.yahoo.com/search?p=site%3Anbdev.co.uk&ie=UTF-8 

  • Nick : For MSNBot it depends of the user agent used to index your site, and it seems ok for live.com
    BUT for google it's not, It indexed only main pages
    I tried : http://www.google.co.uk/search?hl=fr&q=site%3Anbdev.co.uk+CSMVP+server
    Your post called "CSMVP Server" Is not indexed, and it should

    Same for yahoo : http://uk.search.yahoo.com/search?p=site%3Anbdev.co.uk+CSMVP+server&ei=UTF-8&x=wrt&meta=vc%3D
    Crawlers that crashes can only get "main pages", not posts page with your main content :p

    The indexation problem come from User agent used to crawl your page and for exemple Google use sometime a old signature that don't crash the page.

    The strange things is that I made a projet for a post that I prepare to explain this bug, and month ago the User Agent MSNBot/X.X crashed on my site on production. And on my test projet I can't anymore reproduce the crash for this signature. But that"s change nothing, it crash for a lot's of User Agent signatures.
    I'll post a zip with a test project soon

  • I wrote a (big) post on my blog to explain the bug.
    Sorry, for the moment, this post is in French but Sander_G should translate it in english for those who can't read french

    http://blogs.developpeur.org/nix/archive/2006/07/01/DOTNET_2_GoogleBot_Crash_Bug_Pas_seulement_googlebot.aspx

    I hope this could help ASP.NET 2 developers community

    If a msftee read this post, please ear us and release a patch