Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Gaming > Diplomacy > Re: "Plague" of...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 3 of 4 Topic 1293 of 1394
Post > Topic >>

Re: "Plague" of Broken Links

by Chris Babcock <cbabcock@[EMAIL PROTECTED] > Dec 2, 2007 at 06:18 PM

> >The good news is that I'm starting to see some instances of "Item
> >already checked" fla****ng by. This would mean that I probably don't
> >have to wait another 16 hours or more for final results and that
> >there may not be 3,000 broken links by the time it's finished
> >checking the site...
> 
> What you mean, I think, is that you're doublecounting some of the 
> "repeated" broken links where the same link exists in many places and
> of course each one is broken?  Some things that are im****tant.... how
> HUGE the site is, how many links there are!!!  And is 3% large or
> small?  I think that's about what I would have thought it was.  I
> think that's neither large nor small but about what such a huge site,
> originally built over ten years ago, has as a legacy.

There turned out to be a nasty bit of recursion in the site - items
that are in the "DipPouch" folder on the site are physically located in
the root folder... as is the link (in the filesystem sense) that
redirects the traffic there. It's a clever thing to do in a couple
ways. It was just inconvenient for this project. In the end, I had to
'break' that link in order to successfully crawl the site with the
spider. Otherwise I would have gotten ever deeper levels of URLs that
look like:

"diplom.org/DipPouch/DipPouch/DipPouch/DipPouch/DipPouch/DipPouch/..."

The final result is that there are 37776 links to 5892 unique targets
(including images). There are 4824 good links and 945 bad links; The
number for 'bad links' unfortunately including those links that needed
to be tem****arily disabled. I'll be contacting the maintainers
individually with specifics on their sections as soon as I can generate
re****ts. 

So I've got a tool that can help find the broken links (with some human
intervention), but the statistics are more obviously useless than is
normally the case (and the recursion makes it difficult to *****s the
size of the site too).

Chris
 




 4 Posts in Topic:
"Plague" of Broken Links
Chris Babcock <cbabcoc  2007-12-02 08:57:35 
Re: "Plague" of Broken Links
Jim Burgess <burgess@[  2007-12-03 00:28:10 
Re: "Plague" of Broken Links
Chris Babcock <cbabcoc  2007-12-02 18:18:04 
Re: "Plague" of Broken Links
Jim Burgess <burgess@[  2007-12-03 21:25:28 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Jul 25 14:51:04 CDT 2008.