Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Gaming > Diplomacy > Re: "Plague" of...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 4 of 4 Topic 1293 of 1394
Post > Topic >>

Re: "Plague" of Broken Links

by Jim Burgess <burgess@[EMAIL PROTECTED] > Dec 3, 2007 at 09:25 PM

Chris Babcock <cbabcock@[EMAIL PROTECTED]
> writes:

>> >The good news is that I'm starting to see some instances of "Item
>> >already checked" fla****ng by. This would mean that I probably don't
>> >have to wait another 16 hours or more for final results and that
>> >there may not be 3,000 broken links by the time it's finished
>> >checking the site...
>> 
>> What you mean, I think, is that you're doublecounting some of the 
>> "repeated" broken links where the same link exists in many places and
>> of course each one is broken?  Some things that are im****tant.... how
>> HUGE the site is, how many links there are!!!  And is 3% large or
>> small?  I think that's about what I would have thought it was.  I
>> think that's neither large nor small but about what such a huge site,
>> originally built over ten years ago, has as a legacy.

>There turned out to be a nasty bit of recursion in the site - items
>that are in the "DipPouch" folder on the site are physically located in
>the root folder... as is the link (in the filesystem sense) that
>redirects the traffic there. It's a clever thing to do in a couple
>ways. It was just inconvenient for this project. In the end, I had to
>'break' that link in order to successfully crawl the site with the
>spider. Otherwise I would have gotten ever deeper levels of URLs that
>look like:

>"diplom.org/DipPouch/DipPouch/DipPouch/DipPouch/DipPouch/DipPouch/..."

I see.  And yes, knowing how the directories are formed, I see why that 
happened.  This partly has to do with the fact that the Pouch is the site 
and the Szine.

>The final result is that there are 37776 links to 5892 unique targets
>(including images). There are 4824 good links and 945 bad links; The
>number for 'bad links' unfortunately including those links that needed
>to be tem****arily disabled. I'll be contacting the maintainers
>individually with specifics on their sections as soon as I can generate
>re****ts. 

Ohh, now that's not so good.  That's more like 18%, which is getting high,

depending on how many of them are the "tem****arily disabled" links.

I look forward to my re****t for the postal section, I know about some of 
the bad links and they just need to be deleted.

>So I've got a tool that can help find the broken links (with some human
>intervention), but the statistics are more obviously useless than is
>normally the case (and the recursion makes it difficult to *****s the
>size of the site too).

>Chris 

I see.

Jim-Bob
 




 4 Posts in Topic:
"Plague" of Broken Links
Chris Babcock <cbabcoc  2007-12-02 08:57:35 
Re: "Plague" of Broken Links
Jim Burgess <burgess@[  2007-12-03 00:28:10 
Re: "Plague" of Broken Links
Chris Babcock <cbabcoc  2007-12-02 18:18:04 
Re: "Plague" of Broken Links
Jim Burgess <burgess@[  2007-12-03 21:25:28 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat Jul 26 3:27:48 CDT 2008.