Crawlers crawl weird long urls

r.nijkamp

I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.

For example (to be clear):

there is a page: www.website.com/dogs/dog.html

but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.html

what can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website

r.nijkamp

Answer from Screaming Frog!

The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).

To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.

r.nijkamp

Wow, Big mistakes are made one Home

maybe because of the .aspx. extension? alle pages have seo-friendly urls

Thanks Wesley and Paddy Displays

WesleySmits

I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.

It's the bottom left block which causes this link. This way you will get a big nesting effect.

PaddyDisplays

OK found one problem

on this page

http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx

you have a link to

http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx

which i think should be

http://www.odin-groep.nl/Home/ctl/OverOdin/LesscherIT.aspx

PaddyDisplays

ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs

You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.

eg

http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx

is being linked from

http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)

You just have to follow that trail till you find the source of the problem

r.nijkamp

every link, except the hompage itself

bugurl.png

r.nijkamp

I can't see any source:

The pages are like:

| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 |

WesleySmits

Which URL(s) is/are causing problems?

r.nijkamp

please be free to check: http://tinyurl.com/lox7le9

WesleySmits

You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.

But curious to see what caused the problem

PaddyDisplays

I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.

r.nijkamp

That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!

WesleySmits

Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.

Maybe you can send me the URL and i can take a look at it?

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Crawlers crawl weird long urls

Browse Questions

Explore more categories

Related Questions

WEbsite cannot be crawled

Block Moz (or any other robot) from crawling pages with specific URLs

What to do with a site of >50,000 pages vs. crawl limit?

Special Characters in URL & Google Search Engine (Index & Crawl)

How to track data from old site and new site with the same URL?

Site Explorer - No Data Available for this URL

Duplicate page titles are the same URL listed twice

How long does a crawl take?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved