Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Can you use Screaming Frog to find all instances of relative or absolute linking?
-
My client wants to pull every instance of an absolute URL on their site so that they can update them for an upcoming migration to HTTPS (the majority of the site uses relative linking). Is there a way to use the extraction tool in Screaming Frog to crawl one page at a time and extract every occurrence of _href="http://" _?
I have gone back and forth between using an x-path extractor as well as a regex and have had no luck with either.
Ex. X-path: //*[starts-with(@href, “http://”)][1]
Ex. Regex: href=\”//
-
This only works if you have downloaded all the HTML files to your local computer. That said, it works quite well! I am betting this is a database driven site and so would not work in the same way.
-
Regex: href=("|'|)http:(?:/{1,3}|[a-z0-9%])|[a-z0-9.-]+.
This allows for your link to have the " or ' or nothing between the = and the http If you have any other TLDs you can just keep expanding on the |
I modified this from a posting in github https://gist.github.com/gruber/8891611
You can play with tools like http://regexpal.com/ to test your regexp against example text
I assumed you would want the full URL and that was the issue you were running into.
As another solution why not just fix the https in the main navigation etc, then once you get the staging/testing site setup, run ScreamingFrog on that site and find all the 301 redirects or 404s and then use that report to find all the URLs to fix.
I would also ping ScreamingFrog - this is not the first time they have been asked this question. They may have a better regexp and/or solution vs what I have suggested.
-
Depending on how you've coded everything you could try to setup a Custom Search under Configuration. This will scan the HTML of the page so if the coding was consistent you could put something like href="http://www.yourdomain.com" as the string it's looking for and in the Custom tab on the resulting pages it'll show you all the ones that match the string.
That's the only way I can think of to get Screaming Frog to pull it but looking forward to anyone else's thoughts.
-
If you have access to all the website's files, you could try finding all instances in the directory using something like Notepad++. Could even use find and replace.
This is how I tend to locate those one-liners among hundreds of files.
Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Question about a Screaming Frog crawling issue
Hello, I have a very peculiar question about an issue I'm having when working on a website. It's a WordPress site and I'm using a generic plug in for title and meta updates. When I go to crawl the site through screaming frog, however, there seems to be a hard coded title tag that I can't find anywhere and the plug in updates don't get crawled. If anyone has any suggestions, thatd be great. Thanks!
Technical SEO | | KyleSennikoff0 -
Can OG titles be used as a substitute for Meta titles
We use og (open graph) titles in lieu of meta titles. Is there any downside to using just one. Should we be using both og and meta titles on our page. Appreciate any insight. Himanshu
Technical SEO | | patilhimanshu0 -
Updating inbound links vs. 301 redirecting the page they link to
Hi everyone, I'm preparing myself for a website redesign and finding conflicting information about inbound links and 301 redirects. If I have a URL (we'll say website.com/website) that is linked to by outside sources, should I get those outside sources to update their links when I change the URL to website.com/webpage? Or is it just as effective from a link juice perspective to simply 301 redirect the old page to the new page? Are there any other implications to this choice that I may want to consider? Thanks!
Technical SEO | | Liggins0 -
Links from Instructables.com?
This is a silly newbie question. But will posting on www.instructables.com with some valuable content and url link back to my site help with "linking"? Or do they put a no-follow on all links on their site? Thanks for answering! Ron
Technical SEO | | yatesandcojewelers0 -
Diagnosing Canonical Errors Is Screaming frog reliable?
Morning from suny & warm wetherby UK 🙂 On this page http://www.goldsboroughestates.co.uk/how-we-care-for-you/right-to-manage/ screaming frog is citing a canonical error but I'm confused as this piece of code is in place: http://www.goldsboroughestates.co.uk/About/right-to-manage" /> So my question is please - "Does this page http://www.goldsboroughestates.co.uk/how-we-care-for-you/right-to-manage/ have a caninical error or is screaming frog useless? Other examples where screaming frog is picking up canonical errors include:
Technical SEO | | Nightwing
http://www.goldsboroughestates.co.uk/what-our-customers-say/right-to-manage/
http://www.goldsboroughestates.co.uk/buying-a-home/right-to-manage/ Oh forgot to say the preffered version is http://www.goldsboroughestates.co.uk/About/right-to-manage/ Any insights welcvome 🙂0 -
Effective use of hReview
Hi fellow Mozzers! I am just in the process of adding various reviews to our site (a design agency), but I wanted to use the ratings in different ways depending on the page. So for the home page and the services (branding, POS, direct mail etc) I wanted to aggregate relevant reviews (giving us an average of all reviews for the home page, an average of ratings from all brand projects and so on). Then, I wanted to put specific reviews on our portfolio pages, so the review relates specifically to that project. This is the easiest to do as the hReview generator is geared up for reviews that come from one source, but I can't find a way of aggregating the star ratings to make an average rating rich snippet. Anyone know where I can get the coding for this? Thanks in advance! Nick.
Technical SEO | | themegroup0 -
Can you mark up a page using Schema.org and Facebook Open Graph?
Is it possible to use both Schema.org and Facebook Open Graph for structured data markup? On the Google Webmaster Central blog, they say, "you should avoid mixing the formats together on the same web page, as this can confuse our parsers." Source - http://googlewebmastercentral.blogspot.com/2011/06/introducing-schemaorg-search-engines.html
Technical SEO | | SAMarketing1 -
Is link cloaking bad?
I have a couple of affiliate gaming sites and have been cloaking the links, the reason I do this is to stop have so many external links on my sites. In the robot.txt I tell the bots not to index my cloaked links. Is this bad, or doesnt it really matter? Thanks for your help.
Technical SEO | | jwdesign0