It’s not an easy process.
Domain Name Wire is 15 years old and has over 12,000 posts.
With so many posts, and so many of them years old, it makes sense that a lot of links included in posts are no longer valid. Services I’ve written about have gone defunct, news sources I’ve linked to have closed their doors, and third-party files have been removed.
I was reading an SEO guide the other day that said it makes sense to clean up non-working links. This idea certainly makes sense, if nothing else, from a user perspective. And no site owner wants a link on their site that used to resolve to a useful website to resolve to one that’s now filled with “adult” photos.
But where do you start when you have tens of thousands of outbound links?
I came across a service designed to look for bad outbound links. Dr. Link Check scans your site for links that fit into three main categories:
- Broken links, e.g., 404 errors
- Links to malware/blacklisted sites
- Links to parked pages
The monthly service charges varying fees depending on the number of links on your site, the types of “bad” links it identifies, and how often it crawls for link changes.
It’s difficult to know how many links your site has before you get started. I optimistically paid for 20,000 links for $21.00 per month and began the scan.
20,000 links weren’t enough. Unfortunately, there’s no way to know how many links you have until they are all crawled. Dr. Link Checker also doesn’t let you add links to your plan and continue where you left off; you have to scan everything again. (The company provided some helpful link scanning rules I can apply to keep the service from checking internal links.)
Of those 20,000 links, Dr. Link Checker identified 481 problematic links. 379 of these were labeled a “broken,” although a subset of these were links that the service couldn’t scan because the linked pages block bots. Of the remaining, most were 404 pages.
105 of the links led to parked domains. For most websites, this means the site linked to no longer exists. In the unique case of a blog about domain names, I’d say about 1/3 of the parked page links originally linked to active sites. The others were examples of parked pages that I purposefully linked to.
I learned a lot by looking at the errors. Many of them were for defunct services. A lot were for sites that I linked to liberally, such as Domain Name News, that no longer exist. Links to online auctions almost always break over time. Many of the broken links are in the comments.
Some of the observations will help me when linking in the future. For example, it seems that quarterly earnings releases on Yahoo Finance break after some time, so I’ll no longer link to earnings releases there.
With this context in mind, the next challenge was figuring out what to do with the broken links.
Given the number, I decided not to worry about links in the comments. WordPress sets these to ‘no follow’ by default. Google isn’t following them, and I assume most users who click on them understand that it’s user-generated content. Dr. Link Check allows you to filter out no follow links.
Other links are easy to clean: there’s no need to keep a broken link to an online auction from 2007. These can be safely removed.
But what about when links added context? Often, I’d quote another site’s article, and that article no longer exists. Or a forum post. How can you explain that this content existed and no longer does? Does it make sense to remove these links, or should you keep them in the text so that people understand that you linked to another site for more context?
For now, I added a note in some posts that states that I removed a link because it’s no longer active. From now on, I think I’ll change this so that invalid links point to a page on Domain Name Wire that explains that a link was removed because it no longer works.
The other issue I ran into is the sheer volume of bad links and the time it takes to correct them. Ideally, if you can get on top of the issue, it should be easier to manage going forward. A service like Dr. Link Check surfaces newly-broken links, which should be manageable on a monthly basis.
Getting on top of invalid links will take time, but I think it’s a worthwhile practice for a 15-year-old blog.