Many inherent legacy issues with WordPress open the door to bad actors. It’s an amazing CMS, but I wish the core team would focus on correcting issues plaguing WordPress for over a decade. One such issue that continues to plague WordPress is comment SPAM. It’s so bad that I’ve disabled comments altogether here on Martech Zone.
While comments are disabled and removed from view, it doesn’t stop these bots from attempting to publish comment spam anyway. There are a ton of visits to my pages with the querystring replytocom
on them, attempting to post spam. Also, I see these URLs throughout Google Search Console, which drives me crazy as I’m trying to correct real issues with the site.
replytocom
The replytocom
parameter is commonly found in WordPress URLs when users reply to comments directly on a post. When someone replies to a comment, WordPress appends ?replytocom=[comment ID]
to the URL, which allows the site to load the page with the relevant comment thread expanded. The comment ID is just a number, so it’s simple to build a bot that just iterates numerically to try and post spam.
WordPress could harden itself by creating a dynamic querystring parameter name and hashing the number… I’m not sure why they don’t. Granted, these URLs are likely marked noindex
, they’re still publicly available for abuse and synonymous across every WordPress installation, so they’re easily scripted and abused.
The replytocom
parameter attracts spam bots and exploits them to post comment spam. These URLs also end up in Google Search Console under the Excluded section as duplicate pages or URLs with query parameters.
This can create several issues:
- Index Bloat: Since each
replytocom
Search engines treat a URL as a unique page; this can significantly increase the number of indexed pages, making it harder for search engines to find and index the main content on your site. - Duplicate Content: Search engines may view
replytocom
URLs as duplicate content because they are only variations of the same page. This can negatively affect SEO rankings. - Spam Comment Generation: Bots can use these URLs to target comment sections for spam, resulting in an influx of spam comments that need to be filtered or deleted.
- Crawler Inefficiency: Search engine crawlers may waste resources crawling unnecessary
replytocom
URLs instead of focusing on more valuable pages.
.htaccess
The .htaccess
file is a powerful configuration file used by Apache-based web servers to manage and control various aspects of a website. Located in the root directory, this file allows site owners to perform URL redirection, access control, and performance optimization tasks.
By placing specific rules within .htaccess
, you can modify the server’s behavior on a per-directory basis, making it an essential tool for improving website security, managing redirects, and, in this case, handling issues like unwanted URL parameters that can impact SEO and attract spam bots.
To address these issues, you can use the .htaccess
file to strip out the replytocom
parameter from URLs. Below is the .htaccess
code and a breakdown of how it works. I’ve placed this before the default WordPress .htaccess
section.
RewriteEngine On
# If replytocom is the only parameter, redirect to the URL without any query string
RewriteCond %{QUERY_STRING} ^replytocom=[^&]*$ [NC]
RewriteRule ^(.*)$ /$1? [R=301,L]
# If replytocom is at the beginning, with other parameters following
RewriteCond %{QUERY_STRING} ^replytocom=[^&]*&(.*)$ [NC]
RewriteRule ^(.*)$ /$1?%1 [R=301,L]
# If replytocom is at the end, preceded by other parameters
RewriteCond %{QUERY_STRING} ^(.*)&replytocom=[^&]*$ [NC]
RewriteRule ^(.*)$ /$1?%1 [R=301,L]
# If replytocom is in the middle, with parameters before and after
RewriteCond %{QUERY_STRING} ^(.*)&replytocom=[^&]*&(.*)$ [NC]
RewriteRule ^(.*)$ /$1?%1&%2 [R=301,L]
How the Code Works
Each block in this code is designed to handle a specific pattern of the replytocom
parameter:
- RewriteEngine On: This command enables URL rewriting, allowing subsequent
RewriteCond
andRewriteRule
commands to execute. - First Condition: If
replytocom
is the only parameter in the URL, it matches^replytocom=[^&]*$
, which looks forreplytocom
alone without any other parameters. It then rewrites the URL, removing all query parameters and redirecting to the URL with a301
status code (permanent redirect). - Second Condition: If
replytocom
is at the beginning of the query string and is followed by other parameters, it matches^replytocom=[^&]*&(.*)$
. The rewrite rule then removesreplytocom
but retains the other parameters. - Third Condition: If
replytocom
is at the end of the query string and preceded by other parameters, it matches^(.*)&replytocom=[^&]*$
. This rule removes thereplytocom
parameter while keeping the preceding parameters. - Fourth Condition: If
replytocom
is sandwiched between other parameters, it matches^(.*)&replytocom=[^&]*&(.*)$
. This rule removesreplytocom
while preserving the parameters before and after it.
This .htaccess
configuration helps ensure that search engines and visitors are redirected to a cleaner, more efficient URL version without the replytocom
parameter. It reduces index bloat, mitigates duplicate content issues, and helps improve overall SEO performance.
Does this Eliminate Comment Spam?
Using .htaccess
to remove the replytocom
parameter can help reduce certain types of comment spam, but it won’t completely eliminate it. This method specifically targets spam bots that exploit replytocom
URLs to flood comment sections with spam. By stripping out this parameter, you can reduce the number of entry points for bots that rely on this URL structure. However, determined spam bots may still find other ways to access and abuse the comment form.
To further combat comment spam, consider implementing these additional strategies:
- Enable CAPTCHA: Adding a CAPTCHA to your comment form can block automated bots.
- Use Anti-Spam Plugins: WordPress plugins like Cleantalk are highly effective at filtering out spam.
- Require Login for Comments: Restrict commenting to logged-in users only, which can reduce spam by limiting access to registered users.
- Moderate Comments: Enable manual moderation or set up keyword-based filters to catch suspicious comments before they go live.
While .htaccess
is a great first line of defense, and combining it with other anti-spam measures is the most effective way to protect your WordPress site from spam comments.