#vss365 today - Handling spam
We are all familiar with spam emails. It's that email you get every day about some topic that you don't care about nor want in your inbox. Every time it arrives, you roll your eyes, wish it were gone, and delete it with a sigh. ProTip™: click the unsubscribe link at the bottom of the email that's required by United States law to be present. You're welcome. And while such emails are annoying, such spam is only one type of email spam. Another type are spam email addresses. While it may or may not be as prevalent as spam messages, they are just as annoying and, depending on how are weaponized, may cost the individual or business more than wasted time from employees deleting messages.
This is the story of how #vss365 today ended up with a lot of spam addresses and how I had to deal with it. Why am I writing this? Because I wanted to document it for myself and for others in case it's ever wondered why things are the way they are. I also like to be transparent with what I'm doing.
To start, a quick recap of how the email notification system worked. Anyone who wanted to receive the #vss365 word prompt in their inbox shortly after it was posted would go to the site, enter their email address in a small form at the bottom of the page, and submit the form. That was it.
#vss365 today from October 31, 2020. Notice the simple form at the bottom of the page to sign up for the email notifications.
This type of email notification sign up system works well for these types of emails. Newsletters, notifications, or any form of auto-generated messages, typically use this type of process.
The Real Python email subscription box. Notice how it uses the same signup model vss365today originally used.
This signup method has worked since the site inception. It allowed the amount of emails sent out to grow and make an impact in the community quickly so others could get their hands on the newest prompt more quickly, which was and still is the primary purpose of the whole vss365today project.
Now, do spam or non-working addresses sneak into the mailing list from time to time? Yes, they do, and typically I check the sending stats to remove the bad addresses and save money on the Mailgun email bill (because both successful and failed sends cost money). I would normally not check this, but I had stopped using Mailgun's Email Validation API when it started costing more money than I was able to afford at the time.
Over 2020, the number of subscribers steadily increased. However, in early March 2021, they shot up. Waaaay up.
As you can see, the amount of total emails sent out started spiking on March 11. Incidentally, so did the amount of failed sends. At the peak, 679 total messages were successfully sent out in one day, with nearly 30 failed sends on the same day. Before the spike, 441 messages a day were being sent. This was clearly a flood of spam addresses. Most likely, a bot found my subscription form and proceeded to submit it with as many messages as it could. I discovered the spike on March 19. Because I did not have the time then to quickly handle this, I disabled the form to prevent any additions and removed all failing addresses.
Once I got an opportunity to look through the newly added addresses, I quickly confirmed that it was likely a spam bot and realized that I had no idea to determine the difference between spam and legitimate additions. The addresses looked real. Most of them were Gmail addresses. But there were also odd ones, such as a bellsouth[dot]net and juno[dot]com address (I remember dad having a Juno address when I was 6!).
A censored sample of some of the spam addresses added to the mailing list.
Because I could not determine what to keep and remove, and I needed to drop down the number of addresses in the mailing list. On March 26, I decided to delete all addresses added after March 10. It was the only course of action I felt I could take. That lowered and set the address count to 385 addresses, less than originally started. This happened because I also took the time to remove obviously spam (but successfully sending) addresses that had accumulated over time.
vss365today runs at full cost to me. As I've written before, I pay for running it and don't ask for donations. The March email bill ended up being nearly 18 dollars USD. It is usually 10 dollars USD. Yikes.
😬
It took about a month for me to get time to fix the subscription process. I ended up creating a basic math-based captcha. When you visit the new subscription page, two random numbers are generated, with one of them being spelled out. To add your email, you need to fill in the box with the correct answer. Upon submitting the form, your answer is checked for correctness. If it is correct and the address and passes the Mailgun Email Validation check, it is added to the mailing list. Is it simple? Yes. Could it be broken? Possibly, especially since I keep the number choices on the small end. But will it block the most basic spam additions? Most likely, and that was my goal. If larger numbers are required, then I can use them. If I need to change out my captcha method altogether later, I can do that too.
The new #vss365 today subscription process, showing the new math-based captcha.
You might be wondering why I went this route instead of using Google's reCAPTCHA, as that would have been the simple option. Simply put, I didn't want to because they personally annoy me a bit and I didn't want to put that annoyance on something as simple as a newsletter signup. Additionally, using Google products only contributes to their massive data store and consumer privacy violations practices, which I am increasingly not OK with. Finally, using reCAPTCHA required the form processing to use JavaScript for validation and submission. While this is not a bad thing, requiring JavaScript is something I have so far managed to avoid (which exception to the search page, where I considered it appropriate). If anyone visited the site with JavaScript disabled, they would be unable to get on the mailing list. I would be annoyed if I could not do such a basic thing if I disabled running JavaScript in my browser. So, I went with my simple, custom route.
And that's the story.