Improving e-mail IP reputation with automation
The Ban
Thanks, Microsoft
This past Saturday, I replied to a long-running e-mail thread with an invitation to join a Zoom meeting for Sunday School class the next day.
That’s pretty standard right? It’s literally copy/pasted from Zoom’s website. I’m certain that millions of such messages have been sent this past year. Wrong I was! My IP has had nothing but good e-mails coming from it for the past few years, but this one stepped over the line. Microsoft, which owns Live, MSN, Outlook, and Hotmail, decided to block my server.
I host my own e-mail server using Mailcow that fronts a combination of personal and business domains. Being blocked by Microsoft isn’t just an inconvenience: It will actively hurt business.
I sent an e-mail to Outlook support and I got a rejection back:
We have completed reviewing the IP(s) you submitted. The following table contains the results of our investigation.
Not qualified for mitigation
54.81.217.253/32
Our investigation has determined that the above IP(s) do not qualify for mitigation.
Please ensure your emails comply with the Outlook.com policies, practices and guidelines found here: http://mail.live.com/mail/policies.aspx.
Of course, I replied back. Then followed a series of messages where Microsoft refused to provide evidence of my violation of their anti-spam policies. Finally, I wrote a scathing letter to Microsoft, and left the case alone. A few days later, a few tickets later, and after having a second IP banned (see below), someone picked up my ticket. They finally reviewed my activity, and unblocked my original IP.
The fact is that large e-mail providers like Google and Microsoft don’t really care if your small-time mail server gets blocked. For our small-time mail server, though, that’s a huge issue. I decided that I should do something about it.
An Idea
While investigating the blacklist issue, I came across a comment from @EricThi on GitHub detailing how he managed to keep his IP reputation positive. He had a script that downloads a few jokes from an API and then automatically e-mails them from his Mailcow installation to a Microsoft-hosted e-mail address that he owns. He then logs in (manually, I presume) once a week to mark all the e-mails as read.
I thought this was a phenomenal idea. It keeps your reputation up by sending a couple hundred “good” e-mails per day. There’s only one problem: I was re-blocked while trying this out on a new IP address! At this point I was at the end of the rope.
Maybe the e-mails weren’t varied enough? Maybe my IP was too new? I will never know why Microsoft blocked my brand new IP. Luckily, Microsoft unbanned my old IP during this time, so I quickly swapped my e-mail server back to it and began working on some automation of my own.
Enron to the rescue!
Instead of sending simple jokes, I thought it would be best to find a real-world data set of known good e-mails. I searched the internet and I found two datasets on Kaggle:
- 7,945 of Hillary Clinton’s e-mails. Ehhhh maybe I shouldn’t use the former Secretary of State’s hacked e-mails.
- A whopping 517,401 e-mails from Enron, the company famous for shredding evidence. The data set was originally made public by the federal government.
Enron’s data should be perfect for me. I can use a large data set of real e-mails to get a lot of varied content sent from my server to my personal MSN e-mail.
Using the data
The idea is relatively straightforward:
- Set up an automated e-mail account
- Periodically pull a random e-mail from the database
- Send it with the original subject to myself
- Periodically login to the server (IMAP/POP3) to “read” the e-mails and clear them out
Before I could do any of this, I needed to get the data into a format I could use. Following are the steps I took.
Compressing the data
I didn’t want to load over a gigabyte of data into memory each time I needed a single e-mail. Thus, I loaded the CSV into Pandas, converted it to Parquet, and compressed it using Brotli, which gave me better compression that gzip.
First, install required libraries with pip3 install pyarrow pandas
|
|
This one-time conversion reduces the file size from 1.43GB to 246.2MB.
Randomly selecting e-mails
Next, I wrote this Python script to select n number of e-mails at random:
|
|
This script simply reads the e-mail dump, selects some at random, and writes them in the following format in text files:
Subject: Test e-mail
Body of e-mail
Sending the e-mails
Next, I configured msmtp, a simple mail sending utility:
/etc/msmtprc
:
|
|
To send an e-mail with this configuration, simply: cat emails/0.txt | msmtp -a blackwell destination@email.com
! Easy.
“Reading” the e-mails
Finally, I configured fetchmail to fetch, read, and delete the e-mails that I receive:
~/.fetchmailrc
:
|
|
Note: I had to enable POP3 access from the Outlook web UI. When connecting with IMAP, I kept getting blocked.
To fetch, read, and delete e-mail, all you have to do is run the command fetchmail
with no arguments. It’s that simple!
Putting it all together
Stitching it all together was the easy part. My cron jobs (replace <user>
with your user):
0 3 * * * <user> python3 /home/<user>/random-mail-sender/random_email.py 300;
Every day, at 3AM UTC, remove the e-mails, and generate 300 new ones randomly.
3,8,13,18,23,28,33,38,43,48,53,58 * * * * <user> cat $(ls /home/<user>/random-mail-sender/emails/*.txt | head -n 1) | msmtp -a blackwell destination@email.com; rm $(ls /home/<user>/random-mail-sender/emails/*.txt | head -n 1);
Selects the first e-mail in the folder, cat
s it out, and sends it to my destination e-mail address. Then, it removes the first e-mail from the folder.
I run this process every 5 minutes, 24 hours a day. That amounts to roughly 288 e-mails sent per day, which is why my first cron job generates 300 e-mails.
9 * * * * <user> fetchmail
On the 9th minute of every hour, I run fetchmail
to retrieve and delete all the e-mails.
Results
I have 288 randomly selected e-mails being sent to Microsoft servers each day that are actively read by a user. Those e-mails should help ensure that my IP stays off the blacklist for the foreseeable future.
Thanks, Enron!