The Ban

Thanks, Microsoft

This past Saturday, I replied to a long-running e-mail thread with an invitation to join a Zoom meeting for Sunday School class the next day.

That’s pretty standard right? It’s literally copy/pasted from Zoom’s website. I’m certain that millions of such messages have been sent this past year. Wrong I was! My IP has had nothing but good e-mails coming from it for the past few years, but this one stepped over the line. Microsoft, which owns Live, MSN, Outlook, and Hotmail, decided to block my server.

I host my own e-mail server using Mailcow that fronts a combination of personal and business domains. Being blocked by Microsoft isn’t just an inconvenience: It will actively hurt business.

I sent an e-mail to Outlook support and I got a rejection back:

We have completed reviewing the IP(s) you submitted. The following table contains the results of our investigation.

Not qualified for mitigation
54.81.217.253/32
Our investigation has determined that the above IP(s) do not qualify for mitigation.

Please ensure your emails comply with the Outlook.com policies, practices and guidelines found here: http://mail.live.com/mail/policies.aspx.

Of course, I replied back. Then followed a series of messages where Microsoft refused to provide evidence of my violation of their anti-spam policies. Finally, I wrote a scathing letter to Microsoft, and left the case alone. A few days later, a few tickets later, and after having a second IP banned (see below), someone picked up my ticket. They finally reviewed my activity, and unblocked my original IP.

The fact is that large e-mail providers like Google and Microsoft don’t really care if your small-time mail server gets blocked. For our small-time mail server, though, that’s a huge issue. I decided that I should do something about it.

An Idea

While investigating the blacklist issue, I came across a comment from @EricThi on GitHub detailing how he managed to keep his IP reputation positive. He had a script that downloads a few jokes from an API and then automatically e-mails them from his Mailcow installation to a Microsoft-hosted e-mail address that he owns. He then logs in (manually, I presume) once a week to mark all the e-mails as read.

I thought this was a phenomenal idea. It keeps your reputation up by sending a couple hundred “good” e-mails per day. There’s only one problem: I was re-blocked while trying this out on a new IP address! At this point I was at the end of the rope.

Maybe the e-mails weren’t varied enough? Maybe my IP was too new? I will never know why Microsoft blocked my brand new IP. Luckily, Microsoft unbanned my old IP during this time, so I quickly swapped my e-mail server back to it and began working on some automation of my own.

Enron to the rescue!

Instead of sending simple jokes, I thought it would be best to find a real-world data set of known good e-mails. I searched the internet and I found two datasets on Kaggle:

7,945 of Hillary Clinton’s e-mails. Ehhhh maybe I shouldn’t use the former Secretary of State’s hacked e-mails.
A whopping 517,401 e-mails from Enron, the company famous for shredding evidence. The data set was originally made public by the federal government.

Enron’s data should be perfect for me. I can use a large data set of real e-mails to get a lot of varied content sent from my server to my personal MSN e-mail.

Using the data

The idea is relatively straightforward:

Set up an automated e-mail account
Periodically pull a random e-mail from the database
Send it with the original subject to myself
Periodically login to the server (IMAP/POP3) to “read” the e-mails and clear them out

Before I could do any of this, I needed to get the data into a format I could use. Following are the steps I took.

Compressing the data

I didn’t want to load over a gigabyte of data into memory each time I needed a single e-mail. Thus, I loaded the CSV into Pandas, converted it to Parquet, and compressed it using Brotli, which gave me better compression that gzip.

First, install required libraries with pip3 install pyarrow pandas

1
2
3
4


import pandas as pd

df = pd.read_csv("emails.csv")
df.to_parquet('emails.parquet.brotli', compression='brotli')

This one-time conversion reduces the file size from 1.43GB to 246.2MB.

compression

Randomly selecting e-mails

Next, I wrote this Python script to select n number of e-mails at random:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


# Usage: python3 random_email.py [number-of-emails]
# python3 random_email.py <-- get 1
# python3 random_email.py 50 <-- get 50

from pandas import read_parquet
import re
import sys
import os

emails_dir = f'{sys.path[0]}/emails'
for f in os.listdir(emails_dir):
    os.remove(os.path.join(emails_dir, f)) # Delete pre-existing emails

desired_emails = int(sys.argv[1]) if len(sys.argv) > 1 else 1
formatted_emails = []

emails = read_parquet(f'{sys.path[0]}/emails.parquet.brotli', columns=['message'])

while len(formatted_emails) < desired_emails:
  sampled_email = emails.sample()
  message = sampled_email.iloc[0]['message']

  subject = re.search('^Subject: (.*)$', message, re.I|re.M).group(1).strip()
  body = re.search('^X-FileName: .*?$(.*)', message, re.I|re.M|re.S).group(1).strip()

  if subject == "" or body == "": # Don't send any e-mails with empty subjects or bodies
    continue

  formatted_emails.append(f'''Subject: {subject}\n\n{body}''')

for index, email in enumerate(formatted_emails):
  with open(f'{sys.path[0]}/emails/{index}.txt', "w") as f:
    f.write(email)

This script simply reads the e-mail dump, selects some at random, and writes them in the following format in text files:

Subject: Test e-mail

Body of e-mail

Sending the e-mails

Next, I configured msmtp, a simple mail sending utility:

/etc/msmtprc:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


defaults
auth        on
tls         on
logfile     ~/.msmtp.log
host        mail.blackwell.email
port        587

account     blackwell
from        my@address.com
user        my@address.com
password    blah

To send an e-mail with this configuration, simply: cat emails/0.txt | msmtp -a blackwell destination@email.com! Easy.

“Reading” the e-mails

Finally, I configured fetchmail to fetch, read, and delete the e-mails that I receive:

~/.fetchmailrc:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


set bouncemail
set no spambounce
set softbounce
set properties ""
defaults:
  antispam -1
  batchlimit 100

poll outlook.office365.com with proto POP3
  user 'destination@email.com' there with password 'blah'
  ssl

Note: I had to enable POP3 access from the Outlook web UI. When connecting with IMAP, I kept getting blocked.

To fetch, read, and delete e-mail, all you have to do is run the command fetchmail with no arguments. It’s that simple!

Putting it all together

Stitching it all together was the easy part. My cron jobs (replace <user> with your user):

0 3 * * * <user> python3 /home/<user>/random-mail-sender/random_email.py 300;

Every day, at 3AM UTC, remove the e-mails, and generate 300 new ones randomly.

3,8,13,18,23,28,33,38,43,48,53,58 * * * * <user> cat $(ls /home/<user>/random-mail-sender/emails/*.txt | head -n 1) | msmtp -a blackwell destination@email.com; rm $(ls /home/<user>/random-mail-sender/emails/*.txt | head -n 1);

Selects the first e-mail in the folder, cats it out, and sends it to my destination e-mail address. Then, it removes the first e-mail from the folder.

I run this process every 5 minutes, 24 hours a day. That amounts to roughly 288 e-mails sent per day, which is why my first cron job generates 300 e-mails.

9 * * * * <user> fetchmail

On the 9th minute of every hour, I run fetchmail to retrieve and delete all the e-mails.

Results

I have 288 randomly selected e-mails being sent to Microsoft servers each day that are actively read by a user. Those e-mails should help ensure that my IP stays off the blacklist for the foreseeable future.

emails

Thanks, Enron!

Clete Blackwell II

Improving e-mail IP reputation with automation