Deliverability

Email Deliverability Test Tools and Why Most Give You the Wrong Answer

The tool showing you a 10/10 score might be the reason your emails are landing in spam.

By Alex Berman - Apr 26, 2026 - 18 min read

The Score That Means Nothing

You run a test. Mail-Tester gives you a 10/10. You feel good. You hit send.

Your reply rate is 0.1%.

It is the most common deliverability problem in cold email right now - and the tool gave you false confidence before you burned your domain.

Email deliverability test tools vary wildly in what they measure and how useful that is. I see it constantly - comparison articles lumping them all together like they do the same job. They do not. Using the wrong tool for your use case is like checking your tire pressure before a race and ignoring the engine. The number looks fine. The car still does not run.

This article breaks down exactly what each category of tool measures, what it misses, and which tools to stack depending on whether you are a cold emailer, a newsletter sender, or a transactional email team.

Why Tools Give Conflicting Results

One of the most documented frustrations among cold email operators is running the same email through multiple tools and getting completely different answers. A sender tests one email and gets a 10/10 on Mail-Tester, a clean result from Unspam, but then runs it through GlockApps and sees 66% spam placement - for the exact same email and domain.

These tools are built to measure different things.

Every inbox placement tool uses a different seed list. A seed list is a collection of real email accounts that the tool controls. When you send a test email, it goes to those accounts, and the tool checks where it landed. The problem is that seed lists vary dramatically in composition.

Some tools weight their seed lists toward consumer accounts - personal Gmail, Yahoo, AOL, Hotmail. Others weight them toward professional accounts like Google Workspace and Microsoft Office 365. The filters on those two types of inboxes behave differently. The algorithms are different. The history those mailboxes have with your domain is different.

So when you get conflicting results, neither tool is necessarily wrong. They are each measuring placement into a different population of inboxes. And if your prospects live in Microsoft 365 corporate inboxes, a tool measuring your placement into Yahoo and personal Gmail accounts is giving you structurally irrelevant data.

EmailTooltester, which has run deliverability tests for years using seed list tools, eventually retired their standard seed list testing methodology after discovering that test results often fluctuated depending on the seed list and test timing - and the results did not always reflect real-world inbox performance for everyday senders.

Threads in r/coldemail and r/Emailmarketing have documented this repeatedly. One thread described two different tools showing 85% inbox placement versus 40% for the exact same campaign. The tools were not malfunctioning. They were measuring different things.

The 5 Types of Email Deliverability Test Tools

I see this every week - comparison articles treating all five categories of tools as if they compete. They do not. They solve different problems. Using the wrong category of tool for your situation is how you end up with a false sense of security.

Here is how to think about the five categories before you spend a dollar on any of them.

Category 1 - Infrastructure and DNS Checkers

These tools check whether your technical setup is correct. SPF record configured properly? DKIM signing active? DMARC policy in place? Mail server reverse DNS set up? Is your domain or IP on any blacklists?

These are diagnostic tools. They tell you if something is broken, but not where your emails are landing in real inboxes.

MxToolbox is the most-used tool in this category. It is a free online diagnostic platform that performs DNS lookups, MX record checks, SMTP diagnostics, blacklist monitoring, and email authentication verification. The free tier lets you scan your domain or sending IP against over 100 DNS-based blacklists in under 60 seconds.

Infrastructure configuration is what MxToolbox checks - whether your setup is built to allow delivery. That is an essential first step - but it is only the first step. If your SPF, DKIM, and DMARC all pass and your IP is clean, you still have no idea whether you are landing in the inbox or spam folder of actual recipients.

Think of MxToolbox as the preflight checklist. It tells you the plane is airworthy. It does not tell you whether you will land at the right airport.

Google Postmaster Tools belongs here too, though it does more than infrastructure checks. It gives you your domain reputation score with Gmail, your spam rate, your delivery errors, and your IP reputation - all based on real Gmail user data. It is free and genuinely useful. The catch: it only shows Gmail data, tells you nothing about Outlook or other providers, and only populates data once you are sending at some volume. It is a rearview mirror, not a windshield. But it is the most honest signal available about your Gmail standing.

Microsoft SNDS (Smart Network Data Services) is the Microsoft equivalent - free, and shows you how your sending IPs are classified by Outlook and Hotmail. Set it up alongside Postmaster Tools if you are sending to any volume of Microsoft addresses.

Category 2 - Spam Score Testers

These tools check your email content and headers against spam filter rules. You send a test email to a unique address, and the tool returns a score based on SpamAssassin rules, blacklist checks, authentication pass/fail, and similar signals.

Mail-Tester is the most popular tool in this category. It is free, fast, and gives you a score out of 10. Millions of people use it. It is also the most frequently misunderstood tool in the space.

Mail-Tester checks spam filter rules and your technical configuration. It does not send your email to real inboxes and measure where it lands. A perfect 10/10 means your email passes rule-based filters and your authentication is set up. It says nothing about your domain reputation with Gmail or Outlook, nothing about your sending history, and nothing about whether a real recipient will see your email in their inbox.

Mail-Tester even includes a section in its own documentation titled something to the effect of a clean score not guaranteeing inbox placement - because this confusion is that common.

Mail-Tester is useful for diagnosing configuration problems. It is not a deliverability test. Treating it as one is how domains get burned.

Unspam.email is similar in function but adds an AI-generated heatmap of your email, showing where readers attention would go and flagging design elements that might trigger filters. It offers a free tier and is useful for content optimization - but like Mail-Tester, it does not measure actual inbox placement with real seed accounts.

Category 3 - Inbox Placement Monitors

These tools send your email to a managed network of real email accounts and report back where the email actually landed - inbox, promotions tab, spam folder, or missing entirely.

This is the closest thing to a real deliverability test. It is also where the seed list composition problem matters most.

GlockApps is the dominant tool in this category. It tests placement across 70+ seed addresses spanning Gmail, Outlook.com, Microsoft 365, Yahoo, AOL, and other providers. It also runs your email through five spam filters, checks authentication, monitors your domain and IP against 50+ blacklists, and provides DMARC analytics. Pricing starts at $59/month for the Essential plan with 360 test credits. Standalone credit packs start around $16.99 for 3 tests.

GlockApps is strong for newsletter teams. The Gmail tab prediction feature - forecasting whether your email lands in Primary versus Promotions - is particularly reliable. One practitioner noted that GlockApps predictions matched actual open rates in their ESP closely enough to use as a planning tool for campaign decisions.

But GlockApps has a documented limitation for cold emailers. Its seed list was built for newsletter and bulk email testing. Per MailReach data, GlockApps uses only 1-2 accounts each for Google Workspace and Office 365 in its seed list. That is a tiny sample for the environment where your cold email prospects actually live.

Cold email prospects are not on personal Gmail or Yahoo. They are on corporate Microsoft 365 accounts or Google Workspace accounts managed by their IT departments. Those inboxes have stricter filtering, different sender reputation calculations, and different behavioral signals than consumer accounts. Measuring placement into personal Gmail accounts and calling it a B2B deliverability test is solving the wrong problem.

Saleshandy Inbox Radar addresses this directly. It is purpose-built for cold outreach and tests placement across 40+ professional inboxes - specifically Google Workspace and Microsoft Office 365 accounts. The ESP-to-ESP Placement Report shows exactly how your email performs across different sending environments, which lets you identify which sending accounts perform best for specific prospect segments.

MailReach takes a similar approach for B2B accuracy, focusing its seed list on Google Workspace and Office 365. It also includes email warmup as part of the platform, combining placement testing with reputation building - something GlockApps does not include.

Validity Everest is the enterprise-grade option in this category. It claims the largest global seed list of any deliverability tool and bundles inbox placement testing, engagement analytics, competitor benchmarking, and Sender Score monitoring. The entry plan starts at $20/month for 5 placement tests. The next meaningful tier is $525/month. Enterprise contracts run significantly higher. Unless you are operating an enterprise ESP or a high-volume newsletter business, Everest is more tool than you need.

Allegrow takes a different approach in this category. Rather than testing placement as a snapshot, it focuses on ongoing sender reputation building for B2B outbound. It monitors your sending health continuously rather than providing one-time diagnostic snapshots. Plans start around $30 per user per month and it holds a 4.7 rating on G2.

Category 4 - Rendering Testers

These tools check how your email looks across different email clients, devices, and operating systems. Does it render properly in Apple Mail versus Outlook 2019 versus Gmail on Android?

Litmus is the standard here. It is used by 80% of the Fortune 100 and serves 700,000+ marketing professionals. Plans start at $99/month.

Rendering is what Litmus tests. An email can render beautifully in Litmus and still land in spam. An email can render poorly on mobile and still hit the inbox every time, because deliverability and visual presentation are entirely separate problems. Rendering testers are essential for marketing email teams who care about visual presentation. They are not deliverability tools and should not be evaluated as such.

Email on Acid is the main alternative in this space, with similar functionality at a slightly lower price point. Both tools are irrelevant to inbox placement and belong in a different conversation than deliverability testing.

Category 5 - Email Verification Tools

These tools check whether an email address is valid - does the mailbox exist, will it bounce, is it a known spam trap, is it risky to send to?

ZeroBounce and NeverBounce are the most commonly used tools here. They verify list hygiene before you send, which indirectly protects your deliverability by keeping your bounce rate low. High bounce rates damage your sender reputation quickly.

One operation sending 200,000 cold emails per month reported verifying over 300,000 addresses to hit that number - because not every email scraped from any source is valid. That verification cost alone ran to $2,700 per month with standard tools, or over $45,000 per year. The economics of list verification at scale are significant and worth optimizing carefully.

Whether an email address is deliverable is what these tools measure. Your sending domain health and where your emails land once sent are outside their scope entirely.

The Cold Emailer Problem with Every Tool on This List

Here is something almost no tool comparison article explains: most email deliverability test tools were built for newsletter and bulk marketing email. The seed list methodology, the spam filter checks, the content scoring - all of it is calibrated against the patterns of opt-in marketing email. Cold email operates by entirely different rules.

Newsletter senders have permission. They send to engaged lists. Mailbox providers treat them differently than they treat unsolicited outbound. A Gmail inbox that has opened and clicked newsletters for months will receive future emails from that sender very differently than a cold prospect corporate account that has never seen your domain before.

When a cold emailer runs a GlockApps test, they are checking placement into inboxes that have no history with their domain, weighted toward consumer mailboxes. The result tells you something. It just does not tell you what happens when your actual cold email hits a corporate Outlook account belonging to a VP of Sales who has never heard of you.

The signal a cold emailer wants is this: how does my sending domain perform in corporate Microsoft 365 and Google Workspace inboxes when sending to people with zero prior relationship? Very few tools measure that accurately. Saleshandy Inbox Radar and MailReach are the closest options built specifically for that use case.

This is why practitioners running B2B cold outreach at volume stack multiple tools - using a B2B-focused placement tool alongside Google Postmaster Tools for the Gmail reputation signal, plus MxToolbox for infrastructure verification as a baseline.

Stack by Sender Type

Stop asking which email deliverability test tool is best. Ask which stack fits your specific sending pattern. These are the combinations that work at each level.

Cold Email Stack - B2B Outbound

Your goal is to understand how corporate Google Workspace and Microsoft 365 inboxes respond to your sending domain. Consumer-weighted seed lists are noise for this use case.

Step 1 - Infrastructure check: MxToolbox, free. Confirm SPF, DKIM, and DMARC are configured. Check your domain and sending IPs against the blacklist database before anything else. This takes five minutes and should be the first thing you do with any new sending domain.

Step 2 - B2B placement test: Saleshandy Inbox Radar or MailReach. These tools test your email against professional inboxes - Google Workspace and Microsoft 365 - rather than consumer accounts. For cold outreach, this is the placement test that matters. Run this before starting any sequence on a new domain.

Step 3 - Ongoing reputation monitoring: Google Postmaster Tools, free. Set this up immediately for any domain you are sending from at volume. It is the only tool that shows you how Gmail classifies your domain reputation in real time. Watch the domain reputation score. If it drops from High to Medium, stop and diagnose before continuing to ramp.

Step 4 - List hygiene: ZeroBounce or NeverBounce before every new list enters a sequence. If you are building lists with a tool like ScraperCity - which lets you search millions of contacts by title, industry, location, and company size - verify those addresses before they go into any sequence. High bounce rates from unverified lists are one of the fastest ways to tank a fresh domain reputation.

Newsletter and Bulk Email Stack

You have permission. Your seed list testing results are more representative because the seed list population is closer to your actual recipients.

Step 1 - Infrastructure check: MxToolbox, free. Same as above. Always start here.

Step 2 - Pre-send content check: Mail-Tester, free. Confirm your authentication is passing and your content does not trigger obvious rule-based filters. Know going in that a 10/10 does not mean inbox placement is guaranteed - it means you passed the rule check.

Step 3 - Inbox placement testing: GlockApps starting at $59/month or credit packs from $16.99. GlockApps is where newsletter senders get the most useful signal. The Gmail tab prediction - Primary versus Promotions - is reliable and actionable. Run a placement test before any major send: new template, new domain, new ESP, or new sending infrastructure.

Step 4 - Ongoing Gmail reputation: Google Postmaster Tools, free. Monitor domain reputation and spam complaint rates after every major campaign. Watch it after every send.

Transactional Email Stack - Developer and SaaS Teams

Your concern is delivery confirmation and infrastructure reliability, not inbox placement for marketing campaigns.

Step 1 - Infrastructure: MxToolbox for DNS and blacklist checks on your sending domain and IPs.

Step 2 - Pre-send testing: Mailtrap provides a developer sandbox for testing email sends without hitting real inboxes. It is designed for development environments and staging, with a free tier and plans from $15/month. It is not built for cold outreach monitoring or inbox placement at scale, but for developers testing email sending code, it is the right tool. Mailtrap holds a 4.8 rating on G2, the highest of any tool in this category.

Step 3 - Authentication monitoring: GlockApps DMARC Analytics or a dedicated DMARC monitoring tool if your sending volume warrants dedicated oversight.

The Numbers Behind Broken Deliverability

Roughly 1 in 6 commercial emails never reaches the inbox, per Validity benchmark data. That statistic does not account for cold email specifically - where the deliverability challenge is harder because you lack the permission and engagement history that protect newsletter senders.

The math of broken deliverability is significant at scale. Cold email operators can see reply rates around 2% when deliverability is healthy and the offer is clear. When deliverability breaks - when a domain reputation tanks and emails route to spam - that same list can produce reply rates under 0.2%. The cost is not just the sequence spend. It is the domain, the warmup time, and the infrastructure rebuild before you can start again.

One way to put the scale into context: 387 targeted emails sent to the right prospects with solid deliverability can generate the same pipeline as over 6,000 spray-and-pray emails with broken delivery. Deliverability and targeting quality drive that entire difference.

A blacklisting caught by MxToolbox on Monday morning, before you have sent 50,000 emails, is worth an enormous amount more than discovering it on Friday when your open rates have already collapsed. Running email deliverability test tools proactively is not overhead. It is risk management for the channel everything else depends on.

The Warmup Duration Debate

Practitioners disagree significantly on how long to warm up a new sending domain before running full sequences. Common recommendations in practitioner communities range from 2 weeks to 8 weeks. The honest answer is that it depends on where you are sending and what infrastructure you are using.

The 2-3 week warmup that worked for Gmail inboxes a few years ago is now frequently reported as insufficient for Outlook and Microsoft 365. Multiple practitioners in r/coldemail threads have noted that Microsoft has become significantly more aggressive, and domains hitting Outlook need longer warmup runways before placement stabilizes. Practitioners working with SMTP infrastructure - rather than Gmail or Google Workspace sending accounts - often report needing 4-6 weeks before placement is reliable. Most warmup guides suggest a shorter runway than that.

The industry standard sending volume has also tightened. Practitioners commonly sent 30+ emails per day per account a few years ago. Current practitioner consensus in cold email communities sits at 20-25 emails per day per inbox as a safer ceiling, with more cautious operators running 10-15 per day on newer domains.

What deliverability test tools can tell you during warmup: your current placement score at a specific moment in time.

What they cannot tell you: whether your reputation is on a positive or negative trajectory. A static test on day 14 looks different than the same test on day 20. Run placement tests every 3-4 days during warmup and watch the trend, not just any single number. Running these tests does not harm your domain - tests go to managed seed addresses, not live prospects.

Seed Lists Have a Fundamental Limitation

Seed lists have a fundamental limitation that every practitioner should understand before trusting any tool result.

Gmail spam filters use machine learning with engagement history as a significant input - whether previous emails to a given address were opened, clicked, replied to, or marked as spam. Seed addresses, by definition, have no engagement history with your domain. The filter has no behavioral data to work from for your specific sending identity.

This means seed list results show you how a fresh inbox responds to your email based purely on content signals, authentication, and domain reputation. They do not simulate what happens with a real Gmail user who has engagement patterns, subscription history, and behavioral signals built up over time.

For newsletter senders with established engagement, this may understate your real placement rates. For cold emailers hitting brand-new contacts, it may overstate them - because real recipient inboxes with no prior engagement history can apply more aggressive filtering than seed list results suggest.

Use placement tests directionally. They are indicators, not predictions. When a placement test shows 60% inbox, that does not mean 60% of your real sends will land in the inbox. It means your email passed the seed list test at that moment in time.

The Test You Should Run Before Spending Anything

Before spending money on inbox placement testing, run this sequence first.

Go to MxToolbox and run a domain health check on every sending domain you own. Fix anything flagged. This is free and takes ten minutes per domain.

Set up Google Postmaster Tools for every domain you send from. This is free and provides the most direct available signal about your Gmail reputation outside of Google itself.

Send a test email to your own business Gmail account and a personal Gmail account from your sending domain. Check where it lands. If it routes to spam in your own test, you have a problem that no paid tool will fix for you - you need to address the root cause first.

Only after those free steps are clean should you invest in paid inbox placement testing. If the infrastructure is broken, a paid placement test will confirm that the infrastructure is broken. The free layer above tells you the same thing at no cost.

The most expensive mistake in deliverability is paying for diagnostic tools before fixing the obvious problems those tools are going to find anyway.

What the Tool Comparison Articles Get Wrong

I see it in almost every tool comparison I come across: deliverability tools evaluated on features and pricing without clarifying the use case each tool was built for. GlockApps is a strong tool for newsletter senders. It is also the wrong primary tool for cold emailers who need B2B placement accuracy. Mail-Tester is a useful quick-check for configuration. It is misleading if you treat it as an inbox placement test. MxToolbox is indispensable as a starting point. It also tells you nothing about where your emails land once authentication passes.

Pick the tool that fits this specific problem at this stage of sending.

Infrastructure broken? MxToolbox first, free, immediately.

Newsletter sender testing placement before a major campaign? GlockApps at $59/month is the right investment.

Cold emailer testing B2B placement accuracy? Saleshandy Inbox Radar or MailReach, built specifically for professional inboxes.

Need Gmail reputation data on a live sending domain? Google Postmaster Tools. Free. Always running.

Need email list quality before sending a cold sequence? ZeroBounce or NeverBounce before the list touches any sequence.

None of these tools replace the others. Each one answers a different question. Stack the right ones for your situation rather than searching for a single tool that does everything.

The Operator Mindset on Deliverability

Deliverability is the first variable in cold email, not the last one to optimize. Operators running serious cold outreach volume are systematic about testing every new domain before it touches a real prospect. I see it repeatedly - smaller senders skipping testing because it feels like overhead before the campaign starts.

That logic is backwards. Cold email at any scale lives or dies on inbox placement. A 2% reply rate with solid deliverability and a clear offer produces pipeline. A 0.1% reply rate because emails are routing to spam produces nothing - regardless of how good the copy is, regardless of how well-targeted the list is. The sequence doesn't matter if no one sees it.

The cold email operators who scale sustainably are not the ones who found the perfect template. They are the ones who solved the infrastructure problems first and then turned their attention to messaging. Deliverability is table stakes. Everything else - timing, personalization, follow-up cadence, offer clarity - only matters if the email reaches the inbox.

Running the right email deliverability test tools, at the right stage, for the right use case is how you protect the channel. It is not a nice-to-have. It is the foundation that everything else is built on.

Frequently Asked Questions

Is a perfect 10/10 score on Mail-Tester enough to confirm my emails will reach the inbox?

No. Mail-Tester checks your authentication setup and content against SpamAssassin rules. It does not send your email to real inboxes and measure where it lands. A perfect score means your technical configuration passes basic rule-based filters. It says nothing about your domain reputation with Gmail or Outlook, your sending history, or how a real recipient mailbox will classify your email. Mail-Tester even acknowledges in its own documentation that a clean score does not guarantee inbox placement - because this is a common and costly misunderstanding.

Why does GlockApps show one result while another tool shows something completely different for the same email?

Every inbox placement tool uses a different seed list - a managed network of real email accounts it controls. If one tool weights its seed list toward consumer Gmail and Yahoo accounts, and another weights toward corporate Google Workspace and Microsoft 365, the same email scores differently because the filters and algorithms in those environments behave differently. Neither tool is wrong. They are measuring placement into different inbox populations. The tool whose seed list most closely matches your actual recipient environment gives you the most relevant result for your use case.

Does running inbox placement tests hurt my domain sending reputation?

No. Test emails go to managed seed addresses controlled by the testing platform, not to live prospects. Running placement tests does not affect your sender reputation with Gmail or Outlook. You can run them freely during warmup or before campaigns without worrying about negative consequences to your domain.

What is the minimum viable deliverability testing setup for a cold emailer on a tight budget?

Three free tools cover the basics: MxToolbox for infrastructure and blacklist checks on your sending domain, Google Postmaster Tools for ongoing Gmail reputation monitoring, and your own test accounts - one business Gmail and one personal Gmail - for a quick placement spot-check. This costs nothing and catches the most common problems. Only add paid placement testing once the free layer is clean and you need more granular data on specific inbox environments.

How often should I run inbox placement tests during domain warmup?

Every 3-4 days is a reasonable cadence during warmup. A single test at any one point is just a snapshot. What you want to see is a trend - placement scores improving as warmup progresses. If scores are flat or declining, that is a signal to pause and diagnose before continuing to ramp sending volume. Watch the trend over time, not just the number at a single moment.

Is Google Postmaster Tools actually useful or just a vanity dashboard?

It is the most useful free deliverability tool available for Gmail. It is the only tool that shows your domain actual reputation classification with Gmail - High, Medium, Low, or Bad - using real engagement data from real Gmail users rather than a seed list. The limitation is that it only covers Gmail, tells you nothing about Outlook or other providers, and requires some sending volume before it populates meaningful data. Set it up for every domain you send from and check it weekly.

When does it make sense to upgrade from free tools to a paid inbox placement tool like GlockApps?

When you are making frequent changes to your sending setup and need rapid feedback before those changes affect live campaigns. Starting a new domain, switching ESPs, changing your email template structure, or adding new sending IPs are all moments where a paid placement test pays for itself quickly. GlockApps standalone credit packs start around $16.99 for 3 tests, which is a low-friction entry point before committing to a monthly plan.