Business

I can’t stand Google scanning my emails anymore

September 27, 2024

Google is applying deep learning AI models to help secure Gmail inboxes

Getty

Gmail has been changing the way we think about email since 2004. In that time, it has gained an eye-popping 1.5 billion users, according to Google. I’m one of them, and the chances are high that you are as well. A lot has changed in those 15 years. A lot has stayed the same. One of the static components in the world of email is malware, specifically malware in a document attached to your email. Macro viruses, mainly infecting Microsoft Word documents, have been a thing since long before Gmail, of course: hands up who remembers Concept way back in 1995? Microsoft does, no doubt, as it kick-started the Word macro security problem that led to the default disabling of macros in Office 2000. That didn’t, unfortunately, stop the problem. The attachment malware problem has continued to evolve, and the defenses against this threat vector have evolved as well. Google reckons that malicious documents currently represent 58% of all malware that targets Gmail users. Now Google is fighting back by employing “Deep Learning” AI to prevent this malware from reaching your inbox.

Google blocks 99.9% of malicious Gmail attachments

It should come as no surprise that Google is investing in security, earlier this year I reported how it had paid hacker bounties of $6.5 million (£5 million) to keep the internet safe. Then there was the pre-emptive step it took to suspend all paid extensions from the Chrome Web Store when an uptick in fraud was detected. It’s only natural, then, that Google should be using machine learning models as part of the Gmail security process and has been doing so behind the scenes for many years. Indeed, it was back in 2017 that Google announced machine learning models were helping prevent 99.9% of spam and phishing messages from reaching your inbox. That was a huge number then, given that more than 50% of all the messages Gmail received back then were spam. Fast-forward to 2020 and the machine learning models have been honed, with that 99.9% success rate still standing when it comes to spam, phishing and malware blocking. The malware scanning part of the equation is what interests me most, not least thanks to the crazy numbers involved. The Gmail scanner processes an incredible 300 billion Gmail attachments every single week, looking for malicious documents to block. Of the documents that are blocked, Google says that 63% of them change, are different, day by day. It’s this ever-evolving threat from malicious documents that prompted Google to deploy the next-generation of machine learning scanners into the mix: ones based on deep learning.

How Google is using deep learning to keep your inbox clean of malware

There has been plenty written already that will let you deep dive into what deep learning is and how it is being applied commercially. At the risk of hugely oversimplifying the concept, you can think of machine learning as being a branch of “AI” that employs self-modifying algorithms that need structured data fed into the system to work properly, it needs human intervention to succeed. Deep learning is more human brain like, to the small degree that it can be, using a data processing neural network approach; stacking layers of these networks one on the other to become a “deep” neural network. Deep learning is very good at certain things, such as identifying photos and categorizing them, or understanding spoken commands. Google already uses deep learning for these things, and now you can add malware scanning into the mix.

The numbers don’t lie; deep learning detection rates are on the up

According to Google the new deep learning scanner has been working since the end of 2019. During this time, it has increased the “daily detection coverage of Office documents that contain malicious scripts by 10%.” That’s another huge number within the context of the sheer scale of documents being scanned by Google every day. A number that gets even bigger when you look at something the scanner does particularly well, namely “detecting adversarial, bursty attacks.” By which Google means the kind of botnet-driven mass document distribution that tends to come in spurts rather than at a measured pace. In those cases, deep learning has improved the identifying malicious document identification rate by 150%. It works by employing a TensorFlow deep learning model and a custom doc analyzer for every different file type. TensorFlow is an open-source software library used in dataflow and differentiable programming, and Google trains its model with the TensorFlow Extended (TFX) platform. The custom document analyzers are key, taking care of not only parsing the attached documents but also identifying attack patterns and deobfuscating content.

“Malware evolves at a rate that the security industry struggles to keep up with,” Jake Moore, a cybersecurity specialist at ESET, said, “but using deep learning looks like it could help minimize the risk of malicious software reaching inboxes around the world.”

If you want to learn more about securing Gmail like a boss in 2020, then I’ve got you covered in this article.

Follow me on Twitter or LinkedIn. Check out my website or some of my other work here.

Google blocks 99.9% of malicious Gmail attachments

How Google is using deep learning to keep your inbox clean of malware

The numbers don’t lie; deep learning detection rates are on the up

Share this: