The Case for Including Traffic Analysis into Your Threat Model
Table of Contents
by: [Erno Kuusela][erno]
What is traffic analysis? #
The history of encryption on the Internet is a success story. Why I say this is that with Let’s Encrypt, the Internet community finally managed to democratize the use of transport layer security. Moreover, the Snowden leaks taught us that once more, using encrypted connections in and between data centers is considered sensible instead of paranoid.
The continuous focus on denying adversaries plaintext access to communications has taken our attention away from the other half of techniques, which are used to attack the confidentiality of our communications.
Just as in traditional cryptography, the rule of thumb that attacks will always get better over time applies to traffic analysis. Machine learning and the current fast-paced AI technology development are ideally suited to automating and extending the reach of traffic analysis, thus making us more vulnerable to it than ever.
We, the information security professionals, are mostly not yet well practiced enough in using our imagination to envision traffic analysis correlation risks. This is why there is a good reason to believe we are missing relevant things, which simply are not covered in our continued training and education.
Why You Should Care About Your Communication Fingerprint #
Our everyday use of networked computing creates a fingerprint of activity on the net that can be thought of as a big graph. This category has overlapping facets known by different names: side channels, traffic analysis, correlation attacks, traffic fingerprinting etc.
Traffic analysis and the related technical terms and concepts above, lack a widely understood umbrella term that could be used in less technical discussions on the wider subject. On Wikipedia, traffic analysis is explained through an abstract article that links it firstly to military intelligence. So in this sense we also have a limitation of language and awareness to deal with.
A Glimpse Under the Hood #
For a glimpse under the hood, you can open your browser’s development tools network tab and have a look at the network requests flying around when using your favorite web service. As a thought exercise, think about how easy it would be to identify what you’re doing if someone had access to a significant number of these session traces with:
- network endpoints
- durations
- and even more detailed packet stream traces.
Secondly, imagine if your adversary has this information for all the networked apps you use, cross-correlated. And finally, imagine if they have this information for most people in your social or professional circle – also cross-correlated.
The magic of traffic analysis is that it stacks up: an observer being able to correlate requests to one service can just keep on linking other related flows of observed events in order to build even a bigger graph of your actions. Building an automated profile with lots of variables and repeating each of the correlation steps above, the set of people fitting a profile is vastly reduced.
All done in bulk automation, the refined outputs can then be fed back to more automated analysis across sets of users, across longer time spans, and so on.
In the bigger picture I don’t actually know what the more thought out abuse for this data actually is, because the topic is so neglected by the field of information security. It feels obvious that thinking further and red-teaming various traffic analysis approaches would result in scenarios leaking more information, since as I stated above:
traffic analysis stacks up.
Zooming out to People #
If you can monitor a group of people and their communication patterns, you can deduce a lot about their relationships. Since you don’t need to analyze the content, this is relatively cheap and computationally easy to do for even very large sets of people.
There is an entire subfield of research called “social network analysis” and in the open literature that is pretty much what it revolves around. Besides being a privacy threat in itself, this can be used for target selection and tailoring targeted attacks.
Wireless Communication for the Win #
How does your adversary get access to traffic profiles passively?
There are lots of ways for this to happen in the wired world, but listening to radio transmissions is a casual way of doing this. This can happen over wifi, bluetooth and possibly over mobile communications. The beauty of these attacks is that the adversary doesn’t even have to be local. If they have compromised a nearby wifi-enabled device , they can use it for monitoring encrypted wifi traffic flying by.
But it’s only the spooks! #
This is a common thesis when talking about traffic analysis concerns.
I would argue that it is not true.
Traffic analysis can be used by traditional cyber criminals after they have established initial access to a system or a network endpoint. It is actually a necessity for reconnaissance and lateral movement. A good example is a situation where a perpetrator has penetrated a VPN gateway and is actively looking for initial SSH connections to newly deployed cloud servers.
Traffic analysis lets the perpetrator identify the opportune moment, by observing a cloud service API call to create a new server or by gathering history about where SSH connections have been made in the past. A man-in-the-middle attack is possible here because of SSH’s trust on first use, TOFU, security model. In other words, if the user does not verify the fingerprint of the remote server over the first connection, the perpetrator will be able to intercept the whole connection and gain a foothold in the newly created cloud server.
After all, we all fastidiously verify the fingerprints of our first SSH connections, right?
Tooling used in break ins always gets more sophisticated and collecting and using this type of data is not rocket science.
I’ll give a few more use case examples:
-
Your adversaries might be ad networks and your local ISP might be colluding with them to associate your VPN-protected traffic to ad impressions, whereafter they store and leak the de-anonymized impressions.
-
Your home IoT devices likely have wireless traffic patterns that reveal when your house is empty or occupied, which is especially of interest to professional burglars.
-
You might self-identify as a classic intelligence target, working on national defense, foreign service or an adjacent sector of interest.
Also consider the very large portion of the world’s population living under autocratic regimes with pervasive surveillance. But more importantly, I’d also argue people should include actors who employ methods such as mass surveillance and Pegasus into their threat model.
From a human rights and freedom from surveillance standpoint we shouldn’t accept being subjects to pervasive surveillance. And for targeted cases, for example system administrators and IT professionals working on solutions for clients, they certainly should be more interested in this risk.
After all, the persecuted reporter, political activist, involuntary subject of organizational blanket surveillance are all on the list of likely targets.
What would work as a remedy? #
In circles historically more serious about the problem, solutions have been in use for a long time. For example, in cold war times (and after), the so-called “numbers stations” were broadcasting constant streams of encrypted data - or padding. There is no way to know and there is no way to tell if they were carrying a signal intended for somebody or not. No doubt similar channel padding is used in more modern technology in that field as well.
Current Internet access VPNs decidedly do not offer channels padded to fixed bandwidth. Someone should build and offer this.
When auditing or building new apps, padding is easier to introduce. For example, if you are building a machine-to-machine system that communicates periodically, it’s often possible to do it without inefficient constant transmission through clocked constant size exchanges.
Tor is the obvious and somewhat traffic analysis resistant network in wide use and there have been other systems proposed in research, for example TARANET.
Welcome to the Other Side: Leveraging Traffic Analysis #
It seems that in the business world, just as on the spook side, the information security field is more engaged in leveraging traffic analysis than defending from it.
Many products from corporate security product vendors advertise features (Data Loss Prevention, DLP) for monitoring user behavior patterns and other features to detect malware traffic patterns even with encrypted traffic. These features are likely implemented using traffic analysis techniques.
This might actually be one reason why security product companies and consultancies are not that eager to develop or offer any traffic analysis defense solutions or other defenses. Many business environments may simply be satisfied with the current balance of limited privacy versus monitorability.
Three Main Takeaways #
Hopefully this write-up has motivated you to think about including traffic analysis into your threat model. To do this, I think these three points will help:
-
Attacks always get better over time and we should expect the current fast paced AI advances to make traffic analysis attacks more and more lucrative and widely accessible.
-
The privacy impact of traffic analysis is widely recognized and addressing it as a threat is the logical next step after getting one’s traffic payload confidentiality in good shape (with the help of widely deployed and robust encryption mechanisms available on the Internet).
-
Traffic analysis does stack up: correlating, linking and inferring human behavior patterns from observed traffic traces requires information security professionals to engage in a new kind of big-picture thinking when building their threat models.
Some Recommended Further Reading #
-
Resisting Traffic Analysis on Unclassified Networks Roger Dingledine, Nick Mathewson, Catherine Meadows, Paul Syverson Paper presented at the RTO IST Symposium on “Adaptive Defence in Unclassified Networks”
-
Confidentiality in the Face of Pervasive Surveillance: A Threat Model and Problem Statement Internet Architecture Board R. Barnes, B. Schneier, C. Jennings, T. Hardie, B. Trammell, C. Huitema, D. Borkmann
-
TARANET: Traffic-Analysis Resistant Anonymity at the NETwork layer Chen Chen, Daniele E. Asoni, Adrian Perrig, David Barrera, George Danezis, Carmela Troncoso
Credits #
- Hero image courtesy of Sherman Kwan on Unsplash.