- Bytes to Schlep? Use a FEP: Hiding Protocol Metadata with Fully Encrypted Protocols [paper] [BibTeX]
- Full version of CCS 2024 conference paper, adding appendices containing definitions, proofs, and experimental details.
Ellis Fenske and Aaron Johnson
In Proceedings of the 31st ACM Conference on Computer and Communications Security (CCS 2024), October 2024.
Show abstract
Fully Encrypted Protocols (FEPs) have arisen in practice as a technique to avoid network censorship. Such protocols are designed to produce messages that appear completely random. This design hides communications metadata, such as version and length fields, and makes it difficult to even determine what protocol is being used. Moreover, these protocols frequently support padding to hide the length of protocol fields and the contained message. These techniques have relevance well beyond censorship circumvention, as protecting protocol metadata has security and privacy benefits for all Internet communications. The security of FEP designs depends on cryptographic assumptions, but neither security definitions nor proofs exist for them. We provide novel security definitions that capture the metadata-protection goals of FEPs. Our definitions are given in both the datastream and datagram settings, which model the ubiquitous TCP and UDP interfaces available to protocol designers. We prove relations among these new notions and existing security definitions. We further present new FEP constructions and prove their security. Finally, we survey existing FEP candidates and characterize the extent to which they satisfy FEP security. We identify novel ways in which these protocols are identifiable, including their responses to the introduction of data errors and the sizes of their smallest protocol messages.
- Repositioning Real-World Website Fingerprinting on Tor [paper] [BibTeX]
Rob Jansen, Ryan Wails, and Aaron Johnson
In Proceedings of the 23rd ACM Workshop on Privacy in the Electronic Society (WPES 2024), October 2024.
Show abstract
Website fingerprinting (WF) is a potentially devastating attack against Tor because it can break anonymity by linking a Tor user to their purportedly unlinkable internet destinations. Previous work proposes that an adversary trains WF classifiers either on synthetic traces that are programmatically generated using automated tools, or on real-world traces collected from one or more Tor exit relays. However, no existing work accurately represents a real-world threat model in which a WF adversary's classifiers must be tested against real-world entry traces that are naturally created by real Tor users. In this paper we present Retracer, a novel method for producing labeled entry traces of genuine Tor traffic patterns. Retracer uses high-fidelity network simulation to accurately transform real-world exit traces into entry traces prior to training and testing WF classifiers. After first demonstrating that Retracer accurately transforms exit traces into entry traces, we then apply it to the recently released GTT23 dataset in a WF evaluation in which more than 3,500 classifiers are tested against, for the first time, labeled entry traces of real Tor traffic patterns. Our evaluation yields the best available estimate of the performance an adversary can achieve when directing WF attacks at real Tor users.
- A Measurement of Genuine Tor Traces for Realistic Website Fingerprinting [paper] [BibTeX]
Rob Jansen, Ryan Wails, and Aaron Johnson
Technical Report (arXiv:2404.07892), April 2024.
Show abstract
Website fingerprinting (WF) is a dangerous attack on web privacy because it enables an adversary to predict the website a user is visiting, despite the use of encryption, VPNs, or anonymizing networks such as Tor. Previous WF work almost exclusively uses synthetic datasets to evaluate the performance and estimate the feasibility of WF attacks despite evidence that synthetic data misrepresents the real world. In this paper we present GTT23, the first WF dataset of genuine Tor traces, which we obtain through a large-scale measurement of the Tor network. GTT23 represents real Tor user behavior better than any existing WF dataset, is larger than any existing WF dataset by at least an order of magnitude, and will help ground the future study of realistic WF attacks and defenses. In a detailed evaluation, we survey 25 WF datasets published over the last 15 years and compare their characteristics to those of GTT23. We discover common deficiencies of synthetic datasets that make them inferior to GTT23 for drawing meaningful conclusions about the effectiveness of WF attacks directed at real Tor users. We have made GTT23 available to promote reproducible research and to help inspire new directions for future work.
- Waks-On/Waks-Off: Fast Oblivious Offline/Online Shuffling and Sorting with Waksman Networks [paper] [code] [BibTeX]
- Full version of CCS 2023 conference paper, adding appendices containing proofs, implementation optimizations, and additional experiments.
Sajin Sasy, Aaron Johnson, and Ian Goldberg
In Proceedings of the 30th ACM Conference on Computer and Communications Security (CCS 2023), November 2023.
Show abstract
As more privacy-preserving solutions leverage trusted execution environments (TEEs) like Intel SGX, it becomes pertinent that these solutions can by design thwart TEE side-channel attacks that research has brought to light. In particular, such solutions need to be fully oblivious to circumvent leaking private information through memory or timing side channels.
In this work, we present fast fully oblivious algorithms for shuffling and sorting data. Oblivious shuffling and sorting are two fundamental primitives that are frequently used for permuting data in privacy-preserving solutions. We present novel oblivious shuffling and sorting algorithms in the offline/online model such that the bulk of the computation can be done in an offline phase that is independent of the data to be permuted. The resulting online phase provides performance improvements over state-of-the-art oblivious shuffling and sorting algorithms both asymptotically (O(β n log n) vs. O(β n log2 n)) and concretely (> 5× and > 3× speedups), when permuting n items each of size β.
Our work revisits Waksman networks, and it uses the key observation that setting the control bits of a Waksman network for a uniformly random shuffle is independent of the data to be shuffled. However, setting the control bits of a Waksman network efficiently and fully obliviously poses a challenge, and we provide a novel algorithm to this end. The total costs (inclusive of offline computation) of our WaksShuffle shuffling algorithm and our WaksSort sorting algorithm are lower than all other fully oblivious shuffling and sorting algorithms when the items are at least moderately sized (i.e., β > 1400 B), and the performance gap only widens as the item sizes increase. Furthermore, WaksShuffle improves the online cost of oblivious shuffling by > 5× for shuffling 220 items of any size; similarly WaksShuffle+QS, our other sorting algorithm, provides > 2.7× speedups in the online cost of oblivious sorting.
- Throwing Your Weight Around: Fixing Tor's Positional Weighting [paper] [BibTeX]
Aaron Johnson, Aaron D. Jaggard, and Paul Syverson
In Proceedings on Privacy Enhancing Technologies (PoPETS 2023), Vol. 2023, Issue 4, July 2023.
Show abstract
We analyze deficiencies in Tor's positional weighting system, identifying cases in which the system either fails to produce valid weights or fails to properly load balance across positions. We describe how an attacker can take advantage of these failures to reduce Tor's performance, thereby also easing censorship and surveillance through a denial-of-service attack. Our attacks exploit incorrectly determined positional-weight equations by adding new capacity to the network or, for even more covertness, by just minor changes in the status of existing malicious relays. Our analysis of past Tor consensuses shows that these attacks could have reduced the throughput of the network by as much as 45% due only to their triggering of Tor's flawed position weights. Rather than a mere patch to Tor's currently ad hoc scheme, we then propose a new, systematic method for deriving positional weights and propose two goal sets generated using that method. We derive new sets of weights, prove that they satisfy these goal sets, and give examples of how they would change the weights from the current system. Tor could use our results to quickly fix the main deficiencies of its positional weights as well as adopt a better approach long-term.
- Proteus: Programmable Protocols for Censorship Circumvention [paper] [BibTeX]
Ryan Wails, Rob Jansen, Aaron Johnson, and Micah Sherr
In Free and Open Communications on the Internet (FOCI 2023), July 2023.
Show abstract
We present the Proteus system for censorship circumvention. Proteus provides a programmable protocol environment in which new communication protocols can be expressed as concise and comprehensible specification files. This design allows clients and proxies to quickly respond to new censorship strategies just by installing new specification files. Proteus improves on prior programmable designs by improving host safety from malicious specifications, providing a specification language that is complete and comprehensible to non-specialists, and supporting multiple simultaneous protocols at a proxy for versioning and localization. This paper represents work in progress and provides an overview of the Proteus design, as well as examples showing that it can express existing encrypted protocols.
- Security Notions for Fully Encrypted Protocols [paper] [BibTeX]
Ellis Fenske and Aaron Johnson
In Free and Open Communications on the Internet (FOCI 2023), February 2023.
Show abstract
A common strategy to avoid network censorship is to use a fully encrypted protocol (FEP). Such protocols are designed to make all bytes appear uniformly random and make packet sizes unpredictable. These goals depend on cryptographic assumptions, but no mathematical formalization of them has been presented. We give the first definitions for FEPs, examples of how existing protocols fail to satisfy them, and a novel protocol that does satisfy them.
- Fast Fully Oblivious Compaction and Shuffling [paper] [code] [BibTeX]
- Full version of CCS 2022 conference paper, including (1) appendices with proofs and experimental details, (2) small corrections and improvements to the pseudocode for ORCompact and ORCPar (Figures 2 and 4), and (3) small corrections and improvements to the pseudocode for BORPStream (Figures 9 and 10).
Sajin Sasy, Aaron Johnson, and Ian Goldberg
In Proceedings of the 29th ACM Conference on Computer and Communications Security (CCS 2022), November 2022.
Show abstract
Several privacy-preserving analytics frameworks have been proposed that use trusted execution environments (TEEs) like Intel SGX. Such frameworks often use compaction and shuffling as core primitives. However, due to advances in TEE side-channel attacks, these primitives, and the applications that use them, should be fully oblivious; that is, perform instruction sequences and memory accesses that do not depend on the secret inputs. Such obliviousness would eliminate the threat of leaking private information through memory or timing side channels, but achieving it naively can result in a significant performance cost.
In this work, we present fast, fully oblivious algorithms for compaction and shuffling. We implement and evaluate our designs to show that they are practical and outperform the state of the art. Our oblivious compaction algorithm, ORCompact, is always faster than the best alternative and can yield up to a 5x performance improvement. For oblivious shuffling, we provide two novel algorithms: ORShuffle and BORPStream. ORShuffle outperforms prior fully oblivious shuffles in all experiments, and it provides the largest speed increases—up to 1.8x—when shuffling a large number of small items. BORPStream outperforms all other algorithms when shuffling a large number of large items, with a speedup of up to 1.4x in such cases. It can obtain even larger performance improvements in application settings where the items to shuffle arrive incrementally over time, obtaining a speedup of as much as 4.2x. We additionally give parallel versions of all of our algorithms, prove that they have low parallel step complexity, and experimentally show a 5–6x speedup on an 8-core processor.
Finally, ours is the first work with the explicit goal of ensuring full obliviousness of complex functionalities down to the implementation level. To this end, we design Fully Oblivious Assembly Verifier (FOAV), a tool that verifies the binary has no secret-dependent conditional branches.
- Differentially Private Maximal Information Coefficients [paper] [code] [BibTeX]
John Lazarsfeld, Aaron Johnson, and Emmanuel Adeniran
In Proceedings of the 39th International Conference on Machine Learning (ICML 2022), PMLR Volume 162, July 2022.
Show abstract
The Maximal Information Coefficient (MIC) is a powerful statistic to identify dependencies between variables. However, it may be applied to sensitive data, and publishing it could leak private information. As a solution, we present algorithms to approximate MIC in a way that provides differential privacy. We show that the natural application of the classic Laplace mechanism yields insufficient accuracy. We therefore introduce the MICr statistic, which is a new MIC approximation that is more compatible with differential privacy. We prove MICr is a consistent estimator for MIC, and we provide two differentially private versions of it. We perform experiments on a variety of real and synthetic datasets. The results show that the private MICr statistics significantly outperform direct application of the Laplace mechanism. Moreover, experiments on real-world datasets show accuracy that is usable when the sample size is at least moderately large.
- Accountable Private Set Cardinality for Distributed Measurement [paper] [code] [BibTeX]
- Journal version of CCS 2017 conference paper. Improvements over the conference version include (1) a modified protocol to provide accountability, (2) added measures to prevent input modification, and (3) a description of how to modify the protocol to compute set-intersection cardinality.
Ellis Fenske, Akshaya Mani, Aaron Johnson, and Micah Sherr
In ACM Transactions on Privacy and Security (TOPS),
Volume 25, Number 4, Article No. 25, May 2022.
Show abstract
We introduce cryptographic protocols for securely and efficiently computing the cardinality of set union and set intersection. Our private set-cardinality protocols (PSC) are designed for the setting in which a large set of parties in a distributed system makes observations, and a small set of parties with more resources and higher reliability aggregates the observations. PSC allows for secure and useful statistics gathering in privacy-preserving distributed systems. For example, it allows operators of anonymity networks such as Tor to securely answer the questions: How many unique users are using the network? and How many hidden services are being accessed?
We prove the correctness and security of PSC in the Universal Composability framework against an active adversary that compromises all but one of the aggregating parties. Although successful output cannot be guaranteed in this setting, PSC either succeeds or terminates with an abort, and we furthermore make the adversary accountable for causing an abort by blaming at least one malicious party. We also show that PSC prevents adaptive corruption of the data parties from revealing past observations, which prevents them from being victims of targeted compromise, and we ensure safe measurements by making outputs differentially private.
We present a proof-of-concept implementation of PSC and use it to demonstrate that PSC operates with low computational overhead and reasonable bandwidth. It can count tens of thousands of unique observations from tens to hundreds of data-collecting parties while completing within hours. PSC is thus suitable for daily measurements in a distributed system.
- Consistency of the Maximal Information Coefficient Estimator [paper] [BibTeX]
John Lazarsfeld and Aaron Johnson
Technical Report (arXiv:2107.03836), July 2021.
Show abstract
The Maximal Information Coefficient (MIC) of Reshef et al. (Science, 2011) is a statistic for measuring dependence between variable pairs in large datasets. In this note, we prove that MIC is a consistent estimator of the corresponding population statistic MIC*. This corrects an error in an argument of Reshef et al. (JMLR, 2016), which we describe.
- FlashFlow: A Secure Speed Test for Tor [paper] [BibTeX]
Matthew Traudt, Rob Jansen, and Aaron Johnson
In Proceedings of the 41st IEEE International Conference on Distributed Computing Systems (ICDCS 2021), July 2021.
Show abstract
The Tor network uses a measurement system called TorFlow to estimate its relays'
forwarding capacity and to balance traffic among them. This system has been
shown to be vulnerable to adversarial manipulation, and inaccuracies even in
benign circumstances have long been observed. To solve the issues with security
and accuracy, we present FlashFlow, a system to measure the capacity of Tor
relays. Our analysis shows that FlashFlow limits a malicious relay to obtaining
a capacity estimate at most 1.33 times its true capacity. Through realistic
Internet experiments, we find that FlashFlow measures relay capacity with
≥89% accuracy 95% of the time. Through simulation, we find that FlashFlow can
measure the entire Tor network in less than 5 hours using 3 measurers with 1
Gbit/s of bandwidth each. Performance simulations using FlashFlow for load
balancing shows that, compared to TorFlow, network weight error decreases by
86%, while the median of 50 KiB, 1 MiB, and 5 MiB transfer times decreases by
15%, 29%, and 37%, respectively. Moreover, FlashFlow yields more consistent
client performance: the median rate of transfer timeouts decreases by 100%,
while the standard deviation of 50 KiB, 1 MiB, and 5 MiB transfer times
decreases by 55%, 61%, and 41%, respectively. We also find that the performance
improvements increase relative to TorFlow as the total client-traffic load
increases, demonstrating that FlashFlow is better suited to supporting network
growth.
- On the Accuracy of Tor Bandwidth Estimation [paper] [code & data] [BibTeX]
Rob Jansen and Aaron Johnson
In Proceedings of the 22nd Passive and Active Measurement Conference (PAM 2021), March 2021.
Show abstract
The Tor network estimates its relays' bandwidths using relay self-measurements of client traffic speeds. These estimates largely determine how existing traffic load is balanced across relays, and they are used to evaluate the network's capacity to handle future traffic load increases. Thus, their accuracy is important to optimize Tor's performance and strategize for growth. However, their accuracy has never been measured. We investigate the accuracy of Tor's capacity estimation with an analysis of public network data and an active experiment run over the entire live network. Our results suggest that the bandwidth estimates underestimate the total network capacity by at least 50% and that the errors are larger for high-bandwidth and low-uptime relays. Our work suggests that improving Tor's bandwidth measurement system could improve the network's performance and better inform plans to handle future growth.
- CLAPS: Client-Location-Aware Path Selection in Tor [paper] [BibTeX]
Florentin Rochet, Ryan Wails, Aaron Johnson, Prateek Mittal, and Olivier Pereira
In Proceedings of the 27th ACM Conference on Computer and Communications Security (CCS 2020), November 2020.
Show abstract
Much research has investigated improving the security and performance of Tor by having Tor clients choose paths through the network in a way that depends on the client's location. However, this approach has been demonstrated to lead to serious deanonymization attacks. Moreover, we show how in some scenarios it can lead to significant performance degradation. For example, we demonstrate that using the recently-proposed Counter-RAPTOR system when guard bandwidth isn't abundant could increase median download times by 28.7%. We propose the CLAPS system for performing client-location-aware path selection, which fixes the known security and performance issues of existing designs. We experimentally compare the security and performance of CLAPS to Counter-RAPTOR and DeNASA. CLAPS puts a strict bound on the leakage of information about the client's location, where the other systems could completely reveal it after just a few connections. It also guarantees a limit on the advantage that an adversary can obtain by strategic relay placement, which we demonstrate to be overwhelming against the other systems. Finally, due to a powerful formalization of path selection as an optimization problem, CLAPS is approaching or even exceeding the original goals of algorithms to which it is applied, while solving their known deficiencies.
- Stormy: Statistics in Tor by Measuring Securely [paper] [BibTeX]
Ryan Wails, Aaron Johnson, Daniel Starin, Arkady Yerukhimovich, and S. Dov Gordon
In Proceedings of the 26th ACM Conference on Computer and Communications Security (CCS 2019), November 2019.
Show abstract
Tor is a tool for Internet privacy with millions of daily users.
The Tor system benefits in many ways from information gathered
about the operation of its network. Measurements guide operators
in diagnosing problems, direct the efforts of developers, educate users about
the level of privacy they obtain, and inform policymakers about Tor's impact.
However, data collection and reporting can degrade user privacy, contradicting
Tor's goals. Existing approaches to measuring Tor have limited capabilities and
security weaknesses.
We present Stormy, a general-purpose, privacy-preserving measurement system
that overcomes these limitations. Stormy uses secure multiparty
computation (MPC) to compute any function of the observations made by Tor
relays, while keeping those observations secret. Stormy makes use of
existing efficient MPC protocols that are secure in the malicious model, and in
addition it includes a novel input-sharing protocol that is secure,
efficient, and fault tolerant. The protocol is non-interactive, which is
consistent with how relays currently submit measurements, and it allows the
relays to go offline after input submission, even while ensuring that an honest
relay will not have its input excluded or modified. The input-sharing protocol
is compatible with MPC protocols computing on authenticated values and may be of
independent interest.
We show how Stormy can be deployed in two realistic models: (1) run
primarily by a small set of dedicated authorities, or (2) run decentralized
across the relays in the Tor network. Stormy scales efficiently to Tor's
thousands of relays, tolerates network churn, and provides security depending
only on either Tor's existing trust assumption that at least one authority is
honest (in the first model) or the existing assumption that a large fraction of relay bandwidth is
honest (in the second model).
We demonstrate how to use the system to compute two broadly-applicable
statistics: the median of relay inputs and the cardinality of set-union across
relays. We implement Stormy and experimentally evaluate system
performance. When Stormy is run among authorities we can perform 151 median
computations or 533 set-union cardinalities over 7,000 relay inputs in a single
day. When run among the relays themselves, Stormy can perform 36 median
computations or 134 set union cardinalities per day. Thus, both deployments
enable non-trivial analytics to be securely computed in the Tor network.
- Guard Placement Attacks on Path Selection Algorithms for Tor [paper] [BibTeX]
Gerry Wan, Aaron Johnson, Ryan Wails, Sameer Wagh, and Prateek Mittal
In Proceedings on Privacy Enhancing Technologies (PoPETS 2019), Vol. 2019, Issue 4, July 2019.
Show abstract
The popularity of Tor has made it an attractive target for a variety of deanonymization and fingerprinting attacks. Location-based path selection algorithms have been proposed as a countermeasure to defend against such attacks. However, adversaries can exploit the location-awareness of these algorithms by strategically placing relays in locations that increase their chances of being selected as a client's guard. Being chosen as a guard facilitates website fingerprinting and traffic correlation attacks over extended time periods.
In this work, we rigorously define and analyze the guard placement attack. We present novel guard placement attacks and show that three state-of-the-art path selection algorithms—Counter-RAPTOR, DeNASA, and LASTor—are vulnerable to these attacks,
overcoming defenses considered by all three systems.
For instance, in one attack, we show that an adversary contributing only 0.216% of Tor's total bandwidth can attain an average selection probability of 18.22%, 84× higher than what it would be under Tor currently. Our findings indicate that existing location-based path selection algorithms allow guards to achieve disproportionately high selection probabilities relative to the cost required to run the guard.
Finally, we propose and evaluate a generic defense mechanism that provably defends any path selection algorithm against guard placement attacks. We run our defense mechanism on each of the three path selection algorithms, and find that our mechanism significantly enhances the security of these algorithms against guard placement attacks with only minimal impact to the goals or performance of the original algorithms.
- Understanding Tor Usage with Privacy-Preserving Measurement [paper] [BibTeX]
Akshaya Mani, T Wilson-Brown, Rob Jansen, Aaron Johnson, and Micah Sherr
In Proceedings of the Internet Measurement Conference 2018 (IMC 2018).
Show abstract
The Tor anonymity network is difficult to measure because, if not done carefully, measurements could risk the privacy (and potentially the safety) of the network's users. Recent work has proposed the use of differential privacy and secure aggregation techniques to safely measure Tor, and preliminary proof-of-concept prototype tools have been developed in order to demonstrate the utility of these techniques. In this work, we significantly enhance two such tools — PrivCount and Private Set-Union Cardinality — in order to support the safe exploration of new types of Tor usage behavior that have never before been measured. Using the enhanced tools, we conduct a detailed measurement study of Tor covering three major aspects of Tor usage: how many users connect to Tor and from where do they connect, with which destinations do users most frequently communicate, and how many onion services exist and how are they used. Our findings include that Tor has ~8 million daily users, a factor of four more than previously believed. We also find that ~40% of the sites accessed over Tor have a torproject.org domain name, ~10% of the sites have an amazon.com domain name, and ~80% of the sites have a domain name that is included in the Alexa top 1 million sites list. Finally, we find that ~90% of lookups for onion addresses are invalid, and more than 90% of attempted connections to onion services fail.
- Tempest: Temporal Dynamics in Anonymity Systems [paper] [BibTeX]
Ryan Wails, Yixin Sun, Aaron Johnson, Mung Chiang, and Prateek Mittal
In Proceedings on Privacy Enhancing Technologies (PoPETS 2018), Vol. 2018, Issue 3, July 2018.
Show abstract
Many recent proposals for anonymous communication omit from their security analyses a consideration
of the effects of time on important system components.
In practice, many components of anonymity systems, such as the client location and
network structure, exhibit changes and patterns over time.
In this paper, we focus on the effect of such temporal dynamics on the security
of anonymity networks. We present Tempest, a suite of novel attacks based on
(1) client mobility, (2) usage patterns, and (3) changes in the underlying network routing.
Using experimental analysis on real-world datasets, we demonstrate that these
temporal attacks degrade user privacy across a wide range of anonymity networks,
including deployed systems such as Tor; path-selection protocols for Tor such as DeNASA, TAPS, and
Counter-RAPTOR; and network-layer anonymity protocols for Internet routing such as
Dovetail and HORNET. The degradation is in some cases surprisingly severe. For example, a single
host failure or network route change could quickly and with high certainty identify the client's ISP
to a malicious host or ISP.
The adversary behind each attack is relatively weak — generally passive and
in control of one network location or a small number of hosts.
Our findings suggest that designers of anonymity systems should rigorously consider the impact of
temporal dynamics when analyzing anonymity.
- Distributed Measurement with Private Set-Union Cardinality [paper] [BibTeX]
Ellis Fenske, Akshaya Mani, Aaron Johnson, and Micah Sherr
In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS 2017).
Show abstract
This paper introduces a cryptographic protocol for efficiently aggregating a count of unique items across a set of data parties privately — that is, without exposing any information other than the count. Our protocol allows for more secure and useful statistics gathering in privacy-preserving distributed systems such as anonymity networks; for example, it allows operators of anonymity networks such as Tor to securely answer the questions: how many unique users are using the distributed service? and how many hidden services are being accessed? We formally prove the correctness and security of our protocol in the Universal Composability framework against an active adversary that compromises all but one of the aggregation parties. We also show that the protocol provides security against adaptive corruption of the data parties, which prevents them from being victims of targeted compromise. To ensure safe measurements, we also show how the output can satisfy differential privacy.
We present a proof-of-concept implementation of the private set-union cardinality protocol (PSC) and use it to demonstrate that PSC operates with low computational overhead and reasonable bandwidth. In particular, for reasonable deployment sizes, the protocol run at timescales smaller than the typical measurement period would be and thus is suitable for distributed measurement.
- PeerFlow: Secure Load Balancing in Tor [paper] [BibTeX]
Aaron Johnson, Rob Jansen, Nicholas Hopper, Aaron Segal, and Paul Syverson
In Proceedings on Privacy Enhancing Technologies (PoPETS 2017), Vol. 2017, Issue 2, April 2017.
Show abstract
We present PeerFlow, a system to securely load balance client traffic in Tor. Security in Tor requires that no adversary handle too much traffic. However, Tor relays are run by volunteers who cannot be trusted to report the relay bandwidths, which Tor clients use for load balancing. We show that existing methods to determine the bandwidths of Tor relays allow an adversary with little bandwidth to attack large amounts of client traffic. These methods include Tor's current bandwidth-scanning system, TorFlow, and the peer-measurement system EigenSpeed. We present an improved design called PeerFlow that uses a peer-measurement process both to limit an adversary's ability to increase his measured bandwidth and to improve accuracy. We show our system to be secure, fast, and efficient. We implement PeerFlow in Tor and demonstrate its speed and accuracy in large-scale network simulations.
- Avoiding The Man on the Wire: Improving Tor's Security with Trust-Aware Path Selection [paper] [BibTeX]
Aaron Johnson, Rob Jansen, Aaron D. Jaggard, Joan Feigenbaum, and Paul Syverson
In Proceedings of the 24th Network and Distributed System Security Symposium (NDSS 2017).
Show abstract
Tor users are vulnerable to deanonymization by an adversary that can observe some Tor relays
or some parts of the network. We demonstrate that previous network-aware path-selection
algorithms that propose to solve this problem are vulnerable to attacks across multiple Tor
connections. We suggest that users use trust to choose the paths through Tor that
are less likely to be observed, where trust is flexibly modeled as a probability
distribution on the location of the user's adversaries, and we present the Trust-Aware Path
Selection algorithm for Tor that helps users
avoid traffic-analysis attacks while still choosing paths that could have
been selected by many other users. We evaluate this algorithm in two settings using a high-level map
of Internet routing: (i) users try to avoid a single global
adversary that has an independent chance to control each Autonomous System organization, Internet
Exchange Point organization, and Tor relay family, and (ii) users try to avoid
deanonymization by any single country. We also examine the performance of Trust-Aware Path
selection using the Shadow network simulator.
- Safely Measuring Tor [paper] [BibTeX]
Rob Jansen and Aaron Johnson
In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS 2016).
Show abstract
Tor is a popular network for anonymous communication. The usage and operation of Tor is not well-understood, however, because its privacy goals make common measurement approaches ineffective or risky. We present PrivCount, a system for measuring the Tor network designed with user privacy as a primary goal. PrivCount securely aggregates measurements across Tor relays and over time to produce differentially private outputs. PrivCount improves on prior approaches by enabling flexible exploration of many diverse kinds of Tor measurements while maintaining accuracy and privacy for each. We use PrivCount to perform a measurement study of Tor of sufficient breadth and depth to inform accurate models of Tor users and traffic. Our results indicate that Tor has 710,000 users connected but only 550,000 active at a given time, that Web traffic now constitutes 91% of data bytes on Tor, and that the strictness of relays' connection policies significantly affects the type of application data they forward.
- Defending Tor from Network Adversaries: A Case Study of Network Path Prediction [paper] [BibTeX]
Joshua Juen, Aaron Johnson, Anupam Das, Nikita Borisov, and Matthew Caesar
In Proceedings on Privacy Enhancing Technologies (PoPETS 2015), Vol. 2015, Issue 2, June 2015.
Show abstract
The Tor anonymity network has been shown vulnerable to traffic analysis attacks by autonomous systems (ASes) and Internet exchanges (IXes), which can observe different overlay hops belonging to the same circuit. We evaluate whether network path prediction techniques provide an accurate picture of the threat from such adversaries, and whether they can be used to avoid this threat. We perform a measurement study by collecting 17.2 million traceroutes from Tor relays to destinations around the Internet. We compare the collected traceroute paths to predicted paths using state-of-the-art path inference techniques. We find that traceroutes present a very different picture, with the set of ASes seen in the traceroute path differing from the predicted path 80% of the time. We also consider the impact that prediction errors have on Tor security. Using a simulator to choose paths over a week, our traceroutes indicate a user has nearly a 100% chance of at least one compromise in a week with 11\% of total paths containing an AS compromise and less than 1% containing an IX compromise when using default Tor selection. We find modifying the path selection to choose paths predicted to be safe lowers total paths with an AS compromise to 0.14% but still presents a 5-11% chance of at least one compromise in a week while making 5% of paths fail, with 96% of failures due to false positives in path inferences. Our results demonstrate more measurement and better path prediction is necessary to mitigate the risk of AS and IX adversaries to Tor.
- Hidden-service statistics reported by relays [paper] [BibTeX]
David Goulet, Aaron Johnson, George Kadianakis, and Karsten Loesing
Tor Technical Report 2015-04-001, April 2015.
Media coverage of resulting statistics: BBC News,
Ars Technica
Show abstract
In order to understand how hidden services are being used in Tor, relays can collect and
report statistics about the hidden-service activity they observe. However, such statistics
might harm the privacy of individual Tor users. We describe how relays can be used to
collect statistics about hidden services, how they might be useful, and how they can be
published in a way that protects individual privacy.
- 20,000 In League Under the Sea: Anonymous Communication, Trust, MLATs, and Undersea Cables [paper] [BibTeX]
Aaron D. Jaggard, Aaron Johnson, Sarah Cortes, Paul Syverson, and Joan Feigenbaum
In Proceedings on Privacy Enhancing Technologies (PoPETS 2015), Vol. 2015, Issue 1, April 2015.
Show abstract
Motivated by the effectiveness of correlation attacks
against Tor, the censorship arms race, and observations of malicious
relays in Tor, we propose that Tor users capture their trust in network
elements using probability distributions over the sets of elements
observed by network adversaries. We present a modular system that allows
users to efficiently and conveniently create such distributions and use
them to improve their security.
To illustrate this system, we present two novel types of adversaries. First,
we study a powerful, pervasive adversary that can compromise an unknown
number of Autonomous System organizations, Internet Exchange Point organizations,
and Tor relay families.
Second, we initiate the study of how an adversary might use Mutual Legal Assistance
Treaties (MLATs)
to enact surveillance. As part of this, we identify submarine cables as a
potential subject of trust and incorporate data about these into our MLAT analysis by
using them as a proxy for adversary power.
Finally, we present preliminary experimental results that show the potential for our trust
framework to be used by Tor clients and services to improve security.
- Security Analysis of Accountable Anonymity in Dissent [paper] [BibTeX]
Ewa Syta, Aaron Johnson, Henry Corrigan-Gibbs, Shu-Chun Weng, David Wolinsky, and Bryan Ford
In ACM Transactions on Information and System Security (TISSEC),
Volume 17, Issue 1, Article No. 4, August 2014.
Show abstract
Users often wish to communicate anonymously on the Internet,
for instance, in group discussion forums or over instant messaging.
Existing solutions are vulnerable to misbehaving users, however,
who may abuse their anonymity to disrupt communication.
Dining Cryptographers networks leave groups vulnerable to denial-of-service and Sybil
attacks, mix networks are difficult to protect
against traffic analysis, and accountable voting schemes
are unsuited to general anonymous messaging.
DISSENT, originally introduced by Corrigan-Gibbs and Ford [2010], is the first general
communication protocol that offers
provable anonymity and accountability for moderate-size groups, while
efficiently handling unbalanced communication demands among users.
We provide a full description of an improved DISSENT protocol, define its precise security
properties, and give rigorous proofs of these properties.
Our improved protocol is a direct result of this security analysis, which
identified several non-trivial attacks on the original protocol stemming from subtle
design flaws.
- The Sniper Attack: Anonymously Deanonymizing and Disabling the Tor Network [paper] [BibTeX]
Rob Jansen, Florian Tschorsch, Aaron Johnson, and Björn Scheuermann
In Proceedings of the 21st Network and Distributed System Security Symposium (NDSS 2014).
Show abstract
Tor is a distributed onion-routing network used for achieving anonymity and resisting censorship online. Because of Tor's growing popularity, it is attracting increasingly larger threats against which it was not securely designed. In this paper, we present the Sniper Attack, an extremely low cost but highly destructive denial of service attack against Tor that an adversary may use to anonymously disable arbitrary Tor relays. The attack utilizes valid protocol messages to boundlessly consume memory by exploiting Tor's end-to-end reliable data transport. We design and evaluate a prototype of the attack to show its feasibility and efficiency: our experiments show that an adversary may consume a victim relay's memory by as much as 2187 KiB/s while using at most only 92 KiB/s of upstream bandwidth. We extend our experimental results to estimate the threat against the live Tor network and find that a strategic adversary could disable all of the top 20 exit relays in only 29 minutes, thereby reducing Tor's bandwidth capacity by 35 percent. We also show how the attack enables the deanonymization of hidden services through selective denial of service by forcing them to choose guard nodes in control of the adversary. Finally, we discuss defenses against the Sniper Attack that provably render the attack ineffective, and suggest defenses against deanonymization by denial-of-service attacks in general that significantly mitigate the threat.
- Users Get Routed: Traffic Correlation on Tor by Realistic Adversaries [paper] [BibTeX]
Aaron Johnson, Chris Wacek, Rob Jansen, Micah Sherr, and Paul Syverson
In Proceedings of the 20th ACM Conference on Computer and Communications Security (CCS 2013).
Media coverage: The Register, Vice, The Irish Times, Ars Technica, Finnish Broadcasting Company (YLE), NewScientist, MIT Technology Review
Show abstract
We present the first analysis of the popular Tor anonymity network
that indicates the security of typical users against
reasonably realistic adversaries in the Tor network or in the
underlying Internet. Our results show that Tor users are far more
susceptible to compromise than indicated by prior work. Specific
contributions of the paper include (1) a model of various typical
kinds of users, (2) an adversary model that includes Tor network relays,
autonomous
systems (ASes), Internet exchange points (IXPs), and groups of
IXPs drawn from empirical study,
(3) metrics that
indicate how secure users are over a period of time,
(4) the most
accurate topological model to date of ASes and IXPs as they relate
to Tor usage and network configuration, (5) a novel realistic Tor
path simulator (TorPS), and
(6) analyses of security making use of all the above. To show that
our approach is useful to explore alternatives and not just Tor as
currently deployed, we also analyze a published alternative path
selection algorithm, Congestion-Aware Tor. We create an empirical
model of Tor congestion, identify novel attack
vectors, and show that it too is more vulnerable than previously
indicated.
- Privacy-Preserving Data Exploration in Genome-Wide Association Studies
Aaron Johnson and Vitaly Shmatikov
In Proceedings of the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2013).
Show abstract
Genome-wide association studies (GWAS) have become a popular method
for analyzing sets of DNA sequences in order to discover the genetic
basis of disease. Unfortunately, statistics published as the result of
GWAS can be used to identify individuals participating in the study.
To prevent privacy breaches, even previously published results have
been removed from public databases, impeding researchers' access to
the data and hindering collaborative research. Existing techniques for
privacy-preserving GWAS focus on answering specific questions, such as
correlations between a given pair of SNPs (DNA sequence variations).
This does not fit the typical GWAS process, where the analyst may not
know in advance which SNPs to consider and which statistical tests to use,
how many SNPs are significant for a given dataset, etc.
We present a set of practical, privacy-preserving data mining algorithms
for GWAS datasets. Our framework supports exploratory data
analysis, where the analyst does not know a priori how many and which SNPs
to consider. We develop privacy-preserving algorithms for computing the
number and location of SNPs that are significantly associated with the
disease, the significance of any statistical test between a given SNP
and the disease, any measure of correlation between SNPs, and the block
structure of correlations. We evaluate our algorithms on real-world
datasets and demonstrate that they produce significantly more accurate
results than prior techniques while guaranteeing differential privacy.
- Poster: Onions for Sale: Putting Privacy on the Market [paper] [slides]
Aaron Johnson, Rob G. Jansen, and Paul Syverson
In Proceedings of the 17th International Conference on Financial Cryptography and Data Security (FC 2013).
Show abstract
We propose that Tor support the purchase of its services.
- LIRA: Lightweight Incentivized Routing for Anonymity [paper] [BibTeX]
Rob G. Jansen, Aaron Johnson, and Paul Syverson
In Proceedings of the 20th Network and Distributed System Security Symposium (NDSS 2013).
Show abstract
Tor, the most popular deployed distributed onion routing network, suffers from
performance and scalability problems stemming from a lack of incentives for
volunteers to contribute. Insufficient capacity limits scalability and harms the
anonymity of its users. We introduce LIRA, a lightweight scheme that creates
performance incentives for users to contribute bandwidth resources to the Tor
network. LIRA uses a novel cryptographic lottery: winners may be guessed
with tunable probability by any user or bought in exchange for resource
contributions. The traffic of those winning the lottery is prioritized through
Tor. The uncertainty of whether a buyer or a guesser is getting priority
improves the anonymity of those purchasing winners, while the performance
incentives encourage contribution. LIRA is more lightweight than prior reward
schemes that pay for service and provides better anonymity than schemes that
simply give priority to traffic originating from fast relays. We analyze
LIRA's efficiency, anonymity, and incentives, present a prototype
implementation, and describe experiments that show it indeed improves
performance for those servicing the network.
- Dissent in Numbers: Making Strong Anonymity Scale [paper] [BibTeX]
David Isaac Wolinsky, Henry Corrigan-Gibbs, Bryan Ford, and Aaron Johnson
In Proceedings of the Tenth USENIX Symposium on Operating Systems Design and Implementation (OSDI '12).
Show abstract
Current anonymous communication systems exhibit a trade-off between weak anonymity among many nodes, via onion routing, or strong anonymity among a few nodes, via DC-nets. To addresses this trade-off we introduce Dissent, a practical anonymity system that increases by 1-2 orders of magnitude the anonymity set sizes achievable using traffic-analysis-resistant techniques. Dissent's design achieves these gains via a client/server architecture, in which many unreliable clients partially offload communication and computation costs to a smaller and more robust, but decentralized, set of servers. Clients trust only that at least one server in the set is honest, but need not know or choose which server to trust. Unlike the quadratic costs of prior peer-to-peer DC-nets schemes, Dissent's client/server design reduces communication and processing costs to linear in the number of clients, and hence in anonymity set size. Further, Dissent's servers can unilaterally ensure progress, even if clients respond slowly or disconnect at arbitrary times, ensuring robustness against client churn, tail latencies, and DoS attacks. On PlanetLab, Dissent scales to 2,000 online participants and offers latencies as low as 3 seconds for network sizes of 500. An anonymous Web browsing application also shows that Dissent's performance suffices for interactive communication within smaller local-area groups.
- Probabilistic Analysis of Onion Routing in a Black-box Model [paper] [BibTeX]
Joan
Feigenbaum, Aaron Johnson, and Paul Syverson
In ACM Transactions on Information and System Security (TISSEC), Volume 15 Issue 3, November 2012.
Show abstract
We perform a probabilistic analysis of onion routing. The analysis is presented in a black-box model of
anonymous communication in the Universally Composable framework that abstracts the essential properties
of onion routing in the presence of an active adversary who controls a portion of the network and knows all
a priori distributions on user choices of destination. Our results quantify how much the adversary can gain
in identifying users by exploiting knowledge of their probabilistic behavior. In particular, we show that, in
the limit as the network gets large, a user u's anonymity is worst either when the other users always choose
the destination u is least likely to visit or when the other users always choose the destination u chooses. This
worst-case anonymity with an adversary that controls a fraction b of the routers is shown to be comparable to the best-case anonymity against an adversary that controls a fraction
√b.
- Scalable Anonymous Group Communication in the Anytrust Model [paper] [BibTeX]
David Isaac Wolinsky, Henry Corrigan-Gibbs, Bryan Ford, and Aaron Johnson
In Proceedings of the Fifth European Workshop on System Security (EuroSec 2012).
Show abstract
Anonymous communication capabilities are useful and desirable, but practical onion routing approaches are vulnerable to traffic analysis and DoS attacks — especially against a powerful adversary, such as a repressive government or compromised ISP. To fill this gap we introduce D3, the first practical anonymous group communication system offering anonymity and disruption resistance against strong traffic analysis and collusion attacks, with scalability to hundreds of group members and quick response to churn. D3 builds on a trust model we call anytrust. Anytrust is a decentralized client/server network model, in which each of many clients — representing group members — trust only that at least one of a smaller but diverse set of “server” or “super-peers” behaves honestly, but clients need not know which server to trust. By combining and adapting verifiable shuffle and DC-nets techniques to anytrust, D3 achieves short shuffle latencies and efficient tree-based DC-nets ciphertext combining, while guaranteeing message anonymity and integrity, transmission proportionality among group members, and adaptability to both network/node failures and active disruption. Experiments with a working prototype demonstrate that D3 is practical at scales infeasible in prior systems offering comparable security.
- Trust-based Anonymous Communication: Adversary Models and Routing Algorithms [paper] [BibTeX]
Aaron Johnson, Paul Syverson, Roger Dingledine, and Nick Mathewson
In Proceedings of the 18th ACM Conference on Computer and Communications Security (CCS 2011).
Show abstract
We introduce a novel model of routing security that incorporates the ordinarily overlooked variations in trust that users have for different parts of the network. We focus on anonymous communication, and in particular onion routing, although we expect the approach to apply more broadly.
This paper provides two main contributions. First, we present a novel model to consider the various security concerns for route selection in anonymity networks when users vary their trust over parts of the network. Second, to show the usefulness of our model, we present as an example a new algorithm to select paths in onion routing. We analyze its effectiveness against deanonymization and other information leaks, and particularly how it fares in our model versus existing algorithms, which do not consider trust. In contrast to those, we find that our trust-based routing strategy can protect anonymity against an adversary capable of attacking a significant fraction of the network.
- Preventing Active Timing Attacks in Low-Latency Anonymous Communication (Extended Abstract) [paper] [BibTeX]
Joan
Feigenbaum, Aaron Johnson, and Paul Syverson
In Proceedings of the 10th Privacy Enhancing Technologies Symposium (PETS 2010).
Show abstract
Low-latency anonymous communication protocols in general,
and the popular onion-routing protocol in particular, are broken
against simple timing attacks. While there have been few proposed solutions
to this problem when the adversary is active, several padding
schemes have been proposed to defend against a passive adversary that
just observes timing patterns. Unfortunately active adversaries can break
padding schemes by inserting delays and dropping messages.
We present a protocol that provides anonymity against an active adversary
by using a black-box padding scheme that is effective against a passive
adversary. Our protocol reduces, in some sense, providing anonymous
communication against active attacks to providing a padding scheme
against passive attacks.
Our analytical results show that anonymity can be made arbitrarily good
at the cost of some added latency and required bandwidth. We also perform
measurements on the Tor network to estimate the real-world performance
of our protocol, showing that the added delay is not excessive.
- More Anonymous Onion Routing Through Trust [paper] [BibTeX]
Aaron Johnson and Paul
Syverson
In Proceedings of the 22nd IEEE Computer Security Foundations
Symposium (CSF 2009).
Show abstract
We consider using trust information to improve the security
of onion-routing networks. In particular, we introduce a model of trust in
network nodes and use it to design path-selection strategies that minimize the
probability that the adversary can successfully perform a correlation
attack. We first describe the general case in which onion routers can be
assigned arbitrary levels of trust. Selecting a strategy can be formulated in a
straightforward way as a linear program, but it is exponential in size. We thus
analyze a natural simplification of path selection for this case. More
importantly, however, when choosing routes in practice, only a very coarse
assessment of trust in specific onion routers is likely to be
feasible. Therefore, we focus next on the special case in which there are only
two trust levels. For this more practical case we identify three optimal
route-selection strategies such that at least one is optimal, depending on the
trust levels of the two classes, their size, and the reach of the
adversary. This can yield practical input into routing decisions . We set out
the relevant parameters and choices for making such decisions.
-
Online and Offline Selling in Limit Order Markets [paper] [BibTeX]
Kevin L. Chang and Aaron Johnson
In Proceedings of the 4th International Workshop on Internet and
Network Economics (WINE 2008).
Show abstract
Completely automated electronic securities exchanges and algorithms for trading
in these exchanges have become very important for modern finance. Kakade et
al. introduced the limit order market model, which is a prevalent paradigm in
electronic markets. In this paper, we consider both online and offline
algorithms for maximizing revenue when selling in limit order markets. We first
prove that the standard reservation price algorithm has an optimal competitive
ratio for this problem. This ratio is not constant, and so we consider
computing solutions offline. We show that the offline optimization problem is
NP-hard, even for very restricted instances. We complement the hardness result
by presenting an approximation scheme that runs in polynomial time for a wide class of instances.
-
Probabilistic Analysis of Onion Routing in a Black-box
Model (Extended abstract) [paper] [BibTeX]
Joan
Feigenbaum, Aaron Johnson, and Paul Syverson
In Proceedings of the 2007 ACM Workshop on Privacy in the Electronic
Society (WPES 2007).
Show abstract
We perform a probabilistic analysis of onion routing. The
analysis is presented in a black-box model of anonymous
communication that abstracts the essential properties of onion
routing in the presence of an active adversary that controls a
portion of the network and knows all a priori distributions
on user choices of destination. Our results quantify how
much the adversary can gain in identifying users by exploiting
knowledge of their probabilistic behavior. In particular,
we show that a user u's anonymity is worst either when the
other users always choose the destination u is least likely to
visit or when the other users always choose the destination u
chooses. This worst-case anonymity with an adversary that
controls a fraction b of the routers is comparable to the best-case
anonymity against an adversary p that controls a fraction
√b.
-
Private Web Search [paper] [BibTeX] [software]
Felipe Saint-Jean, Aaron Johnson, Dan Boneh, and Joan
Feigenbaum
In Proceedings of the 2007 ACM Workshop on Privacy in the Electronic
Society (WPES 2007).
Show abstract
Web search is currently a source of growing concern about
personal privacy. It is an essential and central part of most
users' activity online and therefore one through which a significant
amount of personal information may be revealed. To
help users protect their privacy, we have designed and implemented
Private Web Search (PWS), a usable client-side tool
that minimizes the information that users reveal to a search
engine. Our tool protects users against attacks that involve
active components and timing information, to which more
general Web-browsing privacy tools (including the combination
of FoxTor and Privoxy) are vulnerable. PWS is a
Firefox plugin that functions as an HTTP proxy and as a
client for the Tor anonymity network. It configures Firefox
so that search queries executed from the PWS search
box are routed through the HTTP proxy and Tor client, filtering
potentially sensitive or identifying components of the
request and response.
-
A Model of Onion Routing with Provable Anonymity [paper] [BibTeX]
Joan
Feigenbaum, Aaron Johnson, and Paul
Syverson
In Proceedings of Financial Cryptography and Data Security '07 (FC 2007).
Show abstract
Onion routing is a scheme for anonymous communication
that is designed for practical use. Until now, however, it has had no formal
model and therefore no rigorous analysis of its anonymity guarantees.
We give an IO-automata model of an onion-routing protocol and, under
possibilistic definitions, characterize the situations in which anonymity
and unlinkability are guaranteed.