Research Paper Summary - Computer Science
What is the paper about?
Questions are in attached .jpeg. Please go through the attachments
gitimate responses. Thus, both of these adversaries necessarily
create an observable artifact: the victim, if it waits sufficiently
long, will receive both the attacker’s packet and the legitimate
reply. (We employed a similar form of this anomaly to detect
TCP reset injection attacks [7].) Only an in-path adversary,
capable of blocking and modifying packets, can prevent the
legitimate reply from reaching the victim.
Although in-path approaches have more power, on-path ap-
proaches have several advantages, making their use appealing
for attackers. Censorship tools commonly use on-path rather
than in-path techniques to ease deployment and to make the
system failure and load tolerant, as the censorship system
can then operate on a traffic mirror rather than the live
traffic.1 Similarly, on-path WiFi packet injection works without
modifying drivers, but suppressing legitimate replies requires
hardware-specific access to the low-level air interface to detect
and squelch a broadcast in flight.
B. Vulnerability of current implementations
Systems that implement the DNS standard [2], [8] are
vulnerable to on-path spoofing, despite the presence of the
later legitimate reply, because the resolver attempts to “get
the answer as quickly as possible” [2]. Upon receiving a reply,
the resolver checks the ID field in the header and then will
“verify that the question section corresponds to the information
currently desired” [8]. Clearly, these steps do not provide
sufficient diligence, as the design goal of quickly returning
an answer causes the resolver to return the attacker’s value.
DNSSEC adds cryptographic authentication to prevent the
acceptance of invalid DNS replies [9], [10], [3]. Although
attackers cannot redirect victims using spoofed replies, they
can still perform denial-of-service attack, which will often
suffice to satisfy a censor’s objective. DOS occurs because
the resolver will attempt to process the attacker’s packet,
determine that the DNSSEC signature is absent or invalid,
and immediately return “Bogus”, depriving the client from
the ability to connect to the host corresponding to the name.
Because of this behavior, DNSSEC does not suffice as a
replacement for a mechanism such as Hold-On: resolvers need
to maintain an open port for a period of time in order to
attempt to validate all responses received for a query, not just
the first.
C. Other related work
DNS has a long history of poisoning attacks [4], [5], [11],
[12]. Beside those mentioned above, several previous efforts
counter DNS poisoning attack by increasing the difficulty of
blindly injecting DNS answers [13], [14], [6], [15]. These
efforts focus on deterring off-path injection by increasing the
information entropy required to match a valid DNS reply.
Our work, however, addresses the threat from attackers that
1TCP traffic control tools also have used this vantage point. For example,
Comcast deployed Sandvine’s Policy Traffic Switch devices to disrupt BitTor-
rent traffic in an on-path configuration [7], even though the devices themselves
supported in-path operation.
can observe queries, which allows them to circumvent these
previous defenses.
Poisoning attacks based on on-path injection are not limited
to DNS. Malicious injection, such as TCP RST and ICMP
unreachable messages, have been used in both individual
attacks [7] and ISP-scale censorship [16], [17]. Similar to
DNS poisoning, traffic sent from the remote peer of the
legitimate communication will still arrive at the victim after
these malicious injections. Therefore, the use of Hold-On
mechanisms similar to those explored here will likely have
applicability to deter these malicious injections as well.
III. HOLD-ON AND DILIGENT VALIDATION
As a consequence of the inability for on-path attackers
to suppress legitimate replies, we investigate the benefits of
stub resolvers or forwarders waiting for a “Hold-On” period
to allow subsequent legitimate replies to arrive. Part of this
procedure entails validating replies with more diligence when
a resolver receives two or more replies for the same question.
This improvement effectively protects against DNS injections
in the case of non-disruptive attacks, where the attacker lacks
the ability to to block either the resolver’s request or the
authority’s response.
A. Assumptions
We predicate our approach on the following assumptions,
which we view as reasonable based on our experience with
censorship activity that employs on-path injectors:
(1) The user under attack or censorship is able to access
a trustworthy recursive resolver outside of the attacked or
censored networks, such as Google Public DNS [18] and
OpenDNS [19], which they frequently use. In particular, in
the censorship case, we assume that the censor does not
block access to this resolver, which we argue is a plausible
assumption given the large number (158,364 in January 2012)
of known open resolvers [20].
(2) The attacker/censor injects fake responses according to
a blacklist rather than a whitelist. That is, the user knows
some non-sensitive domain names that can be used to measure
normal (non-interfered by the attacker) communication be-
tween the client (stub resolver) and the DNS server (recursive
resolver).
(3) The attacker injects fake replies as quickly as possible
in order to ensure that the replies arrive earlier than the
legitimate ones. Hence, the injection mechanism will transmit
immediately upon seeing the client’s request. The mechanism
cannot wait for the arrival of the legitimate reply from the
server because by doing so, the injection may arrive after it,
and fail to work.
(4) The attacker cannot construct a properly signed
DNSSEC response.
Based on these assumptions, the stub resolver can estimate
when it expects legitimate replies to arrive, in order to discern
between injected replies and correct ones.
Stub Resolver Injector Recursive Resolver
DNS QueryInjected Reply
TTL=231
Injected Reply
TTL=8
Legitimate Reply
TTL=20
Hold-On
Timer
15 seconds
expected reply:
TTL:20; timing: after 200ms
arrives after 10ms
TTL mismatch, ignore
arrives after 250 ms
TTL match; accept
stop Hold-On Timer
arrives after 50ms
TTL mismatch, ignore
Fig. 1. Hold-On while waiting for a legitimate DNS reply.
B. Hold-On and Validation
The stub resolver or forwarder needs to first learn the
expected RTT and hop-count distance (in terms of expected
TTL) associated with communication involving its remote
recursive resolver, which it does using active measurement.
(Recall that we presume the remote resolver lies outside of
the censored network.) Upon start-up, the resolver issues a
series of non-sensitive queries to measure the initial RTT and
TTL seen on arriving replies for entries cached at the remote
resolver by repeatedly querying for the same name. During this
period, the resolver maintains an open port for an additional
period to validate that an on-path adversary has not tampered
with these initial measurements by injecting replies. During
normal operation, the stub resolver also continually updates
these values based on passive measurements of its ongoing
traffic.
Given estimates of the legitimate RTT and TTL, the resolver
works as shown in Figure 1:
(1) After issuing a DNS query, the resolver starts its Hold-
On timer. A natural setting for the timer would be 15 seconds,
as this reflects the default timeout value for both the BIND
resolver [21, p. 108] and Microsoft Windows [22]. Naturally,
in most cases the resolver will return much sooner, unless the
remote resolver is unreachable.
(2) When the resolver expects a DNSSEC-protected re-
sponse, for each reply it performs a local signature validation.
It returns to the client the first fully validated reply. If it
finds all replies as either Insecure, Bogus, or Indeterminate [3,
p. 20], and the Hold-On timer expires, the resolver returns a
DNSSEC error.
(3) Without DNSSEC, upon receiving a reply before the
Hold-On timer expires, the resolver performs two additional
validations:
• Timing. Does the reply arrives too early? The test we
use here is for replies that arrive sooner than half of
the expected (measurement-derived) RTT. We note that
the resolver could also determine this threshold more
precisely by measuring known injections in the resolver’s
actual environment by generating queries for censored
names to non-existent resolvers.
• TTL. Does the TTL field in the IP header have the
expected value(s)? We assume that the route between the
remote DNS server and the client is stable in at least short
periods (such as 5 minutes), so we can get and update
the expected TTLs by periodical measurement.
Upon observing either of the above mismatches, the resolver
ignores the response and continues to wait. If on the other hand
a reply arrives before the Hold-On time expires and validates
based on the above tests, the resolver accepts the new reply
and returns it to the client.
If the stub resolver receives no valid reply before the Hold-
On timer expires, it returns the latest non-validating reply it
observed. Doing so means that in the presence of significantly
changed network conditions, users experience delay, but not
inadvertent blocking of their access.
In most cases, the resolver will not wait until the Hold-
On timer timing out; it will stop waiting upon receipt of a
legitimate response. Thus, generally this approach will not
cause extra delay, except in the case that network conditions
have changed such that legitimate replies now return sooner
and without DNSSEC protection.
IV. FEASIBILITY ASSESSMENT
To assess the viability of our approach, we investigate the
phenomenon of observing multiple replies for a single DNS
query in both a censored network and a non-censored network.
In the latter, we look at whether normal DNS traffic generates
such replies; that is, whether Hold-On and validation could
cause significant false positives. In the censored network,
we assess how different the injected replies appear from the
legitimate ones, which indicates whether the approach could
suffer from significant false negatives.
A. Observation in an uncensored network
We can view use of the Hold-On approach as a form of
anomaly detector, looking for a condition that represents an
attack. Although it is clear that a packet-injection based DNS
attack must create an anomaly where the client receives two
distinct replies, we must ensure that normal DNS traffic does
not generate these anomalies, as, in some cases, there may
be no effective resolution beyond simply noting the attack
and returning no valid answer if it proves impossible to
heuristically distinguish an attacker’s packet from a legitimate
non-DNSSEC signed reply. If the resolver simply ignores
replies it cannot validate (and returns the last such, if no valid
replies are received), then such anomalies arising in legitimate
traffic will not in fact cause any problems. If, however, the
resolver flags such replies as reflecting an attack, then these
false positives will incur a degree of collateral damage.
We developed a Bro [23] IDS policy script to directly
detect anomalous secondary DNS replies. This script operates
by tracking all DNS requests and matching replies, checking
any subsequent reply that arrives within a 1-minute timeout2
to determine whether the number of records in the reply
and the contents of each are unchanged. We validated that
this script accurately detects attack packets using traces of
injected packets we captured by sending DNS query requests
that transited a network that uses DNS-based packet-injection
censorship.
We ran this script against 6 days of normal DNS traffic
captured at ICSI’s border, consisting of 11,700,000 DNS
requests.3 During this period we observed no DNS anomalies
that would create a false positive, only deliberate testing
intended to trigger a DNS censorship system.
Running on a 1.5 hour trace gathered in August 2011
at the UC Berkeley campus border (a total of 15.2M DNS
transactions,4 both inbound and outbound), we observed two
benign authorities that triggered the basic anomaly detector.
The first server, an authority server for the BBC, returned two
distinct replies for the same query for several names. Although
distinct in value, both values were within the same /24 subnet.
The second, an authority for businessinsider.com, returned
two values for the same query. The first reply was a CNAME
to an external domain with the root authority information
included in the reply, while the second was a SERVFAIL
containing the same CNAME but no authority or additional
fields, triggering the alert.
We also observed both multiple incidents of DNS censorship
(caused by local users configured to use resolvers in a censored
country) and a few false-positives due to script bugs that would
not disrupt a Hold-On resolver.
B. Observation in a censored network
To assess potential false negatives, we trigger a DNS censor-
ship system to inject DNS replies with sensitive domain names
(such as twitter.com). We generated these measurements from
within the censored network, communicating with destinations
outside the censored network. To differentiate the legitimate
from the injected replies, we first query a non-existent DNS
2We chose a longer timeout to be conservative in this analysis, attempting
to detect potential anomalies that would not affect a resolver using Hold-On.
3We excluded lookups issues by an ICSI measurement tool.
4Excluding a known high-volume DNS crawler used for research.
Fig. 2. Comparison of arrival times for legitimate packets (green cross) and
packets injected by censor (red plus)
Fig. 3. Comparison of TTLs for legitimate packets (green cross) and packets
injected by censor (red plus)
server outside the censored network with sensitive names, and
we receive only injected replies. We then query an open DNS
server with non-sensitive names (such as www.mit.edu), by
which we receive only legitimate replies.
With this method, we collected a data trace including
⇡ 100,000 queries and corresponding replies over 9 days.
Figures 2 and 3 show comparisons of RTTs and TTLs observed
of legitimate DNS packets and injected packets by the DNS
censor. It appears not difficult to identify the legitimate packets
from injected. Most injected packets arrive much earlier than
legitimate ones because the injector and the client reside within
the same ISP, while the DNS server resides in another country.
We found the values of IP TTL from the legitimate DNS
responses are quite stable over a period of 9 days (either 44
or 42), but the TTL value of the injected packets varied in the
range of [0–255], presumably to avoid simple filtering.
In another 10-hour trace, we select one pair of (RTT, TTL)
every 5 minutes, and use this as the expected RTT and TTL
to validate other packets in the following time window. In
our experiment, we change the threshold of TTL and RTT
Stub Resolver
(End User)
DNS
Forwarder
DNS
proxy
Recursive
Resolver
Attacker
(Injector)
Fig. 4. Environment of DNS proxy
to evaluate the false positive rate and false negative rate,
as shown in Table I. For example, if we set the threshold
of TTL to 1 (that is, the reply is valid only if TTL 2
[expected TTL�1, expected TTL+1]) and set the threshold
of RTT to 0.5 · expected RTT (that is, the reply is valid only
if it does not arrive 0.5 · expected RTT earlier than expected),
then the approach does not generate any false positives or
negatives.
TTL threshold RTT threshold FP (\%) FN (\%)
0-2 0.5 0 0
3 0.5 0 0.01
4 0.5 0 0.06
5 0.5 0 0.07
6 0.5 0 0.10
7 0.5 0 0.11
2 0.1 5.96 0
2 0.2 1.53 0
2 0.3-0.8 0 0
2 0.9 0 0.31
TABLE I
FALSE POSITIVE (FP) AND FALSE NEGATIVE (FN) RATES
CORRESPONDING TO DIFFERENT THRESHOLDS FOR IP TTL AND RTT
DIFFERENCES.
V. IMPLEMENTATION AND EVALUATION
We implemented a DNS proxy to explore how Hold-On
works in practice. The proxy operates as a DNS forwarder that
aims to protect against DNS injection by on-path adversaries,
as illustrated in Figure 4.
A. Design and implementation of a DNS proxy
To estimate the expected RTT and TTL to/from the remote
recursive resolver, the proxy issues requests upon start-up
for non-sensitive names.5 To estimate the RTT, the resolver
queries the same name multiple times, selecting the minimum
of RTT observed. The resolver excludes the first query, be-
cause it might include additional time consumed by the server
to resolve the name recursively, rather than answering from its
cache. The expected TTL(s) should typically remain constant,
but could vary due to routing changes.6 We assume that the set
of expected TTLs does not vary in a measurement period (see
5It could instead simply monitor initial queries for duplicate replies, and
formulate its estimates from those that engender only a single reply. Doing so
would also help with combating injection from attackers who have different
goals than censorship.
6A potentially pathological case would be replies that vary across a set
of arriving TTL values due to the use of per-flow load-balancing that causes
different replies to take different routes.
Algorithm 1 Hold-On and Validation for DNS Proxy
Timeout 5
while GetDNSRequestFromClient(request) do
retry 1; gotAnyReply false
repeat
ForwardRequestToResolver(Resolver, request);
StartHoldOnTimer(retry · Timeout);
while NOT Timeout and GetDNSReply(replyPkt)
do
gotAnyReply true { from server or injector}
if ValidateDNSSEC OK(replyPkt) then
SendDNSReplyToClient(replyPkt.msg)
StopHoldOnTimer()
return
else if ValidateTTL OK(replyPkt.ipTTL)
and ValidateRTT OK(replyPkt.RTT ) then
SendDNSReplyToClient(replyPkt.msg)
StopHoldOnTimer()
return
else
DropAndLog(replyPkt)
end if
end while
retry retry + 1
until retry == 3
if gotAnyReply then
{No valid reply, return the latest non-validating reply}
SendDNSReplyToClient(replyPkt.msg)
end if
end while
below). In our current implementation, the set has only one
value. During its normal operation, a separate thread repeats
this measurement (see § IV-B) periodically (such as every
5 minutes) and updates the expected RTT and TTL values
adapted to potential change of network status.
Algorithm 1 details how the proxy processes with DNS
requests and replies. When the proxy receives a DNS request
from its client (end user or DNS forwarder), it forwards the
request to the remote recursive resolver and starts the Hold-On
timer. We set the initial value of the timer to 5 seconds; if no
legitimate reply after the timer expires, we reset the timer to
10s for the second try, and similarly to 15s for the third try.
If the proxy receives a DNS reply (from either the remote re-
cursive resolver, or an injector), it validates both TTL and RTT
against the expected values (the expected TTLs could include
multiple values because of multiple paths to the resolver).
If the request is DNSSEC enabled, the corresponding reply
should also be checked with DNSSEC options (not imple-
mented yet in our prototype). For DNSSEC-disabled requests,
ValidateDNSSEC OK always returns false. ValidateRTT OK
and ValidateTTL OK return true if:
expected RTT �replyPkt.RTT < 0.5 · expected RTT
replyPkt.ipTTL 2 expectedTTLs
Fig. 5. For cached names, compare of the query time with Hold-On enabled
(green square) and Hold-On disabled (red cross). Most of the points in the
two set overlap, so only a green line is shown.
B. Evaluation
To evaluate the delay increased by Hold-On, we use two
sets of domain names for cached or uncached queries. For
the first set, we used a fixed prefix (“www”) with different
levels of domains appended (“.” or “null”, “com”, “org”, “net”,
“cn”, “jp”, “de”, “google.com”, “mit.edu”). For the first set,
we query each name first, let the recursive resolver cache
the result, and then measure the query time for subsequent
queries. As shown in Figure 5, the time needed for each query
approximates the RTT.
We generated the second randomly using a nonsense prefix
again with different level of domains appended. We query each
name only once.
Because the resolver must fully resolve the name for
each query, the query time includes RTT (from the proxy
to the recursive resolver) as well as the subsequent time
consumed by the resolver (from recursive resolver to au-
thoritative name servers). As shown in Figure 6, the time
varies considerably for different level of domain. For exam-
ple, it is much faster to resolve “nonsense12847.google.com”
than “nonsense2132323.de” because the resolver (in our case,
Google Public DNS) has a low-latency path to a google.com
authority, but a higher latency path to the .de authority.
As shown in Figure 5 and 6, in both cases (cached or
uncached query) Hold-On and validation do not introduce any
apparent delay.
VI. DISCUSSION
The current Hold-On implementation operates as a stub
resolver to a known-uncensored remote recursive resolver,
which enables accurate initial measurement of RTT and TTL
to enable subsequent detection of unexpected replies. Without
DNSSEC, it relies on attack packets exhibiting significant
differences in IP TTL or RTT in order to distinguish them
from legitimate replies.
Fig. 6. For uncached names, compare of query time with Hold-On enabled
(green square) and Hold-On disabled (red cross).
Stub
Resolver
Query
Injector Recursive Resolver
ReplyInjection
Fig. 7. The attacker carefully crafts the packets’ TTL and injects them with
the expected timing
If the attacker carefully crafts the attack packets’ TTLs, and
only injects them just before the likely arrival of the legitimate
reply (see Figure 7), our Hold-On technique without DNSSEC
will fail to distinguish between the legitimate reply and the
attack packet.7 However, the resolver can still detect that an at-
tack has likely occurred (due to the differing responses). If the
resolver cannot correctly determine the legitimate response, it
should conservatively not return any answer—though doing so
still enables attackers to impose denial-of-service. Hold-On
can provide a robust defense against on-path injection only
when combined with DNSSEC.
Extending our implementation to recursive resolvers will
take some additions. However, most recursive resolvers al-
ready maintain estimated RTTs to different authority servers
in order to select between nameservers. This timing data,
combined with tracking of TTLs, can enable detection (if not
protection) of injected packets.
In addition, as developed so far, our particular Hold-On
mechanism does not suffice to protect against on-path adver-
saries operating in environments that users only occasionally
7The adversary must take the risk that the injected packets arrive later than
the legitimate ones.
visit, such as public WiFi hotspots; we lack knowledge of
“non-sensitive” domains to look up in order to obtain estimates
of RTT and TTL in the presence of non-injection. In such
environments, we can in principle instead look for symptoms
of injection (not only for DNS, but potentially for DHCP, ARP,
and TCP), alert the user in some fashion, and terminate any
associated connections. Clearly, developing a robust, workable
approach along these lines will take significant investigation.
Finally, a recursive resolver should also continue to listen
for replies even after it processes and forwards an answer,
in order to detect attacks. If the resolver detects an attack,
it could revert to a more paranoid mode (performing active
measurements of RTTs and TTLs for all new resolvers to
detect subsequent attacks). In addition, it can flush all cache
entries associated with the attack if it is unable to distinguish
between the legitimate reply and the attack. (If it can tell that
the later reply is the legitimate one, the resolver can use it to
replace the cache entry.) Doing so limits the damage from the
injected packet to the improperly returned question. It also
means that the user, upon noticing a failure or redirection
for a potentially censored site, can simply attempt to reload
the site, this time receiving the valid reply for their answer,
circumventing the censorship. Finally, the extended recursive
resolver can also itself issue an additional response (with the
second reply’s contents) in response to the client’s query, so
that any client or stub that also detects mismatched replies will
be aware of the attack.
VII. CONCLUSION
Packet injection by on-path adversaries poses a particular
threat against DNS. One particular use of such injection
comes from those building DNS-based censorship tools. Even
DNSSEC-validating implementations may be vulnerable, as
injected packets can cause denial-of-service if the resolver fails
to wait for the legitimate reply before returning “Bogus”.
Such adversaries cannot however block the receipt of the
legitimate DNS reply in addition to their injection. We show
that this artifact—differing replies to the same question—
appears to occur only very rarely outside of an actual attack.
Thus, stub resolvers can use its presence to detect when packet
injection occurs.
In addition, unless censors take additional steps, imperfec-
tions in their packet injection tools can allow resolvers to not
only detect injection attacks, but also to potentially distinguish
between legitimate replies and injected answers based on
artifacts in the IP header’s TTL field and in the round-trip time
when receiving replies. Accordingly, we propose that resolvers
should utilize a Hold-On period, waiting for additional replies.
If a reply arrives too early, with an unexpected IP TTL,
or fails to validate (if DNSSEC validation occurs), the stub
waits for the potential arrival of a subsequent legitimate reply
before proceeding. For censors who take steps to match the
TTL and RTT of their injections with those expected for
legitimate replies, Hold-On still allows detection that injection
has occurred; and, for DNSSEC-enabled resolution, prevents
the censor from imposing denial-of-service on obtaining the
correct reply.
Finally, we developed a DNS forwarder that implements
Hold-On, and demonstrated that this forwarder is effective
at distinguishing legitimate replies from those injected by a
widely deployed network censorship system. Our evaluation
found that use of such forwarding imposes minimal additional
latency on undisrupted DNS queries, and thus Hold-On has
promise for enabling more robust DNS resolution in the face
of on-path censorship.
VIII. ACKNOWLEDGMENTS
This work was sponsored in part by NSF grants 0831780,
0905631, and CNS-1015835. Special thanks to Robin Sommer
for assistance with the UC Berkeley network traces.
REFERENCES
[1] G. Lowe, P. Winters, and M. L. Marcus, “The Great DNS Wall of China,”
Dec. 2007. http://cs.nyu.edu/\%7Epcw216/work/nds/final.pdf.
[2] P. Mockapetris, “Domain Names—Concepts and Facilities, RFC 1034.”
http://www.ietf.org/rfc/rfc1034.txt.
[3] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose,
“Protocol Modifications for the DNS Security Extensions,” 2005.
http://www.ietf.org/rfc/rfc4035.txt.
[4] S. M. Bellovin, “Using the Domain Name System for System Break-
ins,” in Proceedings 5th USENIX Security Symposium, 1995.
[5] D. Kaminsky, “Black ops 2008: It’s the end of the cache as we know
it,” Black Hat USA, 2008.
[6] D. Dagon, M. Antonakakis, P. Vixie, T. Jinmei, and W. Lee, “Increased
DNS Forgery Resistance Through 0x20-bit Encoding: security via leet
queries,” in Proceedings of the 15th ACM conference on Computer and
communications security, pp. 211–222, ACM, 2008.
[7] N. Weaver, R. Sommer, and V. Paxson, “Detecting Forged TCP Reset
Packets,” in NDSS’09, 2009.
[8] P. Mockapetris, “Domain Names—Implementation and Specification,
RFC 1035.” http://www.ietf.org/rfc/rfc1035.txt.
[9] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose,
“DNS Security Introduction and Requirement,” 2005.
http://www.ietf.org/rfc/rfc4033.txt.
[10] R. Arends, R. Austein, M. Larson, D. Massey, and S. Rose,
“Resource Records for the DNS Security Extensions,” 2005.
http://www.ietf.org/rfc/rfc4034.txt.
[11] D. Dagon, N. Provos, C. Lee, and W. Lee, “Corrupted DNS Resolution
Paths: The Rise of a Malicious Resolution Authority,” in Proceedings
of Network and Distributed Security Symposium, 2008.
[12] P. Roberts, “Chinese DNS Tampering A Big Threat To Internet Security,”
2010. https://threatpost.com/en\%5Fus/blogs/chinese-dns-tampering-big-
threat-internet-security-112410.
[13] J. G. Høy, “Anti DNS Spoofing-Extended Query ID (XQID).”
http://www.jhsoft.com/dns-xqid.htm, 2008.
[14] D. Atkins and R. Austein, “Threat Analysis of the Domain Name System
(DNS),” RFC3833, 2004.
[15] B. Hubert and R. Mook, “Measures for Making DNS
More Resilient against Forged Answers, RFC 5452,” 2009.
http://tools.ietf.org/html/rfc5452.
[16] R. Clayton, S. Murdoch, and R. Watson, “Ignoring the great firewall of
china,” in Privacy Enhancing Technologies, pp. 20–35, Springer, 2006.
[17] J. R. Crandall and E. Barr, “Conceptdoppler: A weather tracker for
internet censorship,” in 14th ACM Conference on Computer and Com-
munications Security, 2007.
[18] “Google Public DNS.” http://code.google.com/speed/public-dns/.
[19] …
Congestion Avoidance and Control�
Van Jacobsony
Lawrence Berkeley Laboratory
Michael J. Karelsz
University of California at Berkeley
November, 1988
Introduction
Computer networks have experienced an explosive growth
over the past few years and with that growth have come severe
congestion problems. For example, it is now common to see
internet gateways drop 10\% of the incoming packets because
of local buffer overflows. Our investigation of some of these
problems has shown that much of the cause lies in transport
protocol implementations (not in the protocols themselves):
The ‘obvious’ ways to implement a window-based transport
protocol can result in exactly the wrong behavior in response
to network congestion. We give examples of ‘wrong’ behav-
ior and describe some simple algorithms that can be used to
make right things happen. The algorithms are rooted in the
idea of achieving network stability by forcing the transport
connection to obey a ‘packet conservation’ principle. We
show how the algorithms derive from this principle and what
effect they have on traffic over congested networks.
In October of ’86, the Internet had the first of what became
a series of ‘congestion collapses’. During this period, the
data throughput from LBL to UC Berkeley (sites separated
by 400 yards and two IMP hops) dropped from 32 Kbps to 40
bps. We were fascinated by this sudden factor-of-thousand
drop in bandwidth and embarked on an investigation of why
things had gotten so bad. In particular, we wondered if the
4.3bsd (BerkeleyUnix) tcp was mis-behaving or if it could
be tuned to work better under abysmal network conditions.
The answer to both of these questions was “yes”.
� Note: This is a very slightly revised version of a paper originally pre-
sented at SIGCOMM ’88 [11]. If you wish to reference this work, please
cite [11].
y This work was supported in part by the U.S. Department of Energy
under Contract Number DE-AC03-76SF00098.
z This work was supported by the U.S. Department of Commerce, Na-
tional Bureau of Standards, under Grant Number 60NANB8D0830.
Since that time, we have put seven new algorithms into
the 4bsd tcp:
(i) round-trip-time variance estimation
(ii) exponential retransmit timer backoff
(iii) slow-start
(iv) more aggressive receiver ack policy
(v) dynamic window sizing on congestion
(vi) Karn’s clamped retransmit backoff
(vii) fast retransmit
Our measurements and the reports of beta testers suggest that
the final product is fairly good at dealing with congested con-
ditions on the Internet.
This paper is a brief description of (i) – (v) and the ra-
tionale behind them. (vi) is an algorithm recently developed
by Phil Karn of Bell Communications Research, described
in [15]. (vii) is described in a soon-to-be-published RFC
(Arpanet “Request for Comments”).
Algorithms (i) – (v) spring from one observation: The
flow on atcp connection (oriso tp-4 or Xeroxns spp con-
nection) should obey a ‘conservation of packets’ principle.
And, if this principle were obeyed, congestion collapse would
become the exception rather than the rule. Thus congestion
control involves finding places that violate conservation and
fixing them.
By ‘conservation of packets’ we mean that for a connec-
tion ‘in equilibrium’, i.e., running stably with a full window
of data in transit, the packet flow is what a physicist would
call ‘conservative’: A new packet isn’t put into the network
until an old packet leaves. The physics of flow predicts that
systems with this property should be robust in the face of
congestion.1 Observation of the Internet suggests that it was
not particularly robust. Why the discrepancy?
1 A conservative flow means that for any given time, the integral of the
packet density around the sender–receiver–sender loop is a constant. Since
2 CONSERVATION AT EQUILIBRIUM: ROUND-TRIP TIMING 2
There are only three ways for packet conservation to fail:
1. The connection doesn’t get to equilibrium, or
2. A sender injects a new packet before an old packet has
exited, or
3. The equilibrium can’t be reached because of resource
limits along the path.
In the following sections, we treat each of these in turn.
1 Getting to Equilibrium: Slow-start
Failure (1) has to be from a connection that is either starting
or restarting after a packet loss. Another way to look at the
conservation property is to say that the sender uses acks as
a ‘clock’ to strobe new packets into the network. Since the
receiver can generate acks no faster than data packets can get
through the network, the protocol is ‘self clocking’ (fig. 1).
Self clocking systems automatically adjust to bandwidth and
delay variations and have a wide dynamic range (important
considering thattcp spans a range from 800 Mbps Cray chan-
nels to 1200 bps packet radio links). But the same thing that
makes a self-clocked system stable when it’s running makes
it hard to start — to get data flowing there must be acks to
clock out packets but to get acks there must be data flowing.
To start the ‘clock’, we developed aslow-startalgorithm
to gradually increase the amount of data in-transit.2 Al-
though we flatter ourselves that the design of this algorithm
is rather subtle, the implementation is trivial — one new state
variable and three lines of code in the sender:
� Add acongestion window, cwnd, to the per-connection
state.
� When starting or restarting after a loss, set cwnd to one
packet.
� On each ack for new data, increase cwnd by one packet.
packets have to ‘diffuse’ around this loop, the integral is sufficiently contin-
uous to be a Lyapunov function for the system. A constant function trivially
meets the conditions for Lyapunov stability so the system is stable and any
superposition of such systems is stable. (See [2], chap. 11–12 or [20], chap. 9
for excellent introductions to system stability theory.)
2 Slow-start is quite similar to thecute algorithm described in [13]. We
didn’t know aboutcute at the time we were developing slow-start but we
should have—cute preceded our work by several months.
When describing our algorithm at the Feb., 1987, Internet Engineering
Task Force (IETF) meeting, we called itsoft-start, a reference to an elec-
tronics engineer’s technique to limit in-rush current. The nameslow-start
was coined by John Nagle in a message to the IETF mailing list in March,
’87. This name was clearly superior to ours and we promptly adopted it.
� When sending, send the minimum of the receiver’s
advertised window and cwnd.
Actually, the slow-start window increase isn’t that slow:
it takes timeR log2 W whereR is the round-trip-time and
W is the window size in packets (fig. 2). This means the
window opens quickly enough to have a negligible effect
on performance, even on links with a large bandwidth–delay
product. And the algorithm guarantees that a connection will
source data at a rate at most twice the maximum possible
on the path. Without slow-start, by contrast, when 10 Mbps
Ethernet hosts talk over the 56 Kbps Arpanet via IP gateways,
the first-hop gateway sees a burst of eight packets delivered at
200 times the path bandwidth. This burst of packets often puts
the connection into a persistent failure mode of continuous
retransmissions (figures 3 and 4).
2 Conservation at equilibrium:
round-trip timing
Once data is flowing reliably, problems (2) and (3) should
be addressed. Assuming that the protocol implementation
is correct, (2) must represent a failure of sender’s retransmit
timer. A good round trip time estimator, the core of the
retransmit timer, is the single most important feature of any
protocol implementation that expects to survive heavy load.
And it is frequently botched ([26] and [12] describe typical
problems).
One mistake is not estimating the variation,�R, of the
round trip time,R. From queuing theory we know thatR and
the variation inR increase quickly with load. If the load is
� (the ratio of average arrival rate to average departure rate),
R and�R scale like(1��)
�1 . To make this concrete, if the
network is running at 75\% of capacity, as the Arpanet was
in last April’s collapse, one should expect round-trip-time to
vary by a factor of sixteen (�2� to +2�).
The tcp protocol specification[23] suggests estimating
mean round trip time via the low-pass filter
R �R + (1��)M
whereR is the averagertt estimate,M is a round trip time
measurement from the most recently acked data packet, and
� is a filter gain constant with a suggested value of 0.9. Once
the R estimate is updated, the retransmit timeout interval,
rto, for the next packet sent is set to�R.
The parameter� accounts forrtt variation (see [4], sec-
tion 5). The suggested� = 2 can adapt to loads of at most
30\%. Above this point, a connection will respond to load
increases by retransmitting packets that have only been de-
layed in transit. This forces the network to do useless work,
2 CONSERVATION AT EQUILIBRIUM: ROUND-TRIP TIMING 3
Figure 1:Window Flow Control ‘Self-clocking’
PrPr
ArArAs
PbPb
ReceiverReceiverSenderSender
AbAb
This is a schematic representation of a sender and receiver on high bandwidth networks connected
by a slower, long-haul net. The sender is just starting and has shipped a window’s worth of packets,
back-to-back. The ack for the first of those packets is about to arrive back at the sender (the vertical
line at the mouth of the lower left funnel).
The vertical dimension is bandwidth, the horizontal dimension is time. Each of the shaded boxes is
a packet. Bandwidth� Time = Bits so the area of each box is the packet size. The number of bits
doesn’t change as a packet goes through the network so a packet squeezed into the smaller long-haul
bandwidth must spread out in time. The timePb represents the minimum packet spacing on the
slowest link in the path (thebottleneck). As the packets leave the bottleneck for the destination net,
nothing changes the inter-packet interval so on the receiver’s net packet spacingPr = Pb . If the
receiver processing time is the same for all packets, the spacing between acks on the receiver’s net
Ar = Pr = Pb . If the time slotPb was big enough for a packet, it’s big enough for an ack so the
ack spacing is preserved along the return path. Thus the ack spacing on the sender’s netAs = Pb .
So, if packets after the first burst are sent only in response to an ack, the sender’s packet spacing will
exactly match the packet time on the slowest link in the path.
wasting bandwidth on duplicates of packets that will even-
tually be delivered, at a time when it’s known to be having
trouble with useful work. I.e., this is the network equivalent
of pouring gasoline on a fire.
We developed a cheap method for estimating variation
(see appendix A)3 and the resulting retransmit timer essen-
tially eliminates spurious retransmissions. A pleasant side
effect of estimating� rather than using a fixed value is that
low load as well as high load performance improves, partic-
ularly over high delay paths such as satellite links (figures 5
and 6).
Another timer mistake is in the backoff after a retrans-
mit: If a packet has to be retransmitted more than once, how
should the retransmits be spaced? For a transport endpoint
embedded in a network of unknown topology and with an
3 We are far from the first to recognize that transport needs to estimate both
mean and variation. See, for example, [5]. But we do think our estimator is
simpler than most.
unknown, unknowable and constantly changing population
of competing conversations, only one scheme has any hope
of working—exponential backoff—but a proof of this is be-
yond the scope of this paper.4 To finesse a proof, note that
a network is, to a very good approximation, a linear system.
That is, it is composed of elements that behave like linear op-
erators — integrators, delays, gain stages, etc. Linear system
theory says that if a system is stable, the stability is exponen-
tial. This suggests that an unstable system (a network subject
4 See [7]. Several authors have shown that backoffs ‘slower’ than ex-
ponential are stable given finite populations and knowledge of the global
traffic. However, [16] shows that nothing slower than exponential behav-
ior will work in the general case. To feed your intuition, consider that an IP
gateway has essentially the same behavior as the ‘ether’ in an ALOHA net or
Ethernet. Justifying exponential retransmit backoff is the same as showing
that no collision backoff slower than an exponential will guarantee stability
on an Ethernet. Unfortunately, with an infinite user population even ex-
ponential backoff won’t guarantee stability (although it ‘almost’ does—see
[1]). Fortunately, we don’t (yet) have to deal with an infinite user population.
3 ADAPTING TO THE PATH: CONGESTION AVOIDANCE 4
Figure 2:The Chronology of a Slow-start
11
22
33
11
One Round Trip TimeOne Round Trip Time
0R0R
1R1R
22
44
55
33
66
77
2R2R
44
88
99
55
1010
1111
66
1212
1313
77
1414
1515
3R3R
One Packet TimeOne Packet Time
The horizontal direction is time. The continuous time line has been chopped into one-round-trip-time
pieces stacked vertically with increasing time going down the page. The grey, numbered boxes are
packets. The white numbered boxes are the corresponding acks. As each ack arrives, two packets
are generated: one for the ack (the ack says a packet has left the system so a new packet is added to
take its place) and one because an ack opens the congestion window by one packet. It may be clear
from the figure why an add-one-packet-to-window policy opens the window exponentially in time.
If the local net is much faster than the long haul net, the ack’s two packets arrive at the bottleneck
at essentially the same time. These two packets are shown stacked on top of one another (indicating
that one of them would have to occupy space in the gateway’s outbound queue). Thus the short-term
queue demand on the gateway is increasing exponentially and opening a window of sizeW packets
will require W=2 packets of buffer capacity at the bottleneck.
to random load shocks and prone to congestive collapse5 )
can be stabilized by adding some exponential damping (ex-
ponential timer backoff) to its primary excitation (senders,
traffic sources).
3 Adapting to the path: congestion
avoidance
If the timers are in good shape, it is possible to state with
some confidence that a timeout indicates a lost packet and not
5 The phrasecongestion collapse(describing a positive feedback insta-
bility due to poor retransmit timers) is again the coinage of John Nagle, this
time from [22].
a broken timer. At this point, something can be done about
(3). Packets get lost for two reasons: they are damaged in
transit, or the network is congested and somewhere on the
path there was insufficient buffer capacity. On most network
paths, loss due to damage is rare (�1\%) so it is probable that
a packet loss is due to congestion in the network.6
6 Because a packet loss empties the window, the throughput of any win-
dow flow control protocol is quite sensitive to damage loss. For an RFC793
standard TCP running with windoww (wherew is at most the bandwidth-
delay product), a loss probability ofp degrades throughput by a factor of
(1 + 2pw)�1 . E.g., a 1\% damage loss rate on an Arpanet path (8 packet
window) degradestcp throughput by 14\%.
The congestion control scheme we propose is insensitive to damage loss
until the loss rate is on the order of the window equilibration length (the
number of packets it takes the window to regain its original size after a loss).
If the pre-loss size isw, equilibration takes roughlyw2=3 packets so, for the
3 ADAPTING TO THE PATH: CONGESTION AVOIDANCE 5
A ‘congestion avoidance’ strategy, such as the one pro-
posed in [14], will have two components: The network must
be able to signal the transport endpoints that congestion is
occurring (or about to occur). And the endpoints must have a
policy that decreases utilization if this signal is received and
increases utilization if the signal isn’t received.
If packet loss is (almost) always due to congestion and
if a timeout is (almost) always due to a lost packet, we have
a good candidate for the ‘network is congested’ signal. Par-
ticularly since this signal is delivered automatically by all
existing networks, without special modification (as opposed
to [14] which requires a new bit in the packet headers and a
modification toall existing gateways to set this bit).
The other part of a congestion avoidance strategy, the
endnode action, is almost identical in thedec/iso scheme and
our tcp7 and follows directly from a first-order time-series
model of the network:8 Say network load is measured by
average queue length over fixed intervals of some appropriate
length (something near the round trip time). IfLi is the load
at interval i, an uncongested network can be modeled by
sayingLi changes slowly compared to the sampling time.
I.e.,
Li = N
(N constant). If the network is subject to congestion, this
zeroth order model breaks down. The average queue length
becomes the sum of two terms, theN above that accounts
for the average arrival rate of new traffic and intrinsic delay,
and a new term that accounts for the fraction of traffic left
over from the last time interval and the effect of this left-over
traffic (e.g., induced retransmits):
Li = N +
Li�1
(These are the first two terms in a Taylor series expansion of
L(t). There is reason to believe one might eventually need a
three term, second order model, but not until the Internet has
grown substantially.)
Arpanet, the loss sensitivity threshold is about 5\%. At this high loss rate, the
empty window effect described above has already degraded throughput by
44\% and the additional degradation from the congestion avoidance window
shrinking is the least of one’s problems.
We are concerned that the congestion control noise sensitivity is quadratic
in w but it will take at least another generation of network evolution to
reach window sizes where this will be significant. If experience shows this
sensitivity to be a liability, a trivial modification to the algorithm makes it
linear in w. An in-progress paper explores this subject in detail.
7 This is not an accident: We copied Jain’s scheme after hearing his
presentation at [9] and realizing that the scheme was, in a sense, universal.
8 See any good control theory text for the relationship between a system
model and admissible controls for that system. A nice introduction appears
in [20], chap. 8.
When the network is congested,
must be large and
the queue lengths will start increasing exponentially.9 The
system will stabilize only if the traffic sources throttle back
at least as quickly as the queues are growing. Since a source
controls load in a window-based protocol by adjusting the
size of the window,W , we end up with the sender policy
On congestion:
Wi = dWi�1 (d < 1)
I.e., a multiplicative decrease of the window size (which be-
comes an exponential decrease over time if the congestion
persists).
If there’s no congestion,
must be near zero and the
load approximately constant. The network announces, via a
dropped packet, when demand is excessive but says nothing
if a connection is using less than its fair share (since the net-
work is stateless, it cannot know this). Thus a connection has
to increase its bandwidth utilization to find out the current
limit. E.g., you could have been sharing the path with some-
one else and converged to a window that gives you each half
the available bandwidth. If she shuts down, 50\% of the band-
width will be wasted unless your window size is increased.
What should the increase policy be?
The first thought is to use a symmetric, multiplicative in-
crease, possibly with a longer time constant,Wi = bWi�1 ,
1 < b � 1=d. This is a mistake. The result will oscillate
wildly and, on the average, deliver poor throughput. The an-
alytic reason for this has to do with that fact that it is easy
to drive the net into saturation but hard for the net to recover
(what [17], chap. 2.1, calls therush-hour effect).10 Thus
9 I.e., the system behaves likeLi �
Li�1 , a difference equation with
the solution
Ln =
n
L0
which goes exponentially to infinity for any
> 1 .
10In fig. 1, note that the ‘pipesize’ is 16 packets, 8 in each path, but the
sender is using a window of 22 packets. The six excess packets will form
a queue at the entry to the bottleneck andthat queue cannot shrink, even
though the sender carefully clocks out packets at the bottleneck link rate.
This stable queue is another, unfortunate, aspect of conservation: The queue
would shrink only if the gateway could move packets into the skinny pipe
faster than the sender dumped packets into the fat pipe. But the system tunes
itself so each time the gateway pulls a packet off the front of its queue, the
sender lays a new packet on the end.
A gateway needs excess output capacity (i.e.,� < 1 ) to dissipate a queue
and the clearing time will scale like(1��)�2 ([17], chap. 2 is an excellent
discussion of this). Since at equilibrium our transport connection ‘wants’ to
run the bottleneck link at 100\% (� = 1 ), we have to be sure that during the
non-equilibrium window adjustment, our control policy allows the gateway
enough free bandwidth to dissipate queues that inevitably form due to path
testing and traffic fluctuations. By an argument similar to the one used to
show exponential timer backoff is necessary, it’s possible to show that an
exponential (multiplicative) window increase policy will be ‘faster’ than the
dissipation time for some traffic mix and, thus, leads to an unbounded growth
of the bottleneck queue.
4 THE GATEWAY SIDE OF CONGESTION CONTROL 6
overestimating the available bandwidth is costly. But an ex-
ponential, almost regardless of its time constant, increases so
quickly that overestimates are inevitable.
Without justification, we’ll state that the best increase pol-
icy is to make small, constant changes to the window size:11
On no congestion:
Wi = Wi�1 + u (u � Wmax)
whereW
max
is thepipesize(the delay-bandwidth product of
the path minus protocol overhead — i.e., the largest sensible
window for the unloaded path). This is the additive increase
/ multiplicative decrease policy suggested in [14] and the
policy we’ve implemented intcp. The only difference be-
tween the two implementations is the choice of constants for
d andu. We used 0.5 and 1 for reasons partially explained
in appendix D. A more complete analysis is in yet another
in-progress paper.
The preceding has probably made the congestion control
algorithm sound hairy but it’s not. Like slow-start, it’s three
lines of code:
� On any timeout, setcwnd to half the current window
size (this is the multiplicative decrease).
� On each ack for new data, increasecwnd by 1/cwnd
(this is the additive increase).12
� When sending, send the minimum of the receiver’s
advertised window andcwnd.
Note that this algorithm isonly congestion avoidance, it
doesn’t include the previously described slow-start. Since
the packet loss that signals congestion will result in a re-start,
it will almost certainly be necessary to slow-start in addition
to the above. But, because both congestion avoidance and
slow-start are triggered by a timeout and both manipulate
the congestion window, they are frequently confused. They
are actually independent algorithms with completely different
objectives. To emphasize the difference, the two algorithms
11See [3] for a complete analysis of these increase and decrease policies.
Also see [7] and [8] for a control-theoretic analysis of a similar class of
control policies.
12This increment rule may be less than obvious. We want to increase the
window by at most one packet over a time interval of lengthR (the round
trip time). To make the algorithm ‘self-clocked’, it’s better to increment
by a small amount on each ack rather than by a large amount at the end of
the interval. (Assuming, of course, the sender has effectivesilly window
avoidance (see [4], section 3) and doesn’t attempt to send packet fragments
because of the fractionally sized window.) A window of sizecwndpackets
will generate at mostcwndacks in oneR. Thus an increment of 1/cwnd
per ack will increase the window by at most one packet in oneR. In
tcp, windows and packet sizes are in bytes so the increment translates to
maxseg*maxseg/cwndwheremaxsegis the maximum segment size andcwnd
is expressed in bytes, not packets.
have been presented separately even though in practise they
should be implemented together. Appendix B describes a
combined slow-start/congestion avoidance algorithm.13
Figures 7 through 12 show the behavior oftcp connec-
tions with and without congestion avoidance. Although the
test conditions (e.g., 16 KB windows) were deliberately cho-
sen to stimulate congestion, the test scenario isn’t far from
common practice: The Arpanetimp end-to-end protocol al-
lows at most eight packets in transit between any pair of
gateways. The default 4.3bsd window size is eight packets
(4 KB). Thus simultaneous conversations between, say, any
two hosts at Berkeley and any two hosts atmit would exceed
the network capacity of theucb–mit imp path and would
lead14 to the type of behavior shown.
4 Future work: the gateway side of
congestion control
While algorithms at the transport endpoints can insure the net-
work capacity isn’t exceeded, they cannot insure fair sharing
of that capacity. Only in gateways, at the convergence of
flows, is there enough information to control sharing and fair
allocation. Thus, we view the gateway ‘congestion detection’
algorithm as the next big step.
The goal of this algorithm to send a signal to the endnodes
as early as possible, but not so early that the gateway becomes
13We have also developed a rate-based variant of the congestion avoid-
ance algorithm to apply to connectionless traffic (e.g., domain server queries,
rpc requests). Remembering that the goal of the increase and decrease poli-
cies is bandwidth adjustment, and that ‘time’ (the controlled parameter in a
rate-based scheme) appears in the denominator of bandwidth, the algorithm
follows immediately: The multiplicative decrease remains a multiplica-
tive decrease (e.g., double the interval between packets). But subtracting
a constant amount from interval doesnot result in an additive increase in
bandwidth. This approach has been tried, e.g., [18] and [24], and appears
to oscillate badly. To see why, note that for an inter-packet intervalI and
decrementc, the bandwidth change of a decrease-interval-by-constant pol-
icy is
1
I
!
1
I � c
a non-linear, and destablizing, increase.
An update policy that does result in a linear increase of bandwidth over
time is
Ii =
�Ii�1
� + Ii�1
whereIi is the interval between sends when theith packet is sent and� is
the desired rate of increase in packets per packet/sec.
We have simulated the above algorithm and it appears to perform well. To
test the predictions of that simulation against reality, we have a cooperative
project with Sun Microsystems to prototyperpc dynamic congestion control
algorithms usingnfs as a test-bed (sincenfs is known to have congestion
problems yet it would be desirable to have it work over the same range of
networks astcp).
14did lead.
4 THE GATEWAY SIDE OF CONGESTION CONTROL 7
starved for traffic. Since we plan to continue using packet
drops as a congestion signal, gateway ‘self protection’ from
a mis-behaving host should fall-out for free: That host will
simply have most of its packets dropped as the gateway trys
to tell it that it’s using more than its fair share. Thus, like
the endnode algorithm, the gateway algorithm should reduce
congestion even if no endnode is modified to do congestion
avoidance. And nodes that do implement congestion avoid-
ance will get their fair share of bandwidth and a minimum
number of packet drops.
Since congestion grows exponentially, detecting it early
is important. If detected early, small adjustments to the
senders’ windows will cure it. Otherwise massive adjust-
ments are necessary to give the net enough spare capacity
to pump out the backlog. But, given the bursty nature of
traffic, reliable detection is a non-trivial problem. Jain[14]
proposes a scheme based on averaging between queue regen-
eration points. This should yield good burst filtering but we
think it might have convergence problems under high load
or significant second-order dynamics in the traffic.15 We
plan to use some of our earlier work onarmax models for
round-trip-time/queue length prediction as the basis of de-
tection. Preliminary results suggest that this approach works
well at high load, is immune to second-order effects in the
traffic and is computationally cheap enough to not slow …
An Empirical Study of Collusion Behavior in the
Maze P2P File-Sharing System
Qiao Lian1, Zheng Zhang1, Mao Yang13, Ben Y. Zhao2, Yafei Dai3, Xiaoming Li3
Microsoft Research Asia, Beijing, China1
U. C. Santa Barbara, Santa Barbara, CA, USA2
Peking University, Beijing, China3
Abstract— Peer-to-peer networks often use incentive policies to
encourage cooperation between nodes. Such systems are generally
susceptible to collusion by groups of users in order to gain unfair
advantages over others. While techniques have been proposed to
combat web spam collusion, there are few measurements of real
collusion in deployed systems. In this paper, we report analysis
and measurement results of user collusion in Maze, a large-scale
peer-to-peer file sharing system with a non-net-zero point-based
incentive policy. We search for colluding behavior by examining
complete user logs, and incrementally refine a set of collusion
detectors to identify common collusion patterns. We find collusion
patterns similar to those found in web spamming. We evaluate
how proposed reputation systems would perform on the Maze
system. Our results can help guide the design of more robust
incentive schemes.
I. INTRODUCTION
File-sharing networks such as Kazaa and Gnutella have
popularized the peer-to-peer (P2P) resource sharing model. In
these networks, selfish users often “free-ride,” or act in their
self interests to exploit the system. Prior research has focused
on the use of incentive systems [11] to encourage sharing
among users. Despite their effectiveness, these incentives are
generally vulnerable to user collusion along with the Sybil
Attack [4], where users create and control multiple online
identities.
Very little is known about user collusion in real systems.
Measuring user collusion is extremely difficult, since it re-
quires a global view of all transactions in a system. To date,
most user studies log traffic at the edge nodes while per-
forming queries or membership operations [13], [12]. While
informative, these results reveal only a small subset of peer
transactions, and do not shed light on active user collusion.
In this paper, we present measurements of user collusion
activity in the Maze peer-to-peer file-sharing network [17].
Maze is a popular Napster-like P2P network designed, im-
plemented and deployed by an academic research team at
Peking University, Beijing China. As a measurement platform,
Maze is unique in two ways. First, our control over the
Maze software allows us to deploy and embed customized
measurement code inside clients. Second, Maze’s centralized
architecture means all control and query traffic is logged
and available to us. Maze uses a simple incentive system
where user points increase with uploads and decrease with
downloads. Our central server audits file transfers and adjusts
user points accordingly.
In our study, we define collusion as collaborative activity
of a subset of users that grants its members benefits they
would not be able to gain as individuals. We note that we
cannot determine a user’s true intent, or definitively whether
multiple online identities belong to the same user. Issues
such as dynamic addressing via DHCP and shared addresses
behind NATs prevent us from reliably detecting the use of
multiple virtual identities. Instead of measuring user intent,
our study focuses purely on observable action patterns that
produce results similar to those produced by colluding users.
Quantifying all forms of collusion is a topic to be addressed
in ongoing work.
To the best of our knowledge, this is the first empirical
study of collusion behavior in an incentive-based P2P system.
Our results validate conclusions of previous work on incentive
mechanisms [3], [8], [7], but also show that certain types of
collusion are difficult to detect and deter. Our work makes
several key contributions. First, we describe techniques to
aggregate logs and construct collusion detectors that pro-
gressively reveal collusion behaviors and their patterns. We
believe that these techniques have general applicability to other
experimental settings when detecting collusions. Second, we
find that while some behaviors are linked to our incentive
system, their patterns are nearly identical to those found in
Web spamming, albeit at smaller scales. Finally, we evaluate
the ability of the EigenTrust reputation system [6], [7] to detect
and mark colluders with lower reputations. Our results show
that under several collusion patterns, EigenTrust produces non-
ideal reputations for both well-behaved and colluding peers.
The rest of the paper is organized as follows. Section 2
gives a brief description of Maze and the measurement data
used for this study. Next, Section 3 describes each of our
collusion detectors in detail, along with results from the Maze
dataset. Then in Section 4, we apply the EigenTrust algorithm
to the Maze dataset and analyze the results. Finally, we discuss
related work in Section 5 and conclude in Section 6.
II. THE MAZE PEER-TO-PEER SYSTEM
We begin by providing necessary background information
on the Maze file-sharing system. Maze was originally deployed
to address issues of data location and load-balancing on FTP
servers as part of the T-Net Web search project [14]. T-Net’s
increasing popularity led to significantly degraded download
performance across its limited number of FTP servers. Maze
Fig. 1. The Maze point system
provided a way to distribute content while minimizing infras-
tructure costs.
At login, each Maze peer1 authenticates to a central server,
and uploads an index of its locally available shared files. The
central server maintains heartbeats with all online peers, and
its central index supports full-text queries across all shared
files in the system. In addition to searching for files via the
central index, users can browse three peer lists for files: a
friend-list, a neighborhood-list, and an altruistic list.
The friend-list is a user-controlled list of friendly peers,
initially bootstrapped as a random set of peers by the central
server. Over time, these friend-lists across users form a con-
tinuously adapting social network. A peer’s neighborhood list
is a set of peers with the same B-class IP address provided by
the server. These are peers likely to share high-bandwidth, low
latency links with the local host. Finally, the altruistic list is
a collection of peers with the highest “points” as determined
by the server. They are hosts who have contributed the most
to the system as determined by the Maze incentive system.
Their status as “celebrities” in the user population provides
additional social incentive for sharing [17].
A peer can browse and download the storage contents of
any host on these lists. As of November 2004, more than two-
thirds of all downloads are initiated through these peer lists.
As will become clear later, these social lists have unexpected
impacts on the design of a good incentive system. After each
transaction, peers involved send a report to the central server,
which adjusts their points accordingly.
A. The Maze incentive system
Maze currently operates using a point system, where peers
consume points by downloading files and earn points by
uploading files. Download requests are queued according to
their points: requestT ime−3 log P , where P is the requester’s
point total. Frequent uploads provide peers with higher points
and faster downloads. The Maze user community voted for
an asymmetric point system where uploading receives more
points than downloading the same amount of data. While this
1We use the terms “user” and “peer” interchangeably in this paper. We also
use “clients” of peer X to refer to the peers that download from X.
encourages uploads, it also allows two peers to create a net
gain in points after mutual interaction. To allow new users to
participate, Maze initializes new users with enough points to
download > 1GB of data before its downloads are throttled.
The details of the points system are as follows:
1) New users are initialized with 4096 points.
2) Uploads: +1.5 points per MB uploaded.
3) Points used per file downloaded:
• -1.0/MB downloaded within first 100MB.
• -0.7/MB per additional MB between 100MB and
400MB.
• -0.4/MB between 400MB and 800MB.
• -0.1/MB per additional MB over 800MB.
4) Service differentiation:
• Requests are ordered by T = requestT ime −
3 log P , where P is the requester’s point total.
• Users with P < 512 are limited to 200Kb/s.
B. Data collection
We perform our analysis on a log segment gathered during
the span of a one-month period from 2/19/05 to 3/24/05.
During this period, more than 161,000 users participated in
more than 32 million file transfers totaling more than 437
Terabytes. Data gathered for this study includes user point
values and the detailed traffic log. Each log entry contains
the following: uploading peer-id, downloading peer-id, upload
time, transfer start and end times (source), bytes transferred,
file size, downloader IP, file MD5 hash, and full file path.
We tried to associate online identities with the physical
machine the peer uses to detect when a user was controlling
multiple identities. We first used the hash of the hard drive
serial number, but later discovered that the serial number is
not guaranteed to be unique. Thus to uniquely identify the
machine that a peer uses, we concatenate the peer’s IP address
with the hash of the hard drive serial number. As ongoing
work, we are investigating the use of network MAC addresses
as an alternative identifier.
All logs are anonymized for user privacy. In this paper, we
refer to distinct users using common names (e.g. Alice and
Bob), and random alphabetic letters to represent 8-bit blocks
of an IP address (e.g. C.H.97.140).
III. IDENTIFYING COLLUSION TOPOLOGIES
We now discuss our efforts to detect collusion attempts in
the Maze system. Based on our experiences and analysis of the
traffic logs, we design a number of collusion detectors aimed
at locating different types of collusion patterns. We describe
these in detail in this section.
A. Repetition-based Collusion Detection
Our first attempt starts by looking at how users use uploads
to generate points. Given the point system generates a net gain
from a symmetric operation, colluders can benefit from using
only a small “working set” of files to generate points. We use
this assumption to generate our first collusion detector.
Fig. 2. Duplication degree in uploading from peer A to peer B
Detector 1: (Repetition detector) Colluders generate large
amounts of upload traffic with repeated content.
We examine our transaction log and construct a large graph,
where vertices represent individual users and edges represent
aggregated file transfers between users. This results in a
directed graph with roughly 4.5 million individual edges. Out
of all edges, 221,000 (roughly 4.9\%) contain duplicate files in
the transfer traffic. We define duplication degree as the ratio
of total upload traffic (bytes) to amount of non-duplicated data
(bytes). A high duplication degree means a small proportion
of traffic across the link is unique.
We plot the duplication degree of all edges against their
total upload traffic as a scatter plot in Figure 2. For thresholds
set at duplication degrees of 5, 10 and 20, there are 890, 148,
27 edges with duplication degree greater than each threshold.
Since this data is generated from activities performed over the
span of a month, it is highly likely that many of these peers
are actively colluding. We also note that colluders are likely to
use nearby machines to perform the transfers. Such network
locality maximizes throughput and gain from collusion. We
show this in Figure 2: Duplication degree in uploading from
peer A to peer B by classifying edges by the IP affinity
between the two peers. IP affinity confirms that edges with
high amounts of duplicate traffic are likely to be across peers
with similar IP addresses.
To better understand this behavior, we take a closer look
at the temporal distribution of duplicate traffic by individual
users. Table I lists the top-6 edges with the most duplicate
traffic. The table shows each user’s total uploads, and uploads
on the edge with the most repeat traffic (max edge). Each
table entry also includes a temporal locality graph. Each bar
stands for one day, and the height of the bar is proportional
to that day’s upload traffic. These results show that there is
strong temporal locality present. If the same file is uploaded
multiple times close in time, then it is more likely to be used
as a colluding tool than legitimate sharing.
TABLE I
TOP 6 EDGES WI TH THE MOS T REDUNDANT TRAFFIC
Fig. 3. Collusion link topology of 100 links with the highest ratio of duplicate
transfers
The temporal locality provides strong evidence that all
5 of these peers are colluding aggressively. The maximum
duplication degree is close to 43 by Cindy. David colludes with
two different peers, with non-overlapping temporal behavior.
Our data shows that the data transferred during these colluding
sessions are generally large files or directories. For example,
Alice uploaded the a 3GB DVD image 29 times.
To better understand collusion topologies, we built a vi-
sualization tool that draws edges with the highest ratios of
duplicate traffic. Figure 3 gives a snapshot of the top-100
duplicate traffic links. This figure shows collusion patterns
graphically. In addition to pair-wise collusions that exploit
the net-gain in points from transfers, we also observe more
complex 3-party and star-shaped topologies. We examine these
complex collusion behaviors in detail in the next two sections.
B. Group-based Collusion Detection
We now turn our attention to mutually colluding peers.
In Maze, group collusion occurs when peers exchange large
amount of data among themselves to earn points. This is a
consequence of the asymmetric point assignment in Maze. If
Fig. 4. Pair-wise collusion detector by the ratio of mutual upload traffic over
total traffic
two peers upload 10GB data to each other, each of them will
acquire at least 5000 points. The asymmetric point system was
a result of extensive discussions and voting on the Maze user
forum, where users wanted to encourage uploading.
Our data shows that most group collusion are pairs, and
groups of three or more are rare. Intuitively, the traffic pattern
for a pair-wise colluding group is where the traffic between
two peers outweigh their traffic with outside peers. Note that
the total traffic for a peer measured through our log segment is
a rough granularity approximation of a peer’s download/upload
rate. We define the metric pair-wise degree as the ratio of total
traffic between two peers to the sum of all traffic uploaded by
both peers. We use this as our first group collusion detector:
Detector 2: (Pair-wise detector) high rate of mutual upload
traffic compared to total uploads.
Figure 4 shows the results of applying this detector to
our dataset. There are 28,000 pairs of peers with mutual
uploads, each plotted as a single point with total uploads on
the x-axis and pair-wise degree on the y-axis. A horizontal
line is plotted for pair-wise degree equal to 0.5. Above that
line are 73 pairs of peers whose pair-wise upload exceeds
uploads to external peers. While it is possible for two friends
to share large amounts of mutually interesting content, the
highly concentrated nature of these uploads appear to indicate
collusion. Regardless of whether the users intend to collude,
this type of behavior results in artificially inflated point values
for peers who are not contributing to the community at large.
One impediment to effective collusion of any kind is con-
nectivity. It is tedious to transfer large amounts of data through
a narrow pipe just for collusion. While pair-wise collusion
across the wide-area is possible, signing on as a new user
(whitewashing [18]) is an easier alternative to replenish a
peer’s points. Thus, we assert that pair-wise collusion requires
good connectivity between peers. To verify this, we show the
IP affinity of peer pairs in Figure 4 with different symbols.
We see that most colluding peers have similar IP addresses,
TABLE II
TOP-3 MUTUAL UP LOAD PAI RS . PEER 1/EXTERNAL S HOWS ALL TRAFFIC
F ROM PEER 1 NOT GOI NG TO 2.
Peer 1 Peer 1 Mutual upload2 Peer 2 Peer 2
external external
1.7GB Fred 24GB←→ 23GB Gary 5GB
23GB Cindy 81GB←→ 27GB Harry 0GB
52GB David 62GB←→ 126GB Alice 32GB
implying that the peers are likely physically close in the
network and likely to be connected using a high bandwidth
connection.
We take a closer look at specific examples of possible
colluding peer-pairs. Table II lists the top 3 pairs ranked by
pair-wise degree. Given the asymmetric point system, even the
most unbalanced peer (Harry) will end up with a net point gain
(after uploading 27GB and then downloading 81GB).
C. Spam Account Collusion
In our repetition-based topology in Figure 3 that showed 100
links with the highest ratio of duplicate transfers, we observed
an unexpected colluding topology: the star-shaped colluding
group. The center of the star gains points by uploading to
many other peers without downloading anything in return. This
seems contrary to behavior observed in our previous colluding
patterns. The willingness of these “leaf peers” to download
duplicate content from the central peer without personal gain
is puzzling.
In reality, all peers in this topology are controlled by the
same user, and all peers collaborate to increase the point total
of the center peer. We call these leaf-peers spam accounts,
since they are created and then discarded when they expend
their initial point allocations2. This strategy is similar to the
link spam [16] problem in search engines using page rank
to sort results. One might ask why a user would prefer
this strategy to performing whitewashing (restarting with new
identities). One explanation is that users need to maintain
one persistent primary account either for social status from
higher point values or to maintain a persistent friends-list. This
strategy is also efficient because it earns points much faster
than pair-wise collusion for the same amount of traffic. For this
to work, however, the colluder must use multiple machines.
There are 4 star-shaped topologies caught by the repetition
detector in Figure 3. Ted has fan-out of 8, Mary and Sam
have fan-out of 4, and Ingrid has fan-out of 3. We take a
closer look at them in Table III. Except for Ted’s group, there
is generally strong IP address proximity between the center
peer and its leaf-peers. All of this indicates a high likelihood
of collusion. Peer Ted, however, turns out to be the Maze user
with the highest uploads of the month (3.8TB). Since its 8
edges carrying duplicate traffic shows very little IP address
similarity, Ted is likely not a colluder.
While zero-cost identities are easy to generate, physically
separate machines are expensive to obtain. This means spam
2Note that spam accounts can be produced by performing the Sybil [4]
attack.
TABLE III
PEERS S US P I CI OUS OF DOI NG S PAM ACCOUNT COLLUDI NG, AS F OUND BY
REP ETI TI ON DETECTOR
accounts can be many in number, but are likely to reside
on few machines. We define the PM ratio as the ratio of
number of peers to number of machines to describe how
densely a peer’s clients are distributed across different physical
machines.
Detector 3: (Spam account detector) high Peer to Machine
ratio can indicate spam account collusion.
We use the method described in Section II-B to associate a
peer with its machine. One problem with the PM value is the
signal to noise ratio. A single upload to a random peer counts
as an additional peer-machine pair, and significantly reduces
the PM value. We remove these noisy values by discarding
the bottom smallest uploads that, in aggregate, holds less than
20\% of all upload traffic.
Figure 5 plots all peers as points with the total uploads
made by the peer on the X-axis, and the peer’s PM ratio
on the Y-axis. Most peers have PM ratio slightly above 1
and below 2. This is statistically normal because many users
perform whitewashing. However, a number of peers have
exceptionally high PM ratios (up to 7). These peers are likely
machines generating new user accounts to help a peer collude.
Table IV lists the peers whose upload > 10GB and have
PM ratio > 3. The temporal column shows when each client
generated its peak loads of Maze traffic. Consistent temporal
collisions between virtual nodes on the same machine may
signal collusion.
We check for three additional properties of likely colluders.
First, we use IP address proximity to infer whether these
accounts have good connectivity to the center peer as expected.
Fig. 5. Spam account detection by PM ratio
Second, we check if these accounts only download data
from the center peer. Finally, we examine whether the spam
accounts perform a large amount of downloads in a relatively
short life-span. All of these heuristics confirm the likelihood
that these peers are colluding. Most spam accounts live on the
same machine; they generally download exclusively from the
center peer; and they are only active for short life spans (1-2
days).
D. Upload Traffic Concentration
Pair-wise colluding and spam account colluding share one
common trait: a high volume of uploads to a small number
of machines. We now shift our focus from the flow among
peers to among physical machines. We define the traffic
concentration degree (TC degree), as the ratio of a peer’s
highest upload traffic to a single machine to his total upload
traffic. For instance, if X uploads to 10 clients for a total of
100GB, and with 90GB going to one machine, then the TC
degree of X is 0.9. The higher the TC degree, the more likely
that the peer is performing either pair-wise or spam-account
colluding.
Detector 4: (Traffic concentration detector) peers with
exceptionally high TC degree.
We plot the results from this detector in Figure 6 where we
plot the total upload of each peer on the X-axis and its TC
degree on the Y-axis. For non-colluding users, we expect an
increase in upload traffic to correlate with more destinations
and thus a lower TC degree. Figure 6 confirms this in our data
set. Most peers who upload around 10GB have TC degrees
around 10\%. For heavy uploaders who upload around 1TB,
the TC degree drops to about 1\%.
We identify a number of potential colluders who have TC
degrees significantly higher than their peers. For example,
seven peers have uploaded more than 50GB and have TC
degrees higher than 0.6. This means that more than 60\% of
the 50GB uploaded is going to a single machine. We list these
TABLE IV
SOME TOP S PAM ACCOUNT COLLUDERS
TABLE V
TOP 7 COLLUDERS DETECTED BY TC DETECTOR
peers and their traffic in Table V. Six of these have been
detected by previous detectors. The pair-wise detector missed
the new peer Nancy because it has no pair-wise traffic with any
other peer. The spam account detector missed Nancy because
it mainly uploads to only one peer and its PM ratio is close
to one. It turns out that it ranks #7 by the repetition detector
(we listed only the top 6 in Section 3.1).
E. Detectors Compared
After presenting four different collusion detectors, we sum-
marize the top colluders discussed in earlier sections in Fig-
ure 7, and graphically show how they were detected by each
Fig. 6. Upload traffic concentration detector result
TABLE VI
SEVEN TOP COLLUDERS AND HOW OUR DETECTORS HAVE F OUND AND
MI S S ED THEM
of our four detectors. Table VI lists the top seven colluders
according to their total upload traffic. The detectors responsi-
ble for detecting each colluder is shown in shaded cells. Our
first observation is that spam-account and pair-wise colluders
do not overlap. This is logical, because the two detectors are
designed specifically with these two patterns in mind. While
colluders can engage in both activities simultaneously to evade
potential detectors, the fact that spam-account colluding is
more “cost effective” and that these detectors were applied
to logs after the fact, leads us to believe that this does not
happen in practice.
As discussed earlier, the traffic concentration detector relies
on the observation that colluders generally control relatively
few machines. Thus it examines how a peer’s upload traffic
is spread across its partners. Figure 7 shows that this detector
detects the majority of both spam-account and pair-wise col-
luders. We now examine the colluders that TC degree failed
to detect. With the exception of “Bob,” who was missed due
to a low TC degree (0.39), the other colluders it missed all
have high TC degrees (>0.8), and avoided detection only
because their total upload traffic was low. Lowering our traffic
TABLE VII
SUMMARY OF S TRENGTHS AND WEAKNES S ES OF COLLUS I ON DETECTORS.
Detector Heuristic Strength Weakness
Colluders Small colluding working set General Cannot detect randomized colluding group
Pair-wise Uploads: pair-wise > external Accurate Limited to pair-wise topologies
Spam One uploads to many Accurate Limited to spam collusion topologies
Traffic Concentration Colluder controls few hosts General Difficult to optimize parameters
Fig. 7. Venn diagram of collusion detectors
threshold from the current value of 50GB would allow us
to detect all of our colluders, showing that the TC detector
is in fact quite effective. However, because we still cannot
guarantee perfect results when mapping peers to specific
machines (in the presence of DHCP and NATs), we cannot
yet deploy the TC detector as an online collusion detector.
Finally, we are still investigating how to choose reasonable
traffic thresholds to avoid false positives.
Table VI shows that the repetition detector misses six
colluders. All six peers have too small redundant traffic on
a single link to be noticed. For example, peers Larry and
Jane have a lot of collusion traffic, but their traffic is scattered
across multiple upload links. The repetition can only be found
if we aggregate multiple links together (Figure 3). The pair-
wise detector also missed four colluders. Two among the
four colluders have little number of mutual upload with other
peers. The other two have no mutual upload with any peer at
all. Spam account detector missed five colluders. All the five
colluders evaded the spam account detector because they have
very low PM ratios.
Figure 7 shows that the repetition detector also works quite
well. However, the reason that it works at all is because the
current version of Maze has no explicit defense mechanism
against collusion. This detector can easily be circumvented
by a colluder if it simply modifies the content slightly, even
by just flipping one single bit. Also, differentiating legitimate
repeated downloads from colluders will be a challenging task.
For example, peers could lose their local cache and be required
Fig. 8. The basic EigenTrust algorithm for computing trust.
to repeat previous downloads. We have used this detector in the
study to lead the ways to other more robust detectors, taking
advantage of the very fact that colluders today do not bother
to cover their tracks by randomizing their colluding working
set. We summarized the four detectors in Table VII.
IV. EIGENTRUST AND COLLUSION
Our access to a complete view of all transactions in Maze
gives us a unique opportunity to evaluate how proposed
reputation systems from research would perform on real world
systems. In this section, we will evaluate EigenTrust [6], a
well-known reputation system that generatess a global ranking.
The EigenTrust ranking can be used for both reputation man-
agement [6] (clients choose trustworthy download sources),
and free-rider detection [7] (uploaders choose trustworthy
clients). Ideally, the ranking algorithm would assign low scores
to malicious colluders.
A. An Overview of EigenTrust
We first give a high level description of the EigenTrust
system [6]. EigenTrust calculates global trust values for all
peers based on Power iteration in peer-to-peer file-sharing
systems. The algorithm is similar to the PageRank algorithm.
First, peer i assigns peer j trust values Cij based on its
downloading experience from j. Trust values for all j are
normalized locally by each peer i. We obtain from this a
matrix C containing a measure of trust across all peer pair in
the network. The trust vector t is defined as the left principal
eigenvector of C. The component ti is called the EigenRank
of peer i, and represents the peer’s global reputation.
The basic algorithm can be further improved to enhance its
robustness against malicious users. This is done by incorpo-
rating the notion of pre-trusted peers in the set P . So, for
peer i, we define pi = 1/|P | if i ∈ P , and pi = 0 otherwise.
The algorithm is summarized in Figure 8. Parameter α is a
constant less than 1 that is used to control the level of trust
Fig. 9. The EigenRank of peers
each peer places on the pre-trusted peers. A higher value of
α implies more confidence on the pre-trusted peers.
B. Applying EigenTrust to Maze
We map the EigenTrust algorithm to Maze system as
follows. First, we define the trust value cij be proportional
to the total traffic peer i downloads from peer j during the
log period. We then normalize the local trust value cij such
that:
∑N
j=1 cij = 1. Next, we select 10 pre-trusted peers from
the Maze population by choosing ten well-known users who
we have …
Analysis of the Increase and Decreas, e
Algorithms for Congestion Avoidance
in Computer Networks
D a h - M i n g C H I U a n d Raj J A I N
Digital Equipment Corporation, 550 King Street (LKG1-2 /,419),
Littleton, MA 01460-1289, U.S.A.
1. I n t r o d u c t i o n
1.1. Background
Abstract. C o n g e s t i o n a v o i d a n c e m e c h a n i s m s allow a n e t w o r k
to o p e r a t e in t h e o p t i m a l region o f low delay a n d h i g h
t h r o u g h p u t , thereby, p r e v e n t i n g t h e n e t w o r k f r o m b e c o m i n g
c o n g e s t e d . T h i s is d i f f e r e n t f r o m the t r a d i t i o n a l c o n g e s t i o n
c o n t r o l m e c h a n i s m s t h a t allow t h e n e t w o r k to recover f r o m the
c o n g e s t e d s t a t e o f h i g h delay a n d low t h r o u g h p u t . Both c o n -
g e s t i o n a v o i d a n c e a n d c o n g e s t i o n c o n t r o l m e c h a n i s m s are basi-
cally resource m a n a g e m e n t p r o b l e m s . T h e y c a n be f o r m u l a t e d
as s y s t e m c o n t r o l p r o b l e m s in which t h e s y s t e m s e n s e s its state
a n d feeds this back to its users w h o a d j u s t their controls.
T h e key c o m p o n e n t of a n y c o n g e s t i o n a v o i d a n c e s c h e m e is
t h e a l g o r i t h m (or c o n t r o l f u n c t i o n ) u s e d b y the users to in-
crease o r decrease their load ( w i n d o w o r rate). W e a b s t r a c t l y
characterize a wide class o f s u c h i n c r e a s e / d e c r e a s e a l g o r i t h m s
a n d c o m p a r e t h e m u s i n g several different p e r f o r m a n c e metrics.
T h e y key metrics are efficiency, fairness, c o n v e r g e n c e time,
a n d size of oscillations.
It is s h o w n t h a t a s i m p l e additive i n c r e a s e a n d multiplicative
d e c r e a s e a l g o r i t h m satisfies t h e sufficient c o n d i t i o n s for c o n -
v e r g e n c e to a n efficient a n d fair state regardless o f the s t a r t i n g
s t a t e o f the network. T h i s is t h e a l g o r i t h m finally c h o s e n for
i m p l e m e n t a t i o n in t h e c o n g e s t i o n a v o i d a n c e s c h e m e r e c o m -
m e n d e d for Digital N e t w o r k i n g A r c h i t e c t u r e a n d OSI T r a n s -
p o r t C l a s s 4 N e t w o r k s .
Keywords. C o m p u t e r N e t w o r k , N e t w o r k P e r f o r m a n c e , Re-
s o u r c e M a n a g e m e n t , C o n g e s t i o n C o n t r o l , C o n g e s t i o n Avoi-
d a n c e , Flow C o n t r o l , Fairness.
N o r t h - H o l l a n d
C o m p u t e r N e t w o r k s a n d I S D N S y s t e m s 17 (1989) 1 - 1 4
Congestion in computer networks is becoming
an important issue due to the increasing mismatch
in link speeds caused by intermixing of old and
new technology. Recent technological advances
D a h - M i n g C h i u received the B.Sc. de-
gree w i t h first class h o n o u r s f r o m I m -
perial College o f Science a n d T e c h n o l -
ogy, L o n d o n U n i v e r s i t y , in 1975, a n d
t h e M.S. a n d Ph.D. degrees f r o m
H a r v a r d U n i v e r s i t y , C a m b r i d g e , MA,
in 1976 a n d 1980 respectively.
F r o m 1979 to 1980, he was with
A T & T Bell L a b o r a t o r i e s , where he
worked o n a p p l y i n g q u e u i n g t h e o r y to
n e t w o r k m o d e l i n g . Since 1980, he h a s
b e e n w i t h Digital E q u i p m e n t C o r p o -
ration, where he h a s worked o n per-
f o r m a n c e m o d e l i n g a n d a n a l y s i s of c o m p u t e r s y s t e m s a n d
n e t w o r k s . C u r r e n t l y , he is a m e m b e r o f the D i s t r i b u t e d Sys-
t e m s A r c h i t e c t u r e g r o u p , where he is working o n the n a m e
service architecture a n d a n a l y z i n g v a r i o u s d i s t r i b u t e d algo-
rithms. His research i n t e r e s t s include o p e r a t i o n s y s t e m s , dis-
t r i b u t e d s y s t e m s , c o m p u t e r networks, p e r f o r m a n c e analysis,
a n d o p t i m i z a t i o n theories.
Dr. C h i u is a m e m b e r o f the A C M a n d the IEEE.
Raj Jain received the B.E. degree f r o m
A.P.S. University, Rewa, India, t h e
M.E. degree f r o m I n d i a n I n s t i t u t e of
Science, Bangalore, India, a n d the
Ph.D. degree f r o m H a r v a r d U n i v e r -
sity, C a m b r i d g e , M A , in 1972, 1974,
a n d 1978, respectively.
H i s P h . D . d i s s e r t a t i o n , e n t i t l e d
C o n t r o l T h e o r e t i c F o r m u l a t i o n o f
O p e r a t i n g S y s t e m s R e s o u r c e M a n a g e -
m e n t Policies, was p u b l i s h e d by G a r -
l a n d Publishing, Inc. o f N e w Y o r k in
their O u t s t a n d i n g D i s s e r t a t i o n s in
t h e C o m p u t e r Sciences series. Since 1978, he h a s b e e n with
Digital E q u i p m e n t C o r p o r a t i o n , where he h a s b e e n involved in
p e r f o r m a n c e m o d e l i n g a n d a n a l y s i s o f a n u m b e r o f c o m p u t e r
s y s t e m s a n d n e t w o r k s i n c l u d i n g V A X Clusters, D E C n e t , a n d
E t h e r n e t . C u r r e n t l y , he is a C o n s u l t i n g E n g i n e e r in the Distrib-
u t e d S y s t e m s A r c h i t e c t u r e a n d P e r f o r m a n c e G r o u p . He s p e n t
t h e 1 9 8 3 - 1 9 8 4 a c a d e m i c y e a r o n a s a b b a t i c a l at the M a s -
s a c h u s e t t s I n s t i t u t e of T e c h n o l o g y d o i n g research on t h e pei--
f o r m a n c e o f n e t w o r k s a n d local area s y s t e m s . F o r three y e a r s
he a l s o t a u g h t a g r a d u a t e c o u r s e on c o m p u t e r s y s t e m s perfor-
m a n c e t e c h n i q u e s at M I T a n d is writing a t e x t b o o k o n this
subject, to be p u b l i s h e d by Wiley-Interscience.
Dr. J a i n is a m e m b e r o f t h e A s s o c i a t i o n for C o m p u t i n g
M a c h i n e r y , a n d a s e n i o r m e m b e r o f IEEE.
0 1 6 9 - 7 5 5 2 / 8 9 / $ 3 . 5 0 © 1989, Elsevier Science P u b l i s h e r s B.V. ( N o r t h - H o l l a n d )
2 D.-M. Chiu, R. Jain / Congestion Avoidance in Computer Networks
Knee Cliff
; ,
Throu- ~
ghput
I
Load
!
RoeSsP- j *
time , ,
I
l l . -
Load
Fig. 1. Network performance as a function of the load. Broken
curves indicate performance with deterministic service and
interarrival times.
such as local area networks (LANs) and fiber
optic LANs have resulted in a significant increase
in the bandwidths of computer network links.
However, these new technologies must coexist with
the old low bandwidth media such as the twisted
pair. This heterogeneity has resulted in a mis-
match of arrival and service rates in the inter-
mediate nodes in the network causing increased
queuing and congestion.
Traditional congestion control schemes help
improve performance after congestion has oc-
curred. Figure 1 shows general patterns of re-
sponse time and throughput of a network as the
network load increases. If the load is small,
throughput generally keeps up with the load. As
the load increases, throughput increases. After the
load reaches the network capacity, throughput
stops increasing. If the load is increased any fur-
ther, the queues start building, potentially result-
ing in packets being dropped. Throughput may
suddenly drop when the load increases beyond
this point and the network is said to be congested.
The response-time curve follows a similar pattern.
At first the response time increases little with
load. When the queues start building up, the re-
sponse time increases linearly until finally, as the
queues start overflowing, the response time in-
creases drastically.
The point at which the packets start getting lost
is called a cliff due to the fact that the throughput
falls off rapidly after this point. We use the term
knee to describe the point after which the increase
in the throughput is small, but when a significant
increase in the response time results.
A scheme that allows the network to operate at
the knee is called a congestion avoidance scheme,
as distinguished from a congestion control scheme
that tries to keep the network operating in the
zone to the left of the cliff. A properly designed
congestion avoidance scheme will ensure that the
users are encouraged to increase their traffic load
as long as this does not significantly affect the
response time, and are required to decrease them
if that happens. Thus, the network load oscillates
around the knee.
Both congestion avoidance and congestion con-
trol mechanisms are dynamic resource manage-
ment problems that can be formulated as system
control problems in which the system senses its
state and feeds this back to its users who adjust
their controls. For the congestion problem, the
state consists of the load on the network and the
control is the number of packets put into the
network by the users. Often a window mechanism
is used in the transport layer protocols to limit the
number of packets put into the network. An alter-
native mechanism consists of setting a limit on the
rate (packets per second or bits per second) that
can be sent by a user. In either case, the control
(window or rate) can be dynamically adjusted as
the total load on the system changes. This control,
which we call the increase/decrease algorithm, is
at the heart of all congestion avoidance mecha-
nisms.
We have investigated a number of congestion
avoidance mechanisms, reported in a series of
papers, and this paper is a part of that series
[7,8,10,11]. The concept of congestion avoidance
and several alternatives are described in [7]. We
chose a particular alternative called the binary
feedback scheme which is described in detail in
[11]. This scheme is later extended in [10] to
include a selective feedback mechanism in which
the routers monitor different users and permit
some users to increase load while requesting others
to decrease load. All of our work on congestion
avoidance is summarized in [8].
D.-M. Chiu, R. Jain / Congestion Avoidance in Computer Networks
T h i s p a p e r c o n c e n t r a t e s o n a detailed analysis
o f the i n c r e a s e / d e c r e a s e algorithms. This analysis
resulted in the selection o f the i n c r e a s e / d e c r e a s e
a l g o r i t h m s used in the b i n a r y f e e d b a c k s c h e m e
p r o p o s e d in [11] a n d [10]. H o w e v e r , the analysis
p r e s e n t e d here is general a n d applies to m a n y
o t h e r a p p l i c a t i o n s besides congestion avoidance.
Briefly, the b i n a r y f e e d b a c k s c h e m e f o r conges-
tion a v o i d a n c e o p e r a t e s as follows. T h e resources
in the n e t w o r k m o n i t o r their usage a n d d e t e r m i n e
if t h e y are l o a d e d b e l o w o r a b o v e a n o p t i m a l l o a d
level. D e p e n d i n g u p o n the l o a d level, the resource
sends a b i n a r y f e e d b a c k (1 = overloaded, 0 =
u n d e r l o a d e d ) to the users w h o then adjust their
l o a d using a n i n c r e a s e / d e c r e a s e algorithm. This
b i n a r y f e e d b a c k is sent b y setting a bit in the
p a c k e t header. T h e use o f a bit in the p a c k e t
h e a d e r as a f e e d b a c k m e c h a n i s m has b e e n incor-
p o r a t e d into the O S I connectionless n e t w o r k i n g
p r o t o c o l s t a n d a r d s [4]. T h e bit is called a c o n g e s -
tion experienced b i t a n d is a p a r t o f a field called
q u a l i t y o f service in the n e t w o r k layer header.
T h e a b s t r a c t m o d e l a s s u m e s t h a t all the users
sharing the s a m e b o t t l e n e c k will receive the s a m e
feedback. Based o n this feedback, the users try to
a d j u s t their l o a d so t h a t the b o t t l e n e c k is effi-
ciently used as well as equally shared b y all users.
I n this a b s t r a c t e d context, we a s s u m e that the
f e e d b a c k a n d c o n t r o l l o o p for all users is s y n c h r o -
nous, t h a t is, all users receive the s a m e f e e d b a c k
a n d react to it; the n e x t f e e d b a c k is t h e n gener-
a t e d a f t e r all users h a v e reacted to the f e e d b a c k
a n d so on. Also, we c o n c e n t r a t e o n o n e b o t t l e n e c k
resource a n d the users t h a t share it. Because o f
these abstractions, we are able to d e m o n s t r a t e
s o m e o f the subtle b e h a v i o r o f this t y p e o f al-
g o r i t h m . T h e results p r e s e n t e d here were verified
b y detailed simulations o f real n e t w o r k s [7,10,11].
A t the o t h e r end o f the s p e c t r u m , we h a v e de-
centralized d e c i s i o n - m a k i n g . I n this case the deci-
sions are m a d e b y the users while the resources
feed i n f o r m a t i o n regarding c u r r e n t resource usage.
A l g o r i t h m s studied b y J a f f e [5] a n d later exten-
sions b y G a f n i [2] a n d Mosely [9] are all g o o d
e x a m p l e s o f this a p p r o a c h .
I n this p a p e r we a n a l y z e a class o f d e c e n t r a l -
ized d e c i s i o n - m a k i n g a l g o r i t h m s t h a t are b a s e d o n
a special f o r m o f feedback, n a m e l y the f e e d b a c k
f r o m the resource is a b i n a r y signal. This b i n a r y
signal indicates w h e t h e r the resource is c u r r e n t l y
o v e r l o a d e d or underutilized. A very g o o d r e a s o n
for considering a b i n a r y f o r m o f f e e d b a c k is the
m o t i v a t i o n o f m a k i n g the c o n t r o l l e r / m a n a g e r o f
the resource as simple a n d efficient as possible.
T h e r e q u i r e m e n t o f a b i n a r y f e e d b a c k o f t e n mini-
mizes the w o r k at the resource in g e n e r a t i n g the
feedback.
1.3. Notations a n d Definitions
Figure 2 shows the a s s u m e d m o d e l o f the net-
w o r k with n users sharing it. T h e c o n g e s t i o n state
o f the s y s t e m is d e t e r m i n e d b y the n u m b e r o f
p a c k e t s in the system. W e a s s u m e a discrete t i m e
o p e r a t i o n with t i m e divided i n t o small slots. T h e s e
slots basically r e p r e s e n t intervals at the b e g i n n i n g
o f which the users set their l o a d level b a s e d o n the
n e t w o r k f e e d b a c k received d u r i n g the p r e v i o u s
interval. I f d u r i n g t i m e slot t, the i t h users l o a d is
x i ( t ), t h e n the total l o a d at the b o t t l e n e c k re-
source would b e T . x , ( t ) , a n d the state o f the
s y s t e m is d e n o t e d b y the n - d i m e n s i o n a l v e c t o r
x ( t ) = { x l ( t ) , x 2 ( t ) . . . . . x n ( t ) } . Since we are o p -
erating at o r n e a r the knee, all resource~ de-
m a n d e d b y the users are g r a n t e d (this is n o t true
at the cliff). Thus, x A t ) d e n o t e s the i t h u s e r s
1.2. P a s t W o r k
T h e a l g o r i t h m s studied here b e l o n g to a class o f
d i s t r i b u t e d a l g o r i t h m s f o r m a n a g i n g d i s t r i b u t e d
resources. A s p e c t r u m o f such d i s t r i b u t e d al-
gorithrns h a v e b e e n studied in the literature. A t
o n e e n d o f the s p e c t r u m , we h a v e centralized
d e c i s i o n - m a k i n g . I n this p a r a d i g m , i n f o r m a t i o n
( a b o u t user d e m a n d s ) flows t o the resource
m a n a g e r s , a n d the decision o f h o w to allocate the
resource is m a d e a t the resource. T h e analysis b y
S a n d e r s [12] is a g o o d e x a m p l e o f this a p p r o a c h .
U s e r 1
U s e r n
Y
r.xi > Xgoat (
N e t w o r k
Fig. 2. A control system model of n users sharing a network.
D. -M. Chit~ R. Jain / Congestion Avoidance in Computer Networks
d e m a n d as well as allocation of the systems re-
sources. D u r i n g the interval, the system de-
termines its load level and sends a binary feed-
back y(t), which is interpreted by the users as
follows:
y ( t ) = { ~ ~ I n c r e a s e l ° a d ,
Decrease load.
The users cooperate with the system and change
(increase of decrease) their demands b y an a m o u n t
ui(t ). Thus,
x i ( t + 1) = x i ( t ) + ui(t ). (1)
The change ui(t ) represents ith users control. It
is a function of the users previous d e m a n d and
the system feedback:
u , ( t ) = f ( x i ( t ) , y ( t ) ) . (2)
In other words,
x i ( t + 1) = x i ( t ) + f ( x i ( t ) , y ( t ) ) .
Notice that the users are not aware of other users
individual d e m a n d s and, thus, c a n n o t make ui(t)
a function o f xj(t), j ~ i. In general, the control
function f ( ) can be any linear or nonlinear func-
tion. However, we will focus first on linear con-
trois. The state equations (1) reduce to
x i ( t + 1)
= ( a l + b i x i ( t ) if y ( t ) = 0 ~ Increase,
a D + bDXi(t ) if y ( t ) = 1 ~ Decrease.
Here, a l, bi, aD, b D are constants. The following
are some examples of the control functions:
(1) Multiplicative Increase/Multiplicative De-
crease:
x i ( t + l ) = [ b l x ~ ( t ) if y ( t ) = 0 ~ Increase,
[bDXi(t ) if y ( t ) = 1 ~ Decrease.
Here, b I > 1 and 0 < b D < 1. All users increase
their d e m a n d s by multiplying their previous de-
m a n d s by a constant factor. The decrease is also
multiplicative.
(2) Additive Increase/Additive Decrease:
x i ( t + 1)
=[al+xi(t ) i f y ( t ) = 0 ~ Increase,
[ a D + x i ( t ) if y ( t ) = 1 ~ Decrease.
Here, a i > 0 and a D < 0. All users increase their
demands b y adding a c o n s t a n t a m o u n t to their
previous demands. The decrease is also a d d i t i v e )
(3) Additive Increase/ Multiplicative Decrease:
x i ( t + 1)
= ( a l + x i ( t ) i f y ( t ) = 0 ~ I n c r e a s e ,
b o x i ( t ) if y ( t ) = 1 ~ Decrease.
The increase is by a constant a m o u n t but the
decrease is by a constant factor.
(4) Multiplicative Increase/Additive Decrease:
x i ( t + 1)
/ b~xi(t ) if y ( t ) = 0 ~ Increase,
\
[ a o + x i ( t ) i f y ( t ) = l ~ D e c r e a s e .
In order to evaluate the effectiveness of these
controls, we next define a set of criteria explicitly
in the next section.
1.4. Criteria for Selecting Controls
The key criteria are: efficiency, fairness, distrib-
utedness, and convergence. We define them for-
mally as follows:
(1) Efficiency: The efficiency of a resource usage
i s defined by the closeness o f the total load o n the
resource to its knee. I f Xgo~ ~ denotes the desired
load level at the knee, then the resource is operat-
ing efficiently as long as the total allocation X ( t )
= F.xi(t ) is close to X~oar Overload ( X ( t ) > Xgoal)
or underload ( X ( t ) < Xgoal) are b o t h undesirable
and are considered inefficient. We consider b o t h
as equally undesirable.
Notice, that efficiency relates only to the total
allocations and thus two different allocations can
b o t h be efficient as long as the total allocation is
close to the goal. The distribution of the total
allocation a m o n g individual users is measured b y
the fairness criterion.
(2) Fairness: The fairness criterion has been
widely studied in the literature. W h e n multiple
users share multiple resources, the maxmin fair-
ness criterion has been widely a d o p t e d [2,3,5,10].
Essentially, the set of users are partitioned into
equivalent classes according to which resource is
their p r i m a r y bottleneck. The maxmin criterion
then asserts that the users in the same equivalent
i It is a s s u m e d t h a t t r u n c a t i o n is a p p l i e d w h e n a D + xi(t ) is
l e s s t h a n z e r o , s o t h a t x i ( t ) w i l l n e v e r b e c o m e n e g a t i v e .
D. -M. Chiu, R. Jain / Congestion Avoidance in Computer Networks
class o u g h t to h a v e the equal share o f the b o t -
tleneck. Thus, a s y s t e m in which x i ( t ) = x j ( t ) V i, j
sharing the s a m e b o t t l e n e c k is o p e r a t i n g fairly. I f
all users d o n o t get exactly equal allocations, the
s y s t e m is less fair a n d we need a n index or a
f u n c t i o n t h a t quantifies the fairness. O n e such
i n d e x is [6]:
F a i r n e s s : F ( x ) - ( E x ) 2
n(r ;i )
This index has the following properties:
(a) T h e fairness is b o u n d e d b e t w e e n 0 a n d 1 (or
0\% a n d 100\%). A totally fair allocation (with
all x i s equal) h a s a fairness o f 1 a n d a
totally u n f a i r allocation (with all resources
given to o n l y o n e user) h a s a fairness o f 1 / n
which is 0 in the limit as n tends to oo.
(b) T h e fairness is i n d e p e n d e n t o f scale, i.e.,
unit o f m e a s u r e m e n t does n o t matter.
(c) T h e fairness is a c o n t i n u o u s function. A n y
slight c h a n g e in allocation shows u p in the
fairness.
(d) I f o n l y k o f n users share the resource
equally with the r e m a i n i n g n - k users n o t
receiving a n y resource, t h e n the fairness is
k / n .
F o r o t h e r p r o p e r t i e s o f this fairness function, see
[61.
(3) Distributedness: T h e next r e q u i r e m e n t t h a t
we p u t o n the c o n t r o l scheme is that it b e distrib-
uted. A centralized s c h e m e requires c o m p l e t e
k n o w l e d g e o f the state o f the system. F o r example,
we m a y w a n t to k n o w e a c h individual u s e r s de-
m a n d or their sum. This i n f o r m a t i o n m a y b e
a v a i l a b l e at the resource. H o w e v e r , c o n v e y i n g this
i n f o r m a t i o n to e a c h a n d every user causes consid-
e r a b l e overhead, especially since a user m a y b e
using several resources a t the s a m e time. W e are
thus p r i m a r i l y interested in c o n t r o l schemes t h a t
c a n b e i m p l e m e n t e d in real n e t w o r k s and, there-
fore, we a s s u m e t h a t the s y s t e m does the m i n i -
m u m a m o u n t o f feedback. I t o n l y tells w h e t h e r it
is u n d e r l o a d e d o r o v e r l o a d e d via the b i n a r y feed-
b a c k bits. O t h e r i n f o r m a t i o n such as Xsoal a n d the
n u m b e r o f users sharing the resource are a s s u m e d
to b e u n k n o w n b y the users. T h i s restricts the set
o f feasible schemes. We, therefore, describe the set
o f feasible schemes with a n d w i t h o u t this restric-
tion.
G o a l
T o t a l
load on
t h e
n e t w o r k
~ e _ _ ~ C R e s p o n s i v e n e s s
~ o o t h n e s s
T i m e
Fig. 3. Responsiveness and smoothness.
(4) Convergence: Finally we require the c o n t r o l
scheme to converge. C o n v e r g e n c e is generally
m e a s u r e d b y the speed with which (or t i m e t a k e n
till) the s y s t e m a p p r o a c h e s the goal state f r o m a n y
starting state. H o w e v e r , d u e t o the b i n a r y n a t u r e
o f the feedback, the s y s t e m d o e s n o t generally
converge to a single s t e a d y state. R a t h e r , the sys-
t e m reaches a n e q u i l i b r i u m in which it oscillates
a r o u n d the o p t i m a l state. T h e t i m e t a k e n to r e a c h
this e q u i l i b r i u m a n d the size o f the oscillations
j o i n t l y d e t e r m i n e the convergence. T h e t i m e de-
termines the responsiveness, a n d the size o f the
oscillations d e t e r m i n e the smoothness o f the c o n -
trol. Ideally, we would like the t i m e as well as
oscillations to b e small. Thus, the c o n t r o l s with
smaller t i m e a n d smaller a m p l i t u d e o f oscillations
are called m o r e responsive a n d m o r e s m o o t h , re-
spectively, as s h o w n in Fig. 3.
1.5. Outline o f this P a p e r
I n this p a p e r , we d e v e l o p a s i m p l e a n d intuitive
m e t h o d o l o g y to explain w h e n a n d w h y a c o n t r o l
converges. W e a d d r e s s the following questions:
W h a t are all the possible solutions that converge to
efficient a n d f a i r states? H o w do we compare those
controls that converge?
T h e p a p e r is o r g a n i z e d as follows. I n Section 2
we will characterize the set o f all linear c o n t r o l s
t h a t c o n v e r g e and, thus, i d e n t i f y the set o f feasible
controls. T h e n we n a r r o w d o w n the feasible set to
a subset t h a t satisfies o u r d i s t r i b u t e d n e s s criterion.
T h e s e subset still includes c o n t r o l s t h a t h a v e un-
a c c e p t a b l e m a g n i t u d e s o f oscillation o r those t h a t
c o n v e r g e t o o slowly. T h e n in Section 3, we discuss
6 D.-M. Chiu, R. Jain / Congestion Avoidance in Computer Networks
h o w to find the subset of feasible distributed
controls that represent the optimal trade-off of
responsiveness and smoothness, as we defined in
convergence. In Section 4, we discuss how the
results extend to nonlinear controls. A n d in the
last section we summarize the results and discuss
some o f the practical considerations (such as sim-
plicity, robustness, and scalability).
2. Feasible Linear Controls
2.1. Vector Representation o f the Dynamics
In determining the set o f feasible controls, it is
helpful to view the system state transitions as a
trajectory through an n-dimensional vector space.
We describe this m e t h o d using a 2-user case, which
can be viewed in a 2-dimensional space.
As shown in Fig. 4, a n y 2-user resource al-
location {Xl(t), x 2 ( t ) } Can be represented as a
point ( x 1, x 2 ) in a 2-dimensional space. In this
figure, the horizontal axis represents allocations to
user 1, and the vertical axis represents allocations
to user 2. All allocations for which x I + x 2 = Xgoa l
are efficient allocations. This corresponds to the
straight line m a r k e d e f f i c i e n c y line. All al-
locations for which x 1 = x 2 are fair allocations.
This corresponds to the straight line m a r k e d f a i r -
ness line. T h e two lines intersect at the poin t
( X goal/2, Xgo~/2 ) that is the optimal point. T h e
goal of control schemes should be to bring the
l Equi-
F a i r n e s s F a i r n e s s
U s e r ~ L m ~ L i n e
2s ~ ~ / /
Alloc- ] ~ / / O v e r l o a d
U s e r l s A l l o c a t i o n x t
Fig. 4. Vector representation of a two-user case.
system to this p o i n t regardless o f the starting
position.
All points below the efficiency line represent an
u n d e r l o a d e d system an d ideally the system
would ask users to increase their load. Consider,
for example, the p o i n t x 0 = (xl0, x20 ). T h e ad-
ditive increase policy o f increasing b o t h users
allocations b y a~ co rresp o n d s to m o v i n g along a
45 ° line. T h e multiplicative increase policy o f
increasing b o t h users allocations b y a f a c t o r b I
c o r r e s p o n d s to moving along the line that con-
nects the origin to the point. Similarly, all points
above the efficiency line represent an o v e r l o a d e d
system an d additive decrease is represented b y a
45 ° line, while multiplicative decrease is rep-
resented b y the line j o i n i n g the p o i n t to the origin.
T h e fairness at a n y p o i n t ( x 1, x 2 ) is given b y
(Xl + x 2 ) 2
Fairness -
2 ( x 2 + x22)
N o t i c e that multiplying b o t h allocations b y a fac-
tor b does n o t change the fairness. T h a t is,
( b x 1, bx2) has the same fairness as ( x 1, x 2 ) fo r all
values o f b. Thus, all points o n the line j o i n i n g a
p o i n t to origin have the same fairness. We, there-
fore, call a line passing t h ro u g h the origin a
equi-fairness line. T h e fairness decreases as the
slope o f the line either increases ab o v e o r de-
creases below the fairness line.
Figure 5 shows a c o m p l e t e t raj ect o ry o f the
two-user system starting f r o m p o i n t x 0 using an
additive i n c r e a s e / m u l t i p l i c a t i v e decrease c o n t r o l
policy. T h e p o i n t x 0 is below the efficiency line
a n d so b o t h users are asked to increase. T h e y d o
so additively b y moving along at an angle o f 45 o.
This brings t h em to x~ which h a p p e n s to b e above
the efficiency line. T h e users are asked to decrease
an d they d o so multiplicatively. This c o r r e s p o n d s
to moving towards the origin o n the line j o i n i n g
x 1 and the origin. This brings t h em to p o i n t x 2,
which h ap p en s to be below the efficiency line an d
the cycle repeats. N o t i c e that x 2 has higher fair-
ness t h an x 0. Thus, with every cycle, the fairness
increases slightly, an d …
CATEGORIES
Economics
Nursing
Applied Sciences
Psychology
Science
Management
Computer Science
Human Resource Management
Accounting
Information Systems
English
Anatomy
Operations Management
Sociology
Literature
Education
Business & Finance
Marketing
Engineering
Statistics
Biology
Political Science
Reading
History
Financial markets
Philosophy
Mathematics
Law
Criminal
Architecture and Design
Government
Social Science
World history
Chemistry
Humanities
Business Finance
Writing
Programming
Telecommunications Engineering
Geography
Physics
Spanish
ach
e. Embedded Entrepreneurship
f. Three Social Entrepreneurship Models
g. Social-Founder Identity
h. Micros-enterprise Development
Outcomes
Subset 2. Indigenous Entrepreneurship Approaches (Outside of Canada)
a. Indigenous Australian Entrepreneurs Exami
Calculus
(people influence of
others) processes that you perceived occurs in this specific Institution Select one of the forms of stratification highlighted (focus on inter the intersectionalities
of these three) to reflect and analyze the potential ways these (
American history
Pharmacology
Ancient history
. Also
Numerical analysis
Environmental science
Electrical Engineering
Precalculus
Physiology
Civil Engineering
Electronic Engineering
ness Horizons
Algebra
Geology
Physical chemistry
nt
When considering both O
lassrooms
Civil
Probability
ions
Identify a specific consumer product that you or your family have used for quite some time. This might be a branded smartphone (if you have used several versions over the years)
or the court to consider in its deliberations. Locard’s exchange principle argues that during the commission of a crime
Chemical Engineering
Ecology
aragraphs (meaning 25 sentences or more). Your assignment may be more than 5 paragraphs but not less.
INSTRUCTIONS:
To access the FNU Online Library for journals and articles you can go the FNU library link here:
https://www.fnu.edu/library/
In order to
n that draws upon the theoretical reading to explain and contextualize the design choices. Be sure to directly quote or paraphrase the reading
ce to the vaccine. Your campaign must educate and inform the audience on the benefits but also create for safe and open dialogue. A key metric of your campaign will be the direct increase in numbers.
Key outcomes: The approach that you take must be clear
Mechanical Engineering
Organic chemistry
Geometry
nment
Topic
You will need to pick one topic for your project (5 pts)
Literature search
You will need to perform a literature search for your topic
Geophysics
you been involved with a company doing a redesign of business processes
Communication on Customer Relations. Discuss how two-way communication on social media channels impacts businesses both positively and negatively. Provide any personal examples from your experience
od pressure and hypertension via a community-wide intervention that targets the problem across the lifespan (i.e. includes all ages).
Develop a community-wide intervention to reduce elevated blood pressure and hypertension in the State of Alabama that in
in body of the report
Conclusions
References (8 References Minimum)
*** Words count = 2000 words.
*** In-Text Citations and References using Harvard style.
*** In Task section I’ve chose (Economic issues in overseas contracting)"
Electromagnetism
w or quality improvement; it was just all part of good nursing care. The goal for quality improvement is to monitor patient outcomes using statistics for comparison to standards of care for different diseases
e a 1 to 2 slide Microsoft PowerPoint presentation on the different models of case management. Include speaker notes... .....Describe three different models of case management.
visual representations of information. They can include numbers
SSAY
ame workbook for all 3 milestones. You do not need to download a new copy for Milestones 2 or 3. When you submit Milestone 3
pages):
Provide a description of an existing intervention in Canada
making the appropriate buying decisions in an ethical and professional manner.
Topic: Purchasing and Technology
You read about blockchain ledger technology. Now do some additional research out on the Internet and share your URL with the rest of the class
be aware of which features their competitors are opting to include so the product development teams can design similar or enhanced features to attract more of the market. The more unique
low (The Top Health Industry Trends to Watch in 2015) to assist you with this discussion.
https://youtu.be/fRym_jyuBc0
Next year the $2.8 trillion U.S. healthcare industry will finally begin to look and feel more like the rest of the business wo
evidence-based primary care curriculum. Throughout your nurse practitioner program
Vignette
Understanding Gender Fluidity
Providing Inclusive Quality Care
Affirming Clinical Encounters
Conclusion
References
Nurse Practitioner Knowledge
Mechanics
and word limit is unit as a guide only.
The assessment may be re-attempted on two further occasions (maximum three attempts in total). All assessments must be resubmitted 3 days within receiving your unsatisfactory grade. You must clearly indicate “Re-su
Trigonometry
Article writing
Other
5. June 29
After the components sending to the manufacturing house
1. In 1972 the Furman v. Georgia case resulted in a decision that would put action into motion. Furman was originally sentenced to death because of a murder he committed in Georgia but the court debated whether or not this was a violation of his 8th amend
One of the first conflicts that would need to be investigated would be whether the human service professional followed the responsibility to client ethical standard. While developing a relationship with client it is important to clarify that if danger or
Ethical behavior is a critical topic in the workplace because the impact of it can make or break a business
No matter which type of health care organization
With a direct sale
During the pandemic
Computers are being used to monitor the spread of outbreaks in different areas of the world and with this record
3. Furman v. Georgia is a U.S Supreme Court case that resolves around the Eighth Amendments ban on cruel and unsual punishment in death penalty cases. The Furman v. Georgia case was based on Furman being convicted of murder in Georgia. Furman was caught i
One major ethical conflict that may arise in my investigation is the Responsibility to Client in both Standard 3 and Standard 4 of the Ethical Standards for Human Service Professionals (2015). Making sure we do not disclose information without consent ev
4. Identify two examples of real world problems that you have observed in your personal
Summary & Evaluation: Reference & 188. Academic Search Ultimate
Ethics
We can mention at least one example of how the violation of ethical standards can be prevented. Many organizations promote ethical self-regulation by creating moral codes to help direct their business activities
*DDB is used for the first three years
For example
The inbound logistics for William Instrument refer to purchase components from various electronic firms. During the purchase process William need to consider the quality and price of the components. In this case
4. A U.S. Supreme Court case known as Furman v. Georgia (1972) is a landmark case that involved Eighth Amendment’s ban of unusual and cruel punishment in death penalty cases (Furman v. Georgia (1972)
With covid coming into place
In my opinion
with
Not necessarily all home buyers are the same! When you choose to work with we buy ugly houses Baltimore & nationwide USA
The ability to view ourselves from an unbiased perspective allows us to critically assess our personal strengths and weaknesses. This is an important step in the process of finding the right resources for our personal learning style. Ego and pride can be
· By Day 1 of this week
While you must form your answers to the questions below from our assigned reading material
CliftonLarsonAllen LLP (2013)
5 The family dynamic is awkward at first since the most outgoing and straight forward person in the family in Linda
Urien
The most important benefit of my statistical analysis would be the accuracy with which I interpret the data. The greatest obstacle
From a similar but larger point of view
4 In order to get the entire family to come back for another session I would suggest coming in on a day the restaurant is not open
When seeking to identify a patient’s health condition
After viewing the you tube videos on prayer
Your paper must be at least two pages in length (not counting the title and reference pages)
The word assimilate is negative to me. I believe everyone should learn about a country that they are going to live in. It doesnt mean that they have to believe that everything in America is better than where they came from. It means that they care enough
Data collection
Single Subject Chris is a social worker in a geriatric case management program located in a midsize Northeastern town. She has an MSW and is part of a team of case managers that likes to continuously improve on its practice. The team is currently using an
I would start off with Linda on repeating her options for the child and going over what she is feeling with each option. I would want to find out what she is afraid of. I would avoid asking her any “why” questions because I want her to be in the here an
Summarize the advantages and disadvantages of using an Internet site as means of collecting data for psychological research (Comp 2.1) 25.0\% Summarization of the advantages and disadvantages of using an Internet site as means of collecting data for psych
Identify the type of research used in a chosen study
Compose a 1
Optics
effect relationship becomes more difficult—as the researcher cannot enact total control of another person even in an experimental environment. Social workers serve clients in highly complex real-world environments. Clients often implement recommended inte
I think knowing more about you will allow you to be able to choose the right resources
Be 4 pages in length
soft MB-920 dumps review and documentation and high-quality listing pdf MB-920 braindumps also recommended and approved by Microsoft experts. The practical test
g
One thing you will need to do in college is learn how to find and use references. References support your ideas. College-level work must be supported by research. You are expected to do that for this paper. You will research
Elaborate on any potential confounds or ethical concerns while participating in the psychological study 20.0\% Elaboration on any potential confounds or ethical concerns while participating in the psychological study is missing. Elaboration on any potenti
3 The first thing I would do in the family’s first session is develop a genogram of the family to get an idea of all the individuals who play a major role in Linda’s life. After establishing where each member is in relation to the family
A Health in All Policies approach
Note: The requirements outlined below correspond to the grading criteria in the scoring guide. At a minimum
Chen
Read Connecting Communities and Complexity: A Case Study in Creating the Conditions for Transformational Change
Read Reflections on Cultural Humility
Read A Basic Guide to ABCD Community Organizing
Use the bolded black section and sub-section titles below to organize your paper. For each section
Losinski forwarded the article on a priority basis to Mary Scott
Losinksi wanted details on use of the ED at CGH. He asked the administrative resident