CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson

CS 170 HW 10

Due 2019-04-13, at 10:00 pm

1 Study Group

List the names and SIDs of the members in your study group. If you have no collaborators,

you must explicitly write none.

2 Opting for releasing your solutions

We are considering releasing a subset of homework submissions written by students for stu-

dents to see what a full score submission looks like. If your homework solutions are well

written, we may consider releasing your solution. If you wish that your solutions not be

released, please respond to this question with a ”No, do not release any submission to any

problems”. Otherwise, say ”Yes, you may release any of my submissions to any problems”.

3 Zero-Sum Games

Alice and Bob are playing a zero-sum game whose payoff matrix is shown below. The ijth

entry of the matrix shows the payoff that Alice receives if she plays strategy i and Bob plays

strategy j. Alice is the row player and is trying to maximize her payoff, and Bob is the

column player trying to minimize Alice’s payoff.

Alice \Bob 1 2

1 4 1

2 2 5

Now we will write a linear program to find a strategy that maximizes Alice’s payoff. Let the

variables of the linear program be x1, x2 and p, where xi is the probability that Alice plays

strategy i and p denotes Alice’s payoff.

(a) Write the linear program for maximizing Alice’s payoff. (Hint : You should think of set-

ting up the constraints of the program such that it finds the best worst case strategy. This

would depend on the strategy Bob plays assuming he knows what Alice’s (probabilistic)

strategy is.)

(b) Eliminate x2 from the linear program and write it in terms of p and x1 alone.

(c) Draw the feasible region of the above linear program in p and x1. You are encouraged to

use a plotting software for this.

(d) Write a linear program from Bob’s perspective trying to minimizing Alice’s payoff. Let

the variables of the linear program be y1, y2 and p, where yi is the probability that Bob

plays strategy i and p denotes Alice’s payoff.

(e) What is the optimal solution and what is the value of the game?

1

CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson

4 Zero-Sum Battle

Two Pokemon trainers are about to engage in battle! Each trainer has 3 Pokemon, each of

a single, unique type. They each must choose which Pokemon to send out first. Of course

each trainer’s advantage in battle depends not only on their own Pokemon, but on which

Pokemon their opponent sends out.

The table below indicates the competitive advantage (payoff) Trainer A would gain (and

Trainer B would lose). For example, if Trainer B chooses the fire Pokemon and Trainer A

chooses the rock Pokemon, Trainer A would have payoff 2.

Trainer B:

ice water fire

dragon -10 3 3

Trainer A: steel 4 -1 -3

rock 6 -9 2

Feel free to use an online LP solver to solve your LPs in this problem.

Here is an example of a Python LP Solver and its Tutorial.

1. Write an LP to find the optimal strategy for Trainer A. What is the optimal strategy

and expected payoff?

2. Now do the same for Trainer B. What is the optimal strategy and expected payoff?

5 How to Gamble With Little Regret

Suppose that you are gambling at a casino. Every day you play at a slot machine, and your

goal is to minimize your losses. We model this as the experts problem. Every day you must

take the advice of one of n experts (i.e. a slot machine). At the end of each day t, if you take

advice from expert i, the advice costs you some cti in [0, 1]. You want to minimize the regret

R, defined as:

R =

1

T

(

T∑

t=1

cti(t) − min1≤i≤n

T∑

t=1

cti

)

where i(t) is the expert you choose on day t. Your strategy will be probabilities where pti

denotes the probability with which you choose expert i on day t. Assume an all powerful

adversary (i.e. the casino) can look at your strategy ahead of time and decide the costs

associated with each expert on each day. Let C denote the set of costs for all experts and all

days. Compute maxC(E[R]), or the maximum possible (expected) regret that the adversary

can guarantee, for each of the following strategies.

(a) Choose expert 1 at every step, i.e. pt1 = 1 for all t.

(b) Any deterministic strategy, i.e. for each t, there exists some i such that pti = 1.

(c) Always choose an expert according to some fixed probability distribution at every time

step. That is, fix some p1, . . . , pn, and for all t, set p

t

i = pi.

What distribution minimizes the regret of this strategy? In other words, what is

argminp1,...,pn maxC(E[R])?

2

CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson

This analysis should conclude that a good strategy for the problem must necessarily be

randomized and adaptive.

6 Solving Linear Programs using Multiplicative Weights

In this problem, we will develop an algorithm to approximately solve linear programs using

multiplicative weights. For simplicity, we will restrict our attention to a specific kind of linear

programs given as follows:

Given vectors aj for j = 1, . . . ,m in Rn, the linear program on variables x = (x1, . . . , xn) is

given by:

max(0)

for all j = 1, . . . ,m : 〈aj ,x〉 ≤ 0

n∑

i=1

xi = 1

for all i = 1, . . . , n : xi ≥ 0

Here 〈x,y〉 denotes the dot product between vectors, specifically 〈x,y〉 =

n∑

i=1

xi · yi where

xi, yi refers to the i

th components of the vector x,y.

We will now use multiplicative weights update towards solving our linear program. The idea

would be to execute multiplicative weights update against an appropriate adversary (sequence

of losses) and let the weights of the algorithm determine the solution x. Formally, the rough

outline of our algorithm to solve the above LP is as follows:

For t = 1 to a sufficiently large number T

1. Multiplicative weights update will return a current probability

distribution x(t)

2. If x(t) approximately satisfies the LP, return x(t).

3. Adversary A will present a loss vector `(t) ∈ Rn.

4. Multiplicative weights algorithm will incur a loss of 〈`(t),x(t)〉 and

update its weights.

Return x(T )

We will design an algorithm that will act as the adversary A, so that x(T ) is an approximate

solution to the LP.

Recall that the multiplicative weights algorithm obeys the following regret bound:

3

CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson

Theorem. The multiplicative weights algorithm starts with an iterate x(1) ∈ Rn, and

then suffers a sequence of loss vectors `(1), . . . , `(T ) ∈ Rn such that `(t)i ∈ [−1, 1]. It then

produces a sequence of iterates x(2), . . . ,x(T ) such that for some 0 < ≤ 12 ,

T∑

t=1

〈`(t),x(t)〉 ≤ min

1≤i≤n

(

T∑

t=1

`

(t)

i

)

+ 2

(

T +

log(n)

)

(a) Let p = (p1, . . . , pn) be a probability distribution over {1, . . . , n}, i.e., pi ≥ 0 and∑n

i=1 pi = 1. For any vector v ∈ Rn prove that,

n

min

i=1

vi ≤

n∑

i=1

pivi

(In words, the minimum is smaller than the average under every distribution)

(b) We will now observe an important property of multiplicative weights. The algorithm not

only has small regret against any fixed best expert, but also has small regret against any

fixed probability distribution over experts.

Let p∗ = (p∗1, . . . , p∗n) be a probability distribution on {1, . . . , n}, i.e. p∗i ≥ 0 for each

1 ≤ i ≤ n and ∑ni=1 p∗i = 1. Using the bounds in the regret of the multiplicative weights

algorithm mentioned above, prove that

T∑

t=1

(

〈`(t),x(t)〉 − 〈`(t),p∗〉

)

≤ 2

( log(n)

+ T

)

(c) At the tth iteration, the probability distribution x(t) output by multiplicative weights

update is a candidate solution to our LP. But it is likely to violate some of the constraints

of the LP, namely 〈ai,x(t)〉 ≤ 0.

Describe how to implement the adversaryA to nudge the multiplicative weights algorithm

towards a feasible solution to the LP.

Hint : What loss vector should the adversary A pick in order to penalize the multiplicative

weights algorithm the most for violations?

(d) Let us call a solution x to be η-approximate solution to the LP if it satisfies all the

constraints within an error of η where η ≤ 2, i.e.,

〈ai,x〉 ≤ η

.

4

CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson

Assume that the LP is feasible in that there exists some probability distribution x∗ that

satisfies all the constraints. For simplicity, let us assume that all the coefficients in our

LP, |(aj)i| ≤ 1. Show that if multiplicative weights uses = η4 then after T = 16 log(n)η2

iterations, the distribution x(T ) is an η-approximate solution.

Hint : In every iteration t, if the current distribution x(t) violates a constraint by more

than η, then it accumlates regret for not being the feasible solution x∗

7 Restoring the Balance!

We are given a network G = (V,E) whose edges have integer capacities ce, and a maximum

flow f from node s to node t. Explicitly, f is given to us in the representation of integer flows

along every edge e, (fe).

However, we find out that one of the capacity values of G was wrong: for edge (u, v), we used

cuv whereas it really should have been cuv − 1. This is unfortunate because the flow f uses

that particular edge at full capacity: fuv = cuv. We could run Ford Fulkerson from scratch,

but there’s a faster way.

Describe an algorithm to fix the max-flow for this network in O(|V |+ |E|) time. Also give a

proof of correctness and runtime justification.