This was a nice creative piece of work. It was a bit light on the "combinatorial" components of the course, but a lot of fun nevertheless. It was also well and clearly written.

You assert tight theoretical bounds cannot be given under an adversarial model. What does this mean-that there is no bound, or that you don't know how to prove one?

A natural generalization of your problem would let you "hold" up to h bandits at any one time (your "storage space") and pull any one of them; alternatively you could evict one and replace it with a new one.

It's not specifically bad to be using a max-expectation objective, but you don't really explain why you chose *not* to consider competitive analysis. It's true that you have a high probability bound on the value of OPT, so you are comparing to what usually happens, but it's interesting to ask how you would do if an unlikely set of inputs arose (no high-value bandit for example).

I'm a bit uneasy about lemma 2.1 claiming "increasing" when all the is obvious to me is "nondecreasing".

motivated by the secretary problem, another way to measure the quality of your answer would be in terms of its rank among the arms. How close to rank 1 do your bandit algorithms get?

Another potential tails strategy is to flip a coin until you get exactly t tails, then move on. This decreases your chance of abandoning a good coin, and I don't think it will dramatically increase your chances of staying on a bad coin.

finding good metrics in your work is tricky because the optimum is so close to 1. I would have been inclined to measure "loss", ie difference from 1, as the default measure.

On question you didn't answer is how much loss you have in the standard bernoulli bandit-how much does not going back cost you?

This also suggests another generalization: suppose you can go back to each bandit some fixed number of times>1. Then how do you do?

---

Great job! The writeup was a bit on the long side (and using small font did not help with readability) but it was really polished and mature. My only (minor) comment here would be to avoid stating algorithms within the proofs of theorems, which is something you tend to do sometimes. It is better to describe the algorithm explicitly outside the theorem, and then only analyze its performance in the theorem itself.

In general, it is evident you really dug into this problem, and it was satisfying to see how you were building a global understanding of the problem from working out the underlying subproblems. Really nice work.

I guess my only real complaint is that, in the end, this project had only a very tangential relationship to the material we covered in the class. Beyond the high-level connection to the online algorithms and the notion of regret, it does not seem that anything we talked about in this class was of much help or relevance here. This is a bit too bad, as, for instance, there is a very nice optimization based framework for understanding multi-armed bandit problem, and I was hoping you might explore that direction a bit.

Still, this is a very nice project, and I think it might be worth it to check the existing literature to see how novel this model really is. Also, trying to apply the general multi-armed bandit machinery to this problem (in particular, the Bayesian approaches) could be interesting too. Let me know if you need help with that.

Grade: A