Probability and Statistics for Computer Science
Abstract
An understanding of probability and statistics is an essential tool for a modern computer scientist. If your tastes run to
theory, then you need to know a lot of probability (e.g., to understand randomized algorithms, to understand the probabilistic
method in graph theory, to understand a lot of work on approximation, and so on) and at least enough statistics to bluff
successfully on occasion. If your tastes run to the practical, you will find yourself constantly raiding the larder of statistical
techniques (particularly classification, clustering, and regression). For example, much of modern artificial intelligence is built
on clever pirating of statistical ideas. As another example, thinking about statistical inference for gigantic datasets has had a
tremendous influence on how people build modern computer systems.
Computer science undergraduates traditionally are required to take either a course in probability, typically taught by
the math department, or a course in statistics, typically taught by the statistics department. A curriculum committee in my
department decided that the curricula of these courses could do with some revision. So I taught a trial version of a course, for
which I wrote notes; these notes became this book. There is no new fact about probability or statistics here, but the selection
of topics is my own; I think it’s quite different from what one sees in other books.
The key principle in choosing what to write about was to cover the ideas in probability and statistics that I thought every
computer science undergraduate student should have seen, whatever their chosen specialty or career. This means the book is
broad and coverage of many areas is shallow. I think that’s fine, because my purpose is to ensure that all have seen enough
to know that, say, firing up a classification package will make many problems go away. So I’ve covered enough to get you
started and to get you to realize that it’s worth knowing more.
The notes I wrote have been useful to graduate students as well. In my experience, many learned some or all of this
material without realizing how useful it was and then forgot it. If this happened to you, I hope the book is a stimulus to your
memory. You really should have a grasp of all of this material. You might need to know more, but you certainly shouldn’t
know less.