How to have fun with computing in your hobby time

Tag Archives: statistics


I was following the fantastic Harvard online course stats 110 . In the first two classes, they talked about how to “count” things, and more importantly, how to properly formulate a counting problem into one of several “patterns”.
A lot of counting problem (at least in statistics) come in a form that we are choosing/sampling objects from a population. One of the points they made is that when you think about a counting problem, it helps if you can ask three questions in mind:

(1) Is the sampling process with or without replacement? A sampling with replacement is like you borrow a book from a library, and you return the book, another one can check out the book again.

(2) Does the order of objects in your sample matter? In other words, do you visualize your sample as a set of objects or an ordered list of them?

(3) Are the objects in the population or your sample distinguishable from each other? Sometime it is hard to see why we need to consider a situation where the objects are distinguishable from each other, since in practice it seems that we can always distinguish them by labeling them. However, certain counting problems can be solved more easily if we put it in a way that objects in the context are assumed indistinguishable, those are usually not the “real objects” (but the “imagery ones”). We will see an example below.

Based on different combinations of those constraints, people have categorized some counting problems into different patterns, which is dubbed in the class as “sampling table”, or more generally accepted as ball picking patterns.

pattern name symbol w/o replacement? order matters? distinguishable? example
string n^{k} with replacement ordered samples yes all possible strings of length k from alphabet
permutation P_{n}^{k}=\frac{n!}{(n-k)!} without replacement ordered samples yes pick a hand of k cards from a deck of n
multiset/multichoose \begin{pmatrix}  n+k-1\\  k  \end{pmatrix} with replacement unordered samples yes pick one friend to go out from n contacts for k times
combination \begin{pmatrix}  n\\  k  \end{pmatrix} without replacement unordered samples yes pick k friends from n contacts to go out at the same time

The first thing to notice is that in all these patterns the objects are considered to be distinguishable – that is the case when we count things in real world! For example, if the objects are indistinguishable, the number of ways forming a k-len string will always be 1!  Among the four patterns, “multichoose” is of special interest because unlike the other three which can be easily “seen” by using rule of product, the proof of multichoose need some tricks. Intuitively the formula says if replication is allowed, choosing a set of k objects from n distinguishable objects is like choosing k objects from (n+k-1) without replication. That makes sense because it is like whenever you pick an object from the pool, we will clone the ball and add it back into the pool again – so there are always n objects in the pool, even though we have “stealthily” added k-1 objects during the choosing process (you dont have to add it back after you choose the last one). But, how do we verify this?

The proof of multichoose is a good example of using “imaginary indistinguishable objects” for counting problem. The trick is to think the problem “inside-out” – an alternative to picking k objects out of n is to put k tags on all n objects. So when we choose an object from n, we put a tag on it. Multiple-choose means we can put multiple tags on a single object, at minimum 0 times and maximum k times, and they add up to k. The next step will be to imagine putting k indistinguishable tags into n distinguishable objects. We need the tags to be indistinguishable simply because what we care about is actually how to divide a sum of integer k into n parts, instead of their permutations or combinations. Now we have another pattern here – partitioning of indistinguishable objects. The answer is simple, for k indistinguishable objects we code them as a string “xxxxxxxx”, putting them into n boxes just means we need to put (n-1) delimiters in between them. One example for k = 10 and n = 3 will be “xxx|xxxxxx|x” or “xxxxxxxxx|x|”. Note in the second example we have zero objects in box 3 – that is fine in our case. The string of a mix of “x” and “|” has a length of (n-1+k), so the partitioning problem itself is equivalent to choosing (n-1) position out of (n-1+k), which is exactly \begin{pmatrix}  n+k-1\\  n-1  \end{pmatrix} , as the same as the number given in the table above.

The last thing that is worth mentioning is some simple applications of the counting patterns above, for each example, it is easy to see that the left hand side of equation is equivalent to the right hand side, with simple use of above patterns. e.g.

1. proof of n\times \begin{pmatrix}  n-1\\  k-1  \end{pmatrix}=k\times \begin{pmatrix}  n\\  k  \end{pmatrix} On the left side it is equivalent to picking one object out of N first, and then picking k-1 objects out of remaining N-1. the order of objects does not matter in the second step, but the order of step 1 and 2 does matter. On the right side, it is an unordered sampling of k objects out of N (combination pattern), after that, we pick 1 object out of k (another combination). We use the product rule to glue steps together (because they are independent).
2. \begin{pmatrix}  n+m\\   k  \end{pmatrix}=\sum_{i=0}^{k} \begin{pmatrix}  n\\   i  \end{pmatrix}\times \begin{pmatrix}  m\\   k-i  \end{pmatrix} The left hand side is the combination pattern, the right hand side is the combination pattern used together with product and sum rule.



Design a site like this with WordPress.com
Get started