Exercise 1: Probability distributions

In this exercise, you will simulate probability distributions, and use the probability distributions available in scipy.

Solve the tasks described below. Write a short report containing your answers, including the plots. Send the report and your Python code by email to the course instructor (richard.johansson -at- gu.se). If you wish, your solution to the first question can be submitted on paper directly.

NB: submit your answers individually. You are allowed to discuss with your fellow students, but not write code together.

Deadline: January 30

References

In Lecture 1, slides 17–22, we saw how to plot histograms, and how to compute the mean, variance and standard deviation of a dataset.
In Lecture 2, we defined most of the notions we are using in this exercise.
In Lecture 3, we saw how to use random variables in scipy
Reference documentation for the plotting library.
Reference documentation for scipy's statistical functions and random variables.

Task 1: Thinking about distributions

Think of the following scenarios and draw the probability mass function (pmf) of the corresponding random variables. A rough sketch on paper is enough.

the first digit in the registration number of a random car in Gothenburg
the number of questions in a 5-question exam correctly answered by a random student
the length of a randomly selected sentence in English

Task 2: Coin-tossing experiments

Make a new Python file that starts with the following imports:

from matplotlib import pyplot as plt
import random
import scipy
import scipy.stats

We define a function that tosses an uneven coin and returns 'heads' or 'tails' depending on the outcome:

def coin_toss(p_heads):
    if random.random() <= p_heads:
        return 'heads'
    else:
        return 'tails'

Next, we make another function that simulates an experiment where we toss the coin a number of times, and count how many times we got 'heads'.

def count_heads(p_heads, n_toss):
    tosses = [ coin_toss(p_heads) for _ in range(n_toss) ]
    return tosses.count('heads')

Write a function that calls count_heads several times, and collects the result of all the calls in a list. Then print the mean and standard deviation of the experiments, and plot a histogram of the results (using first plt.hist and then either plt.show or plt.savefig).

Run your function four times, where you call count_heads 10, 100, 1000, and 10000 times, respectively. Use p_heads=0.7 and n_toss=20.

Hint 1: If your histogram is ugly, increase the parameter bins in the plt.hist function. I used a value of 100.

Hint 2: To make the plots easier to compare, you can adjust the x and y axes:

plt.axis([-1, n_toss+1, 0, n_experiments])

Here, n_experiments is the number of times you have called count_heads.

Hint 3: You can use plt.xlabel and plt.ylabel to add text to the x and the y axis, respectively.

plt.xlabel('Number of heads')
plt.ylabel('Frequency')

Task 3: The binomial distribution

Make a binomially distributed random variable using the same parameters n_toss and p_heads as above. This r.v. is a mathematical model of the coin-tossing experiment.

rv = scipy.stats.binom(n_toss, p_heads)

We will now plot the pmf for the coin-tossing experiment. This is similar to what I did for the die roll r.v., on slide 9 in my lecture.

First, what are the possible outcomes we could get in an experiment where we toss a coin 20 times and count the number of times we get 'heads'? Make a list of all these possible outcomes.

Then compute the pmf for all the possible outcomes of the coin-tossing experiment. Finally plot the result using a bar plot:

outcomes = (... a list of all the possible results of a coin-tossing experiment ...)
pmf_for_outcomes = (... the probability for each of those possible results ...)

plt.bar(outcomes, pmf_for_outcomes, width=0.1)
plt.axis([-1, n_toss+1, 0, 1])

In addition, print the mean and standard deviation of this random variable. Did your simulations in Task 2 give you reasonable results compared to what you get now?

Task 4: Calculations with the binomial distribution

Compute the following probabilities. Use the binomial random variable rv to do the calculations, but to better understand what you are doing it can also be useful to explain the calculations in terms of the plot you made in Task 3.

the probability that you get 'heads' 10 times in the coin-tossing experiment
... that you get 'heads' at most 10 times
... that you get 'heads' more than 10 times
... that you get 'heads' 6–12 times

Finally, compute the 5% percentile of the coin-tossing experiment: the number N such that in 5% of all experiments, the number of heads is N or lower.