Assignment 2: Classifier implementation

Note: After your questions, I wrote a document that tries to clarify the notation of the paper and explain how we get from the mathematical model to the algorithm in the pseudocode. Please tell me if there is anything you would like to see added to this document.

In this assignment, you will implement an algorithm to train a support vector classifier. The main objectives of this assignment are first that you should get some experience of the practical considerations of implementing machine learning algorithms, and secondly that you should get a taste of how a typical machine learning paper looks.

Write Python code to implement the algorithm. Please also try to answer the exercise question. Send the code and the answer to the exercise by email to the course instructor (richard.johansson -at- gu.se). This assignment is solved individually.

Deadline: September 24

Preliminaries

Repeat the material from the lecture on linear classifiers.

Have a look at the paper Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, Shalev-Shwartz et al. (2010). You can skip the mathematical details if you prefer, but please read the introduction and make sure you understand the pseudocode in Figure 1, which is what you will implement in this assignment. Here are some clarifications about the pseudocode:

The authors use angle brackets for dot products: <w, x> means w·x
S is the training set and |S| is the size of the training set.
T is the number of steps in the algorithm. (This is a bit different from our perceptron, where we specified the number of times to process the whole training set.)
λ is a regularization parameter that controls the tradeoff between keeping the classifier simple (i.e. the weights small) and fitting the training set. See slide 41 in the lecture.
The optional line is there for theoretical reasons and can be ignored.

Exercise question

We are developing a simple system to predict the weather based on our location and the season. Here is a training set.

City:      Month:    | Weather:
--------------------------------
Gothenburg July      | rain
Gothenburg December  | rain
Paris      July      | sun
Paris      December  | rain

We build the system by training a perceptron from scikit-learn on this training set. We then test the classifier on the same set. As expected, the classifier correctly classifies all the instances in this set; after all, it's the same set we used to train it.

Now, we repeat this experiment using another training set. We train the classifier using the following examples and evaluate it using the same examples.

City:      Month:    | Weather:
--------------------------------
Sydney     July      | rain
Sydney     December  | sun
Paris      July      | sun
Paris      December  | rain

When we use this set, for some strange reason our classifier performs poorly! We can't improve it by switching to an SVM. Do you have an idea what's going on?

Here is the Python code if you want to try the experiment yourself.

Introduction

Download and unpack this zip file. The package contains the code presented during the lecture (ml_lecture3.py). It also contains a program (experiment.py) that trains and evaluates the perceptron implementation using the review dataset that we've used previously.

Run the experiment code and make sure it works on your machine.

Hint 1.The perceptrons with the dense vectors might cause some memory problems on some machines. In case this happens, reduce the number of features as described in the code.)

Your task

Implement the Pegasos algorithm for training SVMs by converting the pseudocode in Figure 1 into proper Python. Test your implementation by using your classifier in assignment2_experiment in experiment.py. If you want, you can use the perceptron implementations from the lecture as the starting point. If you are aiming for the VG grade (see below), your implementation should handle sparse vectors; otherwise, it is up to you whether you use dense or sparse vectors.

You can try to find good values for the parameters T and λ. In my experiments, I set T to 10 times the size of the training set (roughly equivalent of 10 perceptron iterations) and λ to 0.01, which gave a classification accuracy that was slightly higher than that of the perceptron.

Extra tasks for VG

Speeding up the scaling operation

First, make sure that your code can handle sparse feature vectors.

At some places in the pseudocode, the whole weight vector is scaled by some factor. This can be a slow operation if our training set consists of high-dimensional, sparse vectors. Read section 2.4 about a trick for speeding up the scaling operation. Implement the learning algorithm so that it uses this trick.

Logistic regression

As we saw in the lecture, the logistic regression classifier can be stated similarly to the SVM. The difference is just in which loss function is used: the SVM uses the hinge loss and LR the log loss.

Take a look at the table on page 15 in the Pegasos paper. The first line shows the hinge loss and the corresponding update operation that is carried out in each step in the algorithm.

Implement the log loss (the second line) as well. Train a classifier and see how well it works compared to your previous classifier. Your implementation should work with sparse feature vectors, and use the scaling trick mentioned above.