In this assignment, you will implement two algorithms to train classifiers: support vector machine and logistic regression. The main objectives of this assignment are first that you should get some experience of the practical considerations of implementing machine learning algorithms, and secondly that you should get a taste of how a typical machine learning paper looks.
Write Python code to implement the algorithms. Please also try to answer the exercise question. Send the code and the answer to the exercise by email to the lab instructor (luis.nieto.pina -at- svenska.gu.se). This assignment is solved individually.
Deadline: September 24
Repeat the material from the lecture about linear classifiers (in particular the perceptron code near the end), and the the lecture about SVM and LR.
Have a look at the paper Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, Shalev-Shwartz et al. (2010). You're free to skip the mathematical details, but please read the introduction and make sure you understand the pseudocode in Figure 1, which is what you will implement in this assignment. Here are some clarifications about the pseudocode:
In addition to the paper, I have written a separate document that explains the algorithm in more detail, and uses a notation more similar to what we have seen previously.
We are developing a simple system to predict the weather based on our location and the season. Here is a training set.
City: Month: | Weather: -------------------------------- Gothenburg July | rain Gothenburg December | rain Paris July | sun Paris December | rain
We build the system by training a perceptron from scikit-learn on this training set. We then test the classifier on the same set. As expected, the classifier correctly classifies all the instances in this set; after all, it's the same set we used to train it.
Now, we repeat this experiment using another training set. We train the classifier using the following examples and evaluate it using the same examples.
City: Month: | Weather: -------------------------------- Sydney July | rain Sydney December | sun Paris July | sun Paris December | rain
When we use this set, for some strange reason our classifier performs poorly! We can't improve it by switching to an SVM. Do you have an idea what's going on?
Here is the Python code if you want to try the experiment yourself.
Download and unpack this zip file.
The package contains the perceptron code presented during
the second lecture (
lecture2.py). It also contains a program
that trains and evaluates
the perceptron implementation using the review dataset that we've used
Run the experiment code and make sure it works on your machine.
Implement the Pegasos algorithm for training SVMs by converting the
pseudocode in Figure 1 into proper Python.
Test your implementation by using your own classifier
instead of the perceptron in
It's probably easiest if you start from the code we saw during the lecture, in particular the class called
You can try to find good values for the parameters T and λ. In my experiments, I set T to 10 times the size of the training set (roughly equivalent to 10 perceptron iterations) and λ to 0.01, which gave a classification accuracy that was slightly higher than that of the perceptron.
As we saw in the lecture, the logistic regression classifier can be stated similarly to the SVM. The difference is just in which loss function is used: the SVM uses the hinge loss and LR the log loss.
Take a look at the table on page 15 in the Pegasos paper. The first line shows the hinge loss and the corresponding update operation that is carried out in each step in the algorithm.
Make a new training algorithm that uses the log loss (the second line) instead of the hinge loss. See how well it works compared to your previous classifier.
Our implementation is slow and wasteful in terms of memory if we use a lot of different features (which is common in NLP). With sparse vectors, this can be improved. Follow the example of
SparsePerceptron in the lecture code and implement sparse versions of the SVM and LR algorithms.
In addition, remove the
SelectKBest step from the pipeline and check the difference.
Hint. The helper function
sparse_dense_dot(x, w) is used to carry out a dot product between a sparse vector
x and a dense vector
add_sparse_to_dense(x, w, xw) is the counterpart of the operation
w += xw * x.
At some places in the pseudocode, the whole weight vector is scaled by some factor. This can be a slow operation if our training set consists of high-dimensional, sparse vectors. Read section 2.4 about a trick for speeding up the scaling operation. Implement the learning algorithm so that it uses this trick.