# Assignment 2: Classifier implementation

In this assignment, you will implement two algorithms to train classifiers: support vector machine and logistic regression. The main objectives of this assignment are first that you should get some experience of the practical considerations of implementing machine learning algorithms, and secondly that you should get a taste of how a typical machine learning paper looks.

Write Python code to implement the algorithms. Please also try to answer the exercise question. Send the code and the answer to the exercise by email to the lab instructor (luis.nieto.pina -at- svenska.gu.se). This assignment is solved individually.

## Preliminaries

Repeat the material from the lecture about linear classifiers (in particular the perceptron code near the end), and the the lecture about SVM and LR.

Have a look at the paper Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, Shalev-Shwartz et al. (2010). You're free to skip the mathematical details, but please read the introduction and make sure you understand the pseudocode in Figure 1, which is what you will implement in this assignment. Here are some clarifications about the pseudocode:

• The authors use angle brackets for dot products: <w, x> means w·x
• S is the training set and |S| is the size of the training set.
• T is the number of steps in the algorithm. (This is a bit different from our perceptron, where we specified the number of times to process the whole training set.)
• λ is a regularization parameter that controls the tradeoff between keeping the classifier simple (i.e. the weights small) and fitting the training set. See slide 17 in the lecture.
• The optional line is there for theoretical reasons and can be ignored.

In addition to the paper, I have written a separate document that explains the algorithm in more detail, and uses a notation more similar to what we have seen previously.

## Exercise question

We are developing a simple system to predict the weather based on our location and the season. Here is a training set.

```City:      Month:    | Weather:
--------------------------------
Gothenburg July      | rain
Gothenburg December  | rain
Paris      July      | sun
Paris      December  | rain
```

We build the system by training a perceptron from scikit-learn on this training set. We then test the classifier on the same set. As expected, the classifier correctly classifies all the instances in this set; after all, it's the same set we used to train it.

Now, we repeat this experiment using another training set. We train the classifier using the following examples and evaluate it using the same examples.

```City:      Month:    | Weather:
--------------------------------
Sydney     July      | rain
Sydney     December  | sun
Paris      July      | sun
Paris      December  | rain
```

When we use this set, for some strange reason our classifier performs poorly! We can't improve it by switching to an SVM. Do you have an idea what's going on?

Here is the Python code if you want to try the experiment yourself.

## Introduction

Download and unpack this zip file. The package contains the perceptron code presented during the second lecture (`lecture2.py`). It also contains a program (`doc_classification.py`) that trains and evaluates the perceptron implementation using the review dataset that we've used previously.

Run the experiment code and make sure it works on your machine.

### Implementing the SVM

Implement the Pegasos algorithm for training SVMs by converting the pseudocode in Figure 1 into proper Python. Test your implementation by using your own classifier instead of the perceptron in `doc_clasification.py`. It's probably easiest if you start from the code we saw during the lecture, in particular the class called `NewPerceptron2`.

You can try to find good values for the parameters T and λ. In my experiments, I set T to 10 times the size of the training set (roughly equivalent to 10 perceptron iterations) and λ to 0.01, which gave a classification accuracy that was slightly higher than that of the perceptron.

### Logistic regression

As we saw in the lecture, the logistic regression classifier can be stated similarly to the SVM. The difference is just in which loss function is used: the SVM uses the hinge loss and LR the log loss.

Take a look at the table on page 15 in the Pegasos paper. The first line shows the hinge loss and the corresponding update operation that is carried out in each step in the algorithm.

Make a new training algorithm that uses the log loss (the second line) instead of the hinge loss. See how well it works compared to your previous classifier.

Our implementation is slow and wasteful in terms of memory if we use a lot of different features (which is common in NLP). With sparse vectors, this can be improved. Follow the example of `SparsePerceptron` in the lecture code and implement sparse versions of the SVM and LR algorithms.
In addition, remove the `SelectKBest` step from the pipeline and check the difference.
Hint. The helper function `sparse_dense_dot(x, w)` is used to carry out a dot product between a sparse vector `x` and a dense vector `w`, while `add_sparse_to_dense(x, w, xw)` is the counterpart of the operation `w += xw * x`.