In this assignment, you will implement two algorithms to train classifiers: support vector machine and logistic regression. The main objectives of this assignment are first that you should get some experience of the practical considerations of implementing machine learning algorithms, and secondly that you should get a taste of how a typical machine learning paper looks.

Write Python code to implement the algorithms. Please also try to
answer the exercise question. Send the code
and the answer to the exercise by email to the lab instructor
(luis.nieto.pina -at- svenska.gu.se).
This assignment
is solved **individually**.

**Deadline: September 24**

Repeat the material from the lecture about linear classifiers (in particular the perceptron code near the end), and the the lecture about SVM and LR.

Have a look at the paper Pegasos: Primal Estimated sub-GrAdient SOlver for SVM, Shalev-Shwartz et al. (2010). You're free to skip the mathematical details, but please read the introduction and make sure you understand the pseudocode in Figure 1, which is what you will implement in this assignment. Here are some clarifications about the pseudocode:

- The authors use angle brackets for dot products: <w, x> means w·x
*S*is the training set and |*S*| is the**size**of the training set.*T*is the number of steps in the algorithm. (This is a bit different from our perceptron, where we specified the number of times to process the whole training set.)- λ is a
**regularization**parameter that controls the tradeoff between keeping the classifier simple (i.e. the weights small) and fitting the training set. See slide 17 in the lecture. - The optional line is there for theoretical reasons and can be ignored.

In addition to the paper, I have written a separate document that explains the algorithm in more detail, and uses a notation more similar to what we have seen previously.

We are developing a simple system to predict the weather based on our location and the season. Here is a training set.

City: Month: | Weather: -------------------------------- Gothenburg July | rain Gothenburg December | rain Paris July | sun Paris December | rain

We build the system by training a perceptron from scikit-learn on this training set. We then test the classifier on the same set. As expected, the classifier correctly classifies all the instances in this set; after all, it's the same set we used to train it.

Now, we repeat this experiment using another training set. We train the classifier using the following examples and evaluate it using the same examples.

City: Month: | Weather: -------------------------------- Sydney July | rain Sydney December | sun Paris July | sun Paris December | rain

When we use this set, for some strange reason our classifier performs poorly! We can't improve it by switching to an SVM. Do you have an idea what's going on?

Here is the Python code if you want to try the experiment yourself.

Download and unpack this zip file.
The package contains the perceptron code presented during
the second lecture (`lecture2.py`

). It also contains a program
(`doc_classification.py`

)
that trains and evaluates
the perceptron implementation using the review dataset that we've used
previously.

Run the experiment code and make sure it works on your machine.

Implement the Pegasos algorithm for training SVMs by converting the
pseudocode in Figure 1 into proper Python.
Test your implementation by using your own classifier
instead of the perceptron in `doc_clasification.py`

.
It's probably easiest if you start from the code we saw during the lecture, in particular the class called `NewPerceptron2`

.

You can try to find good values for the parameters *T* and
λ. In my experiments, I set *T* to 10 times the size of
the training set (roughly equivalent to 10 perceptron iterations) and
λ to 0.01, which gave a classification accuracy that was
slightly higher than that of the perceptron.

As we saw in the lecture, the logistic regression classifier can be
stated similarly to the SVM. The difference is just in which **loss
function** is used: the SVM uses the **hinge loss** and LR
the **log loss**.

Take a look at the table on page 15 in the Pegasos paper. The first line shows the hinge loss and the corresponding update operation that is carried out in each step in the algorithm.

Make a new training algorithm that uses the log loss (the second line) instead of the hinge loss. See how well it works compared to your previous classifier.

Our implementation is slow and wasteful in terms of memory if we use a lot of different features (which is common in NLP). With sparse vectors, this can be improved. Follow the example of `SparsePerceptron`

in the lecture code and implement sparse versions of the SVM and LR algorithms.

In addition, remove the `SelectKBest`

step from the pipeline and check the difference.

**Hint**. The helper function `sparse_dense_dot(x, w)`

is used to carry out a dot product between a sparse vector `x`

and a dense vector `w`

, while `add_sparse_to_dense(x, w, xw)`

is the counterpart of the operation
`w += xw * x`

.

At some places in the pseudocode, the whole weight vector is scaled by some factor. This can be a slow operation if our training set consists of high-dimensional, sparse vectors. Read section 2.4 about a trick for speeding up the scaling operation. Implement the learning algorithm so that it uses this trick.