March 2018
SMS Spam Filter
The purpose of the project was to research the efficacy of different machine
learning algorithms for a given dataset and use-case. My topic of choice was
training a predictor with
the given dataset
on whether an SMS was spam or not.
Considerations included;
- use-cases,
- programming language,
- libraries,
- train-test splitting,
- feature extraction,
- baseline algorithms.
The investigated machine learning algorithms were;
- Single Layer Perceptron,
- Multinomial Naive Bayes,
- Support Vector Machines,
- Neural Networks.
All the code
(in Python) used for the research can be found on GitHub along with my
solutions to excercise questions (in MatLab).
My report
on the project followed the structure below:
-
Present the problem of your choice in a formalised way, choose a loss
function that reflects the potential use of the predictor.
-
Propose and implement baseline predictors/classiers and methods to
train them. Present your findings.
-
Propose more advanced algorithms to solve the problem. Implement these
methods, give insights into the training and evaluation process.
-
Asses the performance of your algorithms and present your proposed
solution to the problem.
-
Discuss your overall findings and conclusions.