March 2018

SMS Spam Filter

The purpose of the project was to research the efficacy of different machine learning algorithms for a given dataset and use-case. My topic of choice was training a predictor with the given dataset on whether an SMS was spam or not.

Considerations included;

  • use-cases,
  • programming language,
  • libraries,
  • train-test splitting,
  • feature extraction,
  • baseline algorithms.
The investigated machine learning algorithms were;
  • Single Layer Perceptron,
  • Multinomial Naive Bayes,
  • Support Vector Machines,
  • Neural Networks.

All the code (in Python) used for the research can be found on GitHub along with my solutions to excercise questions (in MatLab).

My report on the project followed the structure below:
  • Present the problem of your choice in a formalised way, choose a loss function that reflects the potential use of the predictor.
  • Propose and implement baseline predictors/classi ers and methods to train them. Present your findings.
  • Propose more advanced algorithms to solve the problem. Implement these methods, give insights into the training and evaluation process.
  • Asses the performance of your algorithms and present your proposed solution to the problem.
  • Discuss your overall findings and conclusions.