Predicting Titanic Survival with Kaggle Data
Posted on Fri 11 August 2017 in Data Science • Tagged with kaggle, random forests
A month ago, I finally joined Kaggle to get some practice applying machine learning algorithms. My first submission was to their Titanic competition. For the uninitiated, Kaggle is a website that hosts data science competitions, which are open to anyone, anywhere with an internet connection (with some exceptions).
Generally, competitors are trying to write algorithms that best predict some kind of outcome. In the Titanic competition, for example, Kaggle provides data on 891 actual passengers aboard the Titanic. This includes information like name, social class, other family on board, and most importantly, whether or not each passenger survived. The goal is to use this 'training data' to build some kind of model or algorithm that correctly predicts each passenger's survival outcome. But the real test is whether your algorithm accurately predicts the survival outcomes for a set of passengers for whom you do not have survival information. This is called the 'test data', and your final accuracy score is calculated based on the number of correct predictions you make for this unlabeled data.
Continue reading