Hello and welcome to the presentation of
the taxi data prediction
I am Tobias and I am presenting the work
of team one punch
which consists of Guanfang, Mingkai, Tobias and Yumi.
The main idea of task 5.1 was that we had five different algorithms.
Each predicting the best order of the orders.
So we had five different best list of orders for each of the rows entries
in these lists we would then say
the most frequent of these orders and
what this would predict which order
was the best for each of the rows
It was predicted different with tips
by using a mean of the regression based models
with this, we achieved an accuracy of 58%
which is higher than the baseline guessing
which just would be 33%.
The main idea of task 5.2.  We
used agglomerative clustering where we
where we had our pickup location and drop off location within 20 clusters.
This time, we used our regression models from the previous one
and calculated the average to find the tips.
We also characterized our payment_type into the label.
So we had the numeric data type to train and predict.
With this we achieved an accuracy of 77%.
which is way higher than the baseline.
Something additional we did for task 5 was
we did some data cleaning, we deleted invalid data.
And we detected outliers by using isolationforest method.
then we dropped 5% of outlying data.
For future improvements of our work.
one of the main things could have done was using cross validation.
With cross validation, we can estimate the generalization error
and with that we could estimate and validate
the methods and choose the best one.
Likewise, we could avoid overfitting and we could
do hyperparameters optimization.
As a lot of the models we are using in this project has dependent variables.
These variables could then be optimized with cross validation,
and could probably give a better fit.
And task 5.1, we did a correlation between the pick up location and the tips and fare.
If we introduced more features or
had some domain knowledge we could perhaps get it
even better accuracy in this one.
So this is the end of the presentation.
Thank you for listening.
See you later, alligator
