Single number evaluation metric
- Use a single real number evaluation metric
- ML Project : Loop(Idea -> Code -> Experiment) needs evaluation
- There are lots of evaluation metric. ex) Precision , Recall for classifier
- Combine these evaluation metric to new one. ex) F1 score
- which algorithm is better?
- for example, use the Average
Satisficing and Optimizing metric
- For example, There are Accuray and Running time to select Classifier.
- If accuracy is better, but Running time is too big, It will be bad Classifier.
- So set the satisfaction condition of Runing time. For example, a classifier with high accuracy is selected under the condition that the run time is less than 100 ms. At this time, Accuracy is the Optimizing metric and Running time is the Satisficing metric.
Train/ dev/ test dataset
- Choose datasets to reflect data you expect to get in the future and consider import to do well on. (same distribution)
Size of the dev and test sets
- old way of splitting data(when dataset is small): Train : Test = 70 % : 30% or Train : Dev : Test = 60% : 20% : 20%
- with much larger dataset : like 98% : 1% : 1%, you can increase train dataset percent
When to change dev/test sets and metrics
- Cat dataset exmaple
- Metric : classfication error
- Algorithm A : 3% error + porn pics though
- Algorithm B : 5% error
- in this case, Metric + Dev : Prefer A, User : Prefer B
- Change Metric -> for instance, weight to porn pics. w is 1 if x is non-pron, w is 10 if x is porn.
- Mobile Cat pic example
- If doing well on your metric + dev/test set does not correspond to doing well on your application (ex,Smartphone), change your metric and/or dev/test set.