UPDATE: I donlt think it is worth doing this in 2025 as the quality of foundation models has improved substantially and costs have gone down. This might still be a good idea at scalle, but I donlt operate at scale.
- background
- the task
- data set, training costs: ~1,200 examples cost $1.41
- Original GPT-4 prompt
- Manually reviewed collected data.
- Two orders of magnitude cheaper due to needing fewer hints and prompt tokens.
- cost reduction
- past attempts
- the training data
- Possible issues:
- Since many of my cards (~1,200) exist in the initial training set, there may be
confirmation bias
and the model might see a reduction in quality as new cards are added. - The fine tuned model is more permissive about minor grammatical issues whereas GPT-4 often rejects grammar mistakes. This might be fixable with more fine tuning but would require native speaker labeling.
- Still better than self-grading via Anki, since GPT-3.5 has a better understanding of Korean than I do.
- Since many of my cards (~1,200) exist in the initial training set, there may be