RickCarlino.com

UPDATE: I donlt think it is worth doing this in 2025 as the quality of foundation models has improved substantially and costs have gone down. This might still be a good idea at scalle, but I donlt operate at scale.

background
the task
data set, training costs: ~1,200 examples cost $1.41
- Original GPT-4 prompt
- Manually reviewed collected data.
- Two orders of magnitude cheaper due to needing fewer hints and prompt tokens.
cost reduction
past attempts
the training data
Possible issues:
- Since many of my cards (~1,200) exist in the initial training set, there may be confirmation bias and the model might see a reduction in quality as new cards are added.
- The fine tuned model is more permissive about minor grammatical issues whereas GPT-4 often rejects grammar mistakes. This might be fixable with more fine tuning but would require native speaker labeling.
  - Still better than self-grading via Anki, since GPT-3.5 has a better understanding of Korean than I do.

Rick Carlino

Fine tuning an llm to grade language learning prompts