Introduction
Language learners, especially language learners at the intermediate level, need to acquire an extremely large volume of vocabulary words, and ideally those words should be acquired in context rather than in isolation. This means having a large set of example sentences where a target word is used. Generating example sentences is time-consuming and usually requires human intervention. Additionally, In the case where a learner is using a spaced repetition system to memorize sentences, it is very important that the sentence is appropriately sized. The sentence can't be too long, and it can't be too short, otherwise spaced repetition becomes ineffective or impractical.
Applications
There are a number of applications where it would be beneficial to have an LLM system that can ingest a target vocabulary word and output appropriately sized, grammatically correct, realistic example sentences. The one that comes to mind most frequently for me is the idea of a language student ingesting a word frequency list and studying the output via spaced repetition, starting at the most common words and moving into a more advanced vocabulary as they progress. As the learner finds hard to memorize words, they can compensate for learning challenges by simply generating more example sentences on words that are more difficult than others.
Failed Attempts
I've tried many methods to get language models to make example sentences on their own, but I've faced several issues. The biggest issue I face is getting the model to output an appropriately sized sentence. This is not as easy as it sounds, since large language models often do not have a good grasp of things like syllable count or word length. Another issue is the lack of variety in the topics of the sentences. In my tests, where I looked at about a thousand sentences made by a language model in Korean, I noticed that nearly 14% of the sentences were about school, books, or reading. It seems like the model prefers some topics over others. Sometimes, I've also gotten responses where the model would just say the word is important
without making a useful sentence.
While example sentences are crucial for learning new words, getting language models to create useful and varied sentences remains both a challenge and a very enticing goal to achieve.