While working on my site for Learning and Quiz Generation I noticed that asking LLM’s for JSON outputs was problematic. Too many times the output was not a valid JSON. After multiple such issues, I felt that I should like at other output formats that I can then parse. Once such method that seems to work is make use of Tag Markers.
So instead of asking the LLM to output responses in this format:
"questions": [
{
"text" : "Full text/context that will be used by all subquestions",
"subquestions" : [
"question": "Clear and detailed question.",
"options": ["Option 1", "Option 2", "Option 3", "Option 4"],
"correctAnswer": "Exact text of the correct option",
"explanation": "Detailed explanation of why this answer is correct",
"difficulty": number between 1 and 100,
"topics": ["Topic 1", "Topic 2"]
]
}
]
I asked them generate response in this expected format:
<QUESTION_START>
<TEXT>Full text/context that will be used by all subquestions</TEXT>
<SUBQUESTION_START>
<QUESTION>First subquestion - clear and detailed question.</QUESTION>
<OPTION_1>First option</OPTION_1>
<OPTION_2>Second option</OPTION_2>
<OPTION_3>Third option</OPTION_3>
<OPTION_4>Fourth option</OPTION_4>
<CORRECT_ANSWER>Exact text of the correct option</CORRECT_ANSWER>
<EXPLANATION>Detailed explanation of why this answer is correct</EXPLANATION>
<DIFFICULTY>number between 1 and 100</DIFFICULTY>
<TOPICS>Topic 1, Topic 2</TOPICS>
<SUBQUESTION_END>
</QUESTION_END>
And used a Regex like this to extract the text between the Tag Markers:
<TAG_MARKER>([\s\S]*?)<\/TAG_MARKER>
Seems to work really well so far.