Looking beyond JSON Output format from LLMs

While working on my site for Learning and Quiz Generation I noticed that asking LLM’s for JSON outputs was problematic. Too many times the output was not a valid JSON. After multiple such issues, I felt that I should like at other output formats that I can then parse. Once such method that seems to work is make use of Tag Markers.

So instead of asking the LLM to output responses in this format:

 "questions": [
  {
       "text" : "Full text/context that will be used by all subquestions",
       "subquestions" : [
          "question": "Clear and detailed question.",
          "options": ["Option 1", "Option 2", "Option 3", "Option 4"],
          "correctAnswer": "Exact text of the correct option",
          "explanation": "Detailed explanation of why this answer is correct",
          "difficulty": number between 1 and 100,
          "topics": ["Topic 1", "Topic 2"]
         ]
    }
  ]

I asked them generate response in this expected format:

<QUESTION_START>
<TEXT>Full text/context that will be used by all subquestions</TEXT>

<SUBQUESTION_START>
<QUESTION>First subquestion - clear and detailed question.</QUESTION>
<OPTION_1>First option</OPTION_1>
<OPTION_2>Second option</OPTION_2>
<OPTION_3>Third option</OPTION_3>
<OPTION_4>Fourth option</OPTION_4>
<CORRECT_ANSWER>Exact text of the correct option</CORRECT_ANSWER>
<EXPLANATION>Detailed explanation of why this answer is correct</EXPLANATION>
<DIFFICULTY>number between 1 and 100</DIFFICULTY>
<TOPICS>Topic 1, Topic 2</TOPICS>
<SUBQUESTION_END>
</QUESTION_END>

And used a Regex like this to extract the text between the Tag Markers:

<TAG_MARKER>([\s\S]*?)<\/TAG_MARKER>

Seems to work really well so far.

trk7's blog

Looking beyond JSON Output format from LLMs

Leave a Reply Cancel reply