![]() |
In recent years, the AI community has begun to face an obvious problem:
LLM models reason well, but they were not designed to efficiently process structured data like JSON.
And those who work with these models know it well: just a few hundred JSON objects are enough to saturate the context, increase costs and slow down pipelines.
To solve this problem, a new proposal was born: TOON – Token-Oriented Object Notation, a syntax specifically designed to be more "acceptable" by language models, reduce tokens and increase efficiency.
In this article, we see what it is, how it works and above all how much it actually saves compared to the JSON.
⭐ What is TOON (Token-Oriented Object Notation)
TOON (Token-Oriented Object Notation) is:
- A serialization format specifically designed for LLM models.
- It is designed to be compact (lightweight), human-readable, and above all, optimized to reduce the number of tokens when entering structured data into LLM prompts.
- It is lossless: you can convert JSON data to TOON and then back again without loss.
- Its sweet spot is arrays uniform arrays (i.e., each object has the same fields), because in that case TOON can compress very efficiently.
- YAML-inspired syntax (indentation for structure) + CSV-like “tabular” style for uniform arrays.
- It has some useful guardrails for LLM: for example, it declares the length of the arrays ([N]) and the fields ({field1,field2,…}), which helps the model understand the structure.
- It also supports “key folding”: if you have a chain of objects with a single key field, you can compress the key into a path (“dotted path”) to save indentation and tokens.
Here's a direct comparison.
JSON
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
TOON
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
TOON eliminates:
- redundant { } and [ ] brackets
- repeated key strings
- double quotes
- long indentations
- uninformative structural symbols for an LLM
The result? Far fewer tokens.
π Why does TOON save tokens?
LLMs tokenize:
- parentheses → 1 token each
- quotation marks → 1 token
- each JSON key → typically 1–3 tokens
- characters like :, ,, [], {} → 1 token
- spaces and newlines → more tokens
- field names are repeated for each array element
A JSON array of 1,000 objects with 10 fields produces:
- 10,000 key occurrences, even if the fields are identical
- ~20,000–25,000 tokens for the structure alone (not counting the values)
TOON instead:
- declares the fields only once
- declares the array length only once
- uses a syntax closer to CSV
- greatly reduces punctuation
- drastically reduces string repetition
π Numerical Analysis: How Much Do You Really Save?
Below are some realistic scenarios based on tests performed with GPT-4-style tokenizers.
Scenario 1 — 100 objects, 5 fields each
Medium Content (Short Names)
- id: number
- name: 5–10 characters
- role: 5 characters
- age: number
- active: boolean
Estimated Tokens
| Format | Token Structure | Token Values | Total |
|---|---|---|---|
| JSON | ~1,450 | ~550 | ~2,000 |
| TOON | ~120 | ~550 | ~670 |
➜ Savings: –66% tokens
Scenario 2 — 1,000 objects, 10 fields
| Format | Total Tokens |
|---|---|
| JSON | ~22,000–25,000 |
| TOON | ~7,000–8,500 |
➜ Savings: –60% / –70%
(over 15,000 tokens saved)
For those who pay for APIs in volume, this can mean:
- –60% cost per input
- more context available to the model
- less tokenization latency
Scenario 3 — Large objects with strings long
When values weigh more than keys, the savings decrease, but remain significant.
Example: 50 objects, 6 fields, 3 text fields of 100 characters.
| Format | Token Total |
|---|---|
| JSON | ~7,200 |
| TOON | ~4,600 |
➜ Savings: –35% / –40%
π§ Why does TOON also help with understanding LLMs?
- Declaration Structure:
users[100]{id,name,role} - Uniform arrays that are easier to interpret
- Less syntactic noise
- More CSV-like format
⚠️ Limitations of TOON
- If the objects are not uniform, the gain decreases.
- For completely flat tables, a CSV can be more compact.
- Some models (especially smaller open-source ones) expect JSON as input because they are trained more heavily on that format.
- It is not yet A widespread industry standard.
π§ Current Programming Language Support for TOON
Although TOON is a relatively recent proposal, the ecosystem is growing rapidly. The goal is to make this notation as easy to adopt as JSON, YAML, or CSV, and several languages already have libraries or preliminary implementations.
π Python
Python is currently the language with the most mature support.
A dedicated library already exists:
py-toon
- TOON parsing → Python dict
- dict serialization → TOON
- TOON conversion ↔ JSON
- Field schema validation
- Support for large arrays and streaming
The API is very similar to that of the json module, making adoption immediate:
import toon
data = toon.load("data.toon")
json_data = data.to_json()
This currently makes Python the best platform for experimenting with TOON in AI pipelines, data engineering, or preprocessing for LLMs.
☕ JavaScript / Node.js
There is a first unofficial implementation:
toon-js(work-in-progress)- Basic parsing
- Support for arrays declared with schema
- Serialization to TOON
Still missing:
- Advanced error handling
- Completely lossless JSON → TOON conversion
- Performance optimizations
It's stable for prototypes, but not yet recommended for production.
π¦ Rust
The Rust community has started work on:
rs-toon(alpha)- Partial parsing
- Focus on efficiency and zero-copy
- Future integration with
serde
The goal is to provide performance comparable to the fastest JSON libraries, while maintainingI provide format readability.
π¦ C# / .NET
Support is currently experimental:
- TOON parser → .NET objects
- Optional integration with
System.Text.Jsonvia custom converters
A full-fledged TOON serializer is still missing.
πΉ Go
Some independent prototypes implement:
- parsing TOON tables
- direct conversion to maps
map[string]interface{} - incremental parsing of very large arrays (useful in distributed systems)
Support is in progress, but promising.
π General evolution of the ecosystem
TOON is attracting interest, especially in the AI world, thanks to the concrete benefits in terms of tokens and costs.
Currently:
- Python is fully usable
- JavaScript and Rust are maturing
- C#, Go, and other languages are in their early stages
In the coming months, the following are likely to appear:
- Official cross-language libraries
- Direct integration into LLM frameworks (LangChain, LlamaIndex, Haystack)
- Extensions for ETL and data tools pipelines
TOON is rapidly moving towards a de facto standard, especially in contexts where token efficiency is critical.
π Conclusion: Should you use TOON?
In many cases, yes, and the reason is simple:
- TOON typically reduces tokens by 35% to 70% compared to JSON,
without losing any information, with a readable syntax optimized for LLMs.
If you work with:
- tabular datasets
- lists of Objects
- Context-bound pipelines
- Token-pricing models
TOON is almost always a significant improvement.
π Further Reading
- Token-Oriented Object Notation (TOON) — https://github.com/toon-format/toon
Follow me #techelopment
Official site: www.techelopment.it
facebook: Techelopment
instagram: @techelopment
X: techelopment
Bluesky: @techelopment
telegram: @techelopment_channel
whatsapp: Techelopment
youtube: @techelopment
