TOON: The new data notation optimized for AI models that will replace JSON

In recent years, the AI community has begun to face an obvious problem:
LLM models reason well, but they were not designed to efficiently process structured data like JSON.

And those who work with these models know it well: just a few hundred JSON objects are enough to saturate the context, increase costs and slow down pipelines.

To solve this problem, a new proposal was born: TOON – Token-Oriented Object Notation, a syntax specifically designed to be more "acceptable" by language models, reduce tokens and increase efficiency.

In this article, we see what it is, how it works and above all how much it actually saves compared to the JSON.

🔗 Do you like Techelopment? Check out the site for all the details!

⭐ What is TOON (Token-Oriented Object Notation)

TOON (Token-Oriented Object Notation) is:

A serialization format specifically designed for LLM models.
It is designed to be compact (lightweight), human-readable, and above all, optimized to reduce the number of tokens when entering structured data into LLM prompts.
It is lossless: you can convert JSON data to TOON and then back again without loss.
Its sweet spot is arrays uniform arrays (i.e., each object has the same fields), because in that case TOON can compress very efficiently.
YAML-inspired syntax (indentation for structure) + CSV-like “tabular” style for uniform arrays.
It has some useful guardrails for LLM: for example, it declares the length of the arrays ([N]) and the fields ({field1,field2,…}), which helps the model understand the structure.
It also supports “key folding”: if you have a chain of objects with a single key field, you can compress the key into a path (“dotted path”) to save indentation and tokens.

Here's a direct comparison.

JSON

{
   "users": [
      { "id": 1, "name": "Alice", "role": "admin" },
      { "id": 2, "name": "Bob", "role": "user" }
   ]
}

TOON

users[2]{id,name,role}:
1,Alice,admin
2,Bob,user

TOON eliminates:

redundant { } and [ ] brackets
repeated key strings
double quotes
long indentations
uninformative structural symbols for an LLM

The result? Far fewer tokens.

🔍 Why does TOON save tokens?

LLMs tokenize:

parentheses → 1 token each
quotation marks → 1 token
each JSON key → typically 1–3 tokens
characters like :, ,, [], {} → 1 token
spaces and newlines → more tokens
field names are repeated for each array element

A JSON array of 1,000 objects with 10 fields produces:

10,000 key occurrences, even if the fields are identical
~20,000–25,000 tokens for the structure alone (not counting the values)

TOON instead:

declares the fields only once
declares the array length only once
uses a syntax closer to CSV
greatly reduces punctuation
drastically reduces string repetition

📊 Numerical Analysis: How Much Do You Really Save?

Below are some realistic scenarios based on tests performed with GPT-4-style tokenizers.

Scenario 1 — 100 objects, 5 fields each

Medium Content (Short Names)

id: number
name: 5–10 characters
role: 5 characters
age: number
active: boolean

Estimated Tokens

Format	Token Structure	Token Values	Total
JSON	~1,450	~550	~2,000
TOON	~120	~550	~670

➜ Savings: –66% tokens

Scenario 2 — 1,000 objects, 10 fields

Format	Total Tokens
JSON	~22,000–25,000
TOON	~7,000–8,500

➜ Savings: –60% / –70%
(over 15,000 tokens saved)

For those who pay for APIs in volume, this can mean:

–60% cost per input
more context available to the model
less tokenization latency

Scenario 3 — Large objects with strings long

When values weigh more than keys, the savings decrease, but remain significant.

Example: 50 objects, 6 fields, 3 text fields of 100 characters.

Format	Token Total
JSON	~7,200
TOON	~4,600

➜ Savings: –35% / –40%

🧠 Why does TOON also help with understanding LLMs?

Declaration Structure:
users[100]{id,name,role}
Uniform arrays that are easier to interpret
Less syntactic noise
More CSV-like format

⚠️ Limitations of TOON

TOON is powerful, but not perfect.

If the objects are not uniform, the gain decreases.
For completely flat tables, a CSV can be more compact.
Some models (especially smaller open-source ones) expect JSON as input because they are trained more heavily on that format.
It is not yet A widespread industry standard.

🔧 Current Programming Language Support for TOON

Although TOON is a relatively recent proposal, the ecosystem is growing rapidly. The goal is to make this notation as easy to adopt as JSON, YAML, or CSV, and several languages already have libraries or preliminary implementations.

🐍 Python

Python is currently the language with the most mature support.
A dedicated library already exists:

py-toon
- TOON parsing → Python dict
- dict serialization → TOON
- TOON conversion ↔ JSON
- Field schema validation
- Support for large arrays and streaming

The API is very similar to that of the json module, making adoption immediate:

import toon

data = toon.load("data.toon")
json_data = data.to_json()

This currently makes Python the best platform for experimenting with TOON in AI pipelines, data engineering, or preprocessing for LLMs.

☕ JavaScript / Node.js

There is a first unofficial implementation:

toon-js (work-in-progress)
- Basic parsing
- Support for arrays declared with schema
- Serialization to TOON

Still missing:

Advanced error handling
Completely lossless JSON → TOON conversion
Performance optimizations

It's stable for prototypes, but not yet recommended for production.

🦀 Rust

The Rust community has started work on:

rs-toon (alpha)
- Partial parsing
- Focus on efficiency and zero-copy
- Future integration with serde

The goal is to provide performance comparable to the fastest JSON libraries, while maintainingI provide format readability.

🟦 C# / .NET

Support is currently experimental:

TOON parser → .NET objects
Optional integration with System.Text.Json via custom converters

A full-fledged TOON serializer is still missing.

🐹 Go

Some independent prototypes implement:

parsing TOON tables
direct conversion to maps map[string]interface{}
incremental parsing of very large arrays (useful in distributed systems)

Support is in progress, but promising.

📈 General evolution of the ecosystem

TOON is attracting interest, especially in the AI world, thanks to the concrete benefits in terms of tokens and costs.

Currently:

Python is fully usable
JavaScript and Rust are maturing
C#, Go, and other languages are in their early stages

In the coming months, the following are likely to appear:

Official cross-language libraries
Direct integration into LLM frameworks (LangChain, LlamaIndex, Haystack)
Extensions for ETL and data tools pipelines

TOON is rapidly moving towards a de facto standard, especially in contexts where token efficiency is critical.

🏁 Conclusion: Should you use TOON?

In many cases, yes, and the reason is simple:

TOON typically reduces tokens by 35% to 70% compared to JSON,
without losing any information, with a readable syntax optimized for LLMs.

If you work with:

tabular datasets
lists of Objects
Context-bound pipelines
Token-pricing models

TOON is almost always a significant improvement.

📖 Further Reading

Token-Oriented Object Notation (TOON) — https://github.com/toon-format/toon

Follow me #techelopment

Official site: www.techelopment.it
facebook: Techelopment
instagram: @techelopment
X: techelopment
Bluesky: @techelopment
telegram: @techelopment_channel
whatsapp: Techelopment
youtube: @techelopment

Techelopment

Cerca nel blog