GTS — Graph Transport Substrate — Specification
| Field | Value |
|---|---|
| Document version | 0.9-draft |
| Wire-format major version | 1 |
| Date | 2026-06-18 |
| Editor | Patrick Audley, Blackcat Informatics® Inc. |
| This version | bfbe9f34142b116c26e5577963f418e5cfc3d267 |
| DOI | https://doi.org/10.67342/6pta6imnmw/v1 |
Abstract
GTS (Graph Transport Substrate) is an ontology-independent binary container and transport format for RDF 1.2 datasets and content-addressed binary payloads. A GTS file is a CBOR Sequence of one or more append-only segments. Each segment consists of a deterministic CBOR header followed by deterministic CBOR frames linked by BLAKE3 content identifiers. The logical dataset is obtained by a deterministic fold over the segment sequence. GTS supports partial readability, opaque encrypted or unknown-codec frames, append-only suppression, optional signatures and encryption, and cross-language conformance through a shared vector corpus.
Status of this document
| Field | Value |
|---|---|
| Status | Working draft |
| Document version | 0.9-draft |
| Wire-format major version | 1, encoded in the segment header "v" field |
| Date | 2026-06-18 |
| Document DOI | https://doi.org/10.67342/6pta6imnmw/v1 |
| Stability | Wire-format changes remain possible until v1.0 |
| Change control | Blackcat Informatics / GTS governance process |
| Conformance | Defined by this document and the versioned vector corpus (§19) |
| Implementation versions | Package versions are independent release artifacts |
| Corpus version | The corpus is versioned separately from package releases |
This specification is maintained in the gmeow-gts
repository, alongside four interoperable reference engines (Rust,
Python, Go, TypeScript) that gate against the shared vector corpus.
Report errata and propose changes there. Core semantic changes, registry
additions, and optional-standard profile promotion follow the GTS governance process.
GTS is ontology-independent. GMEOW is a primary downstream consumer and distribution use case for GTS, but GTS readers and writers do not require GMEOW vocabulary, tooling, or semantics. Domain-specific profiles, including GMEOW and music-package profiles, are layered above the core format.
Document history
This section records changes to this specification document. Package releases, package version numbers, and per-engine release notes are separate artifacts and are not implied by the document version.
Changes in v0.9-draft (2026-06-18):
- Aligns publication metadata with the current v1.0-rc1 preparation state while keeping package versions independent from the spec document version.
- Clarifies conformance scopes, reader/writer classes, streaming-reader memory bounds, and canonical reader diagnostics.
- Formalises the graph fold, multi-segment value union, blank-node
scoping, RDF 1.2 triple-term and
rdf:reifiesmapping, position constraints, and duplicate/conflict behavior. - Adds streamable-layout rules, optional index/MMR proof preimages, proof verification, unknown extension-key behavior, media type and HTTP serving contracts, and durability/security considerations.
- Expands vector-corpus and manifest references so conformance claims name corpus revisions, subsets, tiers, modes, and release-stamped manifest artifacts.
- Pins the RDF 1.2 substrate to the 07 April 2026 W3C Candidate Recommendation Snapshot and states which RDF semantics GTS imports.
Earlier v0.3 document notes:
- Multi-segment files (
cat-append composition, §3.1); segment-scoped term-ids (§7.2); per-segment fold and value-union semantics (§7.5); cross-segment suppression (§11); profile union and per-section language-tag discipline (§13); composition-tool requirements (§14.1); conformance vectors 15–21 (§19). - Layout states and the streamable claim (§3.3, §5); streamable
compaction with detached frame signatures (§10.1); the
streamvocabulary (§13.3); thecompactverb (§14.1); conformance vectors 24–26 (§19).
Table of contents
- 1. Overview and non-goals
- 2. Terminology and conformance
- 3. File structure
- 4. CBOR conventions
- 5. Header
- 6. Frames
- 7. Graph data model and fold
- 8. Transform catalog
- 9. Integrity and confidentiality
- 10. Compaction
- 11. Suppression (additive “deletion”)
- 12. Binary and content-addressing
- 13. Profiles
- 14. Transforms out
- 15. Worked examples
- 16. Media type and HTTP serving contract
- 17. Versioning and durability guarantees
- 18. Security considerations
- 19. Conformance test vectors
- 20. IANA considerations
- 21. Complete CDDL appendix
- 22. Hash, signature, and extension-key preimages
- 23. References
1. Overview and non-goals
GTS encodes a graph as an append-only log of CBOR frames. The logical graph is the fold (replay) of the log. Growth is an append; “deletion” is suppression, never a physical removal; optimisation is a separate, explicitly lossy compaction that rewrites the log into a snapshot.
Four properties define the format:
- CBOR all the way down (RFC 8949). One ubiquitous, IETF-standardised binary encoding with native byte strings (no base64 tax), deterministic encoding (clean content hashes), and CBOR Sequences — concatenated data items with no enclosing length, so append is cheap. A reader needs only a CBOR library.
- A durable transform catalog. Each frame’s payload
carries a stackable chain of codecs drawn from an open,
long-lived catalog (
identity,base64,base85,gzip,zstd,lzma2,cose-encrypt, …). The catalog separates structure durability (CBOR + this spec, forever) from density and confidentiality (swappable codecs). - Integrity by construction. Every frame carries an independent BLAKE3 self-hash (a content-id) and names its predecessor’s id — a git-style content-addressed chain. Verification is parallel, a damaged frame is independently detectable (and the survivors recoverable given an intact index, §9.1), and the head id transitively commits to all history. Cryptographic signatures and encryption (COSE, RFC 9052) are optional, layered, and algorithm-agile.
- Recursive composition (matryoshka). A payload, after its transforms are reversed, is just bytes — and a GTS file is just bytes. So a payload MAY itself be a complete GTS, wrapped in any transform (compressed or encrypted). A whole signed graph can ride inside an encrypted field, with its own independent header, chain, and signatures (§12.1).
Non-goals. GTS does not define a query language, an index format mandatory for reading, a reasoner, or a mutation protocol. Random-access query, deep traversal, and SPARQL are the job of a transform target, not of GTS.
Informative motivation. GTS keeps the baseline reader surface small: a reader needs CBOR, BLAKE3, the mandatory codecs, and the fold rules rather than an RDF text parser. Tools that need richer query, indexing, or analytics project the folded data to an operating substrate such as N-Quads, SQLite, DuckDB, or Parquet.
2. Terminology and conformance
The key words MUST, MUST NOT, REQUIRED, SHALL, SHOULD, MAY, and OPTIONAL are to be interpreted as described in BCP 14 (RFC 2119, RFC 8174).
- Log — the ordered sequence of frames in a GTS file.
- Frame — one CBOR data item in the log (§6).
- Fold — the deterministic replay of the log into a graph state (§7.5).
- Term — an RDF term (IRI, literal, blank node, or quoted triple) with a stable integer id.
- Reifier — a term that denotes a quoted triple, carrying statement-level metadata (RDF 1.2).
- Capability — what a reader must hold to decode a payload: a codec library or a key.
- Opaque node — the graph representation of a frame the reader could not decode (§7.6).
2.1 Conformance scopes
This specification separates the following conformance scopes:
- Wire-format conformance covers the byte-level CBOR Sequence structure, deterministic CBOR encoding, header and frame grammar, content-id preimages, and segment boundaries.
- Reader conformance covers parsing, chain verification, payload resolution, fold behavior, diagnostics, opaque-node handling, and resource-bound behavior.
- Writer conformance covers deterministic output, valid headers and frames, correct content identifiers, codec declarations, and signature/hash preimages.
- Tool conformance covers command-line or library policy that is stricter than local file validity, such as validating composition, extraction, publication, or archive operations.
- Profile conformance covers profile-specific vocabulary, validation, capability, and trust rules layered above the core format.
- Deployment conformance covers serving and distribution behavior such as media type, caching, range requests, and byte-preservation across HTTP or artifact hosting.
The conformance classes below define reader and writer behavior. Tool, profile, and deployment requirements are scoped explicitly in the sections that define them.
Baseline reader/writer conformance is independent of profile validation, CLI verbs, transform targets, and HTTP deployment behavior. A locally valid GTS file remains locally valid when it declares an unsupported profile; a reader records the profile declaration and folds the bytes according to its reader class, while a profile-aware tool MAY apply additional checks in the profile conformance scope.
Profiles, tools, and deployments MUST NOT change header or frame grammar, segment-boundary detection, content-id or signature/hash preimages, transform-catalog resolution, or the core fold semantics in §7. A stricter profile MAY reject an otherwise valid artifact only as a profile-level validation failure, not by redefining core GTS validity.
2.2 Reader and writer conformance classes
- A Baseline Reader MUST: parse the CBOR sequence;
verify the id/prev chain (§9.1); fold
terms,quads,reifies,annot,blob,suppress,meta, andsnapshotframes; support theidentity,gzip, andzstdcodecs; and surface any frame it cannot decode as an opaque node (§7.6). It MAY ignore signatures and encryption. - A Streaming Reader is a Baseline Reader that
processes frames one at a time and emits to a sink without
materialising the whole graph: it maintains only the term
dictionary (and a running chain check), plus the maximum decoded frame
size and validation sidecar state, giving O(distinct terms + maximum
decoded frame size + validation sidecar state) retained memory rather
than O(triples + blobs) (§7.7). The
gts → duckdb/sqlitetransforms (§14) are Streaming Reader-shaped when implemented through a non-materializing sink. - A Full Reader additionally verifies COSE signatures, decrypts COSE-encrypted frames for which it holds keys, MAY recurse into nested GTS blobs (§12.1), and MAY use the optional index frame (§6.2) for parallel verification and random access.
- A Writer MUST emit deterministic CBOR (§4) for any
bytes that are hashed or signed, and MUST compute each frame’s
"id"self-hash and set"prev"to the previous item’s"id".
2.3 Baseline reader API shape
A Baseline Reader SHOULD expose at least:
open(bytes|path) -> Graph # parse + verify chain + fold
Graph.quads() -> iterator[(s,p,o,g)] # term ids resolved to terms
Graph.term(id) -> Term
Graph.annotations(reifier) -> iterator[(prop, value)]
Graph.blob(digest) -> bytes | OpaqueRef
Graph.opaque() -> iterator[OpaqueNode]
Graph.to_nquads(out) # §14
This API shape is intentionally small: it exposes the folded tables,
diagnostics, and common projection path without requiring an RDF text
parser, prefix resolver, query engine, or reasoner. The cross-language
API and CLI parity contract is maintained in GTS-API-CLI-PARITY.md.
2.4 Reader diagnostics
Readers surface machine-observable diagnostics with these canonical classes (an implementation MAY map them to error returns or structured warnings):
| class | meaning |
|---|---|
EmptyFile |
empty byte stream or no segment header; return an empty result with a fatal diagnostic rather than aborting (§3) |
TornAppendError |
trailing incomplete CBOR item at EOF (§3) |
DamagedFrame |
self-"id" mismatch / invalid frame hash (content
corruption); opaque reason:"damaged" (§7.6) |
BrokenChain |
valid frame hash, but "prev" ≠ the previous item’s
"id" (insertion / reorder / splice) (§9.1) |
TruncatedLog |
a head commitment is present but the observed head differs (§9, §18) |
UnknownCodec |
a transform names a codec the reader lacks; opaque
reason:"unknown-codec" |
MissingKey |
an encrypt codec the reader cannot decrypt; opaque
reason:"missing-key" |
KeyWrapFailed |
a deferred multi-recipient key unwrap failed; opaque
reason:"missing-key" |
ConflictingReifier |
a reifier rebound to a different triple (§7.8) |
PositionConstraint |
a term appears in an illegal subject, predicate, object, or graph-name position; reject/diagnose the offending row (§7.4) |
ForwardReference |
a term-id reference names a term not introduced by an earlier frame in the same segment (§7.2, §7.5) |
SegmentBoundary |
a compatibility reader reaches a later segment header where file-global term ids would misfold; stop with a fatal diagnostic (§3.1, §19) |
RecursionLimit |
nested-GTS depth or decoded-size budget exceeded (§12.1, §18) |
StreamableLayoutError |
a segment claims "layout": "streamable" but its covered
region violates delivery ordering, or its index footer is missing or
contradicts the frames it covers (§3.3) |
IndexMmrError |
an optional index.mmr root is present but does not
match the covered frame ids (§6.2) |
UnknownFrameType |
a frame type is not understood by the reader/profile; preserve chain verification and either ignore it or surface it as opaque until a profile handles it (§7.8) |
3. File structure
A GTS file is a CBOR Sequence (RFC 8742): zero
framing bytes between items, each item a well-formed CBOR data item.
Each segment MAY begin with the CBOR self-describe tag
55799 (0xd9 0xd9 0xf7) as a magic number for
that segment. If present, tag 55799 MUST tag the segment
Header data item; it is not a separate log item, has no
"id", and does not participate in the id/prev chain. A GTS
file MUST NOT wrap the whole sequence in an outer CBOR item.
GTS-file = segment *segment
segment = [self-describe-tag] header *frame
- The first data item of a segment MUST be a Header (§5).
- Every subsequent data item of a segment is a Frame (§6), in log order, until the next Header (which begins a new segment) or end of input.
- Append = concatenate one more frame (extending the last segment), or concatenate a whole further segment (§3.1). No length prefix or count is stored, so a writer never rewrites earlier bytes.
3.1 Multi-segment files (cat-append composition)
A GTS file is one or more segments, each a complete,
self-contained header *frame log. The defining property:
byte-concatenation of valid GTS files is a valid GTS
file —
cat music.gts >> core.gts # core.gts is now a valid two-segment GTS- Boundary detection (normative). A reader that has
consumed at least one frame and encounters a data item that is a map
containing the key
"gts"and lacking the key"t"MUST treat it as the Header of a new segment (the optional self-describe tag55799MAY tag that header; writers SHOULD emit the tag on every segment header to make boundaries human-recognizable). The tag is attached to the segment header, so byte-concatenation of independently valid tagged segments remains byte-concatenation of CBOR Sequence items, not a nested whole-file wrapper. Any other non-frame item remains malformed input (§17). - Independent integrity. Each segment has its own
genesis (its header
"id"), its own id/prev chain, its own signatures, and its own optionalindex(an index covers ONLY its segment). The file’s composite identity is the ordered list of segment head ids. A third-party segment carries its own signer; concatenation rewrites nothing (acatcannot rewrite an earlier segment’s header without breaking its self-hash — by design). - Identity across segments. Term-ids are segment-scoped (§7.2); the ONLY cross-segment identity is the term value (IRI, literal, quoted-triple structure). Blank-node labels are segment-local and MUST NOT be merged across segments (the §12.1 nested-GTS rule, applied at the top level).
- Profiles union. The file’s effective
profile/requirement set is the union of the segment headers’
"prof"values (and any profile requirements carried in segment metadata). A reader lacking the capabilities a segment requires degrades that segment’s frames to opaque nodes (§7.6) — “this data needs the gmeow-music profile” is a header read, not an error. - Relationship to nesting. Nested GTS (§12.1)
composes by containment (a sealed, independently-shippable
subgraph); segments compose by concatenation (open, tool-free
aggregation). Both yield a union fold; choose nesting when the part must
travel or seal independently, segments when plain
catmust work.
3.2 Streaming and progressive enhancement
The append-only log makes streaming a property of the format, not a feature of a tool. Three facts compose, and conformant implementations MUST preserve all three:
- Prefix-fold validity (normative). Every byte prefix
of a valid GTS file that ends on a data-item boundary is itself a valid
GTS file, and a reader MUST fold it to exactly the state it would reach
folding those same items inside the complete file. A live stream in
flight is therefore indistinguishable from a file with a torn
append (§3): the partial trailing item means “not yet arrived”, and a
consumer MAY keep reading as bytes land (
tail -fsemantics) — every intermediate fold is a real, usable graph state, never a half-parsed error state. - Monotone refinement. Appended frames only ever
add knowledge: quads accumulate (§7.8 set semantics), a reifier
binding is first-wins so an established rendering never changes under
it, and suppression is an additive display overlay (§11) — arrival of a
suppressframe refines presentation without invalidating any prior fold. The chain check is likewise incremental: O(1) state (the expected"prev") verifies each frame as it arrives. - Chunk-safe framing. CBOR Sequence items are
self-delimiting, so item boundaries are safe re-chunking points for
relays and proxies, and resumption is content-addressed: a receiver that
states the last frame
"id"it verified can be resumed from the next byte with no negotiation beyond that hash.
Progressive enhancement. Producers SHOULD order
content most-significant-first so an early prefix is maximally useful:
within a segment, terms/quads (the graph)
before bulky blob frames, and small or preview
manifestations before large ones; across a file, segments ARE the
enhancement layers — a base segment (core graph + thumbnails) followed
by enhancement segments (full-resolution blobs, computed projections)
gives a receiver a complete, verifiable package at every segment
boundary, §3.1’s composition rules applied as a delivery schedule.
Checkpoint index frames (§6.2) emitted
periodically give a streaming consumer intermediate truncation anchors
("head"), random-access offsets for ranged re-fetch, and a
manifest of what has arrived; the index remains an accelerator, never a
dependency (§3, §6.2).
The manifest is the graph. GTS needs no
table-of-contents structure, because the frames that describe
content can precede the frames that carry it: a producer SHOULD
emit the quads naming each upcoming manifestation — its content digest,
media type, size, role — before the blob frames whose bytes
they promise. The fold of an early prefix then contains the delivery
schedule as ordinary knowledge: every digest the graph names but the
stream has not yet delivered is a content-addressed IOU, so “stop here”,
“skip ahead” and “range-fetch only the RAW file” are informed
consumer decisions, taken against a verifiable catalog rather than a
guess. (A blob that never arrives in this file is simply an external
blob, §12 — the reference degrades gracefully to “bytes live
elsewhere”.)
Worked delivery schedule — a photograph as a progressive stream; a consumer may stop at any item boundary with a complete, verified package of everything above its stopping point:
header profile, codec catalog
terms/quads the catalog: Work + every manifestation below,
each with digest, mt, size, role (the IOUs)
blob image/webp ~20 KB thumbnail — first paint
blob image/jxl ~8 MB full-resolution render
terms/quads scene description (what is IN the image)
blob image/x-raw ~80 MB RAW sensor dump
meta/quads full camera metadata
terms/quads/annot AI analysis as RDF, statement-level provenance
terms/quads/annot opinions — standpoint-qualified claims
terms/quads processing-pipeline provenance
index footer: offsets, head anchor, MMR (§6.2)
A casual viewer stops after the thumbnail; an archivist takes everything; an editor range-fetches the RAW by digest after reading only the catalog. Same bytes, same chain, three consumers.
A reader streams items until end of input. Trailing partial bytes (a
torn append) MUST be detected and ignored with a diagnostic: a reader
attempts to decode each successive CBOR item, and if the decoder signals
an incomplete item or unexpected EOF at end-of-file, it MUST treat the
trailing bytes as a torn append, ignore that incomplete item, and
surface a machine-observable diagnostic (e.g. a
TornAppendError warning). In particular, if a crash
occurred while writing an index frame (§6.2) the trailing
index is torn: a reader MUST ignore it and fall back to an earlier
intact index or to a plain sequential
scan, so every surviving frame remains recoverable. The
optional index is an accelerator, never a dependency.
Every property above holds for any frame order; what a producer chooses as the order is a separate, named concern: a segment is in one of two layout states — accretive (append-ordered) or streamable (delivery-ordered) — defined next (§3.3).
3.3 Layout states: accretive and streamable
A GTS segment is always valid and always prefix-foldable (§3.2), but it lives in one of two layout states:
- Accretive — append-optimized. Frames land in arrival order (live capture, agent memory accrual, evidence accrual). Writes are cheap forever and the stream is consumable as it lands, but significance is not front-loaded and the catalog may trail the bytes it describes. This is the default state; it is never declared.
- Streamable — delivery-ordered. The catalog
presages the payload: a leading streaming
index (ordinary
terms/quadsframes in thestreamvocabulary, §13.3 — onestream:Manifestationper promised blob, carrying digest, media type, size, role, and intended order) precedes everyblobframe, blobs follow most-significant-first, and a trailing offsetindex(§6.2) closes the covered region as the random-access footer.
Append-friendly and stream-optimal are different layouts of the
same content (precedent: mp4 faststart, zip
central-directory rewrites, LSM compaction). A one-pass writer cannot
produce the second state, so conversion is an explicit rewrite —
streamable compaction (§10.1), exposed as
gts compact --streamable (§14.1).
The claim (normative). A segment declares the
streamable state with the optional header key
"layout": "streamable" (§5). The claim is per-segment (each
segment has its own header, §3.1) and tamper-evident (the header
self-hash covers it). Streamability is a declared-vs-computed
claim in the sense of §14.1 — refuse-don’t-trust:
- The covered region of a claimed segment is the
prefix delimited by the segment’s last intact
indexframe:"count"frames, ending at the frame whose"id"equals the index’s"head". The footer MUST immediately follow the frames it covers ("count"= the index’s own frame position − 1) — otherwise frames could sit between the covered prefix and the footer, counted neither as covered nor as accretive tail. A claimed segment with no intactindexframe, whose last index is not immediately adjacent to its covered prefix, or whose"head"does not equal the id of frame"count", is in violation. - Within the covered region, every inline
blobframe MUST be preceded by aquadsframe that describes its digest viastream:digest(§13.3) — catalog-before-payload. A covered blob delivered before its description is in violation. - A reader encountering a violation MUST surface a
StreamableLayoutErrordiagnostic (§2.3); a verifying tool treats it as an error (§14.1). The claim can never rot against the bytes.
Appends after compaction are legal and foldable.
Frames after the last index are simply unpresaged:
they are the segment’s accretive tail, carry no
ordering obligation, and trigger no diagnostic. The segment is then
“streamable through frame N, accretive after” — tooling SHOULD
report the boundary (§14.1). Re-compact to re-streamline. Likewise, a
segment appended by cat makes no claim unless its own
header claims.
In-flight prefixes. A prefix of a streamable segment
cut before the trailing index has, by construction, a claim
and no footer yet; a streaming consumer MUST NOT treat the missing
footer as a lie while input may still be arriving — the missing-footer
violation applies to a complete file. The
catalog-before-payload rule, by contrast, is prefix-stable: a violation
observed in any prefix is a violation of the whole file.
4. CBOR conventions
- Maps use short text-string keys
(e.g.
"t","d") for self-description and eyeball debuggability; compactness is the transform layer’s job, not the schema’s. - Any bytes that are hashed or signed MUST use Deterministic Encoding (RFC 8949 §4.2): shortest-form integers, definite-length items, and map keys sorted bytewise on their encoded form — explicitly the RFC 8949 rule, NOT RFC 7049’s length-first canonical ordering. (For the short text keys GTS itself uses the two coincide, because a CBOR text string’s initial byte embeds its length; the rules diverge on mixed-type keys, so implementations MUST NOT rely on a CBOR library’s legacy “canonical” mode without checking which ordering it implements.)
- Unsigned integers are used for all ids. BLAKE3 digests are 32-byte (256-bit) byte strings.
- Short grammar fragments are given in CDDL (RFC 8610). The complete copyable CDDL appendix is §21, and the canonical preimage rules are §22.
term-id = uint ; append-order, frozen (§7.2)
digest = bstr .size 32 ; BLAKE3-256
content-id = digest ; a frame's self-hash (§9.1)
digest-ref = digest / tstr ; raw digest or "blake3:<hex>" text (§21.2)
codec-id = uint ; index into the header codec catalog (§8)
5. Header
The Header is the first data item and the chain genesis; it is not a
frame (it has no "prev").
header = {
"gts" : "GTS1", ; magic / format id
"v" : uint, ; spec major version (1)
"prof" : tstr, ; profile (§13); "generic" if unspecified
"cat" : { * codec-id => codec }, ; the transform catalog (§8)
? "layout": tstr, ; layout-state claim (§3.3); absent = accretive
? "dct": { * tstr => bstr }, ; named, UNCOMPRESSED dictionaries for dict-codecs
? "meta": any, ; free-form, non-normative metadata
"id" : content-id, ; self-hash of the header content (the chain genesis)
}
codec = {
"name" : tstr, ; "identity" | "gzip" | "zstd" | "lzma2" | "cose-encrypt" | ...
"cls" : "encode" / "compress" / "encrypt",
? "dct": tstr, ; references header "dct" key (dict codecs)
? "p" : any, ; codec parameters (e.g. lzma2 level)
}
The catalog is closed within a file (a frame may
only reference codec-ids the header declares) but open across
the ecosystem (new codecs may be registered by name). The
Header carries its own "id" (self-hash of its content) and
no "prev" — it is the genesis, and the first frame’s
"prev" is the Header’s "id". The Header
"id" MUST equal the BLAKE3-256 of the deterministic CBOR of
the Header map excluding the "id" key; all
other keys (including "meta" and unknown extension keys)
participate. The preimage table in §22 is the single source of truth for
hashed and signed bytes. The optional "layout" key claims a
layout state (§3.3): the only value defined by this revision is
"streamable", which a verifying reader MUST check against
the segment’s actual layout; readers MUST ignore unknown
"layout" values (forward compatibility — an unknown state
imposes no check). Dictionaries are stored uncompressed and
in-band — there is no external-dictionary dependency. A codec’s
"dct" value MUST match a key in the header
"dct" map, and the codec MUST use the corresponding byte
string as its compression/encoding dictionary.
6. Frames
All frames share one envelope:
frame = {
"t" : frame-type, ; discriminator
? "x" : [+ codec-id], ; transform chain, applied in order on encode; default [identity]
? "pub": any, ; CLEARTEXT public envelope (always readable; §9.4)
? "to": [+ recipient], ; recipients, for encrypt-class chains
? "d" : bstr / any, ; payload: bstr when "x" transforms it; structured CBOR otherwise
"prev": content-id, ; the PREVIOUS data item's "id" (chain link; §9.1)
"id" : content-id, ; BLAKE3-256 self-hash of this frame's CONTENT (all keys but "id"/"sig")
? "sig": bstr, ; COSE_Sign1 over "id" (§9.2)
}
frame-type = "terms" / "quads" / "reifies" / "annot" / "blob" / "suppress"
/ "snapshot" / "meta" / "index" / "opaque"
recipient = { "kid": tstr, ? "alg": tstr, * tstr => any } ; key identifier; never the key
Each frame’s "id" MUST equal the BLAKE3-256 of the
deterministic CBOR of its content (every key except "id"
and "sig"; unknown extension keys participate). Each
frame’s "prev" MUST equal the previous data item’s
"id"; the first frame’s
"prev" is the Header’s "id". Because
"prev" is inside the hashed content, each "id"
transitively commits to all prior frames (§9.1). §22 centralizes the
complete preimage and subject rules.
6.1 Payload resolution
To obtain a frame’s logical payload:
- If
"x"is absent, the payload is"d"directly (structured CBOR) — equivalent to a singleidentitytransform; a chain resolving to onlyidentitylikewise leaves"d"unchanged. - If
"x"is present,"d"MUST be a byte string and every codec-id MUST resolve through the header"cat"; apply the reverse of each codec, last to first. Each step requires a capability (§8.3). On any missing capability (unknown codec or missing key), stop and treat the frame as opaque (§7.6). - The fully-decoded bytes are a CBOR item; decode them to the type-specific structure (§7).
6.2 Index frame (optional)
A writer MAY append an index frame — a footer that
accelerates large files without raising the simple-reader floor (a
Baseline Reader ignores it). Because the log is append-only, a fresh
index MAY be appended after more frames; the
last index wins.
index-payload = {
"count" : uint, ; frames covered
"head" : content-id, ; "id" of the last covered frame (truncation anchor)
? "off" : [+ uint], ; byte offset of each frame (random access; parallel verify)
? "ti" : { * frame-type => [+ uint] },; frame indices by type
? "dict" : [+ uint], ; indices of "terms" frames (dictionary locator; §7.7)
? "mmr" : content-id, ; Merkle-Mountain-Range root over frame ids (§9.1)
}
Given "off", a Full Reader dispatches frame-hash
verification across threads and seeks to any frame; given
"dict", a Streaming Reader loads only the dictionary
(§7.7); given "head"/ "mmr", it detects
truncation and produces O(log n) inclusion proofs. A
checkpoint index is simply an index
emitted periodically rather than only as a footer; an earlier
index MAY still serve as a recovery anchor even though the
last intact index is preferred for acceleration. Current
package support and deferrals for off/ti,
dict, mmr, proof verbs, range fetch, and
replication workflows are tracked in GTS-ADVANCED-PRIMITIVES.md.
The mmr root uses a Merkle-Mountain-Range over the
ordered frame ids covered by the index. The index frame itself is not
covered by its own mmr; a later index may cover an earlier
index frame like any other earlier frame. Peaks are built left-to-right
by appending one leaf per frame id and repeatedly merging the two
rightmost peaks while their heights match. The root for
count = 0 is the root preimage with an empty peak list.
All MMR preimages use deterministic CBOR (§4) and BLAKE3-256:
leaf(index, frame_id) =
BLAKE3-256(deterministic-CBOR(["gts-mmr-leaf-v1", index, frame_id]))
parent(parent_height, left_hash, right_hash) =
BLAKE3-256(deterministic-CBOR(["gts-mmr-parent-v1",
parent_height, left_hash, right_hash]))
root(count, peaks) =
BLAKE3-256(deterministic-CBOR(["gts-mmr-root-v1",
count,
[[peak_height, peak_hash], ...]]))
A detached proof JSON object has this stable shape:
{
"schema": "gts-mmr-proof-v1",
"hash": "blake3-256",
"preimage": "gts-mmr-v1",
"count": 4,
"leaf_index": 2,
"frame_id": "<32-byte frame id hex>",
"root": "<32-byte mmr root hex>",
"peak_index": 0,
"peaks": [{"height": 2, "hash": "<32-byte peak hex>"}],
"path": [
{"side": "right", "parent_height": 1, "hash": "<32-byte sibling hex>"}
]
}verify-proof MUST reject a proof unless the peak heights
match count, leaf_index belongs to
peak_index, every hash field is 32 bytes, the path
reconstructs the selected peak, and the peaks reconstruct
root. Proof verification does not require the original
.gts file.
7. Graph data model and fold
A fold is the deterministic projection of an ordered frame log into a value-indexed graph state. Term ids are local compression artifacts used while reading a segment; the exposed file graph is defined by the value-union rules below.
Folded state model (normative). A reader folds each segment into this logical state:
terms: an ordered vector of segment-local term values.quads: a set of asserted RDF quads over term values.reifiers: a partial map from a reifier term value to exactly one quoted triple value.annotations: an ordered multiset of(reifier, predicate, value)rows over term values.blobs: a content-addressed map from BLAKE3 digest to inline bytes or an external content reference.blob_meta: a shallow metadata map per blob digest, built from blob"pub"maps.meta: a shallow segment metadata map, plus a file-level shallow merge (§7.5).suppressions: an ordered list of display/precedence directives (§11).opaque: an ordered list of frames intentionally or necessarily carried without decoded payload semantics (§7.6).signaturesanddiagnostics: ordered reader observations about frames. They are part of the folded GTS state but not part of the RDF dataset.segment_heads,segment_profiles,segment_meta, and segment layout state: the ordered segment ledger needed to preserve cat-append identity, profile requirements, per-segment metadata, and streamable-layout claims.
File fold (normative). A file fold is the ordered value-union of its segment folds: segments are processed in file order; every term id is first resolved within its own segment; then quads, reifier bindings, annotations, blob declarations, metadata, suppressions, opaque nodes, signatures, diagnostics, and segment ledgers are merged by the rules in §7.5 and §7.8. The union MUST NOT compare raw term ids across segments. Cross-segment identity is always value identity.
RDF substrate (normative). GTS imports the RDF 1.2
Concepts and Abstract Data Model Candidate Recommendation Snapshot dated
07 April 2026 for IRIs, blank nodes, literals, RDF datasets, triple
terms, version label "1.2", and rdf:reifies
(§23.1). GTS freezes that substrate for major version 1 unless a later
GTS major version updates this reference. A Baseline Reader need not
implement an RDF parser, query language, entailment regime,
canonicalization algorithm, or RDF 1.2 concrete syntax; it only needs
the term, quad, reifier, annotation, and dataset mapping defined here.
RDF Semantics entailment regimes are not part of the core GTS fold
unless a profile or projection explicitly applies them above the
transport layer.
Value equality (normative). The fold compares values as follows:
| value kind | equality rule |
|---|---|
| IRI | Exact Unicode string equality after CBOR decoding. No percent, case, Unicode, base-IRI, or prefix normalization is applied by core GTS. |
| Literal | Same lexical string, same datatype IRI value after defaulting
(§7.1), and same language-tag string when present. Datatype lexical
canonicalization is not applied; "01"^^xsd:int and
"1"^^xsd:int are distinct transport values. |
| Language tag | Exact string equality in core GTS. Profiles and projections MAY apply language-range matching, preferred BCP 47 casing, or public/private tag translation (§13.1), but that is not term identity. |
| Datatype | Equality of the datatype IRI value, not the local term id that names it. |
| Blank node | Equality is scoped to the blank-node scope plus the non-empty label.
Blank nodes from different segments or nested GTS files are never equal.
A blank node with absent or empty "v" is anonymous: each
term entry is a distinct blank node within its scope. |
| Quoted triple term | Equality of the resolved subject, predicate, and object term values of the quoted triple. Quotation alone does not assert the triple (§7.3). |
| Graph name | Equality of the graph-name term value. An absent graph slot is the default graph and is not equal to any named graph. |
| Blob | Equality of the normalized BLAKE3 digest
(blake3:<hex> or raw digest bytes normalized to that
form); inline byte equality is proved by the digest. |
| Opaque node | Equality of an opaque occurrence is its segment identity plus frame content id. Exact duplicate presentations MAY be collapsed by a display layer, but the fold preserves occurrence order. |
| Metadata | Map keys compare by exact string; values compare by deterministic CBOR data-model equality. The file-level view is a shallow later-wins merge, while per-segment originals remain addressable (§7.5). |
| Suppression | A suppression target first resolves in its own segment and then applies value-wise to the file union (§11). Repeated directives have idempotent display effect but are retained as ordered fold state. |
7.1 Terms (terms frame)
Payload: an ordered array of terms. Ids are assigned
by append order within the current segment dictionary (or within a
snapshot dictionary before it is shifted into the enclosing
segment).
terms-payload = [+ term]
term = {
"k" : 0 / 1 / 2 / 3, ; 0=IRI 1=literal 2=bnode 3=quoted-triple
? "v" : tstr, ; IRI string | literal lexical form | bnode label
? "dt": term-id, ; literal datatype IRI (a term)
? "l" : tstr, ; literal language tag (BCP 47)
? "rf": term-id, ; quoted-triple: the reifier (§7.3) whose triple this term denotes
}
Literal datatype defaulting (normative). For a
k:1 (literal) term: if "l" (language tag) is
present and "dt" is absent, the datatype is
rdf:langString; if both "l" and
"dt" are absent, the datatype is
xsd:string.
Blank-node labels (normative). A k:2
(blank node) term’s non-empty "v" label is local to the
current blank-node scope: the segment for ordinary frames, the snapshot
dictionary for a snapshot, or the nested GTS file for
recursive composition (§12.1). It MUST NOT be treated as a globally
stable identifier and MUST NOT be merged with the same label in another
segment or nested GTS. If "v" is absent or the empty
string, the term is anonymous and denotes a fresh blank node for that
term entry within the scope. Transforms MAY relabel blank nodes while
preserving blank-node isomorphism and scope separation.
7.2 Term-id assignment (normative)
Term ids are unsigned integers assigned in append order, per
segment, starting at 0 at each segment’s header,
and are frozen within their segment: a term minted
while folding frame N keeps its id for the rest of that
segment. A quads, annot, or
reifies frame at position N MUST only reference
term-ids introduced at positions 0..N-1 of the same
segment (such frames introduce no terms of their own). This
makes writing pure-append, reading single-pass, and concatenation sound:
term-ids are compression artifacts, never identity —
cross-segment identity is the term value only (§3.1), exactly
as a snapshot’s dictionary already restarts at
0 (§10). An implementation that applied file-global ids to
a multi-segment file would misfold silently; the boundary rule (§3.1)
and vector 17 (§19) exist to make that failure loud instead.
7.3 Quoted triples and reifiers (reifies frame)
RDF 1.2 lets a triple be the subject or object of another. GTS keeps
quoted triples in the id domain: a reifier is an
ordinary IRI/bnode term; a reifies frame binds it to the
triple it quotes.
reifies-payload = { * term-id => [term-id, term-id, term-id] } ; reifier => (s, p, o)
A quoted triple used as a node is a term with "k": 3 and
"rf" pointing at its reifier.
RDF dataset mapping (normative). A folded GTS graph
maps to an RDF 1.2 dataset as follows: each quads row
(S,P,O,G?) asserts the RDF triple (S,P,O) in
the default graph when G is absent, or in the named graph
G when G is present. A reifies
binding R => (S,P,O) asserts the triple
R rdf:reifies <<( S P O )>> in the default
graph. A k:3 term denotes that triple term, reached through
its reifier R. Each annot row
(R, P', V') asserts the triple R P' V' in the
default graph. Profiles MAY define additional graph-placement
conventions for projection, but the core mapping above is the
interoperable baseline.
Quotation does not imply assertion (normative).
Referencing a triple term, either via a reifier or a k:3
term, does NOT assert the base triple (S P O). The base
triple is asserted iff it also appears in a quads
frame.
RDF 1.1 degradation (informative). RDF 1.1 has no
quoted triple term. A lossy RDF 1.1 projection MAY replace a quoted
triple term with its reifier resource and emit ordinary
reification-style triples such as R rdf:subject S,
R rdf:predicate P, and R rdf:object O, or
carry R rdf:reifies as an extension predicate understood by
the consumer. Such a projection MUST NOT assert (S P O)
merely because the GTS file quoted it, and tooling SHOULD label the
projection as lossy whenever a triple term was present.
7.4 Quads and annotations
quads-payload = [+ [term-id, term-id, term-id, ? term-id]] ; s, p, o, (g; default graph if absent)
annot-payload = [+ [term-id, term-id, term-id]] ; reifier, predicate, value
Statement-level metadata (confidence, validity interval,
standpoint/vantage, modality, …) is expressed as annot rows
on a reifier. Contested claims coexist: several
annot rows on one reifier, or several reifiers over one
(s,p,o), are all retained — none is privileged. Annotations are an
ordered multiset in the folded GTS state: readers MUST preserve row
order within each segment and concatenate segment annotation rows in
file order. Exact duplicate annotation rows are retained in the GTS
fold; an RDF dataset projection MAY collapse identical emitted RDF
triples because RDF datasets are set-valued.
Position constraints (normative). In a
quads row the predicate p MUST be an IRI
(k:0); the subject s MUST be an IRI, blank
node, or quoted triple (k:0|2|3); the object o
MAY be any term; and the graph name g, when present, MUST
be an IRI or blank node (k:0|2) — never a literal or quoted
triple. A reifies triple (S,P,O) obeys the
same subject/predicate/object constraints. In an annot row
the predicate MUST be an IRI.
7.5 Fold algorithm (normative)
result := empty file state
(terms, quads, reifiers, annotations, blobs, blob_meta, meta,
suppressions, opaque, signatures, diagnostics, segment ledger)
for segment in file order: # §3.1; single-segment files: one iteration
verify each frame's id (self-hash) and prev-link within the segment;
record sig status if "sig" present
terms := [] graph := {} reif := {} annot := []
blobs := {} blob_meta := {} meta := {} suppressed := [] opaque := []
diagnostics := []
for frame in segment log order:
P := resolve payload (§6.1); if undecodable -> add opaque node (§7.6); continue
switch frame.t:
"terms" : append each term (assign next id); each "dt"/"rf" MUST name an
already-introduced term-id (no forward references)
"quads" : add each (s,p,o,g) value tuple to graph
"reifies" : bind reifier to (s,p,o), keeping the first non-conflicting binding (§7.8)
"annot" : append (reifier, predicate, value)
"blob" : if "d" present -> blobs[BLAKE3(decoded "d")] := bytes (inline);
else -> register external blob by "pub".digest;
shallow-merge "pub" into blob_meta[digest]
"suppress" : append directive to `suppressed` (display contract; §11)
"snapshot" : load a self-contained fold wholesale (§10)
"meta" : shallow-merge map into segment meta (later keys overwrite earlier)
"opaque" : add explicit opaque node
union segment fold into result BY TERM VALUE # ids resolve locally, never cross segments;
# bnodes keep their scope (§3.1, §12.1)
result
The fold is deterministic: the same intact log yields the same value
state in every conformant reader. Within a segment, meta
accumulates as a shallow union over one map — a later frame’s keys
replace earlier ones; values are not concatenated. Across
segments, each segment’s folded meta is exposed
per segment (keyed by segment head id) AND shallow-merged in file order
into a file-level view — a later segment’s keys win, but the per-segment
originals remain addressable (a third-party segment’s metadata is never
silently absorbed).
7.6 Opaque nodes
When a frame’s payload cannot be decoded — an unknown codec, or a
cose-encrypt codec for which the reader holds no key — the
reader MUST NOT drop it. It MUST add an opaque node to
the graph carrying everything still in cleartext:
opaque-node = {
"id" : content-id, ; the frame's self-hash
"type" : frame-type, ; declared "t"
? "pub" : any, ; the cleartext public envelope, if any
? "to" : [+ recipient], ; declared recipients
"sigstat" : "none" / "valid" / "invalid" / "unverified",
"reason" : "unknown-codec" / "missing-key" / "damaged",
}
Most opaque nodes are produced by a reader at decode time; a writer
MAY also emit an explicit opaque frame (e.g. a redaction
placeholder) whose payload is the structure above, in which case
"sigstat" is omitted (a reader determines it). A
damaged frame (failed self-hash, or absent) is isolated and
folded as an opaque node too (§9.1): a reader MAY surface its cleartext
fields as untrusted diagnostic metadata, but MUST set
"sigstat" to invalid/unverified
and "reason": "damaged" — the bytes are not trustworthy.
The frame still participates in the id/prev chain, so it cannot be
silently stripped.
7.7 Streaming fold and bounded memory
A graph need not be materialised to be transformed. A Streaming Reader (§2.1) processes frames in order and emits to a sink, holding only the term dictionary, the current decoded frame or blob, and the running id/prev and validation state:
gts → duckdb/sqlite(§14) keep the integer-id model: streamtermsdeltas into atermstable andquads/reifies/annotdeltas into id-valued tables, bulk-inserting as frames arrive. No term resolution and no graph materialisation occur — memory is bounded by the dictionary, the largest decoded frame, and validation sidecar state rather than by triples or blobs. The relational join that resolves ids is the engine’s job, later.gts → ttl/nqmust resolve ids to emit text. If the dictionary exceeds memory, the reader uses the index"dict"locator (§6.2) to load (or memory-map, or spill to an on-disk kv) only thetermsframes first, then streams the quads.
Even O(distinct terms + maximum decoded frame size + validation
sidecar state) can exceed memory for pathologically irregular graphs
(e.g. a crawl dumping millions of unique UUID IRIs, or a single very
large inline blob). A Streaming Reader therefore MAY flush its
in-memory dictionary to a temporary on-disk key-value store
when a memory limit is reached, trading RAM for a local spill file;
correctness is unaffected because term-ids are append-order and frozen
(§7.2). The gts → duckdb/sqlite transforms get
this for free — the target table is the spill.
The package-level streaming-sink claim boundary and memory benchmark
helper are maintained in GTS-ADVANCED-PRIMITIVES.md.
A multi-gigabyte log thus transforms to an operating substrate in bounded memory — the resolve-and-materialise OOM failure mode is avoided by construction.
7.8 Duplicates and conflicts (normative)
All duplicate and conflict behavior is defined here so frame handlers do not invent local policy:
| item | fold behavior |
|---|---|
| Duplicate terms | A writer SHOULD intern repeated terms, but each term entry still
receives its own segment-local id. Non-blank values that compare equal
by §7 are the same value in the file union. Anonymous blank nodes
("v" absent or empty) are fresh per term entry. |
| Duplicate quads | The folded graph is a set: identical (s,p,o,g) value
rows collapse to one without diagnostic. |
| Reifier bindings | A reifier SHOULD have exactly one reifies binding.
Repeated identical bindings are harmless. A conflicting binding is a
data-quality error: the reader surfaces ConflictingReifier,
keeps the first binding in file order, and ignores the conflicting
binding for the reifier map. |
| Annotations | Annotation rows are an ordered multiset (§7.4). Multiple rows on one reifier coexist; exact duplicate rows are retained in the GTS fold. RDF dataset projections may collapse identical emitted triples. |
| Blob bytes | Blobs are addressed by digest. Repeating the same digest/bytes is idempotent; a content-addressed view stores one byte value per digest. Validating extraction re-hashes inline bytes against the requested digest (§14.1). |
| Blob metadata | blob_meta[digest] is a shallow map built in file order.
Later metadata keys for the same digest replace earlier keys in the
file-level view; earlier declarations remain in the original
frames. |
| Suppressions | Suppression directives are additive. Repeating an equivalent directive has idempotent display effect but remains present in the ordered suppression list. There is no unsuppress frame (§11). |
| Metadata keys | Segment meta is shallow later-wins; file
meta is the shallow later-wins merge of segment metadata in
file order. Per-segment metadata remains addressable (§7.5). |
| Malformed frames | A frame whose payload cannot be decoded or whose handler cannot safely fold it becomes an opaque node with a diagnostic when recovery is possible (§7.6, §9.1). Surviving later frames are still foldable when item boundaries are known. |
| Unknown structural frame types | A Baseline Reader does not assign graph semantics to an unknown frame type. It MUST preserve chain verification and MAY surface an opaque node or diagnostic; a profile-aware Full Reader MAY interpret the frame. |
| Profile conflicts | Profile declarations and profile requirements union across segments (§3.1, §13). Unsupported profile requirements degrade the affected segment’s unsupported payloads to opaque nodes or profile diagnostics; they do not invalidate core wire-format folding by themselves. |
8. Transform catalog
8.1 Classes
Every catalog entry declares a class:
| class | examples | capability needed to reverse |
|---|---|---|
encode |
identity, base64, base85 |
none (pure function) |
compress |
gzip, zstd, lzma2 |
a codec library |
encrypt |
cose-encrypt0, cose-encrypt |
a key (per recipient) |
8.2 Stacking
"x" is applied in array order on encode and reversed on
decode. Example: [zstd, cose-encrypt] means compress,
then encrypt; a reader decrypts (if keyed) then decompresses.
8.3 Capability model and graceful degradation
Decoding a chain requires every capability it names.
A missing capability is uniform whether it is a library
(unknown-codec) or a key (missing-key): the
frame becomes an opaque node (§7.6). This single mechanism yields
in-file content negotiation — a logical object MAY
appear as several frames in different codecs/formats (e.g. a
high-fidelity representation a reader can’t decode and a
widely-supported fallback it can), and the reader uses the best frame
for which it holds the capabilities.
8.4 Mandatory core set and durability
A Baseline Reader MUST implement identity,
gzip, and zstd — so a conformant reader’s full
dependency set is CBOR + BLAKE3 + gzip + zstd. Writers
targeting maximum longevity SHOULD restrict to the core set.
Density-oriented writers MAY use lzma2 with an in-band
dictionary. All core codecs are stable, widely deployed primitives.
Rsyncable codecs. A compress-class
codec MAY be rsyncable: it periodically synchronizes (resets)
its compression state so that a local change in the uncompressed input
only affects a bounded neighborhood of the compressed output. This
improves delta-transfer tools (e.g. rsync) and
version-control delta compression (e.g. Git packfiles) at the cost of a
small compression-ratio overhead. The only rsyncable codec defined in
this revision is zstd-rsyncable (§8.5).
8.5 Canonical codec registry (v1)
Catalog entries are referenced by integer id within a file (§5), but
each entry’s "name" MUST be a canonical identifier from
this registry so writers interoperate:
| name | cls | baseline? | parameters |
|---|---|---|---|
identity |
encode |
yes | none |
gzip |
compress |
yes | level? |
zstd |
compress |
yes | level?, window?, dct? |
zstd-rsyncable |
compress |
no | block_size: uint (default 65536) |
lzma2 |
compress |
no | level?, dct? |
base64url |
encode |
no | none (unpadded) |
base85 |
encode |
no | none |
cose-encrypt0 |
encrypt |
no | COSE_Encrypt0 (1 recipient) |
cose-encrypt |
encrypt |
no | COSE_Encrypt (n recipients) |
A reader MUST match codecs by canonical "name", not by
catalog id (ids are file-local). Later spec versions register new codecs
by canonical name; an unknown name degrades to an opaque node
(§8.3).
9. Integrity and confidentiality
GTS keeps four integrity concerns distinct:
- Frame integrity — the per-frame BLAKE3 self-hash
"id"(§9.1). - History integrity — the
"prev"content-id chain (§9.1). - Origin / authorship — optional COSE signatures (§9.2).
- Freshness / non-truncation — a head commitment: a
signature over the head
"id", or an index"mmr"/"head"root (§9.1, §13).
The first two are mandatory and key-free; the last two are optional and profile-driven.
9.1 Per-frame self-hash and content-id chain (mandatory)
Each frame’s "id" is the BLAKE3-256 of its own content
(every key except "id" and "sig"), so a frame
is content-addressed and independently verifiable. Each
frame’s "prev" names the previous frame’s
"id"; because "prev" is part of the hashed
content, the chain is a git-style content-addressed list in which the
head id transitively commits to all history.
- Parallel verification. Every
"id"is a hash of a self-contained byte range; with the index"off"table (§6.2) all frame hashes are recomputed concurrently, followed by a trivial O(n)"prev"-equality pass. No accumulating dependency forces single-threaded reading. (The only inherently sequential step is discovering frame boundaries in a bare CBOR sequence — a cheap length-scan the index removes.) - Damage isolation and recovery. A corrupt frame
fails its own
"id", so damage is independently detectable. Recovery of subsequent frames, however, is guaranteed only when their byte offsets are known — from an intactindex"off"table, a checkpoint frame, external framing, or the storage layer. In a bare CBOR Sequence (no per-frame length) arbitrary byte corruption can desynchronise the decoder: a reader with offsets skips the bad frame and folds the survivors (reason: "damaged"), while a reader without offsets MAY be unable to resynchronise past the damage.evidencewriters SHOULD emit periodic checkpoint indexes (§13) so recovery is robust. - Tamper-evidence. Any insertion, reordering, or
mutation breaks a
"prev"link or a self- hash. Truncation (dropping trailing frames) is detected only against a head commitment — a signature over the head"id", the index"head"/"mmr"root (§6.2), or an out-of-band anchor. Opaque frames are part of the chain, so confidential frames cannot be stripped undetectably.
A Merkle-Mountain-Range (MMR) root over the frame ids (optional, carried in the index) is a single whole-file commitment that is itself parallel to compute and supports O(log n) inclusion proofs — proving a frame is in the log without shipping the log.
9.2 Signatures (optional, algorithm-agile)
A frame MAY carry "sig", a COSE_Sign1 (RFC
9052) over the frame’s "id". Because "id" is
the self-hash of the whole content — "pub",
"d" (the ciphertext, if encrypted), and "prev"
(the chain position) — one signature over "id"
binds the public claims to the sealed payload and to
the chain position, and signing the head "id" thereby
anchors all prior history (§9.1). The signing algorithm is declared in
the COSE header (e.g. EdDSA/Ed25519, ES256);
readers MUST honour the declared algorithm. The evidence
and opaque profiles (§13) REQUIRE signatures. Key discovery
and trust anchoring (which keys are authentic, which signers are
authorised) are profile/deployment policy, not core
GTS: sigstat: "valid" means a signature is
cryptographically valid under a resolved key, not that the key
is trusted.
9.3 Encryption (optional)
An encrypt-class codec wraps the payload as
COSE_Encrypt/COSE_Encrypt0. Recipients are
listed in cleartext "to" by key identifier
only — never the key material. Multiple recipients MAY share
one sealed payload (each unwraps the content-encryption key with its own
key). Key escrow, rotation, and revocation are the
issuer’s responsibility and are out of scope; a payload
encrypted to a retired key MAY become permanently opaque.
This draft’s v1 conformance surface implements and tests
COSE_Encrypt0 for one direct recipient. Multi-recipient
COSE_Encrypt envelopes and ECDH key-wrap are deferred
outside v1 until dedicated vectors, key-management policy, and
cross-engine interop tests exist; see GTS-SECURITY-POLICY.md.
The deferred cose-encrypt contract is pinned as a future
Full Reader capability, not as a v1 implementation claim. A future
implementation that claims it MUST consume a CBOR-tagged
COSE_Encrypt envelope whose content encryption uses
A256GCM and whose recipient array contains one entry per
declared recipient in the frame’s cleartext "to" list. The
initially defined key-management mode is ECDH-ES+A256KW:
each recipient entry carries the information needed to derive a
key-encryption key and unwrap the same 256-bit content-encryption key
with A256KW. A conforming future reader tries only
recipients for which it holds the corresponding private key
material.
Deferred failure modes are:
- no held key matches any recipient
kid: emitMissingKeyand preserve the frame as an opaque node withreason:"missing-key"; - ECDH recipient metadata is malformed, the derived key-encryption key
cannot unwrap the content key, or AES-KW authentication fails: emit
KeyWrapFailedand preserve the frame as opaque withreason:"missing-key"; - content-authentication failure after a successful unwrap is a damaged encrypted payload and MUST NOT expose plaintext.
The descriptor vectors in vectors/crypto-deferred/*.json
pin the two-recipient shape and the wrong-key, missing-key, and
failed-unwrap opacity cases until byte-level COSE_Encrypt
vectors replace the placeholders.
9.4 The opacity invariant (normative)
Opacity hides content — never existence, provenance, or position.
For every frame,
{"id", "prev", "t", "x", "to", "pub", "sig"} MUST remain in
cleartext (the transform chain "x" is cleartext so a reader
knows which codecs to reverse). A reader without the relevant key
therefore still learns that the frame exists, what
kind it is, who it is sealed for, who signed it,
and where it sits in the chain. This is what makes selective
disclosure safe: a holder can carry — and a verifier can authenticate
the position of — data neither can read.
10. Compaction
Compaction folds a log and re-emits it as a single self-contained
snapshot frame (re-interned dictionary, deduplicated quads,
dropped self-loops, optionally a materialised entailment closure). A
snapshot’s payload is a self-contained graph fold — terms,
quads, reifiers, annotations, inline blobs, and meta:
snapshot-payload = {
"terms" : terms-payload,
? "quads" : quads-payload,
? "reifies": reifies-payload,
? "annot" : annot-payload,
? "blobs" : { * digest => bstr }, ; inline content-addressed blobs
? "meta" : any,
}
A reader folds a snapshot exactly as it would fold the
equivalent sequence of terms/quads/
reifies/annot/blob frames;
term-ids restart at 0 within the snapshot’s own dictionary.
Compaction is lossy by definition: it discards the
original per-frame signatures and the temporal stacking of the log. A
compactor:
- MUST record the provenance of the fold (source log digest, time, agent) as quads in the snapshot, and
- SHOULD emit a fresh signature over the snapshot.
Two artifact classes follow: an evidentiary log
(append-only, signed, never compacted) and a distribution
snapshot (compacted, dense, lossy — ideal for shipping). A
reader can tell which it holds from the profile and the presence of a
snapshot frame.
10.1 Streamable compaction (ordering-only)
Streamable compaction converts an accretive segment (or multi-segment file) into one delivery-ordered segment in the streamable layout state (§3.3). Unlike snapshot compaction above, it is a re-authoring of the ORDERING, and only the ordering: the folded graph, the inline blobs, and every content-addressed fact are preserved. Three signature subjects behave differently under the rewrite, and a compactor MUST honour all three:
- Content signatures (subject = a content digest: a blob’s BLAKE3, a statement or claim hash — “this is true, signed by Bob”) are ordinary quads/annotations about digests. They are compaction-invariant and survive fully intact: nothing they attest to has changed.
- Frame signatures (a COSE_Sign1 over a frame
"id", which commits to"prev", §9.2) become detached, not broken: they verify against the original frame id forever. A compactor MUST carry every source frame signature in compaction provenance — onestream:DetachedSignaturenode per signature, recording the original frame id (stream:sourceFrame) and the original COSE bytes (stream:cose), plus onestream:sourceHeadper source segment head (§13.3) — so each remains a checkable claim about the original log. - Ordering commitments (a signed head, an index
"mmr"root) are the only layout-bound attestations. They cannot survive a reordering; the compactor re-issues the ordering commitment (the new trailingindexwith its"head", §6.2) and thereby becomes the sole attester of the new ordering. A compactor MAY additionally COSE-sign the new head.
A compactor MUST record the rewrite itself as provenance quads in the
output — a stream:Compaction node carrying the acting tool
(stream:agent), the time (stream:timestamp),
and the source segment heads (stream:sourceHead) — the §10
provenance MUST, given concrete vocabulary by §13.3.
Profiles demanding pristine third-party chain
attestation. For an evidence segment the original
signed chain is the artifact; a compactor MUST refuse it —
unless it seals the original log verbatim as a nested
GTS blob (§12.1) inside the streamable rewrite (role
"source", referenced from the provenance node via
stream:sealedSource). The original bytes, chain, and
signatures stay byte-intact and independently verifiable inside; the
outer layout is delivery-ordered; one content digest binds them.
Refusals for publication tools (§14.1). A compactor
MUST refuse: input that does not verify cleanly (any diagnostic); and
input whose fold carries a frame-addressed suppression
(kind: "frame", §11) — the rewrite assigns new frame ids,
so a frame-digest target would silently dangle. Digest-addressed
blob suppressions are carried forward verbatim
(content-addressing is layout-independent); id-addressed suppressions
are carried forward value-wise (§11).
11. Suppression (additive “deletion”)
GTS never physically deletes. To retract or hide prior content, a
writer appends a suppress frame referencing the superseded
subgraph or frame digest. The suppressed bytes remain present and
hash-linked; suppression is a display/precedence
contract, interpreted by the consumer, not an erasure. This
preserves a complete, tamper-evident history.
suppress-payload = { "targets": [+ suppress-target], ? "reason": tstr, ? "by": term-id }
suppress-target =
{ "kind": "frame", "id": digest } / ; a frame, by its "id"
{ "kind": "blob", "digest": digest } / ; a content-addressed blob
{ "kind": "term", "id": term-id } / ; a term + quads it appears in
{ "kind": "quad", "q": [term-id, term-id, term-id, ? term-id] } / ; one specific quad
{ "kind": "reifier", "id": term-id } ; a reifier + its annotations
Suppression is monotonic and additive: a matched
target is hidden from default resolution (a term target
also hides every quad in which the term appears); the bytes remain
present and hash-linked, and a consumer MAY surface suppressed content
explicitly. There is no un-suppress in v1 — later frames may add further
suppressions, and a later identical assertion does not revive a
suppressed target.
Cross-segment suppression (normative, §3.1).
Digest-addressed targets (frame, blob) are
file-global: a content-id names the same bytes wherever they sit, so a
later segment MAY suppress an earlier segment’s frame or blob by digest.
Id-addressed targets (term, quad,
reifier) carry term-ids, which are segment-local — they are
first resolved to term values within the suppress frame’s own
segment, and the suppression then applies value-wise to
the whole union fold: a quad target hides every
matching (s,p,o,g) value tuple in any segment, and a
term target hides the term value (and the quads it appears
in) file-wide. This is what lets an appended belief-revision segment
suppress a statement made by an earlier segment without rewriting a byte
of it — the earlier segment’s record stays present, signed, and
hash-linked (content-addressed at the wire level).
12. Binary and content-addressing
; a `blob` frame carries raw bytes in "d" (subject to "x"); its metadata lives in cleartext "pub":
blob-pub = { ? "mt": tstr, ? "rep": tstr, ? "digest": digest-ref }
; INLINE blob -> "d" present; digest = BLAKE3(decoded "d").
; EXTERNAL blob -> "d" absent; "pub".digest names bytes held elsewhere.
- A
blobframe’s bytes are addressed by their BLAKE3-256 digest — for an inline blob theBLAKE3of the decoded"d", for an external blob"pub".digest; the graph references the blob by that digest. Identical bytes appearing twice are stored once by convention. - A blob MAY be inline (bytes present, a self-contained package) or external (only the digest appears in the graph; bytes live elsewhere).
- A logical object MAY have multiple representations
(
"rep"/"mt"distinguishing, e.g., a master and a widely-supported fallback) — see content negotiation, §8.3. - Transforming to a text format (§14) externalises inline blobs to a sidecar directory.
12.1 Nested GTS (recursive composition)
A blob whose media type is
application/vnd.blackcat.gts+cbor-seq is itself a complete
GTS file. Because a payload after transform reversal is opaque bytes,
any frame payload MAY carry a nested GTS, wrapped in
any transform chain — [zstd], [cose-encrypt],
or both. The normative carrier is a blob whose
"pub".mt is
application/vnd.blackcat.gts+cbor-seq.
- Fold semantics. A Full Reader MAY recurse: decode the blob (subject to §6.1 capability rules), then fold the inner bytes as an independent GTS, exposing its result as a subgraph the parent graph references by the blob’s digest. A Baseline Reader MAY treat a nested GTS as an ordinary blob (no recursion).
- Blank-node scope. The inner GTS has an independent blank-node scope. If a Full Reader exposes the inner fold beside the parent fold, it MUST relabel or scope inner blank nodes so labels cannot collide with the parent or with sibling nested GTS files.
- Independent integrity. The inner GTS has its own header, id/prev chain, and signatures. The outer chain proves the nested blob is present and intact at its position; the inner chain proves the nested log is intact. The two guarantees compose but do not depend on each other.
- Composed opacity. If the nested GTS is reached
through an
encrypt-class transform and the reader lacks the key, the entire subgraph — including its inner header — is an opaque node (§7.6): the holder can carry and prove the position of a whole sealed graph it cannot read. This is the matryoshka case (“a whole GTS inside an encrypted field”). - Bounded recursion. Readers MUST enforce a maximum nesting depth and total decoded-size budget (§18).
This composition needs no new frame type: nesting is “a blob that
happens to be a GTS.” The v1 Full Reader helper and negative security
vectors for recursion limits are tracked in GTS-SECURITY-POLICY.md.
13. Profiles
A profile is a named set of conventions over the one format, declared
in the segment header "prof" field. Profiles can define
vocabulary expectations, validation rules, trust policy, capability
requirements, and publication workflows, but they sit above the core
wire format.
The status values used by this specification are:
- core-required: part of baseline wire-format, reader, or writer conformance.
- optional-standard: specified here as interoperable infrastructure, but not required for a Baseline Reader or Writer.
- experimental: described for early interoperability; details can change without changing core GTS.
- domain-specific: owned by an application or downstream community, not by core GTS.
| profile or family | status | profile-level meaning | core impact |
|---|---|---|---|
generic |
core-required default | Any conformant log with no extra profile validation. | None; this is the absence of profile-specific requirements. |
dist |
optional-standard | A compacted distribution snapshot: vocabulary,
definitions, and materialised closure. |
None. |
evidence |
optional-standard | Append-only custody chain; profile validators require signatures and a head commitment. | None; signatures remain optional in core GTS. |
opaque |
optional-standard | Selective-disclosure convention over encrypt-class
frames, signatures, and pseudonymous kids. |
None; encryption remains optional in core GTS. |
bundle |
optional-standard | A GTS whose blobs are themselves GTS files
(mt: application/vnd.blackcat.gts+cbor-seq), using
§12.1. |
None. |
files |
optional-standard | A portable file-tree archive profile defined in §13.2 and §14.2. | None; baseline readers fold its graph normally. |
stream |
optional-standard | Streaming vocabulary and publication layout support used by §3.3 and §10.1. | None; layout checks are reader/tool diagnostics, not new frame grammar. |
image |
experimental | Blob representations plus descriptive metadata and analysis frames. | None. |
ai-package |
experimental | A concept plus logic, observations, opinions, refuted claims, embeddings, and data. | None. |
music-package |
domain-specific | GMEOW music transport conventions; informative here, specified by the downstream profile. | None. |
| GMEOW distribution profiles | domain-specific | Downstream GMEOW package conventions layered on GTS distribution artifacts. | None. |
agent-memory |
domain-specific | Application conventions for memory, belief revision, suppression, and provenance. | None. |
Profiles constrain conventions, not the wire format; a
generic reader reads all profile declarations it can parse.
A reader that does not implement a named profile still parses, verifies,
and folds the file under its reader class, then reports the unsupported
profile in diagnostics or metadata. In a multi-segment file each segment
declares its own profile; the file’s effective requirement set is the
union (§3.1).
The evidence profile requires a head commitment (§9,
item 4) at profile level, and writers SHOULD emit a checkpoint
index at least every 1024 frames or 64 MiB, whichever comes
first, so a damaged log recovers robustly (§9.1). That requirement does
not make signatures, indexes, or evidence-profile support mandatory for
baseline GTS.
Profile-policy configuration. A profile-aware verifier MAY accept a deployment trust policy beside the GTS bytes. The v1 policy document is JSON or YAML with these fields:
trusted_signers:
- did:example:issuer
require_trusted_signer: true
pseudonymous_kid_pattern: "^anon:[0-9a-fA-F]{32,}$"trusted_signers lists signer kid values
authorized by the deployment. require_trusted_signer makes
profiles that require signatures fail unless at least one valid
signature is from a trusted signer.
pseudonymous_kid_pattern controls the high-privacy
recipient-id shape for the opaque profile. These settings
are profile/deployment policy only: they do not change core GTS parsing,
folding, frame ids, signature preimages, or baseline reader
validity.
Third-party profile registration template. A third-party profile definition SHOULD publish:
- Stable profile name used in the header
"prof"field. - Owner, change-control process, contact URI, and specification URI.
- Status (
experimental,optional-standard, ordomain-specific) and intended compatibility policy. - Vocabulary namespace IRIs, term shapes, and any profile-specific validation rules.
- Required codecs, keys, signature algorithms, trust anchors, or deployment assumptions.
- Failure taxonomy: which violations are errors, warnings, or informative diagnostics for a profile-aware tool.
- Interaction with segments,
catcomposition, suppression, compaction, and nested GTS blobs. - Conformance vectors, including unsupported-profile behavior for baseline readers.
- Security and privacy considerations.
A profile definition MUST state that it does not change header/frame grammar, segment-boundary detection, content-id or signature/hash preimages, transform-catalog resolution, or the core fold semantics in §7. New profile behavior must be expressed as graph vocabulary, existing frame types, transform capabilities, metadata, or profile-aware validation rules.
The registry change policy, reserved namespaces, and
optional-standard profile promotion process are maintained in GTS-GOVERNANCE.md.
13.1 Language-tag discipline (profile-level normative)
This subsection defines a profile/projection-writer rule, not a Baseline Reader requirement.
A producer’s graph payload MAY carry internal private-use
language tags (e.g. GMEOW’s x-gmeow-*): the
payload of a dist or ai-package segment
is the canonical form, and canonical forms keep their internal
tags. Every projection section — docs blobs, derived
views, down-projected representations, anything generated for an
external consumer — MUST carry public BCP 47 tags
only; a producer that leaks private-use tags into a projection
section MUST fail at write time, not warn (vector 20). The boundary is
per role, not per file: one package legitimately carries a
canonical payload with internal tags beside public-tagged docs sections.
(This mirrors the GMEOW generator framework’s internal-tag leak gate;
the reference producer reuses its retag machinery at the
section boundary.)
13.2 The files profile (optional-standard)
The files profile is an optional-standard,
content-addressed archive of a file tree. It is the GTS answer to tar’s
c/x/d: pack a directory into a
single-segment GTS, unpack it later, and diff it against a
directory without byte comparison. The rules below are profile-level
conformance requirements for files writers and validators;
a Baseline Reader folds the graph without implementing archive
tooling.
Namespace. The profile owns a small, spec-defined
vocabulary at https://w3id.org/gts/files# (prefix
files). GTS independence means an unpacker MUST NOT require
GMEOW, schema.org, or any other ontology to read the archive; the
vocabulary is authored in the spec and carried as literal IRIs in the
graph.
| term | IRI | shape |
|---|---|---|
FileEntry |
https://w3id.org/gts/files#FileEntry |
Class. One archived file. |
path |
https://w3id.org/gts/files#path |
Relative path string, / separators, no leading
/, no .. components. |
digest |
https://w3id.org/gts/files#digest |
blake3:<hex> content digest of the file
bytes. |
size |
https://w3id.org/gts/files#size |
Byte size as xsd:integer. |
mode |
https://w3id.org/gts/files#mode |
POSIX permission bits as a decimal xsd:integer (for
example, 420 for 0o644). File-type bits are
not recorded. |
modified |
https://w3id.org/gts/files#modified |
Modification time as xsd:dateTime in UTC. |
mediaType |
https://w3id.org/gts/files#mediaType |
Declared IANA media type string. |
Quad shape. Each file in the archive is described by
one blank-node FileEntry:
_:entry a files:FileEntry ;
files:path "relative/path.txt" ;
files:digest "blake3:<hex>" ;
files:size 1234 ;
files:mode 33204 ;
files:modified "2026-06-10T20:00:00Z"^^xsd:dateTime ;
files:mediaType "text/plain" .
Determinism. A files archive MUST be
byte-reproducible for the same input tree:
- Paths are sorted lexicographically by their UTF-8 byte sequence before emission.
- Stored paths use
/separators and MUST be non-empty relative paths. Writers, unpackers, and diff tools MUST refuse absolute paths, Windows drive-relative paths,..components,.components, empty components, and backslash separators before touching file bytes. - Modification times are normalised to UTC and serialised as
xsd:dateTimewith second precision. Fractional seconds MUST be truncated before emission. - Only POSIX mode and mtime are recorded; ownership, uid/gid, xattrs, and ACLs are deliberately excluded — they are tar’s portability tarpit.
- Symlinks are deliberately excluded.
packanddiffMUST refuse symlink entries rather than following them;unpackMUST also refuse any destination escape through an existing symlink below the output directory.
Inline and external blobs. A file’s bytes MAY be
carried as an inline blob frame ("d" present,
digest = BLAKE3(decoded "d")) or as an external blob
("d" absent, pub.digest names bytes held
elsewhere, §12). Identical bytes appearing under multiple paths are
stored once by convention. The standard gts pack command
emits inline blobs only. Implementations MAY add an explicit
external-blob mode, but it MUST be opt-in and documented.
gts unpack MUST refuse an unsuppressed
FileEntry whose inline blob is absent;
gts diff MAY compare by files:digest without
fetching external bytes.
Suppression. A blob-targeted suppression (§11) hides
matching file bytes from default extraction. gts unpack and
gts extract skip/refuse suppressed blobs by default and
expose an explicit --include-suppressed override when the
operator intentionally wants retained history.
Relationship to other vocabularies. The profile is
deliberately self-contained, but the terms align by reference to common
surface vocabularies: files:size ↔︎ schema.org
contentSize, files:mediaType ↔︎ schema.org
encodingFormat, files:modified ↔︎ NFO
fileLastModified, files:path ↔︎ NFO
fileName. These alignments live in GMEOW’s mapping DSL; the
files profile itself does not depend on them.
13.3 The stream vocabulary (optional-standard)
The streamable layout state (§3.3) and streamable compaction (§10.1)
use a small, optional-standard vocabulary at
https://w3id.org/gts/stream# (prefix stream) —
the same independence decision as the files profile
(§13.2): no GMEOW or external ontology is required to stream a photo
archive; the terms are authored here and carried as literal IRIs in the
graph. The vocabulary is deliberately distinct from files#
(the two compose: a files archive that is also streamable
describes each file once as a files:FileEntry and once as a
stream:Manifestation — the profile check (§14.1) and the
layout check (§3.3) stay independent).
Streaming-index terms — one
stream:Manifestation per promised blob, emitted in the
leading streaming index before any blob frame (§3.3):
| term | IRI | shape |
|---|---|---|
Manifestation |
https://w3id.org/gts/stream#Manifestation |
Class. One blob this segment promises to deliver. |
digest |
https://w3id.org/gts/stream#digest |
blake3:<hex> content digest — the IOU the blob
redeems. |
mediaType |
https://w3id.org/gts/stream#mediaType |
Declared IANA media type (mirrors the blob’s
pub.mt). |
size |
https://w3id.org/gts/stream#size |
Byte size of the decoded blob as xsd:integer. |
role |
https://w3id.org/gts/stream#role |
Delivery role string: "preview" /
"primary" / "source"; open set. |
order |
https://w3id.org/gts/stream#order |
Intended delivery position among the segment’s blobs,
xsd:integer, 0-based. |
Compaction-provenance terms — the concrete vocabulary for §10/§10.1’s provenance MUST:
| term | IRI | shape |
|---|---|---|
Compaction |
https://w3id.org/gts/stream#Compaction |
Class. One rewrite event (a blank node). |
agent |
https://w3id.org/gts/stream#agent |
The acting tool, a string (e.g. "gts-compact"). |
timestamp |
https://w3id.org/gts/stream#timestamp |
Rewrite time as xsd:dateTime in UTC. |
sourceHead |
https://w3id.org/gts/stream#sourceHead |
blake3:<hex> head id of one source segment;
repeated per segment. |
sealedSource |
https://w3id.org/gts/stream#sealedSource |
blake3:<hex> digest of the nested-GTS blob
holding the verbatim original (§10.1). |
DetachedSignature |
https://w3id.org/gts/stream#DetachedSignature |
Class. One carried-over frame signature (a blank node). |
sourceFrame |
https://w3id.org/gts/stream#sourceFrame |
blake3:<hex> original frame "id" the
COSE signature verifies against, forever. |
cose |
https://w3id.org/gts/stream#cose |
The original COSE_Sign1 bytes, base64url (unpadded) literal. |
Quad shape (a compacted segment’s streaming index, then provenance):
_:m0 a stream:Manifestation ;
stream:digest "blake3:<hex>" ;
stream:mediaType "image/webp" ;
stream:size 20480 ;
stream:role "primary" ;
stream:order 0 .
_:c a stream:Compaction ;
stream:agent "gts-compact" ;
stream:timestamp "2026-01-01T00:00:00Z"^^xsd:dateTime ;
stream:sourceHead "blake3:<hex>" .
_:s0 a stream:DetachedSignature ;
stream:sourceFrame "blake3:<hex>" ;
stream:cose "<base64url>" .
Claim coupling (normative). Use of
stream# terms in a segment that does NOT claim
"layout": "streamable" is a warning, not
an error (§14.1): provenance quads legitimately survive
gts → nq → gts round trips and re-accretion after appends.
The error class is reserved for the opposite rot — a claimed layout the
bytes contradict (§3.3).
13.4 Domain profile example: music-package
(informative)
This subsection is an informative example of a domain-specific
profile. A Baseline Reader, Writer, or verifier is not required to
implement GMEOW vocabulary, music-domain rules, notation projection
rules, or the music-package validator to be conformant with
core GTS.
The music-package profile can be defined as a
single-segment GTS that carries frame-relative musical content: a
MusicalWork/MusicalExpression, its
Voices and MusicalSegments,
TuningSystem and MusicalTimeFrame reference
frames, atomic ToneEvents, DegreeOfFreedom
declarations, and standpoint-indexed analysis claims. It is the
canonical transport form for the GMEOW music slice and the input to
notation projections.
Namespace. The profile reuses the GMEOW music
vocabulary (https://blackcatinformatics.ca/gmeow/). A
music-package is not required to be a dist
profile: it may carry only the musical content graph plus any projection
blobs, and it MAY rely on an external dist snapshot for
vocabulary definitions.
Header. A music-package segment
declares "prof": "music-package". The profile can be
append-only for new claims; existing triples are not deleted, only
superseded by statement-layer provenance (§7.3).
Example quad shape. A minimal package can contain:
@prefix gmeow: <https://blackcatinformatics.ca/gmeow/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
:piece a gmeow:MusicalExpression ;
gmeow:hasVoice :voice1 .
:voice1 a gmeow:Voice ;
gmeow:voiceTuningFrame :tuning12EDO ;
gmeow:voiceTimeFrame :timeGrid .
:tuning12EDO a gmeow:TuningSystem .
:timeGrid a gmeow:MusicalTimeFrame .
:event1 a gmeow:ToneEvent ;
gmeow:segmentOf :voice1 ;
gmeow:toneEventPitchValue :pitchC4 ;
gmeow:segmentSpan :span1 .
:span1 a gmeow:MusicalTimeSpan ;
gmeow:hasMusicalTimeFrame :timeGrid ;
gmeow:timeStartNumerator 0 ;
gmeow:timeStartDenominator 1 ;
gmeow:timeDurationNumerator 1 ;
gmeow:timeDurationDenominator 4 .
Time and pitch are frame-relative:
toneEventPitchValue points to a PitchValue
interpreted under the event’s voice tuning frame, and offsets/durations
are rational values interpreted under the voice time frame.
Projections. A music-package can
contain blob frames whose bytes are down-projected
representations (MusicXML, MEI, ABC, LilyPond, Humdrum **kern, MIDI,
Scala .scl, tablature, mensural, graphic notation). A
music-package profile validator can require each projection to be
accompanied by a declared-loss manifest that lists the
NotationProjectionProfile used, the
MusicalParameters it can represent, and the
ProjectionLosses it incurs. The manifest can be a Turtle
sidecar or an embedded header/comment and is considered part of the
projection, not the canonical graph.
Bundle profile coupling. A bundle
profile (§12.1) whose blobs are music-package segments
provides the multi-movement / multi-version transport case. Each nested
segment keeps its own profile declaration; the outer bundle does not
impose additional conventions.
Verification. A music-package-aware verifier can
check that every NotationSystem referenced by a projection
blob has a corresponding NotationProjectionProfile, and
that the profile accounts for every MusicalParameter
declared in the music slice (no silent omissions). Baseline
gts verify is not required to implement this profile; it
can report the unsupported profile without failing wire-format
validity.
14. Transforms out
Transforms convert GTS to operating substrates. Each is a thin shim over the folded tables — no RDF text parser is involved.
gts → nquads/gts → turtle— serialisequads+reifies/annot(the latter as RDF 1.2 reification). Inline blobs are externalised to./blobs/<blake3>.bin, and the graph’s digest references resolve to those paths. Opaque frames serialise as their opaque-node descriptions.gts → duckdb/gts → sqlite— bulk-load the four tables (terms,quads,reifies,annot) plus ablobstable; create the indexes appropriate to the engine. This is a near-mechanical load because the GTS tables already match the relational shape.
Each transform SHOULD be verifiable by round-trip
equivalence: for fully-decodable frames,
gts → nq → gts MUST yield the same folded graph (modulo
blank-node labelling and deterministic CBOR re-encoding). Opaque nodes
are excluded — they serialise as opaque-node descriptions and re-import
as ordinary quads, not as opaque frames.
14.1 Composition tooling requirements (normative for conformant tools)
This section defines tool conformance only. A Baseline Reader or Writer need not ship these CLI verbs, transform targets, archive commands, or publication policies to be core-conformant. Profile-aware tools enforce only the profile validators they claim to support; unsupported profiles are surfaced as diagnostics or metadata unless the user explicitly requested that profile’s validation.
Raw cat always works (§3.1); a conformant
validating composer (gts cat) and verifier
(gts verify) add the refuse-don’t-trust posture:
gts catMUST refuse degenerate inputs: an input that is not a valid GTS, a segment whose fold yields zero quads and zero blobs (almost always a wiring bug, never a real package), or an output in which a suppress-only segment would hide every prior frame. Publish-class tools never trust a pathological state to be intentional.gts verifyMUST check declared-vs-computed requirements for supported profiles: a segment whose graph uses a supported profile’s vocabulary without declaring the profile is an error; a declared-but-unused supported profile is a warning. Declarations a tool reads (the CLI dependency report, §13) must not be able to rot against the content they describe.gts verifySHOULD report per-segment: head id, signer set, profile, term/quad counts, opaque-node count with reasons — the composition ledger of the file.gts verifyMUST check the layout claim (§3.3): a segment claiming"layout": "streamable"whose covered region violates delivery ordering, or whose index footer is missing or contradicts the frames it covers, is an error (StreamableLayoutError, §2.3);stream#vocabulary in an unclaimed segment is a warning (§13.3).gts infoandgts verifySHOULD report the streamable boundary of a claimed segment — “streamable through frame N, accretive tail of M frame(s)”.gts compact --streamable <in> -o <out>is the layout rewrite (§10.1). It MUST refuse input that does not verify cleanly, input carrying frame-addressed suppressions, andevidenceinput without the seal-the-original option (--seal-original, §10.1); it MUST emit a single claimed segment in the normative streamable shape (§3.3) with compaction provenance and detached signatures (§13.3), and its output MUST be byte-deterministic for the same input and parameters (blobs ordered by ascending decoded size, ties broken by ascending digest; the rewrite timestamp is a parameter, not ambient time).- Deterministic graph authoring mode is the
reproducible-build writer surface for a folded graph. It emits one
ordinary segment and MUST remap local term ids before writing: terms are
sorted by semantic value (IRI string; literal lexical form plus
effective datatype IRI plus language tag; blank-node label, with
anonymous blank nodes using their input occurrence as a tiebreaker;
quoted triple resolved to its subject/predicate/object value). It then
emits authorable frames in this fixed order:
terms,quads,reifies,annot,blob,meta,suppress. Quads, reifier bindings, annotations, blobs, metadata keys, and suppression frames are sorted by the remapped deterministic-CBOR representation. The mode does not replay reader observations (opaque, signatures, diagnostics, or segment ledgers); publication tools that need to preserve those observations must use a profile-specific rewrite such as streamable compaction or seal the original bytes as evidence. - Blob extraction is verification, never conversion
(
gts ls,gts extract): blobs are addressed by content digest (frame indices are physical accidents that shift undercat); extraction re-hashes the bytes against the requested digest; a blob suppressed by digest (§11) is refused by default (suppression is a display contract and extraction is display) with an explicit override; a media-type flag is an assertion against the blob’s declaredpub.mt— a validating publication tool refuses a mismatch rather than transcoding.
14.2 Archive tooling (files profile)
The files profile adds three validating publication
commands. They share the refuse-don’t-trust posture of §14.1: raw byte
operations are always valid GTS, but a tool refuses pathological states
rather than trusting them to be intentional.
gts pack <dir|file>... -o out.gtsProduce a single-segment GTS whose header declares"prof": "files". Each argument is archived: a file is added under its basename; a directory is added recursively. The resulting archive contains, in order, thetermsandquadsdescribing everyfiles:FileEntry, followed by the inlineblobframes for the file contents. The command MUST refuse:- inputs that contain unsafe stored paths: absolute paths,
drive-relative paths,
..,., empty components, or backslash separators; - symlinks;
- inputs that are not readable or that disappear during the walk.
- inputs that contain unsafe stored paths: absolute paths,
drive-relative paths,
gts unpack <archive> [-C dir]Write everyfiles:FileEntryin the archive to the destination directory (default current working directory). The command MUST:- refuse to write outside the destination directory (
.., absolute paths, or symlinks that escape it); - re-hash each written file and verify it matches
files:digest; - restore the file’s declared modification time and permissions (subject to the host OS);
- skip entries whose digest is suppressed (§11) by default, with an
explicit
--include-suppressedoverride.
- refuse to write outside the destination directory (
gts diff <archive> <dir>Compare the archive’sfiles:FileEntryset to the current state of<dir>by content digest. Report added, removed, and modified paths. Exit0if the directory matches the archive exactly; exit1if any path differs or if the input is refused. No byte comparison is needed: content addressing makes the operation O(read) on the directory.
Archive workflow comparison.
| workflow | usual table of contents | GTS files profile behavior |
|---|---|---|
tar |
Header records are interleaved with file bytes; path and metadata interpretation is tool policy. | The manifest is RDF quads, so path, digest, size, mode, mtime, and media type are queryable before or beside blobs. |
zip |
Central directory enables random access but is a rewrite-oriented footer. | GTS stays append-only; optional indexes accelerate access without making the footer the archive identity. |
| BagIt-style package | Payload files plus sidecar manifests/checksums. | The graph-native manifest and content bytes travel in one verifiable CBOR Sequence; external blobs remain content-addressed when used. |
The value proposition is not compression ratio. Use a compression
transform or an outer transport when size dominates. The
files profile is for graph-native manifests,
digest-addressed deduplication, append composition, and consistent
safety policy across engines.
15. Worked examples
CBOR is shown in diagnostic notation (RFC 8949 §8).
Hashes/signatures are elided as h'…'.
15.1 Minimal distribution snapshot (dist)
55799( / self-describe magic /
{ "gts": "GTS1", "v": 1, "prof": "dist",
"cat": { 0: {"name":"identity","cls":"encode"},
4: {"name":"zstd","cls":"compress"} },
"id": h'…header.id…' }
)
{ "t": "terms", "prev": h'…header.id…', "id": h'…terms.id…',
"d": [ {"k":0,"v":"https://example.org/Cat"}, / id 0 /
{"k":0,"v":"http://www.w3.org/2000/01/rdf-schema#label"}, / id 1 /
{"k":1,"v":"Cat","l":"en"} ] } / id 2 /
{ "t": "quads", "prev": h'…terms.id…', "id": h'…', "x": [4],
"d": h'…zstd([[0,1,2]])…' } / Cat rdfs:label "Cat"@en /
Term 2 is a literal with a language tag and no "dt", so
its datatype is rdf:langString (§7.1).
15.2 Evidence: image + signed accrual (evidence)
{ "t": "blob", "prev": h'…header.id…', "id": h'…',
"pub": {"mt":"image/jp2"}, "d": h'…image bytes…', / digest = blake3(d) /
"sig": h'COSE_Sign1 by did:photographer' }
{ "t": "annot", "prev": h'…blob.id…', "id": h'…',
"d": [[10,11,12]], / reifier 10: capturedAt … /
"sig": h'COSE_Sign1 by did:photographer' }
{ "t": "annot", "prev": h'…prev.id…', "id": h'…', / later custody transfer, separate signer /
"pub": {"event":"custody-transfer"},
"d": [[13,11,14]], "sig": h'COSE_Sign1 by did:evidence-clerk' }
Nothing is rewritten; every accrual is hash-linked and independently signed.
15.3 Notary: partially-opaque frame (opaque)
{ "t": "annot", "prev": h'…prev.id…', "id": h'…',
"pub": { "claim": "I hereby notarized this document.",
"notary": "did:notary:jane", "ts": "2026-06-09T12:00:00Z" },
"x": [4, 7], / 7 = cose-encrypt /
"to": [ {"kid":"anon:7f3a…","alg":"ECDH-ES+A256KW"} ], / pseudonymous kid (opaque profile, §18) /
"d": h'COSE_Encrypt(verified ID record + provenance)',
"sig": h'COSE_Sign1 by did:notary:jane' }
Anyone verifies the public notarization and its signature; only the
court key decrypts the sealed record; the signature binds the two
(§9.2). A reader without the court key folds this to an opaque node with
reason:"missing-key", pub intact,
sigstat:"valid".
15.4 Graceful degradation (image, content
negotiation)
{ "t": "blob", "prev": h'…', "id": h'…', "pub": {"mt":"image/vnd.djvu","rep":"master"}, "x":[9], "d": h'…' }
{ "t": "blob", "prev": h'…', "id": h'…', "pub": {"mt":"image/jpeg","rep":"fallback"}, "d": h'…' }
A reader lacking codec 9 (djvu) folds the master to an
opaque node and uses the JPEG fallback — both are present, both are
hash-linked.
15.5 Matryoshka: a whole signed GTS sealed inside a frame
(bundle / opaque)
{ "t": "blob", "prev": h'…', "id": h'…',
"pub": { "rep": "sealed-evidence-graph", "mt": "application/vnd.blackcat.gts+cbor-seq" },
"x": [4, 7], / zstd then cose-encrypt /
"to": [ {"kid":"did:court:registry"} ],
"d": h'COSE_Encrypt( zstd( <a complete, independently-signed GTS file> ) )' }
Without the court key this folds to one opaque node — a whole subgraph the holder carries but cannot read, yet whose presence and position the outer chain proves. With the key, a Full Reader recurses (§12.1) and folds the inner GTS — header, chain, signatures and all — into a verifiable subgraph.
16. Media type and HTTP serving contract
GTS files are published artifacts. This section defines deployment conformance: a locally stored GTS file can be wire-format valid, reader-conformant, and writer-conformant even when it is never served over HTTP. A conformant deployment MUST advertise the media type, support range requests, and set cache headers that respect the format’s immutability.
16.1 Media type and file extension (normative)
- Media type:
application/vnd.blackcat.gts+cbor-seq(registration template in §20.1). GTS uses the+cbor-seqstructured-syntax suffix because a GTS file is a CBOR Sequence ([RFC 8742]) of segment headers and frames, not a single CBOR data item. The earlier provisionalapplication/vnd.blackcat.gts+cborspelling is obsolete; deployments MUST emitapplication/vnd.blackcat.gts+cbor-seq. Readers MAY accept the obsolete spelling as a legacy alias, but MUST NOT emit it in newly written metadata. - File extension:
.gts. - Magic bytes: the CBOR self-describe tag
55799(0xd9 0xd9 0xf7) at the start of the first segment’s Header when the first segment is tagged. A reader MAY use these three bytes as one signal while identifying a candidate GTS file, but MUST confirm the Header shape before treating the bytes as GTS.
Servers that do not recognise
application/vnd.blackcat.gts+cbor-seq SHOULD fall back to
application/octet-stream rather than a wrong text type;
clients SHOULD inspect the first CBOR data item when the media type is
missing or generic.
16.2 File identification algorithm (normative)
Media type metadata is authoritative when it is available. When a reader must identify bytes without trusted metadata, it MUST use this algorithm:
- Treat
.gtsandapplication/octet-streamas hints only; neither proves nor disproves GTS. - If the first three bytes are
0xd9 0xd9 0xf7, parse the first CBOR item as a tagged item and unwrap tag55799. Otherwise parse the first CBOR item from byte offset0. - The unwrapped first item MUST be a Header map containing
"gts": "GTS1"and lacking frame key"t". A mismatch is not a GTS file. - A positive identification is still only an identification result. Complete validity requires parsing the whole observed byte stream as a CBOR Sequence (§3), applying segment-boundary rules (§3.1), and validating ids, chains, profiles, and capabilities as required by the selected conformance class.
- Implementations MUST NOT require a whole-file CBOR wrapper, a total
item count, or a length prefix. Independently valid tagged segments may
be concatenated, so later
55799tags identify later segment headers, not nested whole-file objects.
16.3 HTTP serving semantics (normative)
A GTS package is served like any other immutable binary release, with three extra requirements:
Accept-Ranges: bytesMUST be sent for every.gtsresponse. The format is designed for partial, streaming consumption (§3.2): a consumer can fold the header and a prefix of frames without downloading the whole file. Clients choose byte ranges from discovered CBOR item offsets, indexes, or other trusted manifests; HTTP range support does not by itself validate or repair local file bytes.- No transforms at the edge. Because the bytes are a content-addressed chain, proxies and servers MUST NOT apply compression, minification, or any byte-altering transform. The frames are already compressed by the writer’s chosen codec; re-compressing at the transport layer breaks content hashes and signatures.
- CORS. A public vocabulary/dataset package is
expected to be cross-origin readable. Responses SHOULD include
Access-Control-Allow-Origin: *for the served.gtsorigin.
16.4 Immutability-aware caching (normative)
Published GTS releases are immutable; a GTS package URL names one exact byte sequence.
Versioned URLs (
…/gmeow/1.2.3/gmeow.gts,…/packages/music/2026-06-18/music.gts, or any URL that contains a version/date/head identifier) MUST be served with:Cache-Control: public, max-age=31536000, immutable ETag: "<last-segment-head>"The natural ETag is the hex of the file’s last segment head id (§3.1), because it transitively commits to every byte of the file. The
immutabledirective tells caches they need not revalidate for the one-year lifetime.latest/ conneg aliases (URLs that resolve to the current release and may change) MUST NOT be cached as a single variant:Cache-Control: private, no-store Vary: AcceptThe
Vary: Acceptprevents conneg-cache poisoning when the same path negotiates to HTML, Turtle, or the GTS package. This is the same cache-poisoning class addressed for slice IRIs by the Apache generator.
Profile selection remains URL-shaped in v0.2: one URL per package.
RFC 6906 / Accept-Profile is noted as a possible future
extension, not required for v0.2 conformance.
17. Versioning and durability guarantees
- The header
"v"is the spec major version. A reader MUST refuse a major version it does not implement, but MUST still verify the id/prev chain and enumerate frame types/ids. - Segment semantics and older readers. A reader
implementing this revision MUST support segment boundaries (§3.1). A
reader that does NOT (a pre-§3.1 implementation) encounters a second
Header as a non-frame data item: such input is malformed for
that reader, and it MUST surface a fatal diagnostic for the
remainder of the file rather than skip the item — silently
misfolding (applying file-global term-ids across a boundary) is the one
forbidden outcome (vector 17). Because
catcannot rewrite the first segment’s header (the self-hash seals it), multi-segment files cannot advertise themselves in the first header; boundary detection is therefore structural, and the hard-fail rule is what protects the ecosystem’s oldest readers. - Structure durability: a GTS file plus this specification is decodable forever with no engine and no external dictionary — CBOR is an IETF standard and dictionaries are in-band.
- Density durability: governed by the codec catalog;
the mandatory core set
(
identity/gzip/zstd) guarantees a baseline that any era can decode.
18. Security considerations
- The id/prev chain provides integrity, not
confidentiality; use
encrypt-class codecs for confidentiality. - Truncation (dropping trailing frames) is
undetectable from the chain alone; an
evidenceartifact MUST anchor the head — a signature over the head"id", or the index"head"/"mmr"root (§6.2) — so a verifier can detect a shortened log. - Recovery of frames after a damaged one is guaranteed only with known offsets (an intact index, a checkpoint frame, or external framing); a bare CBOR Sequence can desynchronise on arbitrary corruption (§9.1). GTS defines no parity/erasure coding — durability against bulk loss is the storage layer’s concern.
"to"/kidvalues can leak relationship metadata (who a frame is sealed for). Theopaqueprofile therefore REQUIRES pseudonymouskids; other high-privacy profiles SHOULD use them. Use a per-document or pairwise identifier — e.g."kid": "anon:<BLAKE3(true-kid ∥ head-id)>"— or key blinding, so the same recipient is unlinkable across files.- A valid signature attests the signer over the frame’s bytes; it does not assert the truth of the claims (consistent with attestation semantics — vouching ≠ correctness).
- Opaque frames are unreadable but not invisible; do
not place secrets in
"pub","to", or"meta". - Snapshot compaction (§10) destroys original signatures; an
evidenceartifact MUST NOT be snapshot-compacted. Streamable compaction (§10.1) detaches frame signatures rather than destroying them, but the re-ordered chain is attested only by the compactor; anevidenceartifact MUST NOT be streamable-compacted except by sealing the original verbatim (§10.1), and a consumer’s trust in the ordering of a compacted file is trust in the compactor. - Decompression of attacker-supplied frames MUST be bounded (zip-bomb resistance); readers SHOULD cap decoded sizes.
- Nested GTS (§12.1) MUST be bounded: readers MUST enforce a maximum recursion depth and a total decoded-size budget across all nesting levels (matryoshka-bomb resistance).
- Segments are independently authentic, not mutually vouched. Concatenation implies no endorsement: segment A’s signer attests nothing about segment B. A verifier MUST report signer sets per segment (§14.1), and a consumer deciding trust MUST NOT treat the file-level union as carrying the strongest segment’s authority. Value-wise cross-segment suppression (§11) means an untrusted appended segment can HIDE earlier content from default resolution — readers SHOULD surface which segment suppressed what, and high-assurance consumers MAY resolve suppression only from segments whose signers they trust.
- A torn append at a segment boundary looks like a torn header: the §3 torn-append rule applies; the prior segments fold intact.
19. Conformance test vectors
A conformant implementation MUST pass a shared corpus. v1 requires at least these vectors (shipped with the reference implementation), each as the GTS bytes plus the expected folded graph (N-Quads) and the expected diagnostics:
The companion GTS-CONFORMANCE.md defines
the tiered conformance claims, named vector subsets, expected JSON
fields, vector manifest schema, diagnostics registry, and read/verify
modes used to turn this corpus into comparable implementation
claims.
- Minimal valid file (header + one
terms+ onequads). - A
zstd-transformedquadsframe. - An unknown-codec frame → opaque
reason:"unknown-codec". - A frame with a wrong self-
"id"→DamagedFrameopaque. - A torn append at EOF →
TornAppendError, survivors intact. - Header self-hash verification (positive and tampered).
- RDF 1.2 reifier +
annotround-trip (gts → nq → gts), including quotation-without-assertion. - A nested GTS blob
(
mt: application/vnd.blackcat.gts+cbor-seq), recursed and folded. - Suppression over a term-id and over a frame digest.
- Truncation detection against a signed head / index
"mmr"root. - Literal datatype defaulting (§7.1): a literal with
"l"and no"dt"→rdf:langString; with neither →xsd:string. - A reifier rebound to a different triple →
ConflictingReifier, first binding kept (§7.8). - A position-constraint violation, e.g. a literal in predicate position → rejected/diagnosed (§7.4).
- Blank-node label locality (§7.1, §12.1): identical bnode labels in an outer and a nested GTS remain isolated (not merged).
- Two-segment union (§3.1):
catof two single-segment files folds to the value-union of both graphs; term-ids resolve segment-locally (a shared IRI unifies; identical ids naming different values do NOT collide); identical bnode labels across segments stay isolated. 15b: label-less blank nodes (absent or empty"v") are distinct terms within a segment and across segments, and the union’s serialized labels MUST keep them distinct — relabeling that merges what the graph separates is the forbidden outcome. - Composed round-trip (§3.1, §14): a
cat-composed file survivesgts → nq → gtswith the same union fold. - Pre-segment reader hard-fail (§17, negative): an implementation in pre-§3.1 mode fed a two-segment file MUST surface a fatal diagnostic at the second header — folding frames past the boundary with file-global term-ids is the forbidden outcome this vector exists to catch.
- Cross-segment suppression (§11): a second segment suppresses (a) an earlier segment’s frame by digest and (b) a quad by value; default resolution hides both; the suppressed segment’s bytes verify intact; the verifier reports which segment suppressed what (§18).
- Profile union + graceful segment opacity (§3.1): a two-segment file whose second segment requires an undeclared-to-the-reader capability folds segment one fully and segment two as opaque nodes with the profile named in the diagnostics.
- Language-tag discipline (§13.1, negative): a
producer emitting a private-use language tag into a projection/docs
section MUST fail at write time; the same tag in a canonical
distpayload section is accepted. - Degenerate composition refused (§14.1, negative):
gts catrefuses an empty-fold segment and a suppress-everything composition; raw bytecatof the same inputs still yields a structurally valid file (the tool is stricter than the format, by design). - Inline blob (§12, §14.1): an inline blob folds to
its
blake3:<hex>digest with declared metadata (pub.mt) retained; extraction by digest re-verifies the bytes; a digest-suppressed blob is refused by default. - Prefix-fold streaming property (§3.2, derived): not a vector but a property test over EVERY vector in this corpus — each item-boundary prefix folds without error, and across growing prefixes the folded tables only ever extend (terms/quads are list-prefixes while the segment count is unchanged; ground (blank-node-free) N-Quads lines are monotone across the single→multi-segment representation switch).
- Streamable compaction (§3.3, §10.1, §13.3): an
accretive source (blobs interleaved before their catalog, one
COSE-signed frame, no claim) and its compacted rewrite — the rewrite
claims
"layout": "streamable", leads with the streaming index, orders blobs most-significant-first, closes with the offsetindexfooter, and carries compaction provenance including the detached source signature; both files fold to the same content graph; the compacted bytes are frozen and double as the cross-engine determinism oracle (same input + same timestamp parameter ⇒ byte-identical output in every engine). - Streamable claim that lies (§3.3, negative): a
segment claiming
"layout": "streamable"that delivers a covered blob before the quads describing its digest →StreamableLayoutError; a verifying tool MUST refuse (exit non-zero). - Appended-after-compaction boundary (§3.3): a
compacted segment with frames appended after its
indexfooter folds cleanly with no diagnostic, and tooling reports “streamable through frame N, accretive tail” — the unpresaged tail is legal. - Total-reader hostility regressions (§2.4, negative): empty input, a first CBOR item that is not a Header, an unsupported header major version, an unknown structural frame type, a forward term reference, and a malformed transform payload all return structured diagnostics/opaque nodes where applicable. These vectors pin the “never panic on input bytes” invariant and make cross-engine diagnostic drift visible in CI.
- Deterministic graph writer (§14.1): two equivalent folded graph states with different local term ids and row order produce byte-identical GTS via deterministic graph authoring. The frozen vector pins term remapping, row sorting, blob metadata retention, metadata output, and suppression target remapping across Python and Rust producers.
20. IANA considerations
This section registers one media type. It follows the registration
procedures of [RFC 6838] and the structured-syntax-suffix procedures of
[RFC 9277]. Pending formal registration, the type lives in the vendor
(vnd.) tree and is used provisionally.
20.1 Media type registration:
application/vnd.blackcat.gts+cbor-seq
- Type name:
application - Subtype name:
vnd.blackcat.gts+cbor-seq - Required parameters: none
- Optional parameters: none
- Encoding considerations: binary. A GTS file is a CBOR Sequence ([RFC 8742]) and is not restricted to 7-bit or 8-bit text; transports that are not 8-bit clean MUST apply a content-transfer-encoding (e.g. base64).
- Security considerations: see §18 of this specification. In summary: the content-id chain provides integrity but not confidentiality; truncation is undetectable without a head commitment; decompression and nested-GTS recursion MUST be bounded; and signatures attest a signer over bytes, not the truth of claims.
- Interoperability considerations: the
+cbor-seqstructured-syntax suffix ([RFC 8742]) signals that the payload is a CBOR Sequence, so generic sequence tooling can inspect the ordered data items before applying GTS-specific rules. The self-describe tag55799([RFC 8949] §3.4.6) MAY tag each segment header as a magic number. Conformance is defined by the shared test-vector corpus (§19). - Published specification: this document (GTS — Graph Transport Substrate — Specification).
- Applications that use this media type: content-addressed RDF 1.2 graph transport and archival; signed agent-memory and provenance artifacts; package distribution where the payload bundles a graph and the binaries it references.
- Fragment identifier considerations: none.
- Additional information:
- Magic number(s):
0xd9 0xd9 0xf7(the CBOR self-describe tag55799) when present at the start of the file (§16.1). This prefix is OPTIONAL because the first segment header MAY be untagged. - File extension(s):
.gts - Macintosh file type code(s): none
- Magic number(s):
- Person & email address to contact for further information: Patrick Audley paudley@blackcatinformatics.ca
- Intended usage: COMMON
- Restrictions on usage: none
- Author / Change controller: Blackcat Informatics® Inc.
21. Complete CDDL appendix
This appendix is the copyable schema surface for implementers. Inline CDDL fragments earlier in this document explain local context; this appendix collects the wire-level map shapes in one place.
21.1 Sequence grammar
A GTS file is a CBOR Sequence, not one enclosing CBOR item. CDDL describes the individual items in that sequence; the sequence grammar is defined in English and ABNF-like notation:
gts-file = 1*segment
segment = [ self-describe-tag ] header *frame
self-describe-tag is CBOR tag 55799 applied to the
header item only. It is a wire-level magic hint, not a member of the
Header map, and it is not part of the Header "id" preimage
(§22). Each segment begins with a Header and then zero or more frame
items until the next Header or EOF (§3.1).
21.2 Copyable CDDL
; GTS v1 item grammar. The top-level file is a CBOR Sequence (§21.1).
gts-item = header-item / frame
header-item = header / self-described-header
self-described-header = #6.55799(header)
term-id = uint
frame-index = uint
codec-id = uint
digest = bstr .size 32
content-id = digest
blake3-uri = tstr ; "blake3:" + 64 lowercase hex characters
digest-ref = digest / blake3-uri
profile-name = tstr
layout-state = "streamable" / tstr
extension-key = tstr ; any text key not defined by that map shape
header = {
"gts": "GTS1",
"v": 1,
"prof": profile-name,
"cat": { * codec-id => codec },
? "layout": layout-state,
? "dct": { * tstr => bstr },
? "meta": any,
"id": content-id,
* extension-key => any,
}
codec = {
"name": tstr,
"cls": "encode" / "compress" / "encrypt",
? "dct": tstr,
? "p": any,
* extension-key => any,
}
frame = {
"t": frame-type,
? "x": [+ codec-id],
? "pub": any,
? "to": [+ recipient],
? "d": frame-payload / bstr,
"prev": content-id,
"id": content-id,
? "sig": cose-sign1,
* extension-key => any,
}
frame-type = "terms" / "quads" / "reifies" / "annot" / "blob" / "suppress"
/ "snapshot" / "meta" / "index" / "opaque"
recipient = {
"kid": tstr,
? "alg": tstr,
* extension-key => any,
}
cose-sign1 = bstr ; serialized COSE_Sign1, detached payload = frame "id"
frame-payload = terms-payload / quads-payload / reifies-payload / annot-payload
/ blob-payload / suppress-payload / snapshot-payload / meta-payload
/ index-payload / opaque-node
terms-payload = [+ term]
term = {
"k": 0 / 1 / 2 / 3, ; 0=IRI, 1=literal, 2=bnode, 3=quoted triple
? "v": tstr,
? "dt": term-id,
? "l": tstr,
? "rf": term-id,
* extension-key => any,
}
triple-row = [term-id, term-id, term-id]
quad-row = [term-id, term-id, term-id] / [term-id, term-id, term-id, term-id]
quads-payload = [+ quad-row]
reifies-payload = { * term-id => triple-row }
annot-payload = [+ triple-row]
blob-payload = bstr
blob-pub = {
? "mt": tstr,
? "rep": tstr,
? "digest": digest-ref,
* extension-key => any,
}
suppress-payload = {
"targets": [+ suppress-target],
? "reason": tstr,
? "by": term-id,
* extension-key => any,
}
suppress-target = suppress-frame / suppress-blob / suppress-term
/ suppress-quad / suppress-reifier
suppress-frame = { "kind": "frame", "id": digest-ref, * extension-key => any }
suppress-blob = { "kind": "blob", "digest": digest-ref, * extension-key => any }
suppress-term = { "kind": "term", "id": term-id, * extension-key => any }
suppress-quad = { "kind": "quad", "q": quad-row, * extension-key => any }
suppress-reifier = { "kind": "reifier", "id": term-id, * extension-key => any }
snapshot-payload = {
"terms": terms-payload,
? "quads": quads-payload,
? "reifies": reifies-payload,
? "annot": annot-payload,
? "blobs": { * digest-ref => bstr },
? "meta": any,
* extension-key => any,
}
meta-payload = any
index-payload = {
"count": uint,
"head": content-id,
? "off": [+ uint],
? "ti": { * frame-type => [+ frame-index] },
? "dict": [+ frame-index],
? "mmr": content-id,
* extension-key => any,
}
opaque-node = {
"id": content-id,
"type": frame-type,
? "pub": any,
? "to": [+ recipient],
? "sigstat": sig-status,
"reason": opaque-reason,
* extension-key => any,
}
sig-status = "none" / "valid" / "invalid" / "unverified"
opaque-reason = "unknown-codec" / "missing-key" / "damaged"
/ "unknown-frame-type"
diagnostic = {
"code": diagnostic-code,
"detail": tstr,
? "frame_index": frame-index,
* extension-key => any,
}
diagnostic-code = "EmptyFile"
/ "TornAppendError" / "DamagedFrame" / "BrokenChain"
/ "TruncatedLog" / "UnknownCodec" / "MissingKey"
/ "KeyWrapFailed" / "ConflictingReifier" / "RecursionLimit"
/ "StreamableLayoutError" / "PositionConstraint"
/ "ForwardReference" / "SegmentBoundary" / "IndexMmrError"
/ "UnknownFrameType" / tstr
profile-status = "core-required" / "optional-standard" / "experimental"
/ "domain-specific"
profile-registration = {
"name": profile-name,
"status": profile-status,
? "owner": tstr,
? "spec": tstr,
? "namespace": [+ tstr],
? "requires": any,
? "validation": any,
? "security": any,
* extension-key => any,
}
When "x" is present and non-empty, the frame
"d" value is a byte string carrying the
encoded/compressed/encrypted payload. After reversing the transform
chain (§6.1), those bytes decode to the frame-type-specific payload
above, except blob, whose decoded payload is raw bytes.
When "x" is absent, "d" carries the
frame-type-specific payload directly.
blob-pub is the conventional shape of a blob frame’s
"pub" map; the frame envelope keeps "pub"
typed as any so profiles can layer additional public
metadata without changing the core frame grammar.
digest-ref accepts both the raw 32-byte digest and the
blake3:<hex> text form used by the reference
engines.
22. Hash, signature, and extension-key preimages
All preimages in this section use the deterministic CBOR rules in §4: definite lengths, shortest-form integers, and map keys sorted bytewise by their encoded CBOR form. Unless a row explicitly excludes a field, every key/value pair in the map participates, including unknown extension keys.
22.1 Preimage and subject table
| subject | bytes hashed or signed | excluded fields | included extension fields | verifier behavior |
|---|---|---|---|---|
Header "id" |
BLAKE3-256(deterministic-CBOR(header-map without "id")) |
"id" only. The optional CBOR self-describe tag 55799 is
outside the Header map and outside the preimage. |
All unknown Header keys participate. | Recompute before accepting the segment Header; mismatch is header tampering. |
Frame "id" |
BLAKE3-256(deterministic-CBOR(frame-map without "id" and "sig")) |
"id" and "sig" only. |
All unknown frame keys participate. | Recompute for every frame; mismatch is
DamagedFrame. |
Frame "prev" link |
The "prev" value is included in the frame
"id" preimage. |
None beyond the frame "id" exclusions. |
Unknown frame keys do not change "prev" semantics but
are still in the frame "id" preimage. |
Compare to the previous item’s "id" within the same
segment; mismatch is BrokenChain. |
| COSE frame signature | Detached COSE_Sign1 over the frame "id" bytes. The COSE
Sig_structure is ["Signature1", protected, h'', frame-id];
the COSE payload field is null/detached. |
The signature is not part of the frame "id" preimage
because "sig" is excluded there. |
Extension keys affect the signature indirectly by changing the frame
"id". |
Verify with the key resolved by kid; report
valid, invalid, or
unverified. |
| Inline blob digest | BLAKE3-256(decoded blob bytes), after reversing
transforms and decryption when available. |
Frame envelope fields are not part of the blob digest. | Blob-public extension keys do not affect the blob digest, but they
do affect the containing frame "id". |
Compare with pub.digest when present and with graph
references that name the blob. |
| External blob digest | pub.digest names bytes stored elsewhere; the digest
subject is those external bytes. |
The external bytes are absent from the GTS frame, so only the digest
claim participates in the frame "id" through
"pub". |
Unknown public metadata participates in the frame "id",
not in the external blob digest. |
A verifier can check only when it obtains the external bytes. |
Index "head" |
The content-id of the last covered frame, where "count"
is the number of frames covered by the index payload. |
Not applicable. | Unknown index-payload keys participate in the index frame
"id", not in the "head" subject. |
Compare "head" to the covered frame id; mismatch
invalidates the index/layout claim. |
Index "mmr" |
Merkle-Mountain-Range root over the ordered frame ids covered by the index, using the leaf/parent/root preimages in §6.2. | The index frame itself is not covered unless a later index covers it. | Unknown index-payload keys participate in the index frame
"id", not in the MMR root. |
Use as an optional whole-covered-region commitment and proof root;
mismatch is IndexMmrError. |
| Detached signature provenance | stream:sourceFrame names the original frame
"id"; stream:cose carries the original
COSE_Sign1 bytes. The signature still verifies over the original frame
id. |
The rewritten frame’s new "id" is not the old signature
subject. |
Provenance graph extension terms do not change the original signature subject. | Verify carried signatures against stream:sourceFrame;
do not treat them as signatures over the compacted frame. |
22.2 Unknown extension-key behavior
An extension key is a text-string map key not defined by that map’s
CDDL production. Defined reserved keys such as "id",
"sig", "prev", "t",
"d", "x", "pub", and
"to" are not extension keys and MUST NOT be repurposed by
profiles.
Readers MUST include unknown extension keys when recomputing Header and frame preimages. A reader MUST NOT reject a Header, frame, codec, recipient, term, payload, opaque-node, diagnostic, or profile-registration map solely because it contains an unknown extension key. Unknown keys have no core fold semantics unless a supported profile or extension defines them.
Re-emit behavior depends on the operation:
- Byte-preserving operations such as raw
cat, copying, mirroring, or serving preserve unknown keys naturally because they preserve the original bytes. - A tool that decodes and re-emits a Header or frame while claiming to
preserve the same logical item MUST copy unknown extension keys verbatim
before recomputing
"id"values. - A tool that cannot preserve unknown extension keys MUST treat the
operation as lossy re-authoring, MUST recompute affected
"id"and"prev"values, and MUST NOT claim existing frame signatures remain attached to the rewritten frames. - A compactor or other re-authoring tool MAY preserve old frame signatures only as detached provenance (§10.1), where the old frame id remains the explicit signature subject.
Because extension keys participate in preimages, extension authors can add tamper-evident metadata without changing core GTS grammar. They cannot change header/frame grammar, hash preimages, signature subjects, or fold semantics (§2.1, §13).
23. References
23.1 Normative references
- [RFC 2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels”, BCP 14, March 1997.
- [RFC 8174] Leiba, B., “Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words”, BCP 14, May 2017.
- [RFC 8949] Bormann, C. and P. Hoffman, “Concise Binary Object Representation (CBOR)”, STD 94, December 2020.
- [RFC 8742] Bormann, C., “Concise Binary Object Representation (CBOR) Sequences”, February 2020.
- [RFC 9052] Schaad, J., “CBOR Object Signing and Encryption (COSE): Structures and Process”, STD 96, August 2022.
- [RFC 9053] Schaad, J., “CBOR Object Signing and Encryption (COSE): Initial Algorithms”, August 2022.
- [RFC 9277] Bormann, C. and M. Nottingham, “On the Use of Structured Suffixes in Media Types”, June 2022.
- [RFC 6838] Freed, N., Klensin, J., and T. Hansen, “Media Type Specifications and Registration Procedures”, BCP 13, January 2013.
- [RFC 3339] Klyne, G. and C. Newman, “Date and Time on the Internet: Timestamps”, July 2002.
- [BCP 47] Phillips, A. and M. Davis, “Tags for Identifying Languages”, September 2009.
- [BLAKE3] O’Connor, J., Aumasson, J-P., Neves, S., and Z. Wilcox-O’Hearn, “BLAKE3: one function, fast everywhere” (256-bit output used here).
- [RDF 1.2] W3C, “RDF 1.2 Concepts and Abstract Data
Model”, Candidate Recommendation Snapshot, 07 April 2026, https://www.w3.org/TR/2026/CR-rdf12-concepts-20260407/ —
the RDF terms, dataset model, triple-term, and
rdf:reifiessubstrate imported by §7.
23.2 Informative references
- [RFC 7049] Bormann, C. and P. Hoffman, “Concise Binary Object Representation (CBOR)”, October 2013 (obsoleted by [RFC 8949]; cited only for its legacy length-first “canonical” ordering, §4).
- [RFC 8610] Birkholz, H., Vigano, C., and C. Bormann, “Concise Data Definition Language (CDDL)”, June 2019.
- [RFC 9111] Fielding, R., Nottingham, M., and J. Reschke, “HTTP Caching”, June 2022 (the caching directives of §16.4).
- [RFC 6906] Wilde, E., “The ‘profile’ Link Relation
Type”, March 2013 (the
Accept-Profilefuture extension noted in §16.4).
GTS is intentionally a transport format, not an ontology or graph store. A conforming implementation preserves the append-only, content-addressed fold so independent projections can be regenerated from the same bytes.