Text Extraction
- CURIE:
gmeow:TextExtraction - IRI: https://blackcatinformatics.ca/gmeow/TextExtraction
- Category: class
- Defined by:
gmeow:slices/provenance - Box roles: CBox role, TBox role (What is this?)
The text content extracted from a source object (e.g. a PDF attachment), linked to its source by gmeow:wasDerivedFrom. A faithful (ideally verbatim) rendering of the source's textual content, distinct from a gmeow:Summary, which condenses and is therefore lossy.
Structure
Subclass of: gmeow:Document
Practical Pattern
Use gmeow:TextExtraction as a specialized kind of gmeow:Document. Add statement metadata or a standpoint when the assertion needs provenance, confidence, or vantage.
Common Companion Terms
Usage Advice
Use when
- Use for the text content pulled verbatim out of a source object — OCR or extraction from a PDF, image, or binary attachment — whose lineage to that source must be recorded.
Avoid when
- Avoid for a condensed or paraphrased account (use
gmeow:Summary) and for the source document itself; aTextExtractionis the derived text view, not the original artifact.
How to use
- Type the extracted text as
gmeow:TextExtraction, link it to its source withgmeow:wasDerivedFrom, and name the extracting activity viagmeow:wasGeneratedBy; record extraction confidence (OCR quality) on the statement layer.
Examples
- ex:pdfText a
gmeow:TextExtraction;gmeow:wasDerivedFromex:pdfAttachment.