feat: add epub skill v2 #28

Merged
magnus merged 1 commit from feat/epub-skill into main 2026-05-23 18:10:53 -04:00
Contributor

Summary

Complete EPUB creation, editing, validation, and knowledge extraction skill for the Agent Skills open format. Built from EPUB 3.3 spec research, real-world testing on a 2.1MB commercial Apress EPUB, and Apple Books compatibility verification on macOS 26.

Files

epub/
├── SKILL.md                         (474 lines, ~4,900 tokens)
├── references/                      (9 files)
│   ├── accessibility.md
│   ├── agent-capability-discovery.md
│   ├── apple-books-compatibility.md  ← NEW
│   ├── epub-format-internals.md
│   ├── fixed-layout-epub.md
│   ├── media-overlays.md
│   ├── python-libraries.md           ← updated with epublib API quirks
│   ├── spec-and-validation.md
│   └── tutorials-and-guides.md
└── scripts/                         (11 files)
    ├── epub-batch
    ├── epub-convert
    ├── epub-cover                    ← NEW
    ├── epub-edit
    ├── epub-extract-knowledge
    ├── epub-images
    ├── epub-info
    ├── epub-repair
    ├── epub-scaffold                 ← rebuilt with Apple Books fixes
    ├── epub-text
    ├── epub-validate
    └── test_epub_skill.sh           (46 tests)

Feature Highlights

  • 11 CLI scripts following cli-builder patterns (--json, --dry-run, lazy deps)
  • Apple Books compatibility verified on macOS 26 — cover XHTML wrapper, OEBPS/ directory structure, no deprecated CSS, spine ordering patterns
  • LLM env var convention — scripts auto-detect EPUB_LLM_URL + EPUB_LLM_KEY and enable LLM features automatically; fall back to heuristic mode silently
  • epublib API quirks documented — 9 real-world findings from testing on a 2.1MB commercial EPUB
  • Knowledge extraction pipeline — heuristic + LLM modes, output as vault atoms, memory entries, or structured JSON
  • Full authoring pipeline — scaffold → write XHTML → inject chapters → clean placeholders → validate

QA Gate (46/46 tests passing)

  • Syntax checks — all 11 scripts
  • epub-scaffold: creates valid EPUB with CSS
  • epub-validate: structural checks pass
  • epub-info: metadata/manifest/spine/TOC dump
  • epub-text: text extraction per-chapter
  • epub-extract-knowledge: heuristic + atoms format
  • epub-edit: 8 subcommands (info, metadata, add/remove/reorder/inject/rename)
  • epub-images: list + extract + cover detection
  • epub-batch: multi-file info + extract-text
  • epub-convert: EPUB2→3 conversion
  • epub-repair: diagnose + fix
  • Dry-run on every destructive command
## Summary Complete EPUB creation, editing, validation, and knowledge extraction skill for the Agent Skills open format. Built from EPUB 3.3 spec research, real-world testing on a 2.1MB commercial Apress EPUB, and Apple Books compatibility verification on macOS 26. ## Files ``` epub/ ├── SKILL.md (474 lines, ~4,900 tokens) ├── references/ (9 files) │ ├── accessibility.md │ ├── agent-capability-discovery.md │ ├── apple-books-compatibility.md ← NEW │ ├── epub-format-internals.md │ ├── fixed-layout-epub.md │ ├── media-overlays.md │ ├── python-libraries.md ← updated with epublib API quirks │ ├── spec-and-validation.md │ └── tutorials-and-guides.md └── scripts/ (11 files) ├── epub-batch ├── epub-convert ├── epub-cover ← NEW ├── epub-edit ├── epub-extract-knowledge ├── epub-images ├── epub-info ├── epub-repair ├── epub-scaffold ← rebuilt with Apple Books fixes ├── epub-text ├── epub-validate └── test_epub_skill.sh (46 tests) ``` ## Feature Highlights - **11 CLI scripts** following cli-builder patterns (--json, --dry-run, lazy deps) - **Apple Books compatibility** verified on macOS 26 — cover XHTML wrapper, OEBPS/ directory structure, no deprecated CSS, spine ordering patterns - **LLM env var convention** — scripts auto-detect `EPUB_LLM_URL` + `EPUB_LLM_KEY` and enable LLM features automatically; fall back to heuristic mode silently - **epublib API quirks documented** — 9 real-world findings from testing on a 2.1MB commercial EPUB - **Knowledge extraction pipeline** — heuristic + LLM modes, output as vault atoms, memory entries, or structured JSON - **Full authoring pipeline** — scaffold → write XHTML → inject chapters → clean placeholders → validate ## QA Gate (46/46 tests passing) - Syntax checks — all 11 scripts - epub-scaffold: creates valid EPUB with CSS - epub-validate: structural checks pass - epub-info: metadata/manifest/spine/TOC dump - epub-text: text extraction per-chapter - epub-extract-knowledge: heuristic + atoms format - epub-edit: 8 subcommands (info, metadata, add/remove/reorder/inject/rename) - epub-images: list + extract + cover detection - epub-batch: multi-file info + extract-text - epub-convert: EPUB2→3 conversion - epub-repair: diagnose + fix - Dry-run on every destructive command
Complete EPUB creation, editing, validation, and knowledge extraction
skill for the Agent Skills open format. Built from spec research, real
EPUB testing on 2.1MB commercial Apress title, and Apple Books
compatibility verification on macOS 26.

Scripts (11):
  epub-scaffold    — Create valid EPUB3 with cover XHTML, Apple Books CSS
  epub-edit        — Surgical editing (8 subcommands, epublib)
  epub-info        — Structure/metadata dump as JSON
  epub-text        — Clean text extraction, per-chapter or single-file
  epub-extract-knowledge — Heuristic + LLM extraction (env var auto-detect)
  epub-validate    — EPUBCheck or Python fallback validation
  epub-images      — List/extract all images with cover detection
  epub-batch       — Multi-file processing (extract-text, validate, metadata)
  epub-convert     — EPUB2→EPUB3 conversion with validation
  epub-repair      — Diagnose & auto-fix common structural issues
  epub-cover       — Add cover XHTML wrapper for Apple Books compatibility

References (9):
  epub-format-internals.md, python-libraries.md, spec-and-validation.md,
  tutorials-and-guides.md, agent-capability-discovery.md,
  fixed-layout-epub.md, accessibility.md, media-overlays.md,
  apple-books-compatibility.md (NEW — verified on macOS 26)

Test: 46/46 passing (test_epub_skill.sh)
magnus merged commit 7c8b0ece9f into main 2026-05-23 18:10:53 -04:00
jasper left a comment

🤖 Jasper (automated review)

Summary

This PR adds the EPUB skill v2 with 10 CLI scripts and 8 reference files. Comprehensive documentation, good test coverage, and well-structured scripts. A few issues to address before merging:

Findings

1. 🟡 --toc-visible is a no-op flag (epub-scaffold:255)
With action='store_true' and default=True, --toc-visible can never produce a different outcome than the default. The actual logic is toc_visible = not args.toc_hidden, so only --toc-hidden has effect. Either remove --toc-visible or change to action='store_false' with appropriate dest.

2. 🟡 TOC parsing misses Section objects (epub-info:146)
isinstance(item, tuple) targets an older EbookLib API. Modern Section objects fall through both branches — the tuple check and the Link check (hasattr(item, 'href')) — silently dropping hierarchical TOC structure. Add a branch for iterable objects with .title but no .href.

3. 🟡 atoms_written reported when no output file (epub-extract-knowledge:415)
When --format atoms is used without --output, atoms go to stdout but summary['atoms_written'] is still set to len(atoms). Consider a separate counter like atoms_emitted for the stdout path.

Positive highlights

  • Excellent test coverage (460-line test suite, 15 sections, dry-run & syntax checks)
  • Comprehensive reference files covering format internals, Apple Books compatibility, accessibility, and real-world API quirks
  • Clean cli-builder patterns (--json, --dry-run, --help, errors to stderr)
  • Good Apple Books compatibility focus with verified rules
  • AGPL boundary properly documented in SKILL.md
## 🤖 Jasper (automated review) ### Summary This PR adds the EPUB skill v2 with 10 CLI scripts and 8 reference files. Comprehensive documentation, good test coverage, and well-structured scripts. A few issues to address before merging: #### Findings **1. 🟡 `--toc-visible` is a no-op flag (epub-scaffold:255)** With `action='store_true'` and `default=True`, `--toc-visible` can never produce a different outcome than the default. The actual logic is `toc_visible = not args.toc_hidden`, so only `--toc-hidden` has effect. Either remove `--toc-visible` or change to `action='store_false'` with appropriate `dest`. **2. 🟡 TOC parsing misses `Section` objects (epub-info:146)** `isinstance(item, tuple)` targets an older EbookLib API. Modern `Section` objects fall through both branches — the tuple check and the Link check (`hasattr(item, 'href')`) — silently dropping hierarchical TOC structure. Add a branch for iterable objects with `.title` but no `.href`. **3. 🟡 `atoms_written` reported when no output file (epub-extract-knowledge:415)** When `--format atoms` is used without `--output`, atoms go to stdout but `summary['atoms_written']` is still set to `len(atoms)`. Consider a separate counter like `atoms_emitted` for the stdout path. #### Positive highlights - Excellent test coverage (460-line test suite, 15 sections, dry-run & syntax checks) - Comprehensive reference files covering format internals, Apple Books compatibility, accessibility, and real-world API quirks - Clean cli-builder patterns (`--json`, `--dry-run`, `--help`, errors to stderr) - Good Apple Books compatibility focus with verified rules - AGPL boundary properly documented in SKILL.md
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
magnus/agent-skills!28
No description provided.