Grove: A tool for human-AI collaborative writing

Here's the disclaimer: I'm writing this article in "real-time", which is to say I am typing it up as I am building this tool. Think of this as a lab visit where I show you my experimental setup, explain every decision, and let you draw your own conclusions.

If you're looking for a quick productivity hack, this isn't it.

This is a hands-on tutorial with downloadable command-line tools. You'll:

Download a self contained, statically linked binary for Grove on your machine (Mac/Linux/Windows)
Follow step-by-step examples with provided sample files
See exactly how this workflow solves real writing problems
Walk away with a working system you can use immediately

Download links: Mac | Linux | Windows

Note: There's also a web app for people who prefer not to use the command line. But this post is written for software engineers, so I'm focusing on the terminal-based workflow here.

Defining the Problem

I've been using LLMs to write everything for more than 6 months. Here are some pain points that existing writing tools don't really address.

Effective working memory: ChatGPT and Claude both have a frustrating tendancy to discard carefully negotiated nuance when you are generating anything more than a couple of paragraphs. When you request one edit, then another, then a third, the decisions from your first edit often get overwritten to satisfy the latest request.
Context fragmentation: I need to work paragraph-by-paragraph with carefully curated context.

Input context lengths may have gone up into the millions of tokens, but the size of reliably excellent output is way smaller. What I mean by this is: what's the largest chunk of output you can get that requires absolutely no editing? That's the real constraint I'm working within.

Version archaeology: When an LLM generates a paragraph I don't like, I need to preserve both the output AND my reasoning about why it failed, because the LLM has no memory.

And that leads to...

Context preparation overhead: Every LLM call requires assembling relevant context, and manually recreating this context each time is tedious and annoying. Going back and editing one message to branch the conversation does not cut it. I have started manually creating separate context files for different sections. And pasting these chunks into a new conversation.
Iterative refinement: I'm not looking to write an outline and have AI mechanically expand it. I want to take advantage of the capability to generate stupendous amounts of text in seconds. I actually want to sift through and see how different presentations of the same raw source material looks and feels. I like mining the LLM for cool phrases. So I don't want to go in a straight line from outline to document. I want support for detours.

I'll work on ideas from the top down, or bottom up. I will go down reasoning chains that don't lead anywhere. I'll backtrack. I'll collect ideas from other authors. Basically I'm building an iceberg of information. The polished article you read represents only the tip: a carefully chosen subset of ideas floating above the surface. 90% of the iceberg is invisible: research notes, discarded drafts, supporting evidence, alternative phrasings, and the reasoning process that shaped each paragraph.

Its more fair to say I'm using LLMs as a sequence to sequence translator they were intended to be. Not fabricating ideas whole cloth. Instead I'm going to go back to edit the input to this translation process, the source material and then run the translator again.

The problem isn't that I have to do all this back and forth. Any serious writing requires extensive exploration and revision. The problem is that I have no good way to organize it. I want a system that preserves not just my content, but my editorial intelligence: the decisions, alternatives, and reasoning that transformed raw research into polished prose.

I need infrastructure for human-AI collaborative writing. I'm not organizing thoughts. I want to organize the iterative dialogue between human editorial judgment and AI generation capabilities. This is a fundamentally new writing workflow that existing tools aren't designed for.

My Proposal

Every document you write gets a directory. One directory for a blog post, one directory for a roadmap, one directory for a to-do list.
Every file in this directory starts with an output section at the very top. Everything below that output section is input: all the information, reasoning, alternatives, and research you used to decide what the output should be. The key difference from other transclusion based tools is that output is the only content that can be embedded in parent documents.

Each file looks like this:

<output>
This content can be embedded in other files
</output>

# Regular markdown content below
Everything else - notes, reasoning, alternatives, research.
This is your workspace for developing the output section.

You can embed the output section of any document into a parent document with an embed tag.

<embed [src](./introduction.md) id="EX5TcuGZYl8">
Content from the <output> section of introduction.md appears here
</embed>

The key insight is that the <output> section is not a summary of the file. It's the distilled result of all the thinking documented below it. As you move deeper into the tree, you find the complete reasoning process that produced each piece of output text.

You can create any structure you want this way. The system is very flexible. So far I have used two patterns:

Imagine these embeds form a tree. The tree root file is a list of embed blocks containing paragraphs, figures, even titles. Yes, I might even have an embed block that contains a single sentence, then link to a file where I talk about trying several different versions.
I also will form a linked list that documents the process of proving A, then moving on if A then B, if B then C, etc.

Here's an example structure:

blog-post/
├── main.md              # Your final article
├── introduction.md      # Supporting file 
├── examples.md          # Supporting file
├── user-studies.md      # Could be flat like this
└── research/            # Or in subdirectories if you prefer
    └── competitor-analysis.md

One notable constraint in my implementation is output blocks cannot contain embed blocks. This is intended to prevent cycles from forming.

Meet Grove

To demonstrate that this organizational approach works, I built a command-line tool that automates the embedding process plus a few more convenience features for working with this structure. But the core idea; splitting every file into input and output sections within a directory-based hierarchy; is independent of any specific tool. You can implement this structure with any text editor. There's nothing stopping you from organizing your files this way right now.

I just wrote a tool to automate the tedious copy pasting.

The HN crowd will dismiss this as over-engineering because they're thinking about human writing, not AI-assisted writing. They don't understand that I'm trying to solve a workflow problem that only emerges when LLMs become your primary writing partner.

You're not just including text. You're creating a reusable context library that lets you efficiently make LLM calls without constantly rebuilding the same background information.

I get more use out of this structure than you might think. Because I take it one step further. I'm using this file structure as part of a larger system where the AI becomes your research assistant AND your editor, working within constraints you define.

The tree structure becomes a way to programmatically select context for LLM calls
A standalone AI agent can navigate the tree itself to select relevant context
The human reviews and edits the context selection
Another LLM call combines the context with a writing focused prompt to generate edits to the specific output section
The human reviews those edits
The reasoning gets preserved in the input sections

By having a well-structured tree with clear input/output boundaries, I have created a human-AI collaborative loop where:

AI selects relevant context from my tree
I review/edit that selection
AI generates content within defined boundaries (output sections)
I review/edit the generated content
Reasoning accumulates in input sections automatically

Let the AI help with context selection (step 1)
Have clear boundaries for what the AI should be trying to change (output sections only)
Preserve all the reasoning and alternatives in input sections
Iterate efficiently without recreating context each time

Here's how I use it: I'm working on a blog post and want to revise the introduction

The section I'm working on is a separate file from the final article.
I collect my reasoning and alternative versions in the input section.
When I want to try another revision, I have all the context ready to send to Claude

even with prompts asking the AI to "consider all previous edits" since information further back in the conversation doesn't factor as strongly into predicting the next token.

«Why Developers Disagree About AI Productivity Download Kit»