Dark Factory Gap: What Happens to Teams, Roles, and Organizations
In Part 1 of this series, we worked through the why: Shapiro's five levels of AI development, Brynjolfsson's J-Curve, and the core thesis that AI tools alone

How does AI change how we work? Everyone's asking that question right now. But thinking about it gets you nowhere. You only feel the difference by doing it -- experimenting, breaking things, adjusting, going again.
Since the ChatGPT moment in late 2022, I've been running that experiment across different fields: what changes when you don't just use AI, but actually rebuild the process around it? Better? Worse? Different? You have to find out.
This field: how I research and write. Technical articles in our domain. Not theory -- what has actually changed.
The tipping point for getting serious about it was Claude Code. I'd used it for months, purely for coding. Then sometime in November 2025 the thought landed: this isn't a coding tool. It's a workflow tool. For everything. The fact that Anthropic now calls it "Claude Code + Cowork" confirms exactly that shift. Three months in.
For 15 years I've been doing roughly the same thing: helping teams understand that the tool doesn't bring the change -- the workflow the tool enables does. That was true when we moved from wiki pages and shell scripts to Puppet and Chef. True with Terraform and Pulumi -- one workflow that works the same across AWS, Azure, Google, and your own data center. True with CI/CD and machine-readable BDD tests all the way to continuous deployment. Now AI is the new Swiss Army knife that can improve many of those workflows again.
And at some point I noticed: I've been helping teams with exactly that pattern for years -- and was still writing the way I did two years ago. Research: manual. Source-finding: manual. Fact-checking: manual. Solid. Slow. Limited by the hours in a day.
The Dark Factory series wouldn't have happened that way. Three parts, 30+ primary sources each, five countries. Too much work alongside the day job.
So I started applying the patterns from software development to writing. Write a spec, let the agent work, review the output critically. Sounds cleaner than it is.
Ethan Mollick writes every post himself. AI gives feedback on the finished draft. Simon Willison publishes commit hashes -- you can trace his AI-generated text back to the exact prompts. Tiago Forte says openly: Claude drafts 90% of his writing.
Three people, three completely different approaches. Mine is different again. And probably will be different again in six months.
Everything starts with a brief. Not a prompt. A document I write myself: what do we already know? What's missing? What sources would have to exist if the hypothesis holds?
Sounds trivial. It isn't.
Knowing what you don't know -- and formulating it precisely enough for a machine to work with -- is the part I can delegate least. A poor brief produces plausible-sounding results that miss the actual questions entirely. A good one saves hours.
The brief goes to Gemini Deep Research, ChatGPT, and Claude. All three. Gemini turns up recent studies. ChatGPT performs well on US sources. Claude on structured synthesis. What only one model finds, I treat with more skepticism than what all three return independently.
Important: raw material. Not truth.
Sometimes the research reveals a gap. For the Dark Factory series, we had plenty of US and UK data on the junior pipeline crisis but almost nothing from Central Europe. So: a follow-up brief, specifically for European labor market data -- Bitkom, ZEW, IAB, ICT-Berufsbildung Schweiz. That follow-up research substantially changed the third article. Without it, the European section would have been thin.
Then it gets manual. Going through research results, marking what holds up, what's shaky, what gets cut. And asking: which patterns can carry an article?
Pflichtenheft-style thinking and Spec-Driven Development are essentially the same pattern, thirty years apart. Or dual apprenticeship training as a structural equivalent to Molly Kinder's residency model. Those connections I draw myself. They don't come from the machine. At least not yet.
From the patterns comes a structure brief: which arguments in which order, which data supports what, what tone. Then I write the article with Claude Code.
Yes. Exactly the Level 4 from the Dark Factory article. Write a spec, let the agent work, review the output. The blog doesn't just describe this workflow -- it is this workflow.
No draft ships without multiple rounds. Does the number match the source? Primary source or secondary? Is a claim empirically backed or my interpretation -- and is that distinction clear? Do the links work?
For the Dark Factory series: five rounds.
One example. Gemini delivered "54% decline in junior job postings in Germany," citing Indeed Hiring Lab. The number couldn't be verified from the linked source. Cut. Replaced with a qualitative trend statement that can be backed up.
Fact-checking takes the most time. It also creates the most value. AI accelerates the research; verification stays manual. Definitely.
Hero images and infographics: Gemini Image Generation. Each image gets checked against the current text -- when text changes through fact-checking, an image can become invalid. For the Dark Factory series I generated 15 infographic variants. Still experimenting a lot there.
After five, six, seven iterations, the models are confident: everything checks out. Sources verified, numbers backed, no open points. Exactly then you need a check that isn't stuck in the same system.
I run the finished article through GPTZero. Not because of the AI score -- that sits at 95-100%, and that's honest. What I'm after: the plagiarism score as a proxy for faulty citations. And sure enough, even after multiple fact-checking rounds where I and the models were convinced everything was clean -- broken references. Links going nowhere. Attributions with one too many layers of indirection.
Since then, I also run a custom-built reference checker agent over all links. Each URL is checked: HTTP status, specificity, content match. For the Dark Factory series, that turned up over 40 problems. Across five articles. Most common: AI-hallucinated URL slugs. The domain is right, the path is invented. Looks fine at first glance. Goes nowhere.
Highest plagiarism score in GPTZero so far: 4%. Every match turned out to be correct citation, not unattributed content. But the path there showed something: the AI-assisted loops alone aren't enough. You need a completely independent external checkpoint.
Though -- even with the external check I'm not satisfied yet. The 1-4% plagiarism score is misleading. Some 90% of it is verbatim quotes, correctly attributed, with footnote and link. GPTZero finds a text match with the source and flags it. Fairly, from the tool's perspective: it sees identical text and can't distinguish between plagiarism and clean citation.
The problem runs deeper. My citations are Hugo footnotes -- prose text at the bottom of the article. Readable by humans. A blob for machines that requires parsing and interpretation to establish the connection between an inline marker and a source. There's no structured way for a tool to say: "The sentence on line 47 is a quote from source 3, correctly attributed, not plagiarism."
What's needed: machine-readable metadata. BibTeX, CSL-JSON, structured frontmatter -- something a plagiarism checker or fact-checking agent can consume without parsing prose. Hugo footnotes were the pragmatic choice for speed and simplicity. But they're a dead end if you want to make the citation process automatically verifiable. That piece is still missing from my pipeline.
Everything versioned. Drafts, briefs, research results, editing rounds. The complete history is traceable, from the first brief to the final text. Simon Willison's idea, adapted to my workflow.
Research depth that was previously unrealistic. The Dark Factory series cites 30+ primary sources per article. Five countries. Two languages. Harvard, Brookings, McKinsey, Bitkom, ZEW, IAB. Every single source manually verified.
Without AI that would have been a full-time research project. With AI: intensive, but manageable alongside the day job. Wider reach in research, same quality standards in verification.
Three months in, some things work, others don't.
Multi-model research definitely delivers broader coverage than any single source. The fact-checking loops catch hallucinations reliably. Git makes the process traceable. So far, so good.
What still doesn't work: the transition from research to structure is too manual. Infographics need too many iterations. And I don't have a good workflow for the English version -- rewrite, not translation. Something's still missing there.
And then the things where I genuinely have no idea. How does all this change when the models are better in six months? What happens when colleagues use the same workflow -- does it scale? What kind of article definitively doesn't work this way? We'll see.
The writing process produces articles. But it also produces research results, briefs, source collections, patterns. And those now live in git repos and local files and my head.
Previously: write a Confluence page. Drop a link in, add an architecture example, capture the insight. Done. Out of date the moment you save it. And after three months nobody finds it anymore. Not because the page was bad -- but because the search doesn't keep up and the structure doesn't scale. Anyone who has worked in a wiki knows this.
So a new question appears: how do you build a knowledge base that can keep pace with the speed at which we now research?
I can see an architecture taking shape. Not finished yet, I'll say that. But the building blocks are becoming visible.
Taxonomy. Not the kind someone draws on a whiteboard in a workshop and nobody uses afterwards. Something living. I'm building a small application that evaluates my LinkedIn feed -- scoring, a first simple taxonomy for clustering trends, tools, and ideas. Rough around the edges. But it already shows what's noise and what's signal.
Vector databases and RAG. The whole toolchain to make a knowledge base not just searchable but capable of finding connections. Interesting what Redis is doing in the AI space right now -- I had a call with a Redis Solution Architect about their vector DB path. Or what people are building on retrieval patterns with other tools. Obsidian with taxonomies as personal knowledge management that embeds into a team workflow. All approaches I'm evaluating.
Bot integration. The knowledge needs to go where people work. Slack. Teams. Not "go to the wiki and search."
And then the two questions that always come up first with customers.
Local AI versus frontier models. Customer research, internal strategy, confidential architecture patterns -- that can't go to OpenAI. For general research and synthesis I need the frontier models. The boundary isn't technical. It's organizational and regulatory. And different for every customer.
Cost. Vector DB hosting, API calls, embedding models, bot infrastructure. For my own workflow: manageable. But how does that scale in a customer project with 20 people? No idea. Still need to work through the numbers.
All of this is work in progress. But the direction is getting clearer: knowledge that emerges during research needs to flow into a structure that makes it reusable. Not as a Confluence page that dies after three months.
The EU AI Act requires disclosure for AI-generated content from August 2026. I'm not waiting for that.
Globally, there's no established approach. Most people stay quiet about their AI usage. Others slap a disclaimer at the bottom. I'm trying a third thing: describe the process. Including the parts that don't work yet.
The blog is my test field. But the real application is elsewhere.
If you know Team Topologies, you know the Enabling Team: a team that temporarily closes capability gaps in other teams -- then moves on. Training, coaching, pair programming. Transfer knowledge until the other team is self-sufficient. That's exactly what we do at Infralovers. We're an Enablement Team to Hire.
The core of that model is people. Curriculum design, coaching, reading the room, understanding where a team actually stands and what it needs next. No AI system replaces that. But the model has a structural problem: when the enabling team leaves, a large part of the context knowledge leaves with them. The training materials stay. The Confluence pages stay. But the ability to find the right information at the right moment -- that was bound to people.
And that's where it gets interesting.
What if the research results, the architecture patterns, the decision rationale from an enablement engagement don't disappear into a wiki, but flow into a knowledge base that's contextually retrievable? Not "search the wiki." Instead: you're working on a Terraform module, and the system knows which patterns apply in your organization, which decisions the enabling team discussed with your lead, which mistakes other teams already made.
RAG over the internal knowledge base. Adaptive learning paths that track what a team already knows and what's still missing. Bot integration in Slack or Teams that answers when the coach isn't there anymore. Those are the building blocks I described in the previous section -- vector databases, taxonomies, retrieval patterns. Just not for my blog workflow, but for knowledge transfer in organizations.
The idea isn't to replace the human part. The idea is to extend the half-life of an enablement engagement. The coaching ends. The knowledge base stays. And it answers.
Will that work? No idea. I'm assembling the pieces right now. First for my own workflow, then for our team, then we'll see where it holds up for customers. But the direction is clear: AI-assisted information retrieval will change how enabling teams operate. Not whether -- how, and how long the effect lasts.
You are interested in our courses or you simply have a question that needs answering? You can contact us at anytime! We will do our best to answer all your questions.
Contact us