
// @daneroo
We “Tinkered” with long-form text. Attempting to answer questions that are not well suited for simple RAGs.
Why do you think this would be a good talk for this audience?
It relates to direct, “hands-on” experimentation and development. Using LangChain(.js) and Local LLMs (llama2/mistral) to perform Map/Reduce operations on long-form text.
“Every Tinkerer needs a workbench”

“Every Tinkerer needs a workbench”
Using LangChain(.js)
const chain = loadSummarizationChain(model, { type: "refine" });
summary = summarize(chunk1)
summary = summarize(chunk2, summary)
summary = summarize(chunk3, summary)
...
When this is performed on a large number of chunks (>30), the running summary becomes very forgetful.
Repeatedly split, summarize, concat
level0Chunks = split(OriginalText)
level0Summaries = [...level0Chunks].map(summarize)
level1Txt = concat(level0Summaries)
level1Chunks = split(level1Txt)
level1Summaries = [level1Chunks].map(summarize)
level2Txt = concat(level1Summaries)
...
Until levelNText is small enough.
This turns out to be a very effective approach, and produces a very good summary.
~10:1 reduction per level
| Level | Documents | Size (kB) |
|---|---|---|
| Original | 89 | 1336.85 |
| Level 0 | 213 | 179.04 |
| Level 1 | 23 | 16.70 |
| Level 2 | 3 | 1.96 |
Same as with summarization
refine is not suited for long text.chunk1:[
{ "name": "Dr. Yamada", "description": "A scientist" },
{ "name": "Kaito", "description": "A hacker" }
]
chunk2:[
{ "name": "Kaito", "description": "invaluable to Dr. Yamada" }
]
...chunkN:
Then …
{
"Kaito": ["A hacker", "invaluable to Dr. Yamada"],
"Dr. Yamada": ["A scientist"]
}
“Kaito is a hacker, invaluable to Dr. Yamada”
Kaito is a young street-smart individual with a reputation within the underground networks, … Kaito joins the trio on their quest to stop The Architect.
Hero Of Ages - Characters - llama2 ↗️
| Character | Mentioned in |
|---|---|
| Vin | 77 |
| Elend | 55 |
| Sazed | 47 |
| Spook | 33 |
| Ruin | 30 |
| Breeze | 30 |
| Kelsier | 23 |
// @daneroo
\[ \text{Daniel Lauzon} = \sum_{meetups} \text{people}_{met} + \epsilon \]

\[ \text{length}_\text{tot} = \sum_{\text{doc} \in \text{docs}} \text{length}(\text{doc}) \]
A thin wrapper for MathJax
\[P(E) = {n \choose k} p^k (1-p)^{ n-k} \]
\[ \frac{1}{\Bigl(\sqrt{\phi \sqrt{5}}-\phi\Bigr) e^{\frac25 \pi}} = 1+\frac{e^{-2\pi}} {1+\frac{e^{-4\pi}} {1+\frac{e^{-6\pi}} {1+\frac{e^{-8\pi}} {1+\ldots} } } } \]
\[ {}_{\text{Ottawa}}^{\,\quad\text{AI}} \text{Tinkerer}_{\text{Hackathon}}^{2023} \]