Voozh

I have this tendency to save articles whenever I see an interesting title, telling myself that I'll be sure to read it later. But the truth is, if I didn't have time to read it at that moment, there's a good chance I won't have time later either. I've tried chipping away at my list of articles to read, but it always seems to grow and never shrink.

The first thing I tried was a read-it-later app. They're convenient because they work across devices and let you organize your list. But those apps didn't help me at all, because they simply store the links, and I still need to figure out which ones are worth reading. These days, there are a million things competing for our attention, and most people are short on time.

The answer to the problem was to summarize each article, allowing me to read the main takeaways in only a few minutes. From there, I can decide if any of the summarized articles are worth my time to read completely. With LLMs at our fingertips, this is the perfect kind of task to automate. Turning your reading list into digestible summaries allows you to stay on top of the content you're interested in while giving up as little time as possible.

👁 A Macbook running Paperless-ngx

I use local LLMs and self-hosted apps to manage my documents instead of relying on ChatGPT

Not every LLM-powered task requires a ChatGPT subscription

By Ayush Pande

A workflow that changes how I read

I save time and still manage to read more

My reading list is just a plain text document full of URLs, one per line. This makes the summarization workflow dead simple, although you could still make it work with read-it-later apps, as long as they have an export option, or they expose an API. An offline document keeps my reading list private, and it's easy to read from Python.

It's the same story with the LLM. I'm running a local one because it's easier to tie in the automation, plus it's free and private. You could technically use something like ChatGPT or Claude, but it'd require access to their API, which costs money. Keeping things local ensures that none of your data touches a third-party server, and there's not much advantage to using a big name LLM for simple summarization tasks.

I personally found this setup easier to implement on Linux. If you're using Windows, WSL will be your friend if you want to set this up the way I did. This isn't a strict requirement, and all this software is also available on Windows. Adjust for your preferences accordingly.

How it all actually works

Here are the nuts and bolts of the process

The setup I deployed relies on Ollama, Python, and a plain text file of URLs called reading_list.txt. Here's how I installed the local LLM and the Python dependencies:

# Installing Ollama (the framework which runs a local LLM)
curl -fsSL https://ollama.com/install.sh | sh

# Download the local LLM (I use llama, but there are many choices)
ollama pull llama3

# You'll also need two Python dependencies
pip install requests beautifulsoup4

The last piece of the puzzle is the Python script. This script scrapes content from the URLs found in my reading list, and uses the BeautifulSoup package to isolate the article text from elements like HTML and JavaScript. What's left is just the actual text of the article, which is sent to the local LLM with a prompt that asks it to summarize the content. You can adjust the prompt to your liking, if you want briefer output or bulleted lists, etc.

Here's the Python script:

import requests
from bs4 import BeautifulSoup
from datetime import date

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3"
READING_LIST = "reading_list.txt"
OUTPUT_FILE = f"digest_{date.today()}.txt"

def fetch_article_text(url):
    try:
        r = requests.get(url, timeout=15, headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36"})
        soup = BeautifulSoup(r.text, "html.parser")
        for tag in soup(["script", "style", "nav", "footer", "header"]):
            tag.decompose()
        return soup.get_text(separator=" ", strip=True)[:8000]
    except:
        return None

def summarize(text):
    prompt = f"Summarize the following article in 3-5 sentences. Be concise and focus on the key points:\n{text}"
    r = requests.post(OLLAMA_URL, json={"model": MODEL, "prompt": prompt, "stream": False})
    return r.json().get("response", "").strip()

with open(READING_LIST) as f:
    urls = [line.strip() for line in f if line.strip()]

with open(OUTPUT_FILE, "w") as out:
    for url in urls:
        text = fetch_article_text(url)
        summary = summarize(text) if text else "Could not fetch article."
        out.write(f"## {url}\n{summary}\n---\n")

open(READING_LIST, "w").close()  # Clear the list when done

The script limits the articles to the first 8,000 characters, effectively truncating very long articles.

By default, Ollama's API is listening at localhost:11434, which is where Python will send each article's text to be summarized. The script will also clear the file after running, so I can continue populating it with new URLs the next day. The file with all the summaries will be saved as digest_YYYY-MM-DD.txt in the same directory, but of course all those details are easy to change if you'd like to edit the script.

To set up automation, I put a line in cron to do the job. This one runs the script every day at 2 AM, so my summaries are ready for me by the time I get on my PC in the morning:

0 2 * * * cd /path/to/scripts && python3 summarize.py

It's the almost-perfect solution

I'm reading more articles and managing to consume more content than I was before, so I consider this setup a win. However, it's not quite perfect. Sometimes the LLM misses a detail that I feel it should've mentioned, though tuning the prompt can help with that. The other shortcoming is that some sites don't play as nicely as others do with scrapers. I can't guarantee this script will work across every website, although it works for all the sites I frequent.

This won't replace reading, and that wasn't even my initial goal with this setup. The idea is to help me decide what articles are worth reading or not, and this delivers those results almost flawlessly. If you're overwhelmed with your reading list and want an in-house solution, install the dependencies, grab my Python script, and get ahead of your list.

Ollama

Ollama is a framework that allows you to download and run various LLMs on your computer.

See at Ollama

URL: https://www.xda-developers.com/automate-read-it-later-workflow-with-local-llm-to-summarize-articles/

⇱ I automated my entire read-it-later workflow with a local LLM so every article I save gets summarized overnight

I use local LLMs and self-hosted apps to manage my documents instead of relying on ChatGPT

A workflow that changes how I read

I save time and still manage to read more

How it all actually works

Here are the nuts and bolts of the process

It's the almost-perfect solution

Ollama

URL: https://www.xda-developers.com/automate-read-it-later-workflow-with-local-llm-to-summarize-articles/

⇱ I automated my entire read-it-later workflow with a local LLM so every article I save gets summarized overnight

I use local LLMs and self-hosted apps to manage my documents instead of relying on ChatGPT

A workflow that changes how I read

I save time and still manage to read more

How it all actually works

Here are the nuts and bolts of the process

It's the almost-perfect solution

Subscribe to the newsletter for reading-list automation

Ollama