As a journalist, I often have to sit through hours of recorded interview audio, trying to catch nuance, but mostly just transcribing it for important quotes. Sure, there are services like Otter, but outsourcing transcription can be expensive and risky, often both. File privacy is a legitimate concern, and accuracy isn't quite the best, anyway, for the amount of money these services charge. I've been looking for a tool that I can control, one that runs locally, and that can handle transcription. If it came with a few extra features, that would be even better. And that's how I came across FileWizard. This self-hosted, open-source app is astonishingly capable at wrangling files of all sorts, and just as easy to get up and running. Here's why FileWizard is a great choice not just for journalists, but for anyone in search of an easy transcription tool.
This free Obsidian plugin turns my voice into notes, and it all runs on my computer
Using Whisper plugin with its local LLM, I use Obsidian to transcribe my voice notes and audio files to text on my computer.
Self-hosting beats cloud services
Total ownership of your audio and transcripts
When you hand off your audio recordings to cloud-based transcription services, you basically cede control over your content. That is a nonstarter for sensitive interviews and private information. Self-hosting means you keep your audio files, transcripts, and metadata on your own machine or server. No third party has access to that information. FileWizard gives you exactly that control.
The app runs locally on your own home server using Docker. I wouldn't recommend installing this on a NAS if you plan to run the OCR and Whisper-based transcription, as the experience can be a bit slow, but it's certainly doable. Once installed, the app is extremely straightforward to get started with. You can drop audio or video files, extract the audio, feed it to Whisper models, and return a transcript as a text file. Simple.
The app's capabilities are pretty extensive and include everything from OCR for images, PDF text extraction, and a broad litany of text conversion between formats like PDF, text, Docs, and ePUBs. So, the app can essentially serve as a Swiss Army knife style tool for handling all sorts of content processing in one place. Because the app is entirely under your control, you also get to choose which LLM models to use when processing files, depending on how many compute resources you want to dedicate to the app, or how much compute resources you have on hand. For example, you can choose from a smaller Whisper model for faster speeds or when using a low-powered server, or a larger model if you have enough resources or need to process a long and complicated file. Even better, the large language models are cached locally, so if you're running repeated transcriptions, you don't have to redownload, allowing you to comfortably run transcriptions offline. Moreover, FileWizard conveniently offers a full history of prior transcriptions so you have an easily accessible log. You'll be surprised at how handy this comes in when working with a broad range of files that can quickly become difficult to track.
Getting started with FileWizard
From quick installation to fast transcription in minutes
Getting FileWizard is surprisingly painless. You can quickly spin up a Docker container using Docker Compose. The app's GitHub page has the installation process well-detailed and easy, making it a cinch for anyone to get started. Once complete, simply access the Docker container through the browser interface. Drag the files or choose the file that you want to transcribe, select the transcribe option, which lets you pick the model of your choice, and tap the Transcribe option. That's all you need to do. FileWizard handles the rest, runs Whisper, and gives you the text file to download.
Here's how I use it. Most of my interview recordings happen either with a Plaud Notepin or an iPhone. The former, in particular, has an entire subscription fee attached to transcriptions that I have no interest in paying for. I export the audio files from either device, drop it into the interface, and hit transcribe. Here's a pro tip. While the small model works pretty well for most use cases, if your audio file has multiple people in the conversation, or runs long, I'd highly recommend switching to the large model to avoid mis-detections. I've got FileWizard connected to a reverse proxy, which lets me easily access it when I'm out and about, or if I'm traveling for work.
FileWizard delivers fast and private transcriptions with zero subscription cost
FileWizard is a prime contender for journalists, students, or almost anyone who deals with interviews, lecture recordings, and more. The versatile feature set makes it a must-install for basically anyone who deals with transcription, file conversions, OCR, and more on a regular basis. It certainly helps that the app is both open-source and self-hosted, which guarantees that there's no vendor lock-in, surprise subscription fees, or the fear of third-party services poking into your private data. For my personal workflow, where accurate quotes and transcription are extremely important, and every word counts, FileWizard has become an indispensable tool. For anyone looking for an alternative to paid transcription services, install FileWizard, try it for a short interview, tweak the model choice, and you'll see how well it fits into your workflow.
FileWizard
The self-hosted FileWizard app offers everything from audio transcription to OCR with a simple and easy-to-use interface.
