Soundbites

March 2, 2026

I’ve been thinking of starting this new project, tentatively named “Soundbites”. Admittedly, my main use case for this is quite specific and the tool itself might not even be necessary for what I need.

Basically, I recently purchased this book called “German in 3 Months” by Sigrid-B. Martin, published by DK Hugo. So far, I think the book is pretty good (though I am an absolute beginner so you are free to take whatever I say with a grain of salt). I remember looking into other German books in the language section of book stores, yet they commonly seem to:

Lack any substantial amount of exercises of various types or,
Not provide a substantial amount of context, i.e., grammar.

The book gives a link to a free downloadable ap, which provides a set of .mp3 files should you want to listen to them. The way these audio files are structured is that each audio file is essentially a combination of snippets of pronunciations of German words/phrases/lines combined into one, each separated by a pause.

Now in addition to this book, I also use Anki as an additional practice tool, and I’d like to integrate these audio files especially to help me solidify the pronunciation. So, I started thinking of building this tool that can take these audio files, find points in time where speech “occurs”, and allow the user to input what the text for the corresponding snippet/s is/are. This can then be used to generate an Anki spreadsheet containing the data for each instance. An example of this is the one below:

der Tisch / die Tische, table/s [audio: der_Tisch_die_Tische.mp3]

I say “snippets/s” since the audio files can potentially contain different variants of a word, e.g.: singular and the plural, adjective and its antonym, etc. Because of this, I also want to be able to control the exact rows this tool generates.

So if it “detects” speech at 00:00 to 00:02, and 00:03 to 00:05. I, as a user, can either output one row for each of those snippets with the German + English labeling that I specified or group those two words into one row, perhaps with a draggable UI or something. The CSV and the audio files can then be downloaded as one .zip file for importing into Anki.

Just to keep it simple, I’ll be focusing on generating CSV files for now instead of integrating with something like AnkiConnect.

In terms of the implementation: my initial idea, and what I’ve setup as a project so far is a React frontend (built with NextJS but planning to use it just to generate static sites) and a Rust “backend”. At this point, it’s really just a service since I don’t really need features such as persistence or auth for now. Plus, this would now give me an excuse to use Rust for a project.

Tags: