The transcription service Good Tape, which I’d used for nearly three years, happened to expire today. Pulling up the bill and doing the math: €476, roughly NT$17,000. I was on the Pro plan. I checked this month’s usage: 20 hours still remaining.
Why I Wanted to Build My Own
Because AI Agents are advancing so fast! They’ve already changed how I work. Plus I have frequent meetings—Taiwan-Japan, Chinese-English, and occasionally Southeast Asian partners.
To be fair, three years ago Good Tape was a good tool. Built by a Danish team, it emphasized security and accuracy. But it solves an “after the fact” problem: you record audio, upload it, then wait for the transcript to process. No real-time recognition, no translation, no summary.
Over three years I paid €476 (about NT$17,000) for the transcription feature. Having that capability back then was already impressive, but it would be even better to have it “in the moment”—say, during an ongoing meeting, where if someone speaks Japanese I could see Chinese in real time. Not slowly organizing notes after the meeting.
Current real-time translation competitors on the market:
- Transync AI — $8.99/month (10 hours), the closest to what I wanted, with real-time voice translation + meeting summaries + 60 languages. But it requires installing an app, and exceeding your hours means buying additional hour packs (starting at $7.99/10hr). The more you use, the more expensive it gets.
- JotMe — $9-15/month, 107 languages, but tied to a Chrome Extension
- Wordly — enterprise pricing, hourly packs, starting at 10 hours
- KUDO — annual licensing, undisclosed pricing, targeting large enterprises
- Palabra — requires installing a desktop app, tied to specific meeting software
What I wanted was actually quite simple: open the browser and use it, no installation, works on phone or computer, lets me understand foreign colleagues with AI assistance during meetings, with transparent and controllable costs. As I shared the other day, I decided to build my own.
The Tool Is Called “Real-Time Meeting Notes | Agora,” Deployed on My Personal Website
- 🎙 Real-time voice recognition — text appears as you speak, not after recording finishes
- 🌍 Real-time translation in 12 languages — Chinese, English, Japanese, Korean, Vietnamese, Thai, Indonesian, German, Spanish, French, Portuguese
- 📋 AI meeting summaries — one click generates key points + action items + decisions
- 📖 Glossary — custom professional terminology mapping to ensure translation consistency
- 🖥 Subtitle mode — full-screen black background with large text, for projecting in meeting rooms
- ⬇️ Full-text export — TXT / CSV, can be imported into Excel
- 💰 Real-time cost tracking — how much each API call costs, transparent and visible
- 🔐 Three-tier authentication — Google / LINE / Facebook OAuth + invite code
The front end is 2,533 lines, the back end 2,148 lines. One HTML file plus one Cloudflare Worker.
The Most Interesting Technical Part: Three-Way Voice Recognition Routing
Voice recognition isn’t just a matter of picking one API. Different languages have different optimal solutions. The engine switches automatically based on language:

- 🇹🇼 Chinese → Qwen3-ASR (Alibaba Cloud’s Qwen team, WebSocket streaming)
- 🇺🇸 English → whisper-large-v3-turbo (LPU hardware acceleration, 200×+ real-time speed)
- 🌐 Other languages → Deepgram Nova-3 (WebSocket streaming)
Translation uniformly uses Claude Haiku 4.5 (Anthropic), with streaming output—the translation appears word by word, rather than waiting for the entire translation to finish before displaying. Beyond output quality, cost was also a consideration.
- Groq: $0.02/hr, cheapest for English
- Qwen: ~$0.40/hr, 97%+ recognition rate for Chinese, accurate even for professional terminology (dialects also supported)
- Deepgram: $200 free credit, handles all languages
A 1-hour Chinese-English meeting costs about $0.50 USD in API fees. NT$16. Doing the conversion: the €476 I spent on Good Tape equals running over 950 meetings with my self-built tool. Transync AI’s $8.99/month for a year is $108; the same money buys 216 meetings.
But What This Article Really Wants to Say Isn’t About Technology
The process of building this tool was, in fact, a process of learning “how to collaborate with AI.” I’m not an engineer. My background is in life sciences, theology, agricultural e-commerce, the circular economy, and more. Writing code is very difficult for me. Although my first startup was building Fintech SaaS, the entire tool and service was built with the help of a seven-person team.
I have a feeling that collaborating with AI requires more than just coding ability—it should be a new kind of literacy (I still can’t quite articulate it).
Breaking Down Problems Matters More Than Writing Code
Integrating Groq doesn’t happen with a single “add Groq for me.” I broke it into two phases: In Phase A, the back end first builds, deploys, and verifies that the API endpoint works. In Phase B, the front end then handles language routing, switching engines automatically based on the selected language.
Each phase is independently verifiable. When it breaks, only half breaks—it doesn’t blow up entirely. This breakdown wasn’t something AI taught me; I learned it from repeated failures—trying to do too much at once, then running out of tokens midway or having the context compressed, which damages the parts that were already done correctly.
Asking the Right Questions Is More Effective Than Telling the AI to Just Write It
It’s not about saying “build me a translation tool.” Instead: “The existing WebSocket proxy pattern can’t be used with Groq, because it’s a REST API, not a WebSocket. The front end needs to switch to chunked HTTP mode, POSTing a segment of audio every 3 seconds. Will the onstop + restart cycle have a race condition?”
Only this kind of question gets useful answers.
Finishing a Feature Isn’t the End—You Need a Code Review
I asked Claude to review the code I’d just written from an engineering perspective. It actually caught three problems: Groq failing silently on consecutive failures, a closure safety issue with MediaRecorder, and animation effects not triggering on the new engine.
I wouldn’t have discovered these three bugs myself. But I knew to “ask this question.”
AI Won’t Patrol for You
The Fitbit health data was broken for several days, and I only discovered it because I happened to ask. The root cause was a function missing a parameter, failing silently every time the scheduled task ran. AI won’t get up in the middle of the night to check whether your system has broken. You need to know what to ask and when to ask it.
This Is a New Way of Working
In the past we talked about “information literacy,” meaning the ability to search and judge whether information is true or false. Now what we may need is “AI literacy”:
- Knowing how to break a big problem into smaller ones that AI can handle
- Knowing how to describe technical constraints so AI gives executable solutions
- Knowing when to trust AI’s output and when to verify it yourself
- Knowing where AI’s capability boundaries lie—it can help you write, search, and review, but it won’t proactively think for you about what should be done
This isn’t an engineer’s exclusive domain. It’s a capability everyone who wants to make good use of AI needs.
I can’t write code, but I wanted to collaborate with AI to build a real-time translation tool (and “whip up some software at the drop of a hat” has now become reality).
💬 Comments
Loading...