How LLM Can Fix Your Posture

I stopped typing three months ago. Not completely, but for most of my work, I just talk.

The setup: I speak into my phone, the text appears on my computer wherever the cursor is. No copy-paste, no switching windows. I say a sentence, it gets typed. I press Enter.

This is how I write this article right now.


The problem

I'm a system engineer running a home server with dozens of services, AI agents, dashboards. I spend 5-7 hours a day at my workstation after my full-time job. Most of that time goes to typing: commands, prompts, messages, notes.

My hands get tired. My back hurts from hunching over the keyboard. And the worst part: typing is the bottleneck between thinking and doing.

I wanted to give instructions the way I'd talk to a colleague. By speaking.


How it actually works

The solution turned out to be embarrassingly simple:

  1. Android app sends recognized text over WiFi to my workstation
  2. Workstation service receives the text and types it into the active cursor position
  3. That's it. No cloud. No server processing. No Whisper.

The key insight: Android's built-in speech recognition is better than anything I tried.

I experimented with Whisper (multiple model sizes), Faster Whisper, Vosk, and several other libraries. They all had problems. Whisper small was too slow on CPU, took 3-4 seconds per utterance. Whisper medium ate 4GB of RAM and was still slower than real-time. Faster Whisper improved speed but accuracy with mixed Russian/English was poor. Vosk worked offline but the models were huge and recognition quality was inconsistent.

Android's native speech-to-text just works. It's fast, it's accurate, it runs on the phone's hardware, and it handles language switching naturally. Google has spent billions optimizing on-device recognition. I can't compete with that on a single server.


The workflow

My phone sits on the desk next to me. When I want to "type" something:

  1. Open the app (or it's already open)
  2. Speak naturally, text appears in real-time on my phone screen
  3. The text gets transmitted over WiFi to my workstation
  4. It's inserted wherever my cursor is: terminal, browser, IDE, chat
  5. I hit Enter (on the phone or keyboard)

There's also a PWA version that works in a browser, but I primarily use the Android app. The latency on the local network is negligible. It feels instant.

Language switching: Android auto-detects language from phonemes. I use three languages daily -- English, Russian, Ukrainian -- and it switches between them naturally. Code words like kubectl or xdotool require manual typing; the recognizer has no training data for them.


What changed

My productivity increased dramatically. I measured it informally: tasks that involved writing prompts, commit messages, or documentation took about 3x less time. The bottleneck shifted from typing to thinking, which is where it should be.

Before: Think, position hands, type, fix typos, think again, type more.

After: Think, speak. Done.

The physical change was even more dramatic. I have a motorized standing desk that adjusts height. Before Voice to PC, I rarely used the standing position because it's hard to type while standing. Your wrists are at a weird angle, the keyboard feels too low or too high.

Now I work standing half the day. Just talking.

If you could see me, the evolution looks like the classic "evolution of man" image. First hunched over a keyboard, then sitting upright, then standing tall with just a phone in hand. The irony is that as a system engineer, my posture improved not from ergonomics advice but from building a voice tool.


Why not just use dictation software?

Because existing solutions don't do what I need:

  • OS-level dictation (Windows Speech, macOS Dictation). Tied to one machine, mediocre accuracy, no cross-device support.
  • Dragon NaturallySpeaking. Expensive, Windows-only, overkill for what I need.
  • Web-based tools. Require internet, add latency, and raise privacy concerns. I don't want my prompts going through someone else's server.

What I needed was dead simple: phone recognizes speech (it's already good at this), sends text over local WiFi, text appears at cursor. No cloud round-trip, no subscription, no training period.


Technical details

Android app: Kotlin, uses Android's SpeechRecognizer API. Connects to the workstation via WebSocket over the local network. Sends recognized text as plain string messages. The app stays in foreground with a persistent notification so Android doesn't kill the WebSocket connection.

Workstation service: Lightweight Python process, about 80 lines of code. Receives WebSocket messages, uses xdotool (Linux) to type the text at the current cursor position. Simulates keyboard input at the OS level, so it works with any application. No special integration needed. If you can type in it, Voice to PC can type in it too.

Network: Pure local WiFi. Phone and workstation are on the same network. Latency is under 50ms. No internet required, no data leaves my home. I tested it with WiFi analyzer and the round-trip from speech end to text appearing on screen is about 200ms total, including Android's recognition time.

Recognition accuracy: Android's speech recognition handles three languages I use daily: English, Russian, and Ukrainian. It switches between them naturally. Technical terms sometimes need correction, but for everyday commands and text, it's remarkably good. Better than any self-hosted solution I tested.

Reliability: The WebSocket connection drops maybe once a week. The app auto-reconnects in 2-3 seconds. Not a problem in practice.


What I use it for daily

  • Talking to Claude. This is the primary use case. I dictate prompts, describe bugs, give instructions. Probably 60% of my voice input goes into Claude conversations.
  • Writing notes and worklogs. Worklogs are timestamped notes documenting what I built that day -- part of my semantic memory system. I used to skip writing them because it felt tedious: open terminal, think about tags, type content. Now I just say what I did.
  • Git commit messages. Describe what changed, speak it. My commit messages got longer and more descriptive since I stopped typing them.
  • Slack and Telegram messages. Faster than thumb-typing on phone, more comfortable than keyboard.
  • Documentation. Like this article.

What doesn't work great

Code. I don't dictate code. Variable names, brackets, indentation. Voice is terrible for this. But honestly, I haven't written code manually in three months either -- Claude Code writes it for me. I dictate the intent, the model writes the code. The keyboard limitation stopped mattering.

Noisy environments. Works great in my home office. Drops accuracy significantly with background noise. I tried using it on a balcony with street noise and gave up after two minutes.

Very long dictation. After 2-3 minutes of continuous speech, Android's recognizer sometimes resets. I've learned to speak in shorter bursts, pausing between thoughts. This actually improved my clarity because it forces me to think in complete sentences.

Accented technical terms. When I say "xdotool" or "kubectl", Android has no idea what I mean. I keep a small dictionary of corrections for terms I use often, but honestly, for these I just type.


Why local-only matters

No API keys or prompts leaving my home server. No subscription required. No account dependency. For builders handling sensitive work or client data, this eliminates a critical attack surface. The entire system lives on my network -- I own the data, the latency, the uptime.


What I learned

The biggest surprise was not the speed gain. It was the behavioral change.

I save more notes now. The friction of opening a terminal, thinking about tags, typing content killed most of my note-taking before. Now I just say it. My Mesh memory went from 5-10 entries per week to 30+.

I explain things better. When you speak, you naturally structure your thoughts differently than when you type. Speaking forces linear, clear explanations. My documentation improved.

And the standing desk thing was completely unexpected. I bought it two years ago and used it standing maybe 10% of the time. Now it's 50%. Building a voice input tool accidentally solved my ergonomics problem.


Was it worth building?

It took a weekend to build the first working version. Three months later, I use it every single day.

The total cost: one weekend of coding, zero ongoing costs. No API fees, no subscriptions. The phone I already had. The WiFi network I already had. Android's speech recognition is free.

Sometimes the most impactful tool isn't the most complex one. It's the one that removes friction from what you already do hundreds of times a day.

I type less. I think more. I stand up.

← Back to blog