More

kstonekuan · 2026-02-05T03:24:37 1770261877

Thanks for linking!

kstonekuan · 2025-12-29T13:17:52 1767014272

I built a web viewer for visualizing Gaussian Splat .ply files generated by Apple's ml-sharp, which converts a single photo into a 3D Gaussian Splat in under a second [1].

Features: - Upload and view .ply file directly in the browser - Multiple camera trajectory animations (rotate, swipe, shake, forward) - Interactive orbit controls (drag to orbit, scroll to zoom, right-drag to pan) - No installation required – runs entirely client-side

The original ml-sharp can render the same video trajectories but requires a CUDA GPU. This viewer lets you explore the 3D output on any device with a browser.

I also added cloud GPU inference via Modal so you can generate splats without a local GPU (free tier available) [2].

[1] https://github.com/apple/ml-sharp

[2] https://modal.com/

kstonekuan · 2025-12-16T15:21:16 1765898476

I started off trying Electron, but I am liking Tauri more. It seems Rust has better support for system-level integration, like controlling audio and keys. Are there other alternatives you are exploring?

james_marks · 2025-12-16T15:55:00 1765900500

Yeah, Tauri struck me as much more powerful and lighter than Electron.

My hobby project with Tauri died when I managed to set an OS-wide shortcut, which is amazing.

Except it broke many other apps, and I just never got back to it.

kstonekuan · 2025-12-17T07:41:47 1765957307

Tauri v2 has this global shortcut plugin which I am using and it works amazing

https://v2.tauri.app/plugin/global-shortcut/

kstonekuan · 2025-12-15T12:14:45 1765800885

Yup, the desktop app is built with Tauri, which is cross-platform compatible, and I have personally tested it on macos and windows

kstonekuan · 2025-12-15T07:05:33 1765782333

I have been working on a customizable AI voice dictation tool using Pipecat's framework to swap between many providers and models, including cloud or local.

Started off as an open source alternative to Wispr Flow for myself as I wanted to have more control over the formatting rules as well as model choice but after sharing with friends and presenting it at my local Claude Code meetup, I was encouraged to share it more widely.

The desktop app uses tauri so it is cross-platform compatible and I have tested it working on macOS and windows.

https://github.com/kstonekuan/tambourine-voice

kstonekuan · 2025-12-15T01:10:30 1765761030

Thanks for the feedback, probably should have been clearer in my original post and in the README as well. Local inference is already supported via Pipecat, you can use ollama or any custom OpenAI endpoint. Local STT is also supported via whisper, which pipecat will download and manage for you.

popalchemist · 2025-12-15T09:02:56 1765789376

Rad. put that front and center on the readme.

kstonekuan · 2025-12-15T12:06:03 1765800363

Updated!

kstonekuan · 2025-12-15T01:05:56 1765760756

Yes, Pipecat already supports that natively, so this can be done easily with ollama. I have also built that into the environment variables with `OLLAMA_BASE_URL`.

About ollama in pipecat: https://docs.pipecat.ai/server/services/llm/ollama

Also, check out any provider they support, and it can be easily onboarded in a few lines of code.

kstonekuan · 2025-12-15T01:02:38 1765760558

Hey, sorry if the examples given were not robust, but because this is built on Pipecat, you can actually very easily swap to a local LLM if you prefer that, and the project is already set up to allow you to do that via environment variables.

The integration to set up the WebRTC connection, get the voice dictation working seamlessly from anywhere, and input into any app took a long time to build out, and that's why I want to share this open source.

kstonekuan · 2025-11-25T06:17:30 1764051450

I was curious how people were choosing between voice AI agent providers and was interested in comparing them by baseline network performance, but could not find any existing solution that benchmarks performance before STT/LLM/TTS processing. So I started building a benchmarking tool to compare Pipecat (Daily) vs LiveKit.

Unlike your average LLM benchmark, this benchmark focuses on location and time as variables since these are the biggest factors for networking systems (I was a developer for networking tools in a past life). The idea is to run benchmarks from multiple geographic locations over time to see how each platform performs under different conditions.

Basic setup: echo agent servers can create and connect to temporary rooms to echo back after receiving messages. Since Pipecat (Daily) and LiveKit Python SDKs can't coexist in the same process, I have to run separate agent processes on different ports. Benchmark runner clients send pings over WebRTC data channels and measure RTT for each message. Raw measurements get stored in InfluxDB, then the dashboard calculates aggregate stats (P50/P95/P99, jitter, packet loss) and visualizes everything with filters and side-by-side comparisons.

I struggled with creating a fair comparison since each platform has different APIs. Ended up using data channels (not audio) for consistency, though this only measures data message transport, not the full audio pipeline (codecs, jitter buffers, etc).

Latency is hard to measure precisely, so I'm estimating based on server processing time - admittedly not perfect. Only testing data channels, not full audio path. And it's just Pipecat (Daily) and LiveKit for now, would like to add Agora, etc.

The README screenshot shows synthetic data resembling early results. Not posting raw results yet since I'm still working out some measurement inaccuracies and need more data points across locations over time to draw solid conclusions.

This is functional but rough around the edges. Happy to keep building it out if people find it useful. Any ideas on better methodology for fair comparisons or improving measurements? What platforms would you want to see added?

Stack: Python, TypeScript (React), InfluxDB

focom · 2025-11-28T00:21:43 1764289303

Cool stuff. I prefered the experience with lk but i always wonder whats the performance like with pipecat

kstonekuan · 2025-10-18T18:23:59 1760811839

Think you know your subreddit community well? Test your knowledge in this short Reddit game!

Made this with Devvit at the LA Tech Week Reddit Hackathon last week. Currently, the questions are just based on metadata like upvotes, comments, and authors, but if you like this, I would be happy to work on more features like leaderboards and AI-generated answers.

You can also contribute to it directly here: https://github.com/kstonekuan/reddit-trivia-night