How the Glotix Translation Engine Works — A Technical Deep Dive
The Problem with Flat Text Translation
Most website translation tools extract text from your page, translate it, and paste it back. This "flat" approach breaks in three ways:
1. Inline elements get destroyed — becomes one string, losing the tag
2. Context is lost — the word "Post" in a blog means something different than "Post" on a form button
3. Dynamic content is missed — SPAs, React apps, and JS-rendered text never get translated
Glotix solves all three.
DOM Tree Walking
When the Glotix SDK loads, it walks your DOM using the browser's native TreeWalker API with NodeFilter.SHOW_TEXT. This finds every Text node individually — not elements, not innerHTML, but the actual text content nodes.
For each text node, we compute a contextual hash:
hash = FNV-1a(textContent + "|" + parentPath)
Where parentPath is the chain of ancestor tag names: BODY>DIV>NAV>A. This gives every text occurrence a unique fingerprint based on both its content AND its position in the DOM tree.
The same word "Home" gets two different hashes depending on where it appears:
fnv1a("Home|BODY>NAV>UL>LI>A")— navigation linkfnv1a("Home|BODY>MAIN>H1")— page heading
This means each occurrence translates independently with full context awareness.
Translation Queue
Discovered text nodes are batched into a translation queue. The queue debounces for 300ms (to batch rapid DOM changes), then fires a single POST /api/translate request with up to 100 items.
The server checks its KV cache first. Cache hits return instantly. Cache misses go to GPT-4o-mini for AI translation, then get cached permanently.
Client-side, translations are stored in window.__glotix_translateMap — a Map. Repeat encounters of the same hash (from re-renders or navigation) apply instantly without any API call.
MutationObserver
After the initial walk, a MutationObserver watches for DOM changes:
childList: true— new elements addedcharacterData: true— text content changedsubtree: true— deep watching
When a mutation fires, the affected subtree is re-walked. Only NEW text nodes (hashes not in the known set) get enqueued. A requestAnimationFrame debounce prevents rapid-fire mutations from overwhelming the queue.
Critically, the observer ignores mutations caused by Glotix itself (translation application and restoration) using a WeakSet tracking system.
WeakRef for Memory Safety
Every registered text node is stored as a WeakRef. This means if the host page removes a DOM element, the SDK doesn't hold a reference that prevents garbage collection.
When applying a translation, the SDK calls nodeRef.deref() — if it returns undefined, the node was garbage collected and the translation is skipped silently.
Language Switching
The language switcher widget lives in a Shadow DOM to prevent style conflicts with the host page. When switching languages:
1. All nodes are restored to their original text (stored at registration time)
2. The in-memory translation cache is cleared
3. All known hashes are re-enqueued for the new target language
4. The MutationObserver pauses during restoration to prevent feedback loops
5. New translations are fetched and applied
The 2-second polling interval serves as a safety net, catching any text nodes that the MutationObserver might have missed.
The Result
An 11KB script that translates any website — static, dynamic, SPA, or server-rendered — while preserving DOM structure, understanding context, and handling real-time content changes. All running client-side on the edge.