21 points by johnnyApplePRNG 1 day ago | 37 comments
I just caught this in an o3-pro thought process: "and customizing for low difficulty. কাজ করছে!"
That last set of chars is apparently Bengali for "working!".
I just find it curious that similar "errors" are appearing from multiple different models... what is the training method or reasoning that these alternate languages can creep in, does anyone know?
[0] https://www.reddit.com/r/Bard/comments/18zk2tb/bard_speaking_random_languages/
mindcrime 1 day ago
[1]: By this I mean "whatever it is they do that can be thought of as sorta kind roughly analogous to what we generally call thinking." I'm not interested in getting into a debate (here) about the exact nature of thinking and whether or not it's "correct" to refer to LLM's as "thinking". It's a colloquialism that I find useful in this context, nothing more.
puttycat 1 day ago
In other circumstances they might take a different path (in terms of output probability decoding) through other character sets, if the probabilities justify this.
daeken 1 day ago
pixl97 1 day ago
tehlike 1 day ago
My phrases switch to the language I learned them on very easily.
Computer terms are almost always English.
A lot of idioms I learned in my adult life are going to stay English, even if a Turkish equivalent exists and I later learned about them.
BrandoElFollito 18 hours ago
I find out that it is way easier for me to translate to or from English (not a native speaker) to any of the languages I am bilingual in, than between these languages. It is very hard for me to listen to one, and speak the other.
BrandoElFollito 18 hours ago
To my French ear or sounded like they were sentencing me to terrible things (and were always surprised they sounded like this :)), up until the random "router" or "framework" which was the core of the fight.
I love to listen to languages I do not understand (a great source is Radio Green) and try to get from the words what they are talking about.
Another one is one of my closest friend, a German, who speaks a very soft English. This until he described me how to drive somewhere (pre-GPS era) and the names he was using were like lashes.
Speaking various languages is a blessing
tstrimple 1 day ago
outside1234 1 day ago
ASalazarMX 1 day ago
I assumed it knew I speak Spanish from other conversations, my Google profile, geolocation, etc. Maybe my English has enough hints that it was learned by a native Spanish speaker?
hiAndrewQuinn 24 hours ago
Vilian 24 hours ago
johnnyApplePRNG 1 day ago
Perhaps it's more common in the parts of the world where bengali and english are more commonly spoken in general?
Why so much bengali/hindi then and why not other languages?
epa 1 day ago
groby_b 1 day ago
yen223 1 day ago
latentsea 1 day ago
johnnyApplePRNG 1 day ago
Bjorkbat 1 day ago
diwank 1 day ago
For example, the DeepSeek team explicitly reported this behavior in their R1-zero paper, noting that purely unsupervised reasoning emerges naturally but brings some “language mixing” along. Interestingly, they found a small supervised fine-tuning (SFT) step with language-consistency rewards slightly improved readability, though it came with trade-offs (DeepSeek blog post).
My guess is OpenAI has typically used a smaller summarizer model to sanitize reasoning outputs before display (they mentioned summarization/filtering briefly at Dev Day), but perhaps lately they’ve started relaxing that step, causing more multilingual slips to leak through. It’d be great to get clarity from them directly on whether this is intentional experimentation or just a side-effect.
[1] DeepSeek-R1 paper that talks about poor readability and language mixing in R1-zero’s raw reasoning https://arxiv.org/abs/2501.12948
[2] OpenAI “Detecting misbehavior in frontier reasoning models” — explains use of a separate CoT “summarizer or sanitizer” before showing traces to end-users https://openai.com/index/chain-of-thought-monitoring/
ipsum2 1 day ago
The DeepSeek-R1 paper has a section on this, where they 'punish' the model if it thinks in a different language to make the thinking tokens more readable. Probably Anthropic does this too.
janalsncm 1 day ago
One, the model is no longer being trained to output likely tokens or tokens likely to satisfy pairwise preferences. So the model doesn’t care. You have to explicitly punish the model for language switching, which dilutes the reasoning reward.
Two, I believe there has been some research on models representing similar ideas in multiple languages in similar areas. Sparse autoencoders have shown this. So if the translated text makes sense, I think this is why. If not, I have no idea.
neilv 1 day ago
(Inspired by movies and TV shows, when characters switch from English to a different language, such as French or Mandarin, to better express something. Maybe there's a compound word in German for that.)
dpiers 24 hours ago
Most people can only encode/decode a single language but an LLM can move between them fluidly.
jmward01 1 day ago
muzani 23 hours ago
atlex2 1 day ago
throwpoaster 16 hours ago
tough 1 day ago
NooneAtAll3 1 day ago
The main suspicion is that it's more compact?
rerdavies 1 day ago
1 day ago
CMCDragonkai 1 day ago
CMCDragonkai 1 day ago
Incipient 23 hours ago
One could even say assuming someone's level of worldly understanding based on how many languages they speak shows a fairly limited world view.
ta20240528 23 hours ago
Is it linear (25% more understanding for the fifth) or asymptotically? Does it increase across all domains equally (geology, poetry, ethics) or asymmetrically?
Seriously, explain it to me?
nsonha 1 day ago
drivingmenuts 18 hours ago
We are intentionally undoing one of the things that makes computers useful.
1 day ago
1 day ago