129 points by isaacdl 4 days ago | 74 comments
UmYeahNo 4 days ago
1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.
2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?
Terretta 4 days ago
It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.
Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.
varispeed 3 days ago
How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).
reustle 3 days ago
Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.
Example docs site that uses it: https://docs.browserbase.com
derefr 3 days ago
dgfitz 4 days ago
Can’t imagine why everyone doesn’t pay $200/mo for even more features. Eventually I bet they can clean out /tmp!
chairhairair 4 days ago
LLM heads will say “it’s not completely unreliable, it works very often”. That is completely unreliable. You cannot rely on it to work.
Please product people, stop putting LLMs at the core of products that need reliability.
kenjackson 3 days ago
dkjaudyeqooe 3 days ago
The chance that the flipped bit changes a bit that results in a new valid state and one that does something actually damaging is astronomically small.
Meanwhile LLM errors are common and directly effect the result.
kenjackson 2 days ago
If you don't understand the tolerance of your scenario, then all this talk about LLM unreliability is wasted. You need to spend time understanding your requirements first.
great_psy 3 days ago
mhitza 3 days ago
In practice I think it happens often enough, and I remember a blackhat conference talk from around a decade ago where the hacker squatted typoed variants of the domain of a popular facebook game, and caught requests from real end users. Basing his attack on the random chance of bitflips during dns lookups.
Related, but not the video I was referring to
rsynnott 3 days ago
theshrike79 3 days ago
Then it would translate that into cron commands in the background.
postsantum 3 days ago
headcanon 4 days ago
It seems like it would be good for summarizing daily updates against a search query. but all it would do is display them. I would probably want to connect it with some tools at minimum for it to be useful.
DeepYogurt 4 days ago
42lux 3 days ago
JTyQZSnP3cQGa8B 4 days ago