Hacker Remix

Scheduled tasks in ChatGPT

129 points by isaacdl 4 days ago | 74 comments

UmYeahNo 4 days ago

I tried this yesterday, asking it to create a simple daily reminder task, which it happily did. Then when the time came and went I simply got a chat that the task failed, with no explanation of why or how it failed. When I asked it why, it hallucinated that I had too many tasks. (I only had the one) So, now I don't know why it failed or how to fix it. Which leads to two related observations:

1) I find it interesting that the LLM rarely seems trained to understand it's own features, or about your account, or how the LLM works. Seems strange that it has no idea about it's own support.

2) Which leads me to the Open AI support docs[0]. It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

[0] https://help.openai.com/

Terretta 4 days ago

Same experience except mine insisted I had no tasks.

It does say it's a beta on the label, but the thing inside doesn't seem to know that, nor what it's supposed to know. Your point 1, for sure.

Point 2 is a SaaS from before the LLMs+RAG beat normal things. Status page, a SaaS. API membership, metrics, and billing, a SaaS. These are all undifferentiated, but arguably they selected quite well for when the selections were made, and unless the help is going to sell more users, they shouldn't spend time on undifferentiated heavy lifting, arguably.

varispeed 3 days ago

> it hallucinated that I had too many tasks.

How do you know it hallucinated? Maybe your task was one too many and it is only able to handle zero tasks (which would appear to be true in your case).

reustle 3 days ago

> It seems pretty telling to me that they use old-school search and not an LLM for its own help docs, right?

Just not a priority most likely. Check out the search by Mintlify docs to see a very well built implementation.

Example docs site that uses it: https://docs.browserbase.com

derefr 3 days ago

Re: 2 — for the same reason that you shouldn't host your site's status page on the same infrastructure that hosts your site (if people want to see your status page, that probably means your infra is broken), I would guess that OpenAI think that if you're looking at the support docs, it might be because the AI service is currently broken.

dgfitz 4 days ago

New killer feature: cron

Can’t imagine why everyone doesn’t pay $200/mo for even more features. Eventually I bet they can clean out /tmp!

chairhairair 4 days ago

cron, but completely unreliable. How nice.

LLM heads will say “it’s not completely unreliable, it works very often”. That is completely unreliable. You cannot rely on it to work.

Please product people, stop putting LLMs at the core of products that need reliability.

kenjackson 3 days ago

It's all a matter of degree. Even in deterministic systems, bit flipping happens. Rarely, but it does. You don't throw out computers as a whole because of this phenomena, do you? You just assess the risk and determine if the scenario you care about sits above or below the threshold.

dkjaudyeqooe 3 days ago

A bit flip is a rare occurrence in an array typically tens of billions large.

The chance that the flipped bit changes a bit that results in a new valid state and one that does something actually damaging is astronomically small.

Meanwhile LLM errors are common and directly effect the result.

kenjackson 2 days ago

My point is that your confidence level depends on your task. There are many tasks for which I'll require ECC. There are other tasks where an LLM is sufficient. Just like there are some tasks where dropped packets aren't a big deal and others where it is absolutely unacceptable.

If you don't understand the tolerance of your scenario, then all this talk about LLM unreliability is wasted. You need to spend time understanding your requirements first.

great_psy 3 days ago

When’s the last time you personally had a bit flip on you?

mhitza 3 days ago

You generally cannot know because we don't measure for it? Especially not on personal computers, maybe ECC ram reports this information in some way?

In practice I think it happens often enough, and I remember a blackhat conference talk from around a decade ago where the hacker squatted typoed variants of the domain of a popular facebook game, and caught requests from real end users. Basing his attack on the random chance of bitflips during dns lookups.

Related, but not the video I was referring to

https://news.ycombinator.com/item?id=5446854

rsynnott 3 days ago

Not just that, cron, only non-deterministic! The future is now.

theshrike79 3 days ago

An actual killer feature would be a system that lets me define repeating tasks with natural language.

Then it would translate that into cron commands in the background.

postsantum 3 days ago

I feel like the obligatory comment about Dropbox is coming your way

headcanon 4 days ago

I'm trying to figure out how this would be useful with the existing feature set.

It seems like it would be good for summarizing daily updates against a search query. but all it would do is display them. I would probably want to connect it with some tools at minimum for it to be useful.

DeepYogurt 4 days ago

They're really trying to juice the usage numbers

42lux 3 days ago

"How chatgpt reminders saved my life and made me more productive." Videos on YouTube in 3,2,1.

JTyQZSnP3cQGa8B 4 days ago

As long as it’s generating hype and funding, it brings us closer to their own definition of AGI. It’s the perfect plan.