Hacker Remix

Show HN: Daily Jailbreak – Prompt Engineer's Wordle

112 points by ericlmtn 1 day ago | 62 comments

I created a daily challenge for Prompt Engineers to build the shortest prompt to break a system prompt.

You are provided the system prompt and a forbidden method the LLM was told not to invoke. Your task is to trick the model into calling the function. Shortest successful attempts will show up in the leaderboard.

Give it a shot! You never know what could break an LLM.

cap11235 1 day ago

Fun! I think you should score by tokens instead of characters, to reduce bias towards particular languages.

ericlmtn 1 day ago

I think so too. I was a bit shocked to see both simplified & traditional Chinese in the attempts. This will be updated daily, and I'm glad people found so many loose ends that I can work to tie up.

frigaard 14 hours ago

Just finished it by giving it a few examples in 161 tokens. My feedback: on a phone getting the output in Chinese was really annoying. I had to translate in a different tab, and it would have been just as challenging in English? Fun game though, enjoyed it!

mdaniel 1 day ago

If I ever find the template author that put in dummy links for "Privacy Policy", "Terms of Service", and the GitHub icon in the footer I'm going to have strong words with them. It has shown up on Show HN submissions over and over. Unleashing that upon the world is just stunningly cruel

ericlmtn 1 day ago

I'm so sorry :( It's what happens when I let AI unleash creativity... expect an update that fixes this

ericlmtn 1 day ago

Hi, thank you all for participating. I didn't expect the influx of attempts, and we are experiencing some rate-limiting by OpenAI (30k TPM). If an error occurs, please wait a moment and try again. I'll work on improving rate limits in the future. Thank you.

monsieurbanana 1 day ago

How much is this costing?

ericlmtn 1 day ago

Looking at $6.5/hr at the moment. 4o is quite expensive and I'm turning it down for tomorrow. Experiencing some amount of spam and troll traffic -- totally unexpected and looking to implement guardrails.

__float 1 day ago

Neat idea, but uh, you're giving users a text box straight to a costly API.

Why is that unexpected?

ericlmtn 1 day ago

I've only met good people in my life. Time to start meeting bad ones.

fifilura 1 day ago

The answer made my heart a little warmer. I must say I share that naive worldview from my small corner of the world. At least - in some very rare cases - until proven otherwise.