Dumb feature on new reciprocity website causes privacy failure by EOY 2025?

Ṁ9755

Dec 31

11%

chance

ALL

On my new Reciprocity website , I've added a feature which is a potential security vulnerability. This market resolves to Yes if either of these occur:

The feature is exploited successfully to lead to a privacy failure.
I remove this feature or substantially limit it because I'm afraid of it being exploited, or because it's too painful for me to monitor it to prevent exploitation. (I'll probably do this only if I believe that the feature is more than 20% likely to lead to a privacy failure.)

By "privacy failure", I mean that any user learns a fact about the checking preferences of any other user that they shouldn't have learned according to the normal rules of the site.

If there's controversy over how to resolve the market, @RyanGreenblatt will adjudicate.

Update 2025-06-30 (PST) (AI summary of creator comment): A privacy failure will be considered to have occurred if any user who was using the site sincerely is misled into making unintended selections that result in a match.
- A user acting sincerely is one who is not trying to cause this market to resolve Yes.
- This condition holds even if only a small number of users are affected.

Update 2025-06-30 (PST) (AI summary of creator comment): If a user is tricked, the market will resolve Yes if the creator believes it would be reasonable for the affected user to be annoyed at him for the feature that led to the incident.

️ Technology Information security

Get Ṁ1,000 play money

7 Comments

Sort by:

For context, here are the security measures.

As people have observed, any user is allowed to submit instructions to an LLM (e.g. "green"), and the LLM then produces a new CSS file that (after security measures) is pushed to all users.

The security measures are:

The model generating CSS is told not to do harmful things.
A smarter model reviews instructions to see whether they seem like attempts to make something harmful happen.
CSS is sanitized before being pushed globally (removing a wide variety of threat vectors related to e.g. executing arbitrary javascript).
I look at the instructions that have been submitted sometimes, and will ban people or adopt improved security measures if I'm worried someone is getting close to breaking it.

I think the main threat vector is that someone might be able to persuade the LLMs to adopt a change that causes users to be confused about which checkbox corresponds to which user, e.g. by flipping all checkboxes left to right, or rearranging rows. This might lead a user's information to be leaked, e.g. if they check the wrong person and then match with them, thus inadvertedly revealing the other user's check to them, and perhaps who they wanted to check to the other user.

I don't think this is going to happen, and if it happens it will only affect a small number of users.

If you're worried about your user data being leaked, consider changing the theme before using the site. This is especially true if the tagline of the site seems to be subtly hinting that something weird is happening.

> if it happens it will only affect a small number of users

I think this feature is overall fun and fine to leave in, but just to clarify, this market will resolve positively even if it does only affect a small handful of users?

@OliverHabryka Yes, that's right, if any user who was using the site sincerely (rather than trying to make this market resolve Yes) is misled by the website into clicking checks in ways that lead to them matching with (people, activity) pairs they didn't intend, that's a Yes.

@Buck what if I somewhat purposefully trick them? (Mostly to reduce collateral damage)

@OliverHabryka If I believe you actually tricked them, such that it would be reasonable for them to be annoyed at me for adding a feature that led to them seeing info inappropriately, then that will resolve as Yes.

@OliverHabryka And note that I will actively defend the website based on looking at the logs! (Though I guess I also DM'd you the prompts so you could red team more easily, which perhaps I'll regret...)

current mood

Related questions

Related questions