5 Comments
User's avatar
yocoda's avatar

This is on par with what I've built over time on my own. Let me kindly ask you:

- is a lot of "I like to experiment" represent the amount you're spending on subscriptions? (rather than you feel they help you a lot)

Secondly, the absence of "This whole system feels like it could/should be parallelized to run and cherry pick the best outcomes over 10,000 chats" makes me want to ask:

- if you have thought about the flywheel, which would make what you're doing right now a greedy search style; without the benefit of cherry picking through even 100 attempts?

Expand full comment
Shrivu Shankar's avatar

> is a lot of "I like to experiment" represent the amount you're spending on subscriptions? (rather than you feel they help you a lot)

There's helpful in terms of direct benefits (it goes and does work for me) and helpful in the sense that I learned a new skill that I can take elsewhere (agent/prompt/etc techniques). In this case, most of the budget is towards the latter.

> if you have thought about the flywheel, which would make what you're doing right now a greedy search style; without the benefit of cherry picking through even 100 attempts?

I like to think that this should be handled at the reasoning level of a model (built-in token-based search over scaffolded). I give it a set of constraints and it self-reflects on potential attempts before giving me a final answer.

Also despite plenty of attempts at using custom UIs that could do something like that, I often find myself primarily using the existing app interfaces (perplexity, chatgpt, gemini, cursor, etc) for my day-to-day usage which also makes the N-attempts workflow harder to use easily.

Expand full comment
yocoda's avatar

Really? Nothing spent on "I just want to try this and play around?" Most of mine is spent there; I think I am neglecting to consider the volume of work you might do.

> I like to think that this should be handled at the reasoning level of a model (built-in token-based search over scaffolded).

I was moved by the work in the AlphaEvolve paper. I'm curious if you looked past it and held your opinion or not considered it?

> ... I often find myself primarily using the existing app interfaces ...

Gotcha, so you're of the mindset that "it'll probably happen soon, done well, and I'll use it when it comes."

Expand full comment
Shrivu Shankar's avatar

> Nothing spent on "I just want to try this and play around?"

Hm the opposite, most of my costs (that API budget) goes to "I just want to try this and play around?" (aka what I'm calling "experiments"). Just calling out that playing around != doing productive work that saves me time.

> I was moved by the work in the AlphaEvolve paper. I'm curious if you looked past it and held your opinion or not considered it?

Definitely not making a statement overall on the effectiveness of token-reasoning and N-sampled scaffolding. Especially in scientific research, this type of setup is extremely promising (including from the results of that paper). More so for the scope of this post I'm focusing on my day-to-day usage where that's probably pretty overkill.

> Gotcha, so you're of the mindset that "it'll probably happen soon, done well, and I'll use it when it comes."

When there's a delta between what I think it can do and what I can do in the UI I'll have specialized API-based side-projects (those "experiments") to handle them. Most of my personal usage however is customized prompt-level which I can just copy into a normal UI.

Expand full comment
yocoda's avatar

The smooth and crystal clear understanding you have is insightful; regarding the topic I felt on par at first but I see that my lack of depth stems from my obviously fragmented systems.

I hit walls at times. It's frustrating not understanding how to make the right adjustments. Whereas I saw your toggle customized system as too...

(much key turning? binary? Not sure how to describe it)

In reality its more fluid than I thought.

Big Ups🙌🙌

Expand full comment