Great article Shrivu. I can relate to a lot of the things you are wrriitng a about both on my daily job and also in general in my personal and artistic life (I am a failed musician who still make music).
I have two other interesting questions that I believe are adjacent to the set you created, and i would love to hear your opinion about.
First: what should we read? As manager, my job is to consume vast amount of information and to make (hopefully) good decision based on that. And of course to keep learning.
I heavily mean in AI to decide what to read and what not to read. I built my own provate shmmarization tool that runs locally (so it's 100% private) and heavily lean on it to learn new things. I found this improved my learning speed significantly. What is your take on this?
Second: as a manger, I'd like to think I still have a in depth knowledge of the tech ( I keep learning) and I constantly evolve my system design skills.i can still code but of course my code is horrible, as I have not coded full time for many years now. I found AI coding liberating as it allows me to use my skills to write good software.
My vision for the future is that AI will make me as good as Sr. SWE and will radically change how I manage and run teams. What are your thougs in this?
My initial thought would be "whatever you want to", where if the raw text isn't that, then using an LLM to transform it into what you'd like to read. For something poetic/personal you probably want to absorb the raw text whereas 10+ docs you need to skim for work might be better for LLM targeted summarization.
Often times I'm reading a document but I care about some key subcomponent, "in the book life 3.0, how does the author describe each potential future -- provide as a comparison table with direct quotes from the text". I wanted to know something particular and just extracted it in the way that was easiest for me to skim/ingest.
I've never really had the patience to get through a long book even on a topic that interested me and recently I've just used a GPT to ask targeted questions as a way to absorb the content. Sort "critical reading" to the extreme where everything I'm learning is from my own questions rather than me just reading the underlying text.
> My vision for the future is that AI will make me as good as Sr. SWE and will radically change how I manage and run teams
I think this is sort of similar to "if I can generate images with AI than I can be as good as Picasso" -- which is both true and false. In terms of raw idea -> code outcome, I think you will indeed become on par with a SWE3 but that fundamentally how you might breakdown/frame more complex novel problems could be limited in the short term (i.e. an "engineering taste") but the gap would gradually decrease over time.
AI has already heavily influenced how I run my current team so have no doubt that if you apply it (i.e. as mentioned in the post) it'll help make more effective decisions than you could on your own.
This is on par with what I've built over time on my own. Let me kindly ask you:
- is a lot of "I like to experiment" represent the amount you're spending on subscriptions? (rather than you feel they help you a lot)
Secondly, the absence of "This whole system feels like it could/should be parallelized to run and cherry pick the best outcomes over 10,000 chats" makes me want to ask:
- if you have thought about the flywheel, which would make what you're doing right now a greedy search style; without the benefit of cherry picking through even 100 attempts?
> is a lot of "I like to experiment" represent the amount you're spending on subscriptions? (rather than you feel they help you a lot)
There's helpful in terms of direct benefits (it goes and does work for me) and helpful in the sense that I learned a new skill that I can take elsewhere (agent/prompt/etc techniques). In this case, most of the budget is towards the latter.
> if you have thought about the flywheel, which would make what you're doing right now a greedy search style; without the benefit of cherry picking through even 100 attempts?
I like to think that this should be handled at the reasoning level of a model (built-in token-based search over scaffolded). I give it a set of constraints and it self-reflects on potential attempts before giving me a final answer.
Also despite plenty of attempts at using custom UIs that could do something like that, I often find myself primarily using the existing app interfaces (perplexity, chatgpt, gemini, cursor, etc) for my day-to-day usage which also makes the N-attempts workflow harder to use easily.
Really? Nothing spent on "I just want to try this and play around?" Most of mine is spent there; I think I am neglecting to consider the volume of work you might do.
> I like to think that this should be handled at the reasoning level of a model (built-in token-based search over scaffolded).
I was moved by the work in the AlphaEvolve paper. I'm curious if you looked past it and held your opinion or not considered it?
> ... I often find myself primarily using the existing app interfaces ...
Gotcha, so you're of the mindset that "it'll probably happen soon, done well, and I'll use it when it comes."
> Nothing spent on "I just want to try this and play around?"
Hm the opposite, most of my costs (that API budget) goes to "I just want to try this and play around?" (aka what I'm calling "experiments"). Just calling out that playing around != doing productive work that saves me time.
> I was moved by the work in the AlphaEvolve paper. I'm curious if you looked past it and held your opinion or not considered it?
Definitely not making a statement overall on the effectiveness of token-reasoning and N-sampled scaffolding. Especially in scientific research, this type of setup is extremely promising (including from the results of that paper). More so for the scope of this post I'm focusing on my day-to-day usage where that's probably pretty overkill.
> Gotcha, so you're of the mindset that "it'll probably happen soon, done well, and I'll use it when it comes."
When there's a delta between what I think it can do and what I can do in the UI I'll have specialized API-based side-projects (those "experiments") to handle them. Most of my personal usage however is customized prompt-level which I can just copy into a normal UI.
The smooth and crystal clear understanding you have is insightful; regarding the topic I felt on par at first but I see that my lack of depth stems from my obviously fragmented systems.
I hit walls at times. It's frustrating not understanding how to make the right adjustments. Whereas I saw your toggle customized system as too...
(much key turning? binary? Not sure how to describe it)
Great article Shrivu. I can relate to a lot of the things you are wrriitng a about both on my daily job and also in general in my personal and artistic life (I am a failed musician who still make music).
I have two other interesting questions that I believe are adjacent to the set you created, and i would love to hear your opinion about.
First: what should we read? As manager, my job is to consume vast amount of information and to make (hopefully) good decision based on that. And of course to keep learning.
I heavily mean in AI to decide what to read and what not to read. I built my own provate shmmarization tool that runs locally (so it's 100% private) and heavily lean on it to learn new things. I found this improved my learning speed significantly. What is your take on this?
Second: as a manger, I'd like to think I still have a in depth knowledge of the tech ( I keep learning) and I constantly evolve my system design skills.i can still code but of course my code is horrible, as I have not coded full time for many years now. I found AI coding liberating as it allows me to use my skills to write good software.
My vision for the future is that AI will make me as good as Sr. SWE and will radically change how I manage and run teams. What are your thougs in this?
> what should we read?
My initial thought would be "whatever you want to", where if the raw text isn't that, then using an LLM to transform it into what you'd like to read. For something poetic/personal you probably want to absorb the raw text whereas 10+ docs you need to skim for work might be better for LLM targeted summarization.
Often times I'm reading a document but I care about some key subcomponent, "in the book life 3.0, how does the author describe each potential future -- provide as a comparison table with direct quotes from the text". I wanted to know something particular and just extracted it in the way that was easiest for me to skim/ingest.
I've never really had the patience to get through a long book even on a topic that interested me and recently I've just used a GPT to ask targeted questions as a way to absorb the content. Sort "critical reading" to the extreme where everything I'm learning is from my own questions rather than me just reading the underlying text.
> My vision for the future is that AI will make me as good as Sr. SWE and will radically change how I manage and run teams
I think this is sort of similar to "if I can generate images with AI than I can be as good as Picasso" -- which is both true and false. In terms of raw idea -> code outcome, I think you will indeed become on par with a SWE3 but that fundamentally how you might breakdown/frame more complex novel problems could be limited in the short term (i.e. an "engineering taste") but the gap would gradually decrease over time.
AI has already heavily influenced how I run my current team so have no doubt that if you apply it (i.e. as mentioned in the post) it'll help make more effective decisions than you could on your own.
This is on par with what I've built over time on my own. Let me kindly ask you:
- is a lot of "I like to experiment" represent the amount you're spending on subscriptions? (rather than you feel they help you a lot)
Secondly, the absence of "This whole system feels like it could/should be parallelized to run and cherry pick the best outcomes over 10,000 chats" makes me want to ask:
- if you have thought about the flywheel, which would make what you're doing right now a greedy search style; without the benefit of cherry picking through even 100 attempts?
> is a lot of "I like to experiment" represent the amount you're spending on subscriptions? (rather than you feel they help you a lot)
There's helpful in terms of direct benefits (it goes and does work for me) and helpful in the sense that I learned a new skill that I can take elsewhere (agent/prompt/etc techniques). In this case, most of the budget is towards the latter.
> if you have thought about the flywheel, which would make what you're doing right now a greedy search style; without the benefit of cherry picking through even 100 attempts?
I like to think that this should be handled at the reasoning level of a model (built-in token-based search over scaffolded). I give it a set of constraints and it self-reflects on potential attempts before giving me a final answer.
Also despite plenty of attempts at using custom UIs that could do something like that, I often find myself primarily using the existing app interfaces (perplexity, chatgpt, gemini, cursor, etc) for my day-to-day usage which also makes the N-attempts workflow harder to use easily.
Really? Nothing spent on "I just want to try this and play around?" Most of mine is spent there; I think I am neglecting to consider the volume of work you might do.
> I like to think that this should be handled at the reasoning level of a model (built-in token-based search over scaffolded).
I was moved by the work in the AlphaEvolve paper. I'm curious if you looked past it and held your opinion or not considered it?
> ... I often find myself primarily using the existing app interfaces ...
Gotcha, so you're of the mindset that "it'll probably happen soon, done well, and I'll use it when it comes."
> Nothing spent on "I just want to try this and play around?"
Hm the opposite, most of my costs (that API budget) goes to "I just want to try this and play around?" (aka what I'm calling "experiments"). Just calling out that playing around != doing productive work that saves me time.
> I was moved by the work in the AlphaEvolve paper. I'm curious if you looked past it and held your opinion or not considered it?
Definitely not making a statement overall on the effectiveness of token-reasoning and N-sampled scaffolding. Especially in scientific research, this type of setup is extremely promising (including from the results of that paper). More so for the scope of this post I'm focusing on my day-to-day usage where that's probably pretty overkill.
> Gotcha, so you're of the mindset that "it'll probably happen soon, done well, and I'll use it when it comes."
When there's a delta between what I think it can do and what I can do in the UI I'll have specialized API-based side-projects (those "experiments") to handle them. Most of my personal usage however is customized prompt-level which I can just copy into a normal UI.
The smooth and crystal clear understanding you have is insightful; regarding the topic I felt on par at first but I see that my lack of depth stems from my obviously fragmented systems.
I hit walls at times. It's frustrating not understanding how to make the right adjustments. Whereas I saw your toggle customized system as too...
(much key turning? binary? Not sure how to describe it)
In reality its more fluid than I thought.
Big Ups🙌🙌