No, I don't want to run the linter

Streamed 2024-10-24

Streaming info and archive
← Previous Stream | Next Stream →

Tags: stream vibecoding

Catching up on the Sorbet bugfix (merged!) and avatars (more Active Storage stumbling blocks!), then from 19:06 a demo of Aider, which I think is the best and fastest-improving LLM coding tool available. Where the risks are, the useful parts, the failure modes.

scratch


topics:
x fun little bugfix to sorbet https://github.com/sorbet/sorbet/pull/8220/
x avatar checkin https://bsky.app/profile/push.cx/post/3l74id6crpi2d
x llm coding tool demo


LLM coding tools:
x Socrates on LLMs
x simonw post: https://lobste.rs/s/2zyokg
x   legal risk
x https://aider.chat/
x linter demo
x   https://github.com/lobsters/lobsters/pull/1356
x   bundle gem lobsters_css_linter
x   git work loop
x   repo map
x   notes
x   test first and test last (and test never)
x   cost (lol)
x limitations
x   david_chisnall criticism: https://lobste.rs/s/nxqic4
x   sycophancy
x   feedback loop
x   attention window
x   doesn't reason
x   when does quality matter
x   training data :|
x good for
x   vim_bash, demo streaming layout, transcribe.rb
x   glue code, one-off - dual use of 'slop'
x   not really prototypes
x   API discovery
x   scripting - bash, ffmpeg, tar
x jr dev pairing - revisions, latency, pseudocode

post-stream:
  check out Moral Codes by Alan Blackwell - dpk0

title:
  I want bad code, fast
  No, I don't want to run the linter.

⊕ Transcripts are generated with whisperx, so they mistranscribe basically every username and technical term. They're OK but not great, advice appreciated.

Recording

Autoscroll transcript

03:07This is office hours for lobsters. So if you have any questions about the site or the code base, please feel free to pipe up about that. I am very happy to take questions, run queries against the production database. I appreciate it if you try and find a stopping point, but you can just kind of pipe in any time and I will throw it on my to-do list here in the scratch file if needed. I am a little bit scratchy, so the cat wanted to sleep on my face. We'll see how that goes. So the first thing I wanted to check in on was the commit I made to Sorbet to fix a bug. And I mentioned this on stream. This spins out of recheck, which is the gem I'm working on and will demo on Monday, I promise. I know I said I would get to it today, but I figured I just really liked a post on lobsters and I wanted to do a stream kind of responding to and building on it. So Sorbet is the type system, the gradual type system that builds on top of Ruby. It was started inside of Stripe. I used it when I was there. I have had a social chat with the maintainers, but don't really know them or anything. Never looked at the code base before besides reading the docs and realizing that there's a ton of interesting design decisions in Sorbet, actually. A lot of them are they made the decision early that they didn't want to add syntax so Okay, they do some clever stuff to. allow you to express signatures and in Ruby directly and I appreciate a good DSL. And I think this is one of the rare cases where a DSL is 100% warranted. And then. There's a whole bunch of really interesting working around limitations trying to add a gradual type system to Ruby. So I think if you like programming language design or PLT, programming language theory at all, it's worth checking out because it's not an enormous project, or at least the docs are not enormous. Anyways, this is all to say my favorite new feature in Ruby is the data class. It's It's like OpenStruct, except it is what OpenStruct would be if it was possible to break backwards compatibility. So it's a one-liner for defining data classes, small classes like result and success. I kind of demoed some code maybe three or four streams ago. And the objects you get back are immutable by default. in Ruby's way where, you know, atomic elements are immutable, but then if you put a list or a hash in it, the reference to that list or hash is immutable, but then the hash or the list is mutable by default. So you either want to freeze stuff going in or be real deliberate about how you, anyway, don't foot gun yourself is what I'm saying, but it has so many fewer foot guns than OpenStruct. So I'm in love with it. And I've gotten Recheck to a pretty good place, and I've been doing so much refactoring on Recheck as I've been developing and starting to alpha test it that it didn't make a lot of sense to lock things down with types. Until now, I've gotten to a fairly mature place where I'm ready to transition into beta testing. pushcx https://recheck.dev
You can join the announcement list. Where is it? I don't know that it's linked anywhere, but I'll throw it in here. If you want to join the beta list, I'll send an email. It's on my to-do list for later today, but maybe more like tomorrow about Monday's demo. And then when it's available for beta testing, I will call for beta testers. So I started adding Sorbet to the gem and I immediately ran into a bug in Sorbet where it thought that you could not create a data class without having arguments, which is weird because in the documentation, so I believe Sorbet kind of copies these, I don't know what to call them. They feel like doc sigs. I can't remember the Ruby name for this style of writing comments that get extracted into API docs. There's a name for that tool and I'm just blanking on it. I do this in interviews too. I was actually just telling this story. I have some stress hang up in interviews, like job interviews, where I forget proper nouns and it gets fairly ridiculous. I've learned at the very beginning of an interview to mention that this happens to me because I had one where I was like, yeah, so I gave this talk and it's all about the database library that Rails uses and some limitations of it. and how you can work around some, but you fundamentally can't work around others. The Rails library that does, and I could not for the life of me remember the name, ActiveRecord, even though I had stood on a conference stage and talked about ActiveRecord for an hour, 45 minutes, and written up slides and written blog posts. I've contributed a patch to ActiveRecord, a very little one, But, you know, in the moment of a job interview, I lose proper nouns, which is embarrassing. So anyways, I'm forgetting the name. I guess it's RubyDoc. So the extracted RubyDoc here in Sorbet shows an example of calling data define without any arguments. And the value of that is it's really nice for error types where there's no additional information you can give or you can think of like the maybe monad or i actually use it in the reverse and recheck because success you don't really need to try track stuff and you might have I mean ideally you will have several million successes to one failure so don't bother tracking data on all the successes but then Sorbet required that you have arguments and so that kind of stopped me dead on adding Sorbet to recheck and I went to report the bug and their little issue template was like hey wouldn't you like to contribute a patch to Sorbet? That is much more likely to get merged quick than an issue report. And I kind of looked at the list of, you know, the number of open pull requests versus the number of issue requests. And I was like, yeah, actually a PR is more likely to get merged. So I kind of hacked it out. This part is not my code, this line. because Sorbet is split between the Ruby section that I can read, you know, I don't know its APIs, but I can read it pretty clearly. And then the C++ side that I don't, you know, not only do I not know the API, I am way out of practice with writing C++. jangomandalorian Hey @pushcx 👋🏼 Hey chat!
I haven't written non-trivial C++ in 20 years. Oh, hey, Django, welcome back. daveceddia howdy! 👋
Anybody else, feel free to say hi or sound off. I appreciate it. And I know there are a couple of folks who I know from bootstrapping who are going to drop in today. And there's one. Hey, Dave. Yeah, it's been really fun to see the way all of a sudden in the last three days, like all of the bootstrappers decided to move over to Blue Sky. That's really been encouraging and a lot of fun. So anyways, this was a fun little patch. I've talked about it on stream, I think a little bit before, or maybe I just talked about using Sorbet on recheck. And I gave that little demo of data.define, but I made this patch and I had to say upfront, like I have never used RBI before. I may have obvious errors or misunderstandings because I wanted them to come in and understand that Like I know what a survey type signature is, but I sure don't know my way around your code base. And the couple of maintainers who piped in were really friendly and welcoming and gave some pointers for what to fix. I got it most of the way there. And then Jez, who I think is the primary maintainer, or at least the one that I've seen his name on stuff the most. pushcx https://github.com/sorbet/sorbe…
carried it over the line, and it was really nice. So I'll throw the link in chat in case anybody wants to read the details, but that's that whole bit. And then I wanted to check in on one other thing, which was the avatars that I've been coding on for, what, three or four streams? I said at the end of the last stream I was probably going to finish it off stream and. I got a thread of stuff that i'm mad about an active storage, which was kind of the theme of the last two streams, to be honest. pushcx https://bsky.app/profile/push.c…
So there's that limitation i'm not going to ride this hobby horse around but there's that rails limitation in. URL generation where it really only wants to do it in a request cycle. And then it has action mailer has a thing bolted on the side because action mailer is really often going to get called from background jobs. So it's not going to be in the request cycle. So it's not going to know the host name that was requested or the protocol or the port that was requested. daveceddia ohh yeah this sucks inside Channels too :/
And then active storage like picked up a copy of that but it still points to like hey rails wouldn't it be good this sucks inside channels too yeah because you're not in the request cycle and you're using a slightly different protocol it is it is one of my biggest gripes about rails honestly after plurality whether things are singular or plural it's probably my biggest gripe about rails because it's a web app like reading and generating urls is fundamental features this should not be locked into the controller request response cycle and then i showed i think i showed some of this on the last stream where active storage does a bunch of one plus end queries that ticked off persopite which i have set up to catch those in development that's absolutely mandatory for rails development Since I posted this yesterday, I found that they have two methods you can add on to try and preload attachments. And one of them is for if you are directly loading the user. And then the one I found after I made this tweet, skeet, whatever, was about finding multiple records and in this context this is when listing all the stories they all have authors on them those authors have avatars and so we need to fetch mul we need to preload multiple users avatars and i just was poking around in the active storage issue tracker because of other stuff and i found their preloader And then I ran into another limitation that I did not talk about on stream because I hadn't seen it yet. But when you use the disk service, the files come out with a header that says don't cache them, which is a really reasonable default. It's conservative in the right ways where it just assumes you want this to be private and that doesn't just make sense for coming out of base camp it makes sense generally but there isn't a way to override it like there isn't an api for it and on the s3 and google cloud backends you can they sort of they sort of added a way you can write in config storage that when you store something You can set what the cache control header will be. There's a lot of complexity going on for those remote ones where they want to get Rails out of the request loop. But there isn't such a feature for the disk service, and I definitely can't do it at read time. So this one was just one was just frustrating and i would like to have nginx serving these files directly so maybe i can just punt on the whole thing and set it in my nginx config but it's weird and complicated and it's one more of those places where active storage feels a little bit half-baked and i tried to end on a little more positive well mostly positive note i think i really missed the tone I've only been seriously bothered by how Active Storage is like, yeah, you just give me whatever file and I'll persist it. And then I'll just trust the user to tell me the MIME type of the file. That's a very 90s kind of web design. And I get how it came out of Basecamp like that because... mbuckbee heyo
When your user is paying $20 or $10 or $30 per seat per month, they're really unlikely to put malicious files in to share with their coworkers. But I'm literally doing, oh, hey, Mike, nice to see you again. Or I guess here on stream for the first time. But the really ironical part of it is that the active storage guide talks through, Hey, here's how you store avatars, which are an image attachment to users. And it's like, that's, that's the exact thing I'm doing. And then I'm just constantly running into stuff. It can't do that is all the, you know, if you were actually implementing it instead of it being a toy in the guide. I immediately ran into all these issues. So that's been frustrating. I'm thinking about whether it's worth trying to engage with the active storage team and push patches for some of this stuff or push doc fixes for some of this stuff. And if I had more attention, it would be a lot easier, or if they had been more receptive to some of the bugs. Don't need that open yet. All right. So unless anybody has any questions about where active storage is or Lobster's office hour stuff, which you can pipe up with at any time, I will roll into ridiculous coding. It's kind of a combination of an absolutely trivial nitpicky pull request fix and where some tools are useful because I saw a really nice blog post. So let me show you that pull request and bring the browser back up here. There is a CSS issue that we have on lobsters where when you open the drop menu for flagging a story or getting the cached link for a story. Let me just show you that. We'll do whatever the top story is here. You can click caches and in some situations this menu kind of falls behind the text box and it's really frustrating. And the basic gist of it is we apply transparency to kind of fade out stories as they get heavily flagged and hidden. This happens in comments too, but less frequently. And as soon as you apply opacity in CSS, you get a new z-index stacking context, which means easeout HyperParkour sup
Long story short, this bug just keeps coming back over and over. I want to say we've fixed it about once a year for the last five years, maybe longer. It's just one of those. Ah, hey, Ezout, welcome back. Nice of you to tune in while I'm griping about CSS. If you would like, for context to anybody, Ezout is the person who added dark mode to lobsters and occasionally I get to trick them into fixing fiddly CSS issues. So if you would like one to really sink your teeth into, this recurring bug where menus appear in the background because opacity creates a new stacking context. pushcx https://github.com/lobsters/lob…
Oh, come on. GitHub. Here, I'll throw you the link. Thank you for volunteering, which you have done by standing around in the channel for two seconds. This bug just keeps coming back. So someone came and fixed the latest version of it. And I'm going to show you what this looks like a second ago. So here was, I never remember which order these run in. This one. So the first version of it, so it writes a really specific selector, and it says have that opacity. It's fine. It's probably a correct fix. Actually, it may not be. Now that I look at it again. And this is super nitpicky, but there isn't a space between the selector and the opening brace. This is literally a one-character pull request fix. sfioritto Hey hey. 👋
And I took on, years ago, I took on the giant project of adding RuboCop to lobsters, and then I added standard RB because I was tired of having opinions about linting rules. I don't really care. Ah, hey, Sean, welcome. I don't really care what the linting rules are, and I have very little interest in painting a bike shed myself, but I do care that the bike shed is painted, and that there is some rule, and I hate wasting time and attention from volunteer contributors for this kind of incredibly nitpicky style issue. So on the ruby side, I was able to add a linter, and I'm really... enormously happy with standard rb i think the world of it on the css side we don't have a linter because they're all written in node and i don't want node to be a development time dependency of lobsters i don't want it to be a production time dependency either we currently uglify code with it and i could remove that anytime mbuckbee standard rb ++
minifying code is kind of a waste if you use gzip and we do use gzip via nginx if you use gzip the benefit you get from minifying code is very very small and not worth the hassle of installing node basically so i want to get rid of node entirely, and we don't use it in development mode, which means we don't lint CSS, which means I have to do these incredibly nitpicky kind of things like say, hey, in the CSS, please insert exactly one space character here. That is not a good use of contributor time and attention. My little stream health has gone to shit. If I sound terrible or the video is awful, please let me know. But otherwise, I will just assume it is the stream manager slipping out of sync with reality, which seems to happen about once a stream now. Although it's been a minute since I've gotten any best viewers on Twitch spam, so kudos to Twitch for that, Pix. So as you can see, I had to suggest this change to the committer. And they made the change. mbuckbee Is this all the stuff you're going to fix with aider?
And then they wrote something that basically said they are new to coding. I think they probably are also one of the people that Ryzen 114, who was another junior dev new contributor. Yeah, Michael, I'm going to jump into writing a linter in Ruby. from scratch, live on stream. You are skipping ahead. mbuckbee video stopped a few times but came back
So there are these LLM coding tools, and the most popular one is Copilot, because Microsoft is pushing it in VS Code, which is kind of now the weirdly standard and dominating editor. But Copilot is pretty darn limited. All right. Please keep giving me feedback. My little graph is recovering, but We'll see how it goes. I don't get a lot of control over these things. It is not an encoding issue, according to OBS, but Twitch says it's sad. So we'll see how it goes. Luckily, this is not like a 60 FPS video game. This is a browser and me rambling and Vim.

26:49pushcx https://lobste.rs/s/2zyokg
So the other, yeah, let's go in a little different order. The other part of why I wanted to do this demo stream was this post from Simon Willison, who has written a lot about LLMs and is an Active Lobsters user, which is very nice. He's got a lot of, I share a lot of his opinions. And he wrote this post about everything he built with Claude artifacts in a week. artifacts this is not how i would describe artifacts so claude is like imagine your standard llm chatbot which is very funny to think of something that's what two years old and have the word standard in there but the common llm chatbot and artifacts are look i'm going to ask you to generate a bunch of code just give it to me in a file-like interface so I can click to download the whole thing instead of having to copy and paste in and out of the web or in and out of the iOS app. And Simon gave a bunch of examples of stuff that he's written in the last week, including his prompts and transcripts. So this is pretty darn useful. And I've found Some value in llm coding tools for really similar tasks, so I wanted to talk about what I see as the theme here. Which is a lot of this stuff is. glue code where. Like literally this first one of. Have a web page. mbuckbee stuff was pretty cool
And I just want to be able to take the web page and grab the content and drop it from my one tool, Mobile Safari, into my other tool, GenoReader. hejihyuuga Hello Mr pushcx
And this kind of one-off tool or glue code is where I see LLMs shining. So... hejihyuuga Hope you're doing well
my linter thing my linter concern or hassle was an example of it and i'll talk about it in a minute but first i want to talk about socrates who is obviously relevant to llms hey welcome back hedgy so for folks who like reading ancient greek literature this one's a catch-up but There's kind of a famous bit from ancient Greece where Socrates tells this story about the invention of writing. When it came to writing, it will make Egyptians wiser and improve their memory because they thought that writing came from Egypt. And Socrates goes on to say, in fact, writing will induce forgetfulness into the soul of those who learn it. They will not practice using their memory because they will put their trust in writing, which is external and depends on signs that belong to others instead of trying to remember from the inside completely on their own. You have not discovered a potion for remembering, but for reminding. You provide your students with the appearance of wisdom, not with its reality. And that is a big chunk of the criticism of LLMs in the last year or two that I've seen of if you are using these, you are not learning about code. And this might be odd given that I'm about to demo an AI coding tool for a minute, but I agree with Socrates on this one. I think arh68 ya but u didn't hear Socrates say it. u just read about it SeriousSloth
socrates is right about llms and also he is right about writing writing does introduce forgetfulness because you don't have to memorize everything and he didn't see how that could be good yeah i didn't hear socrates say it i just read about it that is a good point arh mbuckbee What if I use the LLms to explain my code to me?
hejihyuuga Writing introduces forgetfulness, but it frees up cycles that you can spend on other cerebral tasks
So that is one of the values of writing that he just straight up misses, which is fair because for him writing was, you know, a thing that a single digit percentage of the population could do and an individual probably papyrus based on the era would be very expensive. So it's not, you know, like modern paperback publish. What if you use LLMs to explain code to you? Yeah, I think that's part of it. arh68 it's an interesting quote HahaThink
Ah, y'all have very good points. Thank you, Michael and Hedgie. And my point is, yes, if you really want to learn something, you have to memorize it and you have to internalize it. And we've done incredibly well in the last 50 years with things like spaced repetition software for memorizing things. That's been enormously helpful for me in studying computer science because I've had the experience in computer science of hejihyuuga (For the record I agree with the point you're making))
Like I run into an explanation of a thing I don't use very often like what's the difference between a semi group and like on a another group term, but you know terms from group theory and category theory. easeout I like that they show me new things I can learn while verifying with real docs
And I can remember having learned them multiple times. But then I don't use them in day-to-day life, so I forget them. So if I use something like Enki to memorize them, I'm not going to forget that, oh, the difference between a semigroup and a monoid is whether or not they have a commutative operation. And I am almost verbatim reciting for you my Enki card about this.

33:25Yeah. Ease out. You have another good point there. So You know, I agree with Socrates. He's right, but it doesn't matter. It's up there with, does a boat swim? Like, fish swim and people swim. Does a boat swim? I don't know, man. That's kind of a category error. Like, it is right in one sense, but it is misleading in a lot more senses. Because, you know, a modern boat, whether it's a modern racing yacht or... A container ship does things that you cannot do while swimming.

34:09The other thing that caught my attention about Simon's post was was not in his post, it was a thing I asked him about in the comments. easeout a boat exhibits the duck type
Where I mentioned, I know I wrote about this on lobsters before I can't find that old comment from a couple months ago but. Yeah, a boat exhibits the duck type. That is a good way to put it, actually, especially given Sorbet, right? There isn't any case law on who owns the copyright that comes out of LLM-produced material. There's also, like, is the training data copyright infringement in the first place that one seems probably yes the output i don't know like it might be transformative use it might not that one is really hard to say and i think there's an interesting an interesting quirk i think in my blue sky thread i called it the output of it is well maybe the output from these llm coding tools is kind of toxic waste if it turns out that Whoever owns the copyright on the training data owns the copyright on the output. That's, you know, you just have to delete the code at that point. You cannot rescue it if a million unfindable contributors might own partial copyright on your code. That's so weird. And the way the US legal system works without getting too hard into politics, please, not going to have anything on this for a couple years until it's settled and simon gave some great resources here for how the different providers are trying to handle this by saying basically if you get sued we will step up and defend you because that's an existential threat to our business if you can get sued for copyright infringement for using our tools but the actual experience of getting sued in federal court is pretty expensive and unpleasant, regardless if one of these big companies wants to step in and pay the bill for you. So I just wouldn't want to go there. And it kind of hybridizes well with where I see LLM coding tools as useful. I used to talk about, you know, I am just going to slop a script to do this and now we call llm output slot and i think that's sort of fortuitous the the first thing i used claude for was i i use fetchmail you know where is it so fetchmail is a older unix utility for grabbing mail from one server and plopping it in a local directory And it's one of those that's been around for like 30, 35 years. And so its config file is its own weird syntax. I'm trying to see if there's a sample in here real quick in the man page. easeout I use copilot readily for code you don't ship—CI scripts, codegen scripts, local dev convenience
And yes, it's just a man page. There isn't like a great doc or anything. Oh, somebody want to talk about it? easeout tests
So the first thing I did was I had this chore. Here we go. So here's some examples of, oh, God, they're explaining how it handles whitespace.

37:50Yeah, I can see where Copilot would be useful there, Ezio. So you just kind of list your accounts and you say to pull them and you give some settings like what's your user, what's your password. You can add, you know, oh, the user there is this user here. mbuckbee I'm a strong proponent of using AI tests vs AI gen code to "fight" and get a better overall codebase
And it has, as you can see by how long this scroll bar was and how long it took me to find a sample, it has accumulated plenty of options over the years because it is a glue code. So the first thing I asked Claude to do was write me a Ruby function for parsing a FetchMailRC and give me back a hash that is each pull section I want a server, a user, and a pass. And in two seconds, it spit out a function that worked to do this. And that was kind of impressive. And it was, I don't think I have a copy anymore, but it was like just terrible junior dev code. Just all ifs and mutating and slamming strings together. And I had the feeling of like, well, what if this wasn't bad? And I said, okay, but rewrite that using MapFilterSelect instead of string manipulation. And three seconds later, it wrote basically the code I would have written to turn a FetchMailRC into a hash. And this is just one of those tedious one-off kind of scripts you have to do sometimes in coding. or in running a Linux desktop, I have so many of these little, like, oh, turn off the HDMI and toggle off the Bluetooth and switch all the sources over. And it's great to be able to script them, but it's really tedious to have to script them. So this kind of one-off script, gluing stuff together, I think is where LLMs really shine because Do I want to learn the minutiae of fetch Do I want to write the normal form parser for fetch mail RC? No, I'm going to use this once and throw it away. I want bad code fast. Oh, there's a title. For new viewers, sometimes I say silly stuff, and if I hear myself say something especially silly, I will use it as the title for the stream in the archive. So if you hear me say something silly, please do highlight the quote and tell me to use it in a title. And then at the end, I'll pick one. So I used Claude, the web interface, for a while. And this was before they added that artifacts feature that Simon talks about and copying and pasting stuff into and out of Claude is really tedious. I hated doing it. I had the most irritating kinds of copy and paste bugs where like you missed the closing parentheses at the end of a hundred lines and then you get a weird syntax error, like that kind of stuff all the time. And somebody pointed me to Ader and I wanted to say it was, pushcx https://aider.chat/
s fiorito who said hi earlier but maybe he just mentioned it in passing because i got the impression he hasn't used it so i'll throw this link in chat for anybody who wants to play with it it's a so i've seen copilot i don't use vs code and copilot is very line oriented it's gotten better over the last few months where it can start spitting out full functions and tests as ease out mentioned but don't use vs code and it's not working the way i was thinking so i didn't get into it and cursor is a little bit of a step up from that it's sort of a fork of vs code that builds in more ai tools and one of the things where it's nicer about than certainly than copying and pasting into claude's web interface is it runs in your code base and it can introspect the files in your code base which is nice because i write rails all the time although i really am reluctant to use it in a rails app because of the the open issue about copyright but it's good for saying like oh in rails hejihyuuga My other apprehension about LLM tools is that I prefer to not have to pay a subscription (I might be unaware of free options)
When we fix this bug, we're going to have to touch the user model and the controller and the view, and that's spread across four files. easeout code editors need a magnetic selection tool… automatically round off the starts and ends of selection ranges to something meaningful
And I don't want to paste four files into and out of Claude and then, like, have to go back for a fifth. That's just awful. An 8er, I think because it's terminal, like, number one, I just jive with terminal. Code editors need a magnetic selection tool that automatically rounded the start and end of selection ranges. I'm actually, I spend a lot of time mad at the iPad about that. hejihyuuga Helix editor has some syntax aware selection tools
I rather than use a laptop for the last couple of years, since I built a big desktop, I just have a, an iPad and I use it with like a keyboard cover as a remote terminal. And it's. hejihyuuga I'd imagine vim and emacs can do something similar
It's so painful that when I select things in the terminal, the iOS selection kind of finds like, oh, I assume that you want the whole line or you didn't intentionally select half of that word. I will change the selection to say the whole thing and working in code that's just... It's an endless irritation. And the most irritating thing about it is iOS will, if I adjust the end of it, that code kicks in and it will just slap it over to the side. easeout iOS fine text editing is a blowup
I hate tools that are like, oh, you tried to do something. Let me do what I thought you meant. It's weird that I don't get that annoyance for AIDR very much. Yeah, I guess I would say I am mostly irritated at iOS. The idea of a magnetic selection tool is not bad. It's also not bad if it's like the first pass, but it listens to me when I correct it. hejihyuuga That's been google for me over the last year. I'll search something specific, and google will be convinced I mean something different
There's a neat AIDR feature about that that I'll show off a little bit. All right, so there was that bug. I showed the bug. I don't need to show the PR anymore. So I wanted to just drop into a demo. And because I'm talking out loud and I will respond to chat, it's going to take me more than an hour probably. But I will tell you up front that when I knocked this out yesterday, it took 60 minutes. There was like a five minute bathroom and get a coffee refill in the coffee shop kind of thing. But yeah, it was right on 60 minutes. So given how much slower I code when I'm talking out loud and talking to chat, we'll probably come in more like 90 minutes to two hours. But on the flip side, I did this once yesterday, so I can kind of nudge a little faster. So I'm going to close out vim because I want to have vim open in the working directory.

45:34So if you haven't seen the stream before, I pretty much am in Vim the whole time, and you can watch the tabs across the top to make sure you understand what file or terminal I'm farting around in. So let's set up a gem. Because I thought, after that one character fix, I thought, what if I had a CSS linter? How fast could I write a CSS linter that was incredibly opinionated with my opinions and was not useful in a general sense. So like, I'm not calling this CSS Linter. I'm not giving this a friendly name. It is specifically just a me tool, just for lobsters. Really? Oh, it's bundle gem. See, can't remember this stuff. Oh, there's all of my random folks. All right. Let's hop back into Vim. Get my scratch file open so I don't lose my place. Let's see, what have we talked about? This, this, this, this. And now we're into the linter demo.

46:57All right, so bundle has given the initial Git repo setup. So I'm just going to omit that with the command I used.

47:16So one of the things I was really off put about but came to like is that AIDR is really integrated into Git. The first time I used it and it made a commit in a repo, I was mad. Like, how dare you touch my stuff? In that, what is that? Oh, there's that great meme.

...48Yeah. Let's see if I can get rid of this. Yeah. There was a tweet a year and change ago. All robots and computers must shut the hell up. To all machines, you do not speak unless spoken to, and I will never speak to you. I do not want to hear thank you from a kiosk. I am a divine being. You are an object. You have no right to speak my holy tongue. The first time Ader touched my repo, I had this reaction of like, how dare you? You arrogant assumption. But then I played with it some more, and I was like, oh, actually, this is kind of great.

48:23So we'll show that. So here's the layout. Yeah, why don't I open it in Vim? I have Nerd Tree, right? So here's the layout of a bare Ruby gem that doesn't do anything. And I'm gonna go in the gem spec because you have to set a couple of things. Like this one's optional. You have to have a homepage. So I'm gonna just make one up. I may, lobsters. I may or may not actually publish this gem I'm leaning towards, but we'll see how it goes. And then we will say the same for this and then no change logs. I'm just slopping things. And the summary is an opinion aid. Now a CSS linter just for lobsters. Demo code only for lobsters CSS. No feature requests accepted. Great. All right. So let's look. Git says I've changed this file. Let's commit that.

49:53So let's jump into AIDR. So AIDR is a command line tool. So I'm going to be jumping over here to this terminal a bunch number three, and you give it the files you want it to edit on the command line. So I'm going to give it like the gem file and the gem spec. Oh, right. I've got it. Oh, where's my, ah, so there's a terminal thing happening where it's not in my source history. It's over in a Python virtual environment.

50:37Right, there's this weird limitation in the Git library that it uses where it doesn't like, the Git library doesn't understand index version threes. So this is how Git structures the .git directory. But you can just tell it, like, yeah, use the old one, and Git is fine, other tools are fine, and then Ader is happy. So there's, you know, a little bit of rough edges, but it basically works. And one of the nice things, one of the reasons I think Ader's state-of-the-art over Cursor and Copilot is because it's a terminal app and it's ugly, and they don't have to spend a lot of time on GUI coding. In the terminal, you can just slap things together so fast because it's not like every single thing needs a tool tip and a layout and an event handler and nine other things. You're just like, I'm going to barf to standard out and read from standard in. So give some stats. So I started it with a couple of file names and I poked around. Yeah, I'll just show you what I did. This was a couple of minutes of getting started. I looked here and I was like, is there a CSS linter for Ruby? Because I didn't want to code this, right? And. Oh, that's encouraging. There's something named CSS Lint. But then on the other hand, it hasn't been released in 11 years. Okay. And I'm going to spare you scrolling through this, but they're all similar age. So I was like, all right, let's take a step back. How about CSS parse? Are there any CSS tools for parsing, especially if they don't? just shell out to node. Because I ran into one or two of these that was like, yeah, you just install node. And we're a front end to the node thing. And I was like, yeah, that's what I'm trying to avoid. So in the table, there's one that really stands out for activity. So I checked out crass. And I kind of like it. It's like, yeah, we're fully compliant with the CSS level three specification. And I love that the README goes right into like, yeah, here's some things you don't wanna use it for. It's fine by Ruby standards, but don't use it for millions of lines of code. And it parses, it doesn't try and figure anything out for you. This is just a tool for going from snippets of CSS or files of CSS. into i have a parse tree and just as a sign of what to expect it's like yep you're just getting plain ruby objects back you're not getting a custom class there isn't an iterator involved you don't have to learn like the nokogiri xml traversing api just here's your hash and this is this is exactly the level of maturity i want for this project So there's a thing when you're developing a gem, when you're developing a library, it's, are you, when you are adding a dependency on another gem, do you mean that to develop this gem, you need it like mini test. So you need mini test to develop the gem because it's going to have a test suite. But if you are installing and using the gem, you don't need mini test involved. And so that separation is, Weird, and I can never remember how it works. So the first thing I told it was, we're starting new Ruby gem for CSS lending. Add a dependency on the RAS gem for parsing CSS. So it's pretty conversational. It's pretty chat body. Later on, I will talk about why I'm kind of chatty and personal. But AIDR comes back, and it always summarizes back what you say. It kind of irks me. Like, yes, I know, I just told you to do that. But it does serve a purpose. I'll talk more about that when we talk about limitations of LLMs. there are enough code examples online of how do you add a dependency to a gem that it's just like yeah here's how you do it and so aether goes to claude and says give me the diff for this file and claude will generate a diff and then also it knows like oh yeah probably you're gonna have to run this command and so it's like hey so i When it's showing me this diff, it's showing me its work. It has, in fact, let me reload the gem spec. It has made that commit. And when I say commit, if I look at the log, there's my initial commit, and then where I added those required fields to the gem spec, and then there is a new commit here that Ader wrote. And so it wrote this commit message of add the crass gem dependency. You see how this is slightly rephrased from what I actually said. And it's also like slightly weird. I wouldn't call it a gem dependency. But if I was looking at a junior devs code, this is the kind of writing I would see. And then it's there. It works. It also threw away the original comment, which I kind of like.

56:58The

57:06So then back into AIDR, it says, look, I know you're going to want to run bundle install. Do you want me to run it? So I can just hit enter and it runs bundle install. And then it asks, do you want to tell AIDR the output of that command? I'm just going to say no, because it's not a thing to spend tokens on, but it just basically worked. mbuckbee Aider has a really interesting prompt built into it that repeatedly "takes notes" into a hidden tag in the prompt and restates what its goal is
So Okay. The point of this is ADER has a really interesting prompt that repeatedly takes notes into a hidden tag. Yeah. And restates what its goal is. Yeah. I was going to talk a little about that behind the scenes of how ADER works. Let me add that. I mentioned a little, I want to talk about repo map and then notes. I'm not going to like deep dive the AIDR implementation. I have poked around its code base a little. There is a bunch of really clever stuff happening that I don't see happening or haven't heard of happening in other tools. And it's also, this stuff is popping up rapidly over the last couple of months. And so if you last looked at LLMs well, especially before Sonnet 3.5 or before any of these really nice assistants came along and you just used the web interface, I would be similarly like, well, it sort of works for scripts, but it's kind of disappointing. And instead, this has jumped forward quite a bit in the last few months, last year. All right. So I'm going to say clear. I don't know that we'll be in, yeah, we're not going to be in the gem spec again. So you can kind of tell it to forget about files.

59:13Is it not remove? Tells you how often I do it.

...21Usually I bounce AIDR, but I want to actually just show this So what's the command?

...34It is, is it forget?

...45I'm peeking at the docs off screen. Drop, add and drop. All right.

01:00:07We'll provide a command for use in a build pipeline called lobster-css-linter. It takes no arguments. Please write a CLI skeleton for this.

...31So it may be a little weird that I wrote please because it's a robot. We'll talk about vector spaces. But I say basically, well, I'm going to provide a command, so what do I want? And it says, OK, well, I want to add some files. Can I add this file to the chat? So what it's prompting me is, by default, it doesn't send any file up to Claude without explicit permission. Kind of like that. It doesn't create files without explicit permission. I'm going to say yes to all of that. So let's scroll back up a little, look at that. So a lot of gems will have a version and maybe some errors. And then here's the start of a, this is a real common Ruby gem idiom where you have In your exe directory for executable, you have the little binary that the user is going to end up using if they install your gem. And it's going to call into your library. And then, yeah, this is a reasonable skeleton of we'll have some class called cli.run. That's all fine.

01:01:54So if I jump back to this and refresh, yeah, now there's an exe with that. I think it filled in a, yeah, it usually often fills in a version 1.0. There's an interesting thing. I didn't ask it to fill in a version, but it did. Or maybe that's the gem default. I honestly don't remember and kind of don't care, which is nice. And then it shoves stuff in a module. I don't think it did that for me last time. Eh, that's fine. The point of this is I want a really simple linter Yeah, the exe should be executable. So here's a fun one that's going to keep coming up. It gets a little... It really wants to be eagerly helpful about, hey, I know you're probably going to want to run this shell command, like set the executable bit on our executable. But then also it's like, oh, well, if we're writing a command, we'd probably want to run it. And it's like, no, I don't actually need to run the linter. I am developing the linter, but it's going to prompt me over and over to run the linter. And I'm going to say no, because I don't need to run the linter. Okay. Good start. The linter CLI should look in app assets style sheets. daveceddia I wonder if the "Don't ask again" option is file-specific there?
or man to glob oh yeah let's say or subdirectories to grab any dot css file and loop over them pass them to crass and let's say and Pass them to crass. Exit with code zero. If all load and are valid, exit with code one. If any are invalid or won't parse. Yeah, I'm kind of like slopping this description where I might say like if it won't parse or there are parse errors or I could talk about rescuing This is good enough. So Dave, I don't know offhand if the don't ask again is file specific or all of them. It comes up infrequently enough that I don't really care. So here's its intro and its diffs and its outro. I'm just going to skip over to the file and see like, well, what have you produced? all right so we have a run method and it says unless the directory exists throw an error i didn't ask for that but i got it anyways that's that's pretty reasonable actually css files just go ahead and glob and i know the duraglob API, so yeah, this will find any file that ends in .css that's in any subdirectory under the CSS path. I kind of like that it broke it out into its own little constant. If there are no CSS files, it prints a debugging message. Eh, it's wordy, but it's fine. And then for each one, print checking, put okay and failed. It's a little noisy, but yeah. All right, that's fine. So this basically works, right? Like this is the outline of the thing we want. I didn't have to read the crass docs about this. No, I don't want to run the linter. Sure, go ahead and add. I don't know why it wants to add gem spec, but knock yourself out, buddy.

01:06:07So I'm going to grab, where am I here? So I've got Crass up here. Where's their home page? Here we go. I want the readme. I want the raw.

...39Let me talk about what I'm doing. So there is a method in AIDR where you can get it to read web pages and suck in stuff from them. It wants to pull in playwright and 900 Python dependencies, and I don't want to do that. So if I want to give it docs, I'll sometimes just wget them. So I'm going to add it. And we're going to say, oh, let's add a read-only file, crass readme. Because when you tell it to edit code, It doesn't reason. And so it's just like, oh, here's a source file. And every once in a while, I'll tell it, yeah, edit the linter to do this, to call the gem a different way. And it'll be like, well, let me just change the crass documentation to work the way I want it to work. And it's like, no, you can't do that. That doesn't change the gem. know it's i got a pull request on lobsters two weeks ago that was very familiar where i was like oh this function is bugged and when you try and submit a story you see one error message but if you click on it to like select any text out of it it disappears and it replaces with another error message and the first pull request i got from someone who said they were a junior developer was like well let me just delete the function that shows you the error message that's yeah that's not fix i'm not going to accept that pr i actually want that functionality but yes deleting the functionality does fix the bug right okay so so i looked at the crass readme and in that one example i saw you know it gives you the big parse tree

01:08:27How did I see this? I think it actually, so I'm trying to recreate my demo. And I don't remember how I wandered into that state, because I don't look at the transcript. Does this CLI match its API for turning? If a parse is valid, does crass raise exceptions we need to catch? Because I'm really familiar with crass, or I'm sorry, with parsers that want to throw exceptions rather than return some kind of partially valid object. Crass does not work like that. And the first time I ran this, I ran into a... It spit out some code, and it's... Code immediately checked for error nodes, and I was like, what the heck is this? Because that's not what parsers usually do. Yeah, go ahead and make a new file. I don't really care. No, I still don't want to run the linter. mbuckbee need to drop, thanks Peter!
I also don't know if don't ask again is going to... always or never do that thing. Oh, see you later, buckme. So I saw this thing about if there is an error node, and I went to crass. And I was like, is that a thing? Ruby doc. How did I get there? Here.

01:10:34And I kind of poked around. Somewhere in here. No, you want to be mean? I guess I don't even remember what I did yesterday. I found an example somewhere in its documentation where it just said, yeah, when we can't parse that thing, we just spit out error notes. I checked the output, and I was like, oh, yeah, it does. Check the output kind of gets into a point of, well, maybe we want to start getting into tests. Maybe not. daveceddia Trying to find the actual code to confirm that it's per-file and per-session but from the aider changelog I see an item "Many confirmation questions can be skipped for the rest of the session with "(D)on't ask again" response." and a "- Don't ask again in current session about a file the user has said not to add to the chat."
There's kind of an open choice of do I want to do test first or test last or test never, honestly. All right, so the other thing is it'll pick up on stuff like if you're writing a parser, if you're writing a linter and you're using a parser, you probably want to spit out stuff like, hey, where was that error? So it just kind of volunteered to do this. That's a little odd and surprising, but fine. And that stuff I didn't ask for is valid. Oh, thanks for looking in, David. Yeah, I know what's up with that. All right. All right, so there's a complication here that was kind of a, I didn't highlight it in the pull request. And this is, I believe this is ease outs doing if you're still present. That's the wrong pull request. I want our pull request.

01:12:57We are not actually in application CSS. We are in application CSS ERB. Why is that?

01:13:15So this is the top of the file. In CSS, to support light and dark, we have this big here doc variable for light and dark variables, and then we apply them to the root, and whether you prefer that color scheme, And the straightforward way to do this involves repeating styles because in media queries, they sort of operate a level up from regular selectors. And so we can nest a selector inside them, but we can't say if it's media this or it is color scheme dark, use the dark theme. actually just saw somebody post on blue sky recently that they had a way to do this in line let me find that real quick because again maybe we can sucker ease it out into fixing some code but it was just here in my i can't see replies huh i guess i can't see replies i've not logged in which probably makes sense for passing around with stuff. But ease out if you go dig around in my blue sky replies in the last three days, you will see me talking to someone who does a bunch of CSS about her very clever workaround for initializing. I don't know if it generalizes and I ask for help or like, do you have a longer explanation? And she was like, well, I have one that's outdated and I wouldn't do certain things. She was not specific about what those certain things were. this way anymore, but yeah, it's over here. That would be a nice little improvement. So, okay, there's a complexity. Might have to find a.css, or not a, in that same directory. And then run ERB on them to get the CSS that we parse and lint. Please add this functionality. So this is a fairly high level description of the task.

01:16:02Go ahead and bundle because you just added ERB. We don't need to add the bundle output because it worked. So let's see what code change it made. It said, okay, for the glob, we're going to add CSS ERB. That's correct. I would probably put the dot in there because it's nicer to read, even if it's repeated, but yeah, whatever. And then... it says when we're looping over the files if the file ends with erb pass it to erb to get the parsed output and then parse that so that's actually a like again it's like weirdly coupled but it's very well very specifically fitted to the problem we're solving which is lobsters could use a pure Ruby linter, not I want to make a general purpose linter. I kind of... I want to make it refactor a little, though, because it's... All right, so we've got one function that does everything. That's fine-ish, but I know I'm going to ask it to do more. So... Refactor to... move the parsing to its own function the return success or failure keep it in this same file please i'm just saying same file because i'm kind of slopping it i'll ask for a separate file when i want one but i don't want to spend time paging between different files and so i just want to include a test boundary basically that's not going to depend on the... No, I still don't want to run the linter. All right, so now we have this run function that grabs the file, loops them, and then the parser... Okay, it does this, and oddly enough, it's picked up a rescue, just a very general rescue, even though earlier it said that crass doesn't raise. And I happen to know that crass doesn't raise because I did a quick search on here. And I was like, all right, crass, do you ever raise exceptions? Oh, look, there are zero code matches. If you don't raise, I'm not going to worry about it. So this one's kind of interesting. Lots of code that has this sort of structure is going to rescue stuff. I mentioned earlier the data class and the code it's generated screams out for it, right? Because now the return type is this weird data clump of an array of true or false and error, any error messages. I hate it. I hate seeing code that looks like this. I have written code like this and it always turns into a nightmare immediately. I don't like that data clump where the new function returns an array. Use the cool new data class to define a return type still in this file. Let's see what we get out of that. high-level description of what I want. As an experienced developer, I can immediately define the data class in my head, but I would like to see what it comes up with, and I would like to show it refactoring a little. No, I don't want to run the linter. Oh, man, there's a title. daveceddia Ok, from the code if you say "don't ask again" it is indeed specific to the (prompt, command) pair for the rest of the session
I'm going to say it 500 times.

01:20:29Ah, if I say Don't ask again. It's indeed specific to the prompt command pair for the rest of the session. That's promising, but how does it know whether I say don't ask again because it's always yes or don't ask again because it's always no?

...57All right, so we have a parse. And the reason I'm asking, Dave, is I would like to do this in one continuous AIDR session. And if I say don't again and it doesn't do the thing I want, I'm going to have to close and reopen the session. And I think it's interesting to see the progression of it. All right, so this still works. All right, so we have a parse result with a success field and an errors field. That's pretty reasonable. Where's the definition of it? Data define success errors. Okay, that's fine. As long as we have that, let's clean this stuff up. I don't like that parsing a CSS file knows about ERB. So the function only takes a string of CSS.

01:22:30It's really funny. It always, for a while, a couple of months, it chirpily started every response with certainly someone has beat it out of it. One of the things that's especially nice about using AIDR is it has some built-in prompts and they are continually revising those prompts. So, okay, it just lifted that function out and put it over there. That's fine. I kind of want more info here where instead of just I guess it's fine if only the outside function knows the file name and such. We're very coupled to the loop of are we parsing versus, like, when are we linting? So let's go ahead and say, you know, maybe now is a good time to get into testing. All right. Add some tests. make our CSS public and add a test. Now let's say add two tests, that of it, that string just kind of is valid and string

01:24:07How's that? Right? Quick little parallel tests. First, I need to add the test file. Could you add this test to the... All right, sure.

...23So it replaces the built-in gem test that just says assert false. And yeah, so there's my exact test data I gave it of one valid CSS string. one invalid CSS string, you can run the test with bundle exec rake. Do we want to do that? Yeah, we do. Cannot load such file data. Did you mean date? That's an odd one. Do I want to add the command output to chat? Yeah, sure. Why don't you figure it out, Ader?

01:25:08I have a hunch, but I'm not going to explain. Whoa. All right. So I was hoping this would happen at some point. Two things have happened. Number one, it made a bad edit. Data is part of the Ruby standard library. It is not a gem. easeout ⌘Z
It is definitely not data slash data. We do not need to call bundle install. And then also it tried to run my linter and it blew up. Yeah. I know command Z. So it asks like, do you want me to fix the lint errors? No, I do not. Do I want to run bundle install? No, I do not. Don't run that test either. So there's an undo command. That's just look you, you took the wrong path. Let's throw it away. And that's where the Git integration shines because as we have been going, if I pull up the Git log and it has config options for, do you want AIDR to put its username in here? Or how do you want it to prefix commits? Each one of these is me giving it a command when it says chore, it ran the linter automatically because that's in my config file. And then it automatically committed that. So we can just kind of look back through the Git history. Maybe I should have run it before I hit undo, but there's one more commit on top of this. easeout I wonder, does that undo also rewind the message context?
It's just backed up the Git branch. And so it was able to just cleanly drop its work, even though it tried to edit multiple files and run the linter and all that other stuff. Yes, I believe that undo rewinds the message context. It's not something I have tested, but that's the impression that I get. So, all right, I got this one wrong. Am I using the wrong Ruby version here? I don't even have a Ruby version.

01:27:45The data.define isn't working here. No, it's in the standard lib or the Ruby version we're using. Please fix it.

01:28:10Yeah, it must be here. So Kevin, there's your question answer. It remembered a little about that edit. I'm not sure if I needed to run undo twice because it tried to run the linter, but oh, actually I should be able to see it here, right? Yeah, so that undo opt off the linter thing. I needed to run undo again. So instead it's just edited stuff back. Ah, so I was hoping for this. Another thing about LLMs... One sec. Another...

01:29:01I mentioned the cat slept on my face. That's catching up to me a little. LLMs are language models, and you can tell them to produce structured output, like, give me a patch, or give me a diff, like this. But occasionally it goofs. And the output that comes back from the LLM is not a valid patch. Or it does not conform to... Hang on. Either it doesn't conform to the valid patch syntax or it gives me back a patch that doesn't cleanly apply because either it got a character wrong or it didn't correctly read whatever the input is. And so it says, Hey, We caught an error. This patch it gave isn't going to cleanly apply. And then Ader is actually very smart. It goes, oh, you know, I got back a couple of these blocks and only one of them failed.

01:30:25I'm going to catch back up instead of the scroll back. And so this is AIDR redirecting the LLM. So AIDR is in the loop here and it says, hey, if I got these patches to apply, I'll apply them to files. If they don't cleanly apply, I will pass them back to the LLM and say, hey, you goofed this. And so it sort of watches and revises itself. It supervises itself a bit where if you get back bad stuff, it fixes itself, which is kind of neat. I can imagine earlier versions you would have to interact and like basically say yes or no to, do you want to try again? And at this point it's reliable enough about it that it just automatically retries stuff. All right. So I ran the test. I'm gonna go look at the, let's see, so that's the same. I'm gonna look at the test. It looks like it ran an outdated version of the test. Test invalid CSS, expected true not to be, oh no, that's actually just a test failure. All right. So refuted result success, refute empty success errors. Like this test looks pretty reasonable.

01:32:00So there's this natural back and forth between telling aider to do stuff and looking at its output and thinking about it myself and some of that is i'm demoing and i want you to see all that but a lot of it is this is kind of what it feels like where i'm kind of poking it and prodding it and saying like do i understand what it's doing i don't need the sig directory and is it producing reasonable stuff

...54And occasionally you get this kind of goofy stuff of, let me re-implement the parsing thing. I think it's mad that this exact parsing thing doesn't produce an error. Test passes, but this is pretty bad code now.

01:33:20Yeah, so one of the things that's happening is when the test fails, This is on my scratch list to talk about. When the test fails, it says expected true to not be truthy. So that's coming out of mini test. That's just a default thing. The LLM is a language model. So inside the LLM, this is a good time to get new to it. Everything is represented as a vector. So what that means in practice is it looks at what words appear next to each other most frequently. If I am constantly talking about Ruby and gems, it sort of moves it into an area of the vector space where there are lots of related terms like pull request, like rake, like parse, rescue. braces function method you know and some of these are closer some of these are farther away the reason i am polite to it is i mean anthropic hasn't talked about what it trained on but obviously they scraped all of github and almost certainly scraped all of reddit and other conversations and blog posts from all around the web And if I am polite to the bot and I say things like please and thank you and good and excellent, I am sort of nudging it into the vector space where people are being polite to each other. I have gotten noticeably better responses with more plausible fixes if I say please and thank you. easeout you don't want the rude stack overflow posts in the source material XD
because it is echoing online conversations and blog posts where people are polite and friendly to each other. And people who are polite and friendly to each other tend to write better code and better examples. Yeah, it's not just that I don't want rude Stack Overflow comments sneaking in, although Stack Overflow is not so bad about that, especially the last five years or so. It's when people are dunking on each other, they aren't collaborating. All right, so it passed, but we got bad, like bad code for it. So because it's a language model and I'm feeding the output of the tests back into it, you know, you've seen that where Ader asks, can I add the test output to chat? Yeah, you can Ader, go ahead. One of the easiest ways to get better tests out of it is to give it better test output. So please add a test helper for success or failure. On failure, it doesn't need any extra output for success, but on failure, I want to see the parse tree and all error messages from the what do they call it parse response oh yeah so ader i never think about it because i type a little faster than i can stop and see it and read but ader does this autocomplete thing where it's picked up keywords from the various files so i can it doesn't it doesn't cycle through them in the exact same way that bash and vim do, and so I don't use it because it doesn't quite fit my muscle memory, but it's handy when I'm like, what was the name of that thing? Was it parse result or parse response? All right, so let's see what it did before I hit yes on that. All right, so it wrote this assert parses function, and it knows whether it wants to see a success or a failure. then it says oh i expected it to parse but got errors versus i expected it to fail but it succeeded hey that's not bad if mini test throws an assertion because there's something weird well you know print what you can and fail and hey now these tests got really narrow which i like this is something i like to see in test style is sort of table oriented Unclosed brace fails parsing. It's interesting, I didn't ask it for an extra example, but it kind of inferred another one. Unclosed brace fails parsing versus, yeah, they differ by one semicolon. All right, so let's run the spec. Hey, they all pass, great.

01:38:29So I'm going to look back at this, and I don't love this brace thing.

...44I worry it's going to get confused by CSS comments. Can, what is it called, crass? Can we use the crass API to recognize that brand races are unmatched? You're right. So we'll talk about that one in a second. That might be a good time. Let's see what refactoring comes up with.

01:39:27Yeah, let's run the test.

...33so it printed out so here's a good thing it's refactoring is bad and we know it's bad because the tests that are supposed to pass are failing but then also it printed out some more of this stuff and so seeing more of this is going to nudge it into the direction

01:40:00Let's see if it comes up with a fix. That's interesting. So now it's started collecting errors. Has error nodes. Check if it completely parsed. If we didn't consume all input, there must be a syntax error. Oh, that's a very positive looking output. Still fails. So I don't know. Maybe this is a false start and I should just undo a couple of times.

...32What are the actual errors? Incomplete or invalid CSS after position 0. It feels very spellcasting to be like, oh, my tone of voice is influencing the quality of output. But it is a fun, true thing.

01:41:06So I've tweaked the code a little. I got most of them passing. That's positive. Probably did not do it by deleting tests. Let's look at the test. So parses correctly. Oh, so I mentioned comments and it immediately created not only one where the braces are wrong in a comment, if it was just doing that naive counting, it also created one where a data attribute or a, what's the name for this in CSS? The attribute selectors include a brace, which is, I think that's valid CSS. Which test failed? Unclosed brace fails parsing. Man, there's something about this first example I picked that it doesn't like.

01:42:42daveceddia oh, it says assert_parses there
It went back to counting. Oh, it says assert parses there. Let me take a look. I know I'm going to undo that because I don't want that brace counting nonsense. Dave, which line are you looking at here?

01:43:12daveceddia line 39
39. daveceddia oh nevermind sorry, it's passing false, the function name threw me off
Yeah, it says assert parses, but then the API it made is it takes this Boolean for whether it should succeed or not. And I didn't want to make it immediately. Like, I hate taking Booleans. Yeah, I know. It's the function name three off. Yeah, it's bad. It's bad code. Like I said, you get bad code quickly. Yeah, so I could tell it, like, I don't like... functions that take boolean args that flip their behavior like assert parses refactor to an assert parses and refute parses

01:44:14So that's neat. It found a diff for every one of the tests. We're in the same situation as before where it runs. I think this might actually just be a limitation in crass that it considers that input to be valid. Let's look at this. All right, Dave, so now that's a little clearer. The line number is shifted on you, but we have refute parses versus assert parses. I like this level of parallelism in tests where I would rather repeat stuff and be explicit and have fewer things to look at than have to read a function like that previous version of assert parses and maintain the state in my head. Oh, hey, didn't use it on this one. Let's grab that.

01:45:27There we go. Don't know why I missed it, but okay. So looking at this, I get curious, right?

...50And where's my exact text?

...57All right, let's grab that. That is random crap. Bam. All right.

01:46:24What is the API? Is there a .valid? Or it's just if there's no error nodes. So here's Socrates kicking my ass. He's like, see, look, you haven't learned the API of the crass gem. And that's correct. I don't want to. So I think this is just a limitation of crass, maybe a bug in crass even, that it's not catching that I have an open brace and no closing brace.

01:47:02So I'm not going to keep chasing this one around. If there are no error nodes here in my output, eh, I don't want a rabbit hole. Maybe I can tweak that. Maybe I can just comment that out. Where's my helper? Here we go.

...31What if I get rid of that and say. So there's a command for running tests. Slash test. And when you set it up. Ader makes running the tests part of its edit loop.

01:48:11So it ran, it printed an error, which is, yeah, we got a node error that says it's invalid. We expected the CSS to fail parsing, but it succeeded. It automatically took in the input without asking. And it went searching for are there any node children with error in a different way? Yeah, it's not. Yeah, boy, that's ugly. Sure, run it again. Hey, look, it passed. Let's go jump back to that because I want to dwell on it a second. Check for explicit error nodes anywhere in the tree. Yeah, that's ugly.

01:49:30So the interesting thing here is it said yes. It always says yes. That's a little better. I see what it's doing. Let's run the test again. Hey, tests still pass. Good for us. At least now it's a little more succinct. One of the things that happens is there is a lot more tutorial code in the training data and a lot more beginner questions in forums in the training data, then there is experienced coders. And so when you ask for code or when it edits, the junior dev stuff is overrepresented in the training data. And so you get it in the output. And so one of my most frequent nudges is, hey, don't give me that. Give me something a little nicer. And I made these nudges enough that I made my own little prompt file in addition. You are an expert senior developer writing code for testability and long-term maintainability. You have the personality of a slightly jaded long-term forum poster. I know how silly this one sounds, especially because I moderate lobsters, but I find the chipper tone, which was especially bad a few months ago, to be incredibly grating. easeout you're an expert at—yep
Write in a terse functional style, preferring immutable values and inline assignment. Write idempotent functions and avoid side effects or code that is too clever. Only mock at network boundaries. Always use the latest versions of libraries and avoid deprecation warnings. Yeah, EaseOut, you were getting there too. Answer concisely and directly, assuming you are talking to another expert who doesn't need the basics or sugar-coated criticism. So there's a couple of things happening in this prompt like, I am saying things like, you're an expert senior developer, which is, it's so dumb that this works, right? easeout what grates on me is obsequiousness. I'm so sorry for that mistake let me polish your shoes
Like if I was talking to a coworker and I started out a conversation and said, you're an expert, would they give me a better answer? No, they would just be annoyed at me and they'd be like, yeah, I don't need you to tell me. But this is all about nudging what vector space it's in. And if somebody's blog says, I'm a senior developer versus I'm a junior developer looking for my first job, Well, I want more of the former code samples than the latter. Yeah, you call it obsequiousness. I call it sycophancy and I think sycophancy is the jargon for this. Oh man, before I wrote this prompt, I spent some time mad at it because like this is fairly succinct, but it used to be really verbose and I'm so sorry. Has it said sorry? No. Has it said apologize? I apologize for the confusion. You're right. I know I'm right. This one is pretty bad, or this one is pretty minor, but sometimes it's so bad. It's so grating. I don't want to spend, you know, an hour or two talking to my medieval court servant who is going to cater to my every whim. Oh, you're so wise and noble, sire. Of course. chamlis_ you're paying it to apologise, haha
So let's tell it to load that prompt, read only. And now that prompt is going to get read every time I'm paying it to apologize. Yeah, a little bit, Seamus. Nice to see you again, by the way. Thanks. i am sure you will be catching my bugs and researching weird stuff in a moment although dave has been doing that before you showed up of seeing me have a weird question i don't want a rabbit hole and then getting nerd sniped by it so you got competition so i didn't want to add the prompt at first because i wanted a little more of that generic style to show it off but the other thing about it is is the sick of fancy. Here, just rerun my test. Let's see if we can fool it. Hey, look, the tests run. The tests are failing. You need to fix, let's pick one. You need to fix the refute parses function, it has a bug that it's a no op and will accept any input.

01:54:36Looking at that test file, I see the issue. There are a few parameters method needs to be fixed to properly check for parse failures. Here's the fix. No, it's not. That's not a fix. The correct answer to that is, dude, the test suite passes. What are you on about? Even with a prompt that says, and I've tried versions of this prompt that say, you are pairing with someone who may make errors that you need to push back on. It doesn't really help. I've tried, let's see, how long have I been tinkering with this? Maybe three months, something like that. And like every two weeks, I kind of check back and I revise this and I'm like, hey, I've been annoyed by any of these things coming up. And no matter what I've tried, I cannot weed out the sycophancy. It will always trust me and I can just gaslight the hell out of it on accident. That's lousy. I'm going to undo this because this is just a nonsense edit of rescue standard. Like, come on. It's syntactically valid, but that's not, like, it's hallucinated that maybe crass is going to raise standard error. No, don't run that. Undo. so that was, oh, that's interesting. It immediately removed it on us. So occasionally when it runs the tests, like it revises, so it's two undoes. Yes. We removed, fixed, refute parses. The as ease out calls it obsequiousness. I call it sycophancy and I've seen a blog post or two that calls it sycophancy. And I think it is a major problem with using these tools. had one example where i don't remember the particulars of it oh no i do i wrote a script i had it slap out a ruby script slop out really and i said i want to have a progress bar on this because it might take a while to run and you're going to look at you know 1000 files so i want a progress bar is there you know install the And I did like a quick Ruby Toolbox search, and I was like, oh, there's a gem called, I don't know, like Bob Progress. Install Bob Progress and monitor the progress. And it went, okay. And it added instead one called, I think, Progress Bar. And I was like, hey, that's literally not the gem I asked for. But okay, it works. Fine. I'm not particular about the gem. If I was particular about this stuff, you wouldn't be writing it. I would. And I didn't run the script, but I looked at it and it had like this format string thing going on and I didn't know what was in the format string, but I ran it and I got a progress bar and I didn't look too hard at it. And I was like, okay, well, I want to have an ETA counting down, you know, whenever it updates the script or updates the progress bar, tell me, are there approximately, you know, five seconds remaining or 500 seconds. Just give me a countdown. And you can imagine that code in your head, right? We're looping over the thousand files. So we know the time we started. We know the current time. We divide that by how many files we've seen. We multiply by how many files we have less to go. And that's the number of seconds we put in it, right? You can write that code. It takes, what, a minute? Most of your time is going to be typing. So I told Ader, Hey, add me an ETA onto the progress bar. And it made a minor edit to that status code and it did nothing. And I was looking for that like eight lines of timing code that I couldn't, and I didn't see it scroll by in the diff. I don't read those diffs super close all the time. I'm doing it now because I'm assuming a lot of people are new to this tool. But after doing it fine a hundred times, you just kind of skim. Or I'll go poke around in vim instead of here. And so I responded, hey, you ignored me. I really want an ETA. You got to track the time. And I don't think I explained the algorithm. I just said, give me an ETA. And it went, you're right, certainly. I will add a time. And then it slapped out the code that I expected, right? That code that you can write in your head that says, count the seconds, multiply, divide, blah. And I ran it and there were two ETAs on my progress bar. And the second one was the one that had my, the code I had told it to write. And the first one was the Ruby progress bar gem says, if you pass percent capital E, it has those eight lines of code in it. And it was automatically right out of the gate, counting all of the elements, and dividing by time, where is it?

02:00:09I didn't know it. So yeah, this is the progress bar jam. Hey look, it's even there by, I think that's a count up. You can add additional output. And then, oh look, you can tell it, I want an ETA. I didn't know it. So the sycophancy is a major annoyance. You can ask it to explain stuff. So one of the things I like eight or four is tinkering with unfamiliar code. So I can say, ask this test code has helper functions. Please explain why there are two of them.

02:01:07And so it can just introspect the code and say, oh, well, yeah, this split follows command query separation. Well, that's nonsense. That's a hallucination. But this part here about it has a single method with a Boolean flag. which we discussed earlier. See, so it does have some history, so maybe I shouldn't have asked it about a specific thing I told it about. And the other thing is you can ask it for more options. And this is useful because it doesn't think. It doesn't reason. easeout it's also just spending time on BS that amounts to interaction latency
When people say chain of thought or reasoning, those are incredibly misleading terms to apply. There is no thinking going on.

...58I disagree. It is not spending time on BS that amounts to interaction latency. It is priming. easeout amen
dpk0 moin
It is the reason... So I previously in my prompt said, shut the hell up and don't ever explain code to me because I know what the code is doing. Oh, hey, DPK. Are you happy that you posted something about politics to lobsters and then found out that people don't share your politics? I hope that discussion is staying polite. It's over in my browser of things to keep half an eye on, but yeah. So one of the ways to deal with the sycophancy is ask, dpk0 there was one comment i was especially disgusted by and flagged, but mostly it’s dudes trying to derive feminism from first principles, which is amusing to watch but not especially harmful
easeout I mean the sycophancy, but so called chain of thought is context loading yeah
And you suggest some refactorings or the cli class what's worth meaning up or introducing. helper can't type. yeah. Yeah, DPK, so I left a comment in that thread where I was like, yeah, the point of this is not we get to pull 50 years of scholarship. And you could go ahead and disagree with whether there's interesting philosophy or politics you agree with or if it is a great application. But the point of that paper was let's apply 50 years of stuff. So this is interesting. It kind of wants to spit out markdown because it knows lots of times when people ask for options or are discussing this stuff, they use markdown blocks. You'll see things like this crop up where you're like, ah, you were scraped. You were formed on a lot of like readmes and a lot of GitHub code because this is the GitHub flavored markdown. So OK. Instead of saying just generally refactor the cli class or generally saying fix this thing. I can ask it for give me a couple of options. dpk0 there are people saying ‘off-topic’ which i think is probably … problematically motivated. if you are working in programming languages, the paper should certainly fulfill the criteria of making your next project better, making you reflect in a new way on your old project, and i think it will still be interesting in five or ten years
And because it has a little bit of memory here in this conversation transcript and it does that clever notes thing that I think was it buck be brought up. Yeah, DPK, maybe I shouldn't have joked about that because I didn't. dpk0 yeah sorry to show up and derail your demo :D
I'm in the middle of a big demo, but you're right that that is on topic. One of the reasons I left an early comment on it was not just to point out that someone was asking for a thing that was on page five of the paper, which says you really didn't read it, but also a like, yes, Ahmad has looked at this. If I'm leaving a comment, well, if I thought it was off topic, I would have just removed it. So we're getting a little bit of that flagging to disagree. dpk0 oh, interesting
That is the reason I added the your flag doesn't actually change the story until you add, until you actually choose to hide the story, which is to say I am really checking out of this. I think that is generally positive. espartapalma Hi... I've been watching in the last 30 minutes while washing the dishes... and my eyebrow did exercise a lot with I don't think this LLM is any good for devs who are already in senior/expert level...
dpk0 yeah, i think that’s a good change
Yeah.

02:05:50Oh, hey, Spark Palma.

...57I hope I didn't come off as saying that I don't think the LLM is good for senior expert level. dpk0 maybe not the perfect change to achieve the goal, but i think the front page has got better recently
Where I'm kind of working towards is it's pretty darn convenient as a senior developer that dpk0 maybe someone will come up with something even better in the future
what a lot of this feels like is that i'm working with a junior developer who is unusually good at googling things so like finding the readme of the crass gem because you don't always have to import it sometimes it just knows these things and finding their api and matching it and it comes up with all kinds of junior dev errors like that weird brace counting thing that I undid or just kind of slopping code together, not creating great boundaries. Like this should take a CSS string rather than a file name or an input that might be .erb. And so as a senior developer, I've worked with a lot of junior developers and this practice of giving someone an assignment, giving a junior developer a prompt and coming back and saying, yeah, that kind of works, but there's an opportunity to refactor it, like give me a little bit more functional code. espartapalma you didn't said that LLM is good or not, but yeah a Jr level with all the nuances I'd prefer a real Jr. developer
It feels very much like working with a bright junior developer. The biggest difference is when I ask for a revision, I get a revision in about four seconds. instead of 40 minutes. And that's a huge, huge difference. The other huge difference is the cost. So AIDR, the reason I didn't want to bounce AIDR is it keeps a running total. So in the, what, hour and a half or so that I've been driving this, we have spent 68 cents on the Anthropic API. pretty fucking cheap compared to an actual junior developer i mean it's hilarious i i've tried reading the the subreddit or discards for people using llms and i am convinced that they are overwhelmingly students who are shirking their homework because they are constantly complaining about how expensive these things are the other thing is it could be a lot of third world developers for whom the currency conversion is painful. And you know, I would be moving a lot faster if I wasn't explaining and chatting, but it seems to round to a dollar or to an hour. And it's like, come on, the cost of a senior, the fully loaded cost of a senior developer is more like one to $2,000 a day. we're talking about adding five to ten dollars to that daily cost that doesn't move the needle for me that's that's coffee budget right that's that's cheaper than lunch so that's part of why these tools are really interesting to me let me drive this just we've kind of wandered around i'm happy to jump around with stuff so let's see let me look at my to-do list to see what we've talked about so i've shown the pull request the gem the general git work loop of using git we've talked about the notes repo map is how it finds stuff very clever so we know with llms that it's just a stream of tokens ader knows that it wants to work inside of a function or i'm sorry inside of a code base and it has this concept called a repo map where internally it scans your code base using i think some lsp And it extracts the structure of it. Like what are all your classes? What are their signatures or type signatures? What are the constants? What are the file names? And so instead of passing up the entire contents of every single file and blowing out the token window, it only does, you know, 1% of that. So it's not going to work in a huge code base, but you can get surprisingly long and meaningful scripts out of this where you have five or 10 files and you tell it, find this bug, and it will look across five or 10 files. It does have the issue where, we'll talk about that with limitations in a minute. Let's go back to that to do. I want to do limitations. mostly in a clump like we've touched on a few but i think it's worth to to hit so there's this like test for oh yeah i did talk about cost lol american developers are expensive test first test last test never this practice of i told it to write some code and then i was like yeah how about you test that code that tends to be a little better. Sometimes I write the tests first if I really know or sometimes I command it, instruct it to write the tests first. It really depends on whether I feel like I have a great idea of the APIs involved and a high confidence in it. I'm making the same decision I am for when I'm writing my own code of Will I write test first or test last? And if it's one-off code like that fetch mail, you know, I'm going to run that once or twice and then throw it away. I don't need tests. It is okay if the code is kind of bad. All right. So I talked about parsing and this was motivated by that one pull request where I had to ask for a space. Okay, let's add a linting step now. Have the CLI only for the files that successfully loop and lint them. For now, we'll start with just one linting rule. Cat, come off of the mouse. Not helpful, sir, thank you. And that linting rule is between a selector and the opening curly brace or rules, There should there must be exactly one space.

02:13:52that's interesting it added another data class just in parallel for lending results.

02:14:09Yeah, run the tests. Ooh, tests passed. Now, rather than scroll up and down the diff, I'm just going to go poke around. So, okay, so it calls parse, and then if it succeeds on that individual file, it calls lint CSS. Where's lint CSS? Down here. And it says, all right, loop the node. If it is a style rule, which is what we get out of those, looked at the selector, check the space between a selector and the opening brace, and then give it an error. Okay. All right, that's pretty reasonable. All right. Did it add a test? Did it add two tests? Not the test helper. Ooh, look at that. It even added a helper in parallel. So there's this example of it's using the context and it's priming itself off of the context of the code that I've commanded it to write. So this is where all of that please and thank you and let's refactor, as opposed to just a series of strict commands, starts paying off. I've moved into a vector space where we are pairing and refactoring together. And it is really tempting to imagine that it has some concept of the conversation we're having, or my personality, or my code style, when what's really just happened is, come on, it has billions of vectors embedding. It takes 19 GPUs to run. All that's happening is it's landing on a section of the vector space that lines up with my coding style, that is informed by, and shit, I've written blog posts. It's probably got my scraped blog posts in this thing, right? It's probably got my scraped code in it. And so here is that test that I wanted to write. Hey man, if you edit the CSS, I want to see two spaces between the selector, or I'm sorry, I want to see exactly one space between the selector and the curly brace. That's success, right? That is the linter I want. I could plug this into the Lobster's build pipeline right now, and that guy would have gotten a build failure. I don't know Guy. I use Guy pretty generically. It's a weird quirk of... got it from my grandma who english was not her first language and so everything is a guy like the wrench is a guy give me that guy yeah i mean a screwdriver obviously trying to knock that one out because calling people guys is often rude and so yeah there's a lot to change in this code i don't love it but you see in the process of I can just nudge it along and be like, hey, I want something different. So like this parse result, this has that exact same behavior of the first thing in it is a Boolean in the same way that that assert parses took a Boolean on the end. Yeah, I could split that out immediately. I would make this more of a result type. Hey, maybe now I could even add a Sorbet type signature to it now that my pull request is merged. dpk0 i was at the Vintage Computing Festival here in Berlin at the weekend. someone explained to me some of the architecture of the PDP-11, delightfully always referring to the computer as ‘he’
All right. So if the cat will give me the mouse, sir. Thank you, sir. Oh, sir. We are not having a fight time.

02:18:30Let me add this thing. Oh, sir. So we have a Jeeves and Wooster thing going on with the cat where he does terrible dumb stuff and I call him sir. dpk0 i hope my mistakes in German are as endearing as these mistakes German speakers make in English
I'm gonna run and make a bio break and then we're gonna start talking about all of these, the limitations and then the successes. pushcx https://lobste.rs/s/nxqic4
And to give you something to read while I'm AFK, I will give you this link so you don't have to retype it. Oh, speaking other languages, yeah. yeah it's funny i had someone tell me very american sign language is a pretty blunt language culturally and they told me in a friendly but very strict way like boy when you get a little drunk your accent basically disappears i almost wouldn't think you're hearing which was like oh there's there's a lot to unpack there i'll be right back

02:20:38Oh, sir. Sir, you gotta move. You cannot fight my hand. Oh, big buddy. Oh, big buddy. Man. Cats. Wonderfully awful. So let's talk about the limitations. Some of them we've hit. I wanted to lean in with this post. Hmm, flagged to spam. I didn't think it was. And Bitfield is not this submitter. All right. In any case, there was a comment just this morning, six hours ago by David Chisnell, who said, daveceddia ooh, yeah, this part "From what I’ve seen so far, bugs in LLM-generated code are far less likely to be caught in code review because they look much more like correct code than human-generated bugs."
basically a harsher version of some of the stuff i've said here in the last few minutes that they don't know things and i i agree with david's the general point of david's criticism that it is a mistake to anthropomorphize these things there is a little bit of it like The please and thank you are sort of a magic spell to getting into better interactions. But... They don't actually know things. daveceddia it matches with how LLM-generated images look correct at a quick glance, but when you zoom in, the details don't make sense
And we keep hitting errors like... You know, we didn't really see this one, but... No, we did see this one, where it sort of made up API for crass, where... At the beginning, it understood that crass doesn't raise exceptions. And then kind of in the middle, it assumed that crass raised exceptions. And when I gaslit it about a bug in the test helper, it just assumed that crass raised exceptions. So there was never any concept. daveceddia s/LLM-generated/diffusion-generated/ but you get it
I could say it knows the crass API, because the first thing it produced, and then interacting with it more, it clearly has scraped the crass readme at some point. Not just I gave it to it on the command line there, because the very first time I did this, I hadn't. And it used crass correctly out the gate. But it doesn't know the crass API in any meaningful sense. It just sort of knows tokens that appear near the word crass and API. And Dave, yeah, I agree. The details are often wrong. If there's code that you're going to maintain, you have to drive it with tests. You have to have testing on the code and you have to review what it comes out as because occasionally, like you saw it generate tests that I didn't ask for, I have also seen the inverse where it removes tests that I liked. easeout yeah I wish I could use language-focused LLMs in a way that it didn't give the image of a person. you just can't do that while getting the best strengths of the training
And so I have to skim the code that it puts out. I'm watching the diff scroll up and down, or I have vim open in one terminal so I can look at code, and I have tig open in the other so I can look at the blames or review its commits as its writing commits. And it is so tempting, it is constantly tempting to... read llm output and start anthropomorphizing it humans you know i mean i've anthropomized i've anthropomorphized the cat 30 seconds ago right we will anthropomorphize anything like oh my car is really grumpy when it's cold in the morning and it grumbles a lot no come on the car is not alive But humans are really, really primed to imagine that they are seeing agentic beings. There's, you know, I'm going to mispronounce this one. Pareidolia? Pareidolia? That behavior where we imagine we see faces? Wikipedia's got to have a page on this, right? Pareidolia?

02:25:24Yeah, where, you know, you look at a rock and, well, obviously there's a face on Mars. Oh, I could have had a bigger version of that Wikipedia. dpk0 not a new phenomenon for AIs either (Weizenbaum eventually turning against his creation because of how people reacted to Eliza is something that’s come up in several of my uni lectures on digital humanities as we grapple with the arrival of LLMs and such)
And we, you know, you look at the top of a Parmesan container and it looks like a face. We're just primed to see meaning and LLMs are really good at tripping this up. Yeah, Weizenbaum making Eliza. I can definitely see why that would come up constantly. And I'm really glad that people remember that historic accident or historic example of we went through this 50 years ago. Humans are really primed to see agents and for something that can interact so naturally where, you know, if I, if I tell it to do something, it's going to talk to me and do it. We get right into the sycophancy. yeah and so david's concern here is they make mistakes that look right along with the along with the concerns that i talked about right up top about the legal status of the output looks right is another big one And I've talked about, well, the feedback loop is you sort of have to keep interacting with this. It's not a, it's not a fire and forget. You can't one shot and say, make me an iOS app or finding where my friends are at on a Saturday night and meeting up with them. Like you are not going to get an app out of that prompt. You could maybe feedback your way to it in a hundred steps, but it's going to take you a little while. It's not a magic wand. We've talked about this attention window. So I kept open. Where'd it go? Where's your stat, Ader? Ah, there it is. I just didn't spot the line. So it's saying, on this interaction, it sent... I think it's the whole... I don't remember. I think this is a sum. But it... The LLM works and it has a attention window. It can only consider so much input. If I kept going with this demo, especially if I was going at full speed instead of explaining things and playing and chatting, you know, I get about a half an hour or maybe an hour before the early stuff starts falling out of the context window and it just sort of loses the plot. And it starts making weird edits. So if you were here for the beginning, the very first thing I said to it was, hey, we're making a new Ruby gem. If I was going to stop and restart this, when I restarted, I would start with, hey, we're in the middle of implementing a new Ruby gem for linting.

02:28:43The quality window, as David points out about making mistakes that look right, the biggest sign that you've kind of blown out the attention window is you just start getting nonsense edits after an hour or so. And AIDR, one of the reasons I think it's the best tool out there is it does a bunch of stuff like noting to try and keep important things in context there are limitations to it you will hit those limitations if you play with this i've beaten doesn't reason to death went a little out of order but it kind of comes to this question of when does quality matter the point of this coding demo was not to write a CSS linter. It was to write a lobster CSS linter. Something that is very specific, like this .css erb, and you've got to run erb on it. This is a specific quirk of lobsters. I know other Rails basis do that you know nobody's got a gotch me on that it's it's nice for making a one-off tool that is not gonna matter too much like i'm not really worried that someone is going to take a dependency on lobsters css linter.rb And i'm going to get dragged up on stage at. Ruby conf in November to explain this code and why I factored it this way. there's a saying that it's not engineering unless there's dollars in the equation right. Or the other way, the way I really like putting it is, anyone can build a house. It takes an engineer to build a house that just barely doesn't fall down. That's sort of silly, right? But the idea is, if you actually know what you're doing, you know exactly how much quality you need in a given scenario. And I 100% share that. the programmer impulse that everything should be perfect all of the time. That I want all code to be well factored and well tested and well documented and to my coding style and functional and immutable and item potent and side effect free. But I know a lot of the time I am just writing a one-off script and we're going to run it. Every once in a while, those one-off scripts become a two-off or a three-off, or they become load-bearing. But if I'm really judicious about where I spend my time and my energy and my LOC, I can slop a lot of code and then focus on the things that actually live. The one-off script that all of a sudden becomes the Monday morning script and then takes 20 minutes to run and three hours to run and four hours to run. Well, if I'm deliberate about saving my time on all the other scripts, I can come back and refactor that script when it becomes a problem. Talked a little about this limitation. Training data is a limitation two ways. The way we talked about is training data limits or shapes the code that you get out of this because it is biased towards beginner code and sample code. And that is the easiest thing to deal with because you can nudge it, but it is a constant hassle. The other part of training data, and the reason there is the little flat-mouthed emoji here, is there are still big ethical questions around ingesting the entire web and all of github as training data and i don't have good answers for it like i wish i could tell you Oh, we have solved the problem and now it is all free range, ethically sourced training data where everyone involved assented or we have an entire societal norm that it is safe, transformative use and this is wholesome and good and nobody is going to flame you about it. That is not the state of these tools right now. The state of these tools is there is sort of an open secret where their lawyers will never let them say it in public, but they've scraped the shit out of everything. They've even scraped the shit out of things that say, please don't scrape me, that say, hey, you know, like the robots.txt exclusion standard, they're kind of obviously ignoring those and working around rate limits in YouTube. And I don't have an answer there. I want that to be better and good. It's not. dpk0 i’m waiting for a major case on that in Germany … data mining for commercial purposes requires permission under an explicit provision of German copyright law
But there is enough value coming out the other end of these that there is going to be a lot of motivated reasoning happening. I mean, it already has over the last couple of months. There will be more over the next six months, year or two, as we kind of struggle through this. Oh, DPK, do you have a name of that search? The only one I'm familiar with is New York Times versus OpenAI, which it's funny. I think it's going to be a pretty important case, but it doesn't even have its own Wikipedia page yet. Speaking of things that are scraped for training data. If you remember the name of that case, I would appreciate it because I think it's worth seeing how people are figuring this out in real time. But I think there's going to be A lot of motivated reason reasoning because. If I get a junior developer for a day and it costs me. 5 to $10. Instead of costing. I don't know what 800 to $1600. That is a lot of dollars. For someone to find a rationalization for what they're doing is OK. so my cynical guess is we will find some compromise or understanding that training is acceptable in some way or in some circumstances because even if you threw out all of the spam tools that are coming out of it and all of the you assumed they would never get better You just saw it. Like, these are actually pretty fucking useful.

02:36:33So I think... I've shown... You know what? Why don't I... Let me do this out of order. I mentioned you're soaking in it. Because I'm middle-aged. Who are the youths?

...59you're soaking in it is a Reagan era. Oh, it is a Reagan era TV commercial. Actually, I think it was a whole like print series where, these folks, Oh man, you love that for ADP, right? daveceddia gotta run, thanks for the stream!
Or three 60 P these ladies are in a nail salon and the one is soaking to soften her nails before she gets her nails done. And the other woman says, Oh, you know, they are, as one does at the nail salon, discussing the qualities of dish detergent. And the woman says, oh, I hate how they dry my hands out. And the other says, oh, well, not only does palm oil, if not, you're soaking in it right now. So that was a may-may, as the youths say. And to pull the curtain back, there's some of it happening here in the stream. gmem_ "You're soaking in it right now" is a sentence to tune in to
So let me open a terminal and you just saw one run. I have this little debug to make sure like, hey, buddy, you are in a stream terminal. You're not going to print all of your personal stuff on the screen. The way that works is I have a environment variable set and some custom files. So when Vim starts a bash session, it runs this script that says, hey, is there the environment variable stream? and if you have arguments let's see if you don't have arguments run bash with the custom config if you do run it with the custom config and include the command and a stream isn't set just kind of pass through to bash aider wrote this script this kind of one-off it's weird i'm doing a slightly odd thing and this I get what it's doing with RC file and this idiom of piping a bunch of files in. I would never have come up with this one on my own. After I saw it from Ader, I could go Google it, and I saw people talking about it. And I was like, oh, that's pretty clever. I get how that works. But it's some API I would never have been able to find on my own. And maybe that's a little bit of a knock about the bash docs. Or the man pages. Man, digging through man pages I could do less of. There's one more here, right? Not just that bash script. Come here, put this away. Actually, let me show you what my screen looks like. I got to get the image up. So I just took a screenshot. And let me bring this over.

02:40:09No, that's not helpful. One sec.

...18There we go. Just turn off UI. So it's scaled down, but this is what my desktop looks like when I'm streaming. This section up here is the section you see. It is a 1080p box. because that's what Twitch allows me to stream at. And usually that is just the terminal I'm using. I call it the streaming area. Whatever is in here in the upper left corner is what gets streamed. And then underneath it is the sideboard. dpk0 i have to run soon but on the subject of AI for coding, i recommend Alan Blackwell’s new book ‘Moral Codes’ (read most of it in early access online, picked up my hard copy today). it argues that rather than the sci-fi-inspired chatbots AI is focussed on developing today, we need more focus on designing programming systems that allow humans to better express their desires and intentions to computers … also a nice link to what the Felienne Hermans paper says, i suppose!
It's like when you're hosting a dinner party. On the sideboard, you put the dishes or the next bottle of wine. In this case, it's usually a browser or two. Ooh. dpk, thank you. I'm going to just grab that and put that in my scratch notes.

02:41:22So to swap that up. Thank you. I'll check out that book. And then on the right side is all stuff you don't see. And it would be really easy to have stuff drift over, right? Like any kind of pop-up window where if I pop up the file manager, say to open a screenshot on my desktop, or I click over to the picture of the back of a Wi-Fi box on accident.

...56Ugh. I think I, what did I just do? I get it.

02:42:10I deleted the file out of my home directory and I have known the image viewers saw that I deleted it and was like, oh, let me just give you another file that's in your home directory. I'm glad it was the back of a Wi-Fi box as opposed to a picture I care about. So obviously this isn't perfect, but it helps keep me from throwing crap up on the stream that I don't mean to. So I use a window manager called awesome.

...52Awesome. and I had it write me, I had Ader, or no, this was before I saw Ader, I had Claude write me a streaming layout. So awesome is a tiling window manager where you can just pile up windows and it lays them out. I got in the habit when I was using Linux on a laptop and I was like, you know, about half the time I have one window full screen and then 45% of the time, I have two windows that are more or less equally split filling the screen. Why do I want to be dragging windows around all the time? So I switched over to Awesome as a tiling window manager. And it's scriptable in Lua. And I said, all right, Claude, let's make a layout so that we have whatever I can stream up in the upper left corner. and a sideboard of stuff that is not currently visible, but can be visible. And then everything else goes off to the side. And then also... Where is it?

02:44:15There's some stuff about sizing things, but...

...26Where is the darn tag?

...34The window manager is set up to let me tag windows as whether they are acceptable on the stream or not. Oh, and I'm not even spotting it. See, I don't know this code. I just had the thing spit it out.

...56Ah, streamable. Thank you. Speaking of losing the attention window, the stream has been running long enough that I'm starting to fall off. But my window manager config lets me mark windows as streamable. And then when the layout runs, they can only be on the left side, streaming or sideboard, if they have that flag. If I took a floating window like... this calculator. I literally can't drag it over the midline onto screen unless I hit the key combination that marks it as this is acceptable to show on screen. And you saw how that snapped off. I unflagged it and the place function in combination with this just immediately that frame shoved it off screen. I don't really know Lua. Like I've read Lua. I could tweak this. I sure don't know the awesome API very well. And this took a couple of revisions and some tinkering. But this kind of one-off code where it's like, yeah, there's this API and I don't want to stop and learn the API and I don't want to stop and learn Lua. Like, yeah, yeah, I could spend a couple of days doing that. Or I could have that code in a couple of minutes. That's pretty fucking useful. I could read this code enough to give it instructions like, no, I want you to structure this. I want you to extract a function for that. I want you to divide the screen like this and like that. I can read Lua well enough. And if I got a syntax error, I would just paste the syntax error into Sonnet, and then it would be like, oops, let me revise that. Certainly, sir. So that's been on the stream since day one. That is just a huge amount of setup that I wouldn't... I probably would not be streaming if I could not have just slopped that out. Because it's... What is this? 141 lines of code, but it's the learn an API, learn a programming language that costs hours and days. The end result, right? You don't want to do the curse of knowledge thing where you look at this and you go, oh, this isn't much code. I could write that in 15 minutes. It's so tempting to say that, right? But we saw it with the... The linter demo where these little issues come up, like how does the parsing gem indicate errors? Is it raising? How do I wire up an executable from a gem into the gem's library code? How do I connect a test? All that fiddly little stuff gets paved out. And yeah, the LLMs will kind of goof and bring a little bit of fiddly stuff back in. We've seen some of its nonsense APIs, its junior code, its hallucinations. But it's manageable.

02:48:42Let me show you one last example of code that this one I wrote with Ader. So I have on my blog that streaming archive, right? So at the end of every stream, I run this transcribe.rb. So OBS records my stream to a local file. Let's see. And then this script basically makes those stream archive pages, grabs my AWS keys, It even has opt parse for like letting me set the date in case I'm doing these things a day or two later. It finds my Markdown page. It lists all the video files that are in the folder where OBS dumps them. It grabs the chat transcript. It transcodes the images, or I'm sorry, transcodes the video First from MKV to MP4. MKV is nicer to have OBS record to because if it crashes, you're more likely to get a file that you can read. But MP4 is what all the browsers want to read. And then it also transcodes to rip out the MP4 and shove them up to an S3 bucket for AWS Transcribe. And I don't think I've even hit the free tier limits on this. And then it waits for the jobs to finish and it grabs the subtitle files that come out and it parses them out and makes the transcript page that you see which is like loop it, figure out where the paragraph breaks are. I've shown this file once or twice on stream right at the beginning. And then it updates, it rsyncs off to my home NAS and to the video archive. runs git add for me, and then if I told it dash dash publish, it automatically goes and publishes. Sometimes I want to edit the title and the tags before I'm behind on writing titles and tags, because this part is so easy, I can just do it and forget it. But come on, this is 285 lines of Ruby to schlep around a bunch of files and handle a bunch of corner cases like there the same number of chat transcripts as there are video files because if they don't match like this is all manual stuff i was doing for the first couple of streams and then i just added more and more to this and i can just be like yeah just find me the lengths of things and split them up if i hadn't had aider i would not have written this 300 line file i would not have nice transcripts. I would not have bothered figuring out which of the 900 AWS services I want to use and how to call them and what their return errors are. dpk0 have you tried using Whisper for transcriptions? can run locally, and has pretty incredible accuracy in my experience, even the somewhat smaller models
Oh, come on, right? Do you want to write that one off code? No. It just, it's not can't. dpk0 the one OpenAI product i can actually abide
It's just, is the ROI there? using an llm tool to generate stuff against apis i don't know is pretty darn useful dpk you ask if i have used whisper for transcriptions i have tried to the arch packages are in an odd state where they don't want to install on my box i don't know if that's something about the package or something about my weird config but it was the first thing i tried and it didn't work and then i was like well there's got to be an aws api for transcription because there's an aws api for every darn thing and that worked out fine so i keep saying glue code and one off because It's not just that I see it as useful for those, it's also back to the legal concern. If this stuff turns out to be toxic waste, I can delete this code and move on with my life and not feel bad about it. I wouldn't put this in an app I cared about. I wouldn't put output from it either in i certainly haven't put it in lobsters or recheck because i just it's that open-ended what is this going to look like in a year or two and no no so that's why i say like don't you could sort of do it with prototypes right prototypes are I don't know, in my experience, prototypes are usually not as throwaway as you expect them to be. They are more v0.1 than they are toss it after you get some feedback. So I wouldn't use it for prototypes. API discovery, hey, that's handy. Like EaseOut said earlier, and I said about, what, transcribe, awesome, bash. All of these things, every API is like, okay, great. Here's first, you have to find the doc and then you have to read the doc and then you have to find the other doc that's not linked from it, where it actually talks about the errors and the sharp edges. And the, what do you get if you pass a empty string? Oh, it's really nice to zip past that kind of stuff. And you know, 95 of the time it doesn't just hallucinate an api so five percent of the time i'm back in the situation i was in before where i have to go find the readme and dig around and you know like i did earlier on the stream find out search crass does crass raise no okay because i saw that hallucination on my first pass with hacking out a linter And I was like, well, there's a question, but I could look out the exact question I wanted rather than I have to read the whole thing. You know, that linter, if I made the lobsters linter, could I have written it in an hour? No. Like I'm a, an experienced Ruby developer, but there's enough incidental complexity that comes up. No. Yeah. This one, scripting like bash ffmpeg, boy, that one has a big, broad command line. I've never met anybody who, you know, there's that joke about tar. I think there's even an XKCD of can you write a tar command that's valid on the first try? Ffmpeg is like that for me. The overall experience feels a lot like pairing with a junior developer where You have to ask for revisions. You occasionally get nonsense. The latency is a hell of a lot better. Oh, I mentioned pseudocode because one of the very useful strategies with it is to write out, like, to write out the function signature and say, fill this in, or to write out the, you know, I'll write errors dot each error, notice I left off the do. It would catch that and say, you know, if important, log, if syntax, print to standard error. Come here.

02:57:12You know, I could write something like this and tell Vim, like, implement this loop. Very much like pairing with the junior dev where I outline what the task is in pseudocode and they can fill it in. I mean, if you watch the stream regularly, you've seen me do it to myself. It's a great strategy for picking your level of abstraction.

...46I don't have a killer summary besides it feels kind of like pairing with a junior developer with these, you know, the known limitations and the two big caveats of legal risk and what's the other one? Training data is certainly not ethically sourced. But I see a lot of people calling it spicy autocomplete or saying that they get more trouble than it's worth out of Copilot. And yeah, I would probably be, even if Copilot was correct all the time, I would be distracted by trying to read its suggestions when I'm trying to type. It's a little bit of that meme I showed at the beginning where it's like, no, no, no, you don't talk when I'm talking. The ADAR workflow of you command it and it does something and it gives it back, I find that very comfortable. I have tried in a positive way to cover a lot of the criticisms that people make about LLMs where they say, I've tried to cover them in a positive way of saying, instead of directly rebutting the stuff about they suck, they only produce bad code, they can't understand context, they can't find multiple files. If you see those criticisms, they are all valid last year. and maybe even valid seven or eight months ago. They are not so valid anymore. I like AIDR because I think it is the state of the art. pushcx https://aider.chat/
It is the tool that is revising the fastest. It is putting out new releases about every week, two weeks. So I'm throwing a link in the chat there because if you haven't played with it, I think it is very much worth spending a morning, like you have just spent the morning with me, thank you, but playing with Aether yourself. It is worth the $5 to Anthropic to play with it and see what actually comes out of it because there are some big It has limitations. I don't want you to kid yourself that it has no limitations or is a magic wand. But it's pretty darn neat. And I've written, you know, you saw that vim bash. I could probably have gotten there. The streaming layout and the transcribe, I would never have bothered. This stream would be a lot more plain and it sure wouldn't have archives if it wasn't for this tool. I don't know. And I would hope that the legal stuff can get cleared up in a year or two, because I would like to use this more directly in code I plan to keep. We'll see. Anyway, I am winding down. Oh, we're exactly at three hours. Yeah, so that was what, a half an hour of intro, a two hour demo? And then, yeah, it feels about right. Because I said it took me an hour originally, and I run at about half speed on streams. So yeah, it's pretty close. And then half an hour of wrap up covering all of this kind of stuff. I've been adding to this note for a couple of days since I decided I was going to do this, or I've been chewing on this idea of like, hey, this is worth demoing. So my plan for Next stream on Monday is I'm going to show off Recheck, which is the database integrity tool that I've been working on over this summer into the fall. It is a Ruby tool inspired by one that I used at Stripe for checking production database integrity, because I'm not going to give you the talk I gave years ago about it, but The short version is every database with many hundreds of thousands or a couple million records or more in it has just weird one-off errors where there are impossible combinations of state. It is much nicer to find them than for them to lurk and throw 500s in production. dpk0 nice to hang out again. sadly i’ll probably not be around quite as much as the schedule doesn’t work well with my new $ork schedule, but i’ll try to drop in when i can anyway. always interesting to watch – wish i’d caught more today, your thoughts on LLMs as coding assistants were interesting (speaking as someone who is generally negative about the idea of applying such tools)
I have been developing by alpha testing it against the lobsters code base and database. Oh, we have a bunch of bad data. Some of it is bad validations. Some of it is legit bad data. So I will do another big live stream Monday. What is that? 2 p.m. Central time. So three on the East Coast, noon on the West. Who knows if you are overseas? pushcx https://recheck.dev
showing off what that is like and talking about where that's going and then assuming i don't get embarrassed by any giant issues i will start taking beta testers for that so if you check out recheck.dev you can sign up for the mailing list ah dpk well i am sorry to miss you but congratulations on your new work schedule good deal get paid and dpk if you want there will be the stream up thanks to transcribe.rb there'll be a stream up in or an archive page up in an hour or two i have not done any kind of multi-core stuff on the things i don't care if it takes an hour or two instead of you know multi transcribing the chunks anyway Or transcoding the video and audio. That would be another thing that would speed it up because it's CPU bound. Alrighty. So take care, everybody. I hope to see you back on Monday. As always, you can feel free to message me on IRC or Blue Sky or email or Lobster's messages or Smoke Signals. If there is any of this stuff you want to talk more about, I'm happy to chat about it on stream or off. Have yourselves a good weekend. Take care.