the goldilocks diff

Streamed 2024-09-09

Streaming info and archive
← Previous Stream | Next Stream →

Tags: Lobsters performance stream

Following up on the last stream a viewer found a whole bunch of bugs in the performance-oriented code that sorts comment threads.
Work through those bugs and find several more.
Banning these stream pages from the site for inarticulable reasons.
A potential upgrade to the Domain model.
Should we have confidence in confidence, or drop it in favor of score? Channel VIP: dzwdz.

scratch


topics:
  https://lobste.rs/s/vmangd/are_we_living_simulation
  ban this stream from lobsters; meta note
    domain is not granlar - github.com/foo is just 'github.com' is foo.github.io
    'origin' that understood some sites have author-controlled sections
  dzwdz's bugs
    https://github.com/lobsters/lobsters/issues/1318

0. n setup was edited wrong
0.5. deleted comments get score of -10
0.75 return 0 if score == 0
1. clamp to .2/1.2 -> order of operations bug
2. lpad w 030 -> \
3. 65536 replace with 65535
4. -1 -2 comments between upvoted comments -> using score alone
5. is confidence a good metric?
  redefine confidence_order?
  idea; shrink confidence to one byte and expand id to two
  not much preciion lost; fixes collisions

  use around_action for 'new story' line to only update if printed
  a11y https://github.com/lobsters/lobsters/pull/1319
  enforce only admin can ban in controller
  a largesse of PRs + issues!
  running project: repaginate


https://masterbootrecord.bandcamp.com/

dzwdz's pastebins
  https://pastebin.com/y7E9hD46
  https://pastebin.com/fsckYv1B

https://en.wikipedia.org/wiki/Scunthorpe_problem


title: the goldilocks diff
title: five out of our four bugs would've disappeared

post-stream:
  fix reply_comment_spec.rb:104 - failed to break when hiding influenced story score

⊕ Transcripts are generated with whisperx, so they mistranscribe basically every username and technical term. They're OK but not great, advice appreciated.

Recording

Autoscroll transcript

04:01dzwdz oh no the chat already looks fucky
dzwdz you seem to be muted fyi
Hello, let's get started. shindakun hello pushcx
Yeah, just kind of getting set up at the last minute here. Deezy, I hope the chat wants to work out for you, especially since you are kind of the special guest this episode. Oh, hey again, Shindakun, and to any other returners. I actually, I was reading the Twitch Reddit, and apparently there are a lot of people who watch Twitch And they don't want to be seen in like to be called out and greeted by the streamer if they haven't said hello in chat. So there was like a whole norm around it. hejihyuuga hello pushcx hello chat!
I was oblivious to and I don't think I've stepped on anybody's toes. But hello. hejihyuuga !lurk have to go to a meeting
Speaking of stepped on somebody's toes, DZ here is our special guest and I keep wanting to Hey, Hedgy. dzwdz it is in fact derived from a polish word
I keep wanting to pronounce his username as if it was a fine Chicago Polish last name, because we have a lot of Polish people here in Chicago, especially in the 80s when I was growing up. And so to me, I look at it and I keep seeing something like but, you know, that's even for a Polish name that would need a vowel or two. Oh, it is de facto derived from a Polish word. Okay. I mean, I'm going to keep calling you DZ instead of but If you have a preference, I am happy to meet it. dzwdz naw do whatever
So DZ is going to be kind of our special guest because they found a whole bunch of bugs in the site and the performance code we looked at last week. hejihyuuga we love streams with guests
And I think they are officially the first person to really have dug into that performance code I showed. hejihyuuga lobste.rs has a ton of very interesting people in the community
Yeah, the guests are a lot of fun. I'm glad you enjoy them. hejihyuuga it's kinda intimidating sometimes haha
shindakun Guess I better learn rails
all you know and so far these two guests have been the people who are doing big contributions to the site so you know hint hint there's one great way if you would like to be a guest on a stream is do something big but yeah i think i think some of the interesting people on lobsters are kind of intimidating But in general, the overall effect is kind of anti-intimidating because it's easy to read through somebody's comments and be like, oh, they are just a person figuring it out like me. They are not having any kind of special brainwaves. Some of them are, of course, certainly much smarter than me, but... pushcx https://lobste.rs/s/vmangd/are_…
sometimes i see folks and i'm just like oh i see you like six months ago you mentioned you were kind of interested in this topic and now today you're posting fairly interesting stuff and i can see how it resonates with this thing you posted about years ago so it's kind of a one more argument against intimidation there's already lots of reasons not to idolize people but the fact that you can learn to do the same stuff is pretty interesting speaking of the site i wanted to look at this Story early on somebody submitted this and it didn't get a lot of attention. It's it's trying real hard to be a spicy blog post. I'm not going to pull it up, but I left a comment on it. And I think it was trying so hard to be spicy that it kind of turned off the site where it may have successfully attracted a lot more attention on Reddit and HN. I think I saw it on one of those and it had a fair number of comments, but The title seems to be as... Who is this? Epidemain? And I should try and pronounce usernames in my head before I realize I'm going to read them on stream. But this other commenter noted that the title is kind of distracting and gets at a major topic in philosophy, which is kind of distracting. And then on top of that, it was just kind of a... a mean version of a take on, boy, we're dealing with lots of really complicated systems. And I think there's a lot of interesting stuff that can be said around the idea that part of learning is not learning things. Part of learning is specializing and saying, I am going to avoid, so personally, I have deliberately avoided learning Docker. I run Linux on the desktop. I don't really have a strong need for Docker. It is a useful devopsy tool and many developers, especially senior web devs, know Docker because it's very handy for getting dependencies running on Windows and Mac. But I have deliberately said, well, I'm not going to spend a couple of hours or a couple of days getting really comfortable with a thing that has marginal benefit to me. dzwdz (sorry to be offtopic but can someone try dming me on twitch?)
And that has mostly been good, but I kind of have to stay on top of it. And Rails in the latest major update, where is it? In Rails 7 here. Oh, okay. Oh, not Kalam, Kamal.

09:29Rails has integrated support for this, like there's DHH, so you know it's official Rails. This Docker-based tool for managing production deploys, and then of course it also has nice benefits for the... Oh, Dizzy, I will DM you real quick. There is a hello. Oh, I actually can't because It's Twitch's phone number privacy problem. Okay. dzwdz actually nevermind, i'm just not connected to irc at all
Too bad. shindakun yeah i'm hitting the same problem lol
dzwdz something's broken
Oh, Twitch. So I'm kind of reevaluating whether it's worth knowing Docker or worth not knowing Docker to be specific. And it's that kind of keeping an eye on it and being deliberate about there are literally so many hours in the day and so many days in the career. of not knowing things. So I thought this blog post was a little bit of a shame because it's getting at some really interesting topics, but it does it in a really moralizing, these guys are fake developers. I can't believe they don't know what I know and almost take for granted because I've known it for so long, which is a really unfortunate way to look at things and collaborate. Ah, a flawed but thoughtful piece. So that was the first topic. The second one is kind of a meta one. I think this is short enough. We can just kind of jump on it quick. pushcx https://lobste.rs/s/tbkogf/you_…
But I wanted to ban this stream from lobsters. And let me explain why. There's a longer comment on it here. There we go. We'll share that. Calvin, who is a prolific submitter to the site, submitted one of my streams and i think this is the first time any one of these stream archive pages yeah have been submitted since i started doing that twice weekly off yeah office hours and i would guess from having talked with calvin that he was especially interested in how it was a performance deep dive and dzwdz if you can see this then i've managed to join using irc
dzwdz ok great
pushcx we can see it
is meta and so there's a couple of reasons it was especially interesting more than the other streams to make it worth submitting but i'm really really leery of any kind of cult of personality kind of stuff and i don't want the home page to be i think i said the peter show here yeah and so i wanted to even though everything in the stream is very topical for the site

12:26yeah we see your chat if if every stream is topical that's kind of a lot of stuff to post to the site most sites do not get two posts a week unless it's something like all of github where you know we don't see two posts a week from one individual repo or one individual blogger so i figured I was going to special case the site to actually recognize my blog because I don't see the need to block the whole thing. We've talked a little around. Oh, right. I'm already in Vim. There is a domain model. And it is possible for domains to be banned. This was one of the primary reasons I added it actually was we had a very site-specific code for dealing with url shorteners which just came up in another meta discussion tangentially this morning and having a hard-coded list of known url shorteners was non-scalable and was kind of frustrating and so i turned it into this domain model to say oh if we see a link to remember the names off the top of my head like t.co was a popular one that was probably someone trying to there was a user who was deliberately doing stuff like that to evade our code that strips things like utm parameters that are useful for marketing attribution and in that case where it was a marketer who was deliberately targeting the site and trying to work around stuff for avoiding them getting good metrics, it was especially frustrating. So I built out this domain model and it's kind of, it took the philosophy of do the simplest first thing that could work. And the easiest thing was to say, all right, well, we're just going to extract the domain from the website and from like the link submitted and say, is it banned? And then otherwise it had the nice feature that you saw flicker by in my browser a second ago, where you can browse sites Browse links by domain, but then otherwise it's really just used for the banning. And since I wanted to ban my stream posts without banning my entire blog, cause you know, I will occasionally write blog posts that are topical. This is me maybe doing a little bit of special casing. It's in the same vein that we used to have. of blocking the tracking sites where story used to have this specific code. And I can see the domain model evolving so that it's smarter about origins. dzwdz you could maybe do stream.push.cx too
So when I say origin, I mean things like domain is not granular. GitHub.com slash who is just github.com and so this is one way that a lot of mostly peacefully new users kind of step around the new user restriction on unseen domains and i think we could actually lift that a lot if we had like a concept of an origin that understood some sites have author controlled sections And again, GitHub is the big primary one here, but there are other ones like Mastodon instances. Something like stream.push.cx, yeah, that would be reasonable, but it's dzwdz "using my privileged position as an administrator to ban myself"
And I realized this is me slightly using my privileged position as the administrator, but it is much easier for me to land a five line patch on lobsters than it is redo my blog hosting setup, because that would be that would be so much more than five lines and and five minutes on stream. yeah and using the privilege position to ban myself like one of the reasons i feel pretty okay about this is i am trying to avoid giving personal benefit to myself rather than giving a personal benefit so maybe the direction that it's valid is or is limiting is a little bit better so where is the if you're all present In here, we have some stuff.

17:29There are a couple of sister sites that use this code base, and it's a little funny to ban my streams from all of them, but, well, they can cut out the code there.

...47so if you haven't the link i gave in the chat with the comment talks about why i'm doing this a little bit more that the idea is also we can just have meta discussions in the weekly what are you up to threads rather than directly on the stories and that's really the only thing where it seems worthwhile to

18:17Yeah, to kind of work around this. dzwdz i mean what you've suggested is a pretty good idea too
So we're going to just say. btd_12 currently learning python any tips or mistakes that you made as a beginner in coding in general
This. And rather than say and domain. We will say and.

...39Where is it starts with. I never remember if it starts with or start with plural. I guess that's a reason to make a test.

...55You want parentheses? What are we, what is the lintermat about? byby42 Ruby: start_with?, Rails adds starts_with? as alias.
Unexpected tokens, T string. Yeah, it's parentheses. And then we will say Ruby starts with, Ruby adds starts with. Good memory. Thank you. I will go ahead and use just the basic Ruby one then. This is like how I couldn't remember there was a dot second method the other day. There's so many of those very wide APIs in Rails. The other version of Rails adding this starts with, with a plural S is over in bash, a bunch of my typos. So let's see, what do we have here? So if we say cat the readme, dzwdz iirc there's a multiuser site (/~asdf) that's banned because of a single user
I very often typo grep as gerp, and so I just aliased it in bash because, you know, it's just me. It's my typo. Why do I want to see an error? It's unambiguous what's going on. I'm a big fan of just aliasing your typos. dzwdz i'm trying to find it
dzwdz no, it's a smaller one
In that same way, when I have notes in a file, a multi-user site, ASDF. Let me look that one up. I don't remember this one.

20:22dzwdz no no
btd_12 my typo is sudo as sudi and nano as nani so when i edit a write protected file with nano i accidentaly type sudi nani
dzwdz yeah
Yeah, ASDF is, oh, I bet, oh, you're saying not a user named ASDF on our site. dzwdz bitcoin is like a- yeah
you are probably thinking of i'm going to set off auto mod this week you are probably thinking of the bitcoin is like a hand job story what was that that was data swamp yeah data swamp was it com io i don't remember the extension but it was something like data swamp dzwdz i was reminded of it because i saw "Programming is like s*x" posted from your site
dzwdz so hmm
yeah someone wrote a really weird and off-putting blog post with an extended metaphor about sex and i dzwdz also ffs automod
I had to ban the user because it was really clear that if I left them, they were going to be one of those people who plays the I'm not touching you game with any explicit boundary. And the whole thing was just super weird and off-putting. Maybe I'm being an American Puritan about it, but it just didn't feel good. Even more than it even vaguely mentioned sex, which as you can guess from... DZ's get thing about, I've made a programming sex joke before. A lot tamer than that one though. Hopefully not gross. And the only tool for the job is a domain ban there to prevent more stuff from their blog. And I remember looking through their blog and seeing a couple more things that were, they felt very much like trolling and not in the sense of, saying the most incendiary thing but in the older sense of saying deliberately wrong or slightly spicy things to try and prompt arguments and i later learned that it was a multi-user blog and i talked with the admin of data swamp and i was like is this normal on your site and they said that oh btd i just see your notice about pseudo sudi nano yeah that's great that is exactly the thing i do and sudi nanny is pretty funny so i talked with the admin of data swamp and they basically said it's fine by our moderation standards and i said well i have to expect more of this from your domain i don't want to unban it but it was dzwdz bandcamp?
not great because i don't think there really were any other off-putting and problematic users and i couldn't really see that author going for more than a few months because i don't know the the shock jocks tend to burn themselves out pretty quick so that was one more example of it would be nice if we had an origin model that was based on that could handle things like subdomains so example like almost nothing from bandcamp would be on Oh, actually, another one from GitHub is, you know, boo.github.io. Yeah. So these, all of the subdomains, we treat as different domains because we take the whole domain part. Yeah. dzwdz bootboot
And Bandcamp, if you want to visit an artist, all the artist pages are like masterbootrecord.com. Bandcamp.com, just to pick a fun one off the top of my head. No, I don't remember it. Oh, not BootBoot.

24:09pushcx https://masterbootrecord.bandca…
dzwdz no i was just wondering how bandcamp's relevant
Yeah, I like this guy. dzwdz new single out recently
He does a lot of... It's somewhere between chiptune, metal, and baroque era classical music. I really enjoy his stuff. And he covers a bunch of old video game music that I knew. New single. Ooh, Lucky Us. I will have to check back after. dzwdz not *that* recently, you've probably heard it
So without yet a more sophisticated idea of origin, I'm just going to put this in. dzwdz last week
If I had origin, that would be valuable, but I'm kind of waiting for, now we have had like two, two and a half places where it would be useful. I kind of wait for last week. No, I don't usually look for new music every single week. So I haven't seen a single in the last week. And so now that I've got like this and data swamp and kind of GitHub, I mean, maybe we can see our way toward the pattern, but I didn't want to... I don't know. It just doesn't feel ripe yet. Like there's quite enough need to justify the complexity of a feature. If somebody is feeling real interested in it, I don't remember if we even have an issue open about it, but... No. But that might be a thing worth breaking out at some point. So the reason this error is a sentence fragment is this is going to show up on the story submission form. So I want it to say something that appearing under the URL field would be useful. I am pausing a second to watch my mixer. Yeah, there's somebody nearby who is power sawing metal. It's an incredible noise, but I guess I have the noise suppression or the noise gate set up properly because the mixer isn't bouncing forward at all. So I get to hear the dulcet tones of steel on steel, but you just have to put up with me. Not so bad.

26:35So I am tempted to just say... You know, I have this longer explanation right here. So why not just link to it? It's not going to be hyperlinked in error messages, but we don't have to include all of that.

27:11So it's going to generate a sentence fragment that starts URL dot, dot, dot, and then this is the dot, dot, dot. And so I would really like to write a sentence, but I'm sort of constrained by this. And as much as I am happy having this five line hack, I don't want to have a further hack into the form because I am all kind of looking forward to when I break out the larger origin model or somebody volunteers to add it. It will be very easy to pull this out by just this one little hook.

28:04That's such a set of details. There we go. dr3ig you'd have to ban twitch.tv/pushcx as well ?
And before I add a validation, I should at least check that I can see it. We've had similar code. And you saw how I based this on the check not banned domain. So I feel really good about the overall logic and flow of this. But I would like to see it at least once in local development, even if it doesn't quite seem to be worth starting a test over. Because again, a test is just going to be more to pull out. And if I hit login, what's going to show up? Oh, so I've refreshed some of this test data. So let's go to... Oh, it refreshed back here. Okay, good. So the reason for that little bit of dancing around. Oh, yeah, it does make sense to ban the Twitch host of the stream as well. Yeah, let's pull that up real quick. That's a good memory. See, now it's starting to get complicated. I don't know. There's something about adding the conjunction that makes me wonder if I'm overthinking this or it's not quite as simple as I wanted. And I think for their video URLs they all start with the channel name, so that would probably suffice. I hope. Standard, what are you mad about? Do not use mixed in an unless all right so let's go ahead and say return if there's no url or the morgan oh i can pass multiple things to start with all right never mind let's go back to the unless standard rb actually solved it while i was starting to rewrite so let's make sure it's refreshed Yeah, I, I did not know that start with took multiple strings. That's super convenient. And there are probably places in the code base that standard RB has cleaned up when I installed it. Clever.

30:49Hopefully that is good enough. I'm sure there's like a m.twitch.com and a hundred other things, but I'm okay with catching 95% of things where It's not like a commercial thing with money on the line so that I have to catch 100% of things lest it be a customer support burden. So this fetches the title. Yeah, that's fine. Let's just say blah preview. URL is tomato. We don't need twice a week details. Yeah, that's reasonable. Maybe it'd be better to link the exact comment. So let me grab that and I'll show you an undocumented feature of the site. I think it's shown up on stream once or twice here, but if you have a story, a comment URL, it has the full story URL with the title. And it's probably pretty obvious that you can chop out that to link to an individual comment, but also we have a redirect set up at slash C. And so that is a valid URL for a comment. So if I grab this and I reload, I should see that nice. I guess I have to refill it because I was in a funny state. Okay.

32:29Yeah. Yeah, it's not linked, but this is reasonable, right? I feel like programmers are comfortable with URLs. So it's not perfect, but this is like an 80% kind of feature of let's just keep the meta to a dull roar. byby42 is to meta -> is too meta
Let's avoid any kind of cult of personality thing. Yeah, I think that's good. Okay, standards.

33:05I'm going to just grab that to reuse.

...19dzwdz the overall phrasing is kinda weird too imo?
Like that, I can just grab that right off of the error message.

...29I almost don't know what to call it. Is the overall phrasing kind of weird? Yeah, it is kind of weird. Because it has to start with the URL. URL is to meta. We don't need twice a week. I'm struggling right now in the commit message of what do I call this? Because it's not self-promo because the submitter is not me and would not be me. But it's just this weird cult of personality kind of thing that I'm really uncomfortable with. And I don't... gtfrvz man i can't make a twitch stream called pushcxcy and post it
byby42 Ah. As in, it's posted to talk meta. I see.
totally know how to articulate is too meta oh bye bye i see what you're seeing i was saying meta the noun like this post is meta as opposed to it's too meta the adjective god i could read that either way yeah so like adjectively or a noun man i can't make a stream called push yes correct yeah this has all kinds of bugs and i deliberately didn't put the dzwdz gtfrvz: trademark enforcement
gtfrvz ))
byby42 I dount many people will read this anyways :p
trailing slash on there because you could have any kind of question mark blah afterwards hmm so what's the really i guess what's what's clunky here is that meta is both a noun and an adjective and then also i don't have a noun phrase for talking about what's wrong with it i doubt many people will read this anyways i hope not but On the other hand, Hey, the site owner did a thing and you know, on stream once or twice, we have pulled up the stories with the most comments or the stories with the most upvotes. And if you look at that stuff, any kind of meta or announced post is very heavily represented, which, which makes some sense. It's kind of the lowest common denominator. Everyone on lobsters to some small amount is interested in lobsters where dzwdz "If you see this message me" - also gets you out of having to write a real error message
Not everybody visiting the site is interested in Rust or Rails or... I don't know, something else interesting that starts with R. And so... kind of everyone is a potential reader for announcement and meta things. If you see this, message me. You know, I understand where you're going there, Dizzy, but that's one of those where... rather than get a message, someone will submit a meta story that says, why did I get this message? I don't know. I've seen that kind of thing happen before. So I tried to write about it here of like rule, but I didn't even have a, I mentioned that two posts a week is a lot, but I also don't totally know how to understand the thing that makes me uncomfortable about having promotion of myself on the page. So I almost don't know what to call this. And I don't want to totally rabbit hole and spend the entire stream figuring out one error message, especially because I already slept on it for a couple of days and I didn't come up with a better error message. Since we're not immediately getting one here, which, chatters, I appreciate you kind of spitballing some things here. I think I'm just going to put in something bad and wait for someone to complain and then I'll improve it some more. That's okay sometimes. See, yeah, adjectival is probably easier to read. dzwdz comma after meta?
Especially if it was like twice every week. Let's cancel that because instead submitted it. Okay. Because that was a partial thing. Yeah, comma or semicolon.

37:45Let's amend that commit. Cut down on odd metachroma. Let me just say uncomfortable. It's the best description I have.

38:01There we go. I will meet the tyranny of 72 columns or 78 columns, whatever that is, git. All right. That's reasonable. Enough with the meta. Let's talk about how the site is implemented. So on Thursday, we did that big deep dive into how the performance-oriented threading code works. Easy boy. The cat is on the desk, and he's startled when I move abruptly. He's kind of a big scaredy cat. And after the thread, DZ and I had a running conversation for that day about their reading of the code, because they went and did a deep dive into the code, more than just listening to me ramble about it. And there were a bunch of places that they asked very insightful questions like, what's up with this arbitrary number you're multiplying by to clamp to? And I said, well, you know, I saw numbers in this range of 0.2 to 1.2. dzwdz (it's a probability)
NDZ immediately said, I think the range is supposed to be 0 to 1, so there must be a bug. DZ ended up catching four independent bugs, which is, it's a probability. Yes, from zero to one, that's why. So on one level, I had to do some emotional processing of this complicated code I had felt very proud of and clever for writing. has a ton of bugs in it. And also, I was feeling good about the amount of tests that it had, because I wrote quite a few for this, and I spent a lot of time thinking on like test ROI and am I testing the right amount at the right level without burning too much time implementing extra tests that are just going to be dead weight I have to maintain. And I do this all the time on on programming. dzwdz i've stated this a few times but your code was mostly fine
dzwdz iirc the only issue was in serializing the confidence to bytes
So like the scripts that power these stream archive pages, those have no testing in part because they're just all firing off shell commands and they get used exactly twice a week. And if they throw an exception, I'm right there to fix it. You know, it has no users. There's no big win there. But this stuff, I felt like I tested the heck out of it. And instead, there were a couple of places that were golden master tests that could have caught some of these bugs, but did not. So I think there were four bugs that were worth talking about. Let's pull them up.

40:56I'm going to just kind of fill in the structure. So the first one was clamp 2.2, 1.2, which I don't remember if this was the same as the order of operate. Yeah, that pointed to the order of operations bug. which was very junior high algebra shaped. And then... Oh, and I don't feel like I said it enough, but I wanted to highlight that Deezy did some excellent work spelunking on this and asking thoughtful questions. The way this code kind of jumps up and down in levels of complexity and then... dzwdz i got nerdsniped for 2 days lmao
hinges on doing a clever thing with treating a string as an array of bytes made this very complicated and i was thinking through and i realized you must have spent like two hours just reading and trying to understand all the things the code did before any of it made any kind of sense why any of it worked the way it did so i wanted to especially two days oh yeah i mean just two hours to get started by the way before any of it starts to mean anything so i'm very glad that you you gave into that nerd snipe because there's a whole bunch of fixes to make and then they've been hanging around since i originally wrote the the clever threading code and none of them are huge well the the miss sorting confidence to bytes. Yeah, so that was the other one. LPAD was appending the wrong value. That one's really easy to fix. And then that kind of contribution where someone really thinks about what's happening in the code and reflects on it is valuable if it was simpler code people could have found these earlier but it had that that big hill you have to climb before any of it makes any sense so worth calling out so then i think your third bug was dz which of the bugs is it that put negative one or negative two comments between uploaded comments. dzwdz that was the z= one i think
I'm not correctly summarizing that. I believe that was at the end of your first post. Yeah, here we are. It was the Z one.

43:53dzwdz but it depends, there are two reasons for that
Oh, that was the other one. Yeah, it wasn't just an order. That one was. Yeah, two reasons for that. That one was the. Using score. dzwdz z= underestimates the confidence, bringing it below 0
And then I was pretty sure there was a fourth one. dzwdz and deleted comments overestimate it iirc
Or maybe I'm thinking for these bugs posted overall to. The number of these bugs made me wonder if all of this complexity was worth it. Z underestimates the confidence, bringing it below zero, and deleted comments overestimate it. pushcx https://github.com/lobsters/lob…
dzwdz because they bring n<0 in the confidence calculation which empirically seems to give you a confidence > 1
All right, we can take a look. So there's the, I meant to share. Here's the disease bug report.

44:53Yeah. And zero, which I'm not going to try and summarize DZ's comments here, but yeah, the math was bad.

45:14So let's look at the actual coded hand here. dzwdz you also probably meant to subtract from 65535, which i also mentioned in the issue
A lot of it is in the comment model. there is this method calculated confidence, probably also meant to subtract from 65. I don't think I want it to subtract. So it's the question of, is it two to the 16th or two to the 16th minus one? dzwdz if you subtract from 65536 it's not monotonic
And I'm pretty sure I see you tweaked it in some of yours. dzwdz for a confidence of 0 you get 0000 instead of FFFF
but I'm pretty sure it wants to be two to the 16th so that a score of zero stays in the right place. We can double check that. Let's put it on the scratch.

46:18Ah, for a confidence of zero you get, yeah, there are like, dzwdz 0x10000 - 0 = 0x10000
How many places are there in this code base that we're having integer underflow or overflow? I think it's, what, four here? Three of these are sort of? dzwdz well in fact you get 0x0100 because lpad is stupid
Maybe not the order of operations one.

...47Yeah, so DZ is doing the math here that, yeah, it is. We can take away that question mark.

...59dzwdz and truncates "from the left" for whatever reason
So right off there's, and then let's, I'm going to swap these around in order because it makes more sense to fix these bugs or at least look at these bugs and truncates from the left. Oh no, that's terrible. It's supposed to truncate from the right. I don't want to wrap that in another. So let's kind of look at this from the inside out of what is confidence and then back that out to the threading queries because i think if we start from the math it'll be easier he says is there's like you know nine lines complicated equation so what's going on is if we just sorted by score how many upvotes minus how many flags are there we would have a fairly naive ordering of Comments and there's the opportunity to do better by saying well if we count the amount of Upvotes and flags differently We can get more information out and this is that sort of thing where dzwdz no best viewers are here in fact
if you go on Amazon and you look at products and there's something with a five-star rating, but only five ratings versus something with like a four-star rating and 500 ratings, you're gonna choose the lower rating because average is not super important when there's not a lot of signal from the volume of reviews. And so Evan Miller wrote a very nice nerdy piece Ah, man, it must be 15, 20 years ago now about that issue and similar issues with sorting by user ratings, whether that's a five-point scale or an up, thumbs up, thumbs down kind of thing. Yeah, I just have stopped noticing the best viewers. I assume that Twitch has... the naive spam blocking that would catch all the kind of stuff, and they're doing something with Unicode. But we see one or two of those every stream. dzwdz yeah the message looks fucky in my irc client
And it's tempting to say, oh, why can't they get even that easy stuff? But I also assume that there are like 10,000 of those they're banning that I'm not even seeing.

49:28Where's their conference function?

...35Message looks fucky in your IRC client. Yeah. And there's Unicode canonicalization to catch some of that if Twitch is doing the naive byte string comparison. And then there is also what if we kind of smash it down to ASCII and then run spam filter on it as an option where you get rid of lookalike characters you do that. And then the other thing, and Twitch is international enough, you can't just be like, look, if there's weird Unicode code points in here, it's probably spam. So they don't want to do that because it would ban anybody who uses a non-English or a non-popular Western language.

50:23So the first difference here was you can see this implementation in reddit which is python which is very similar to ruby and this is what the original author used joshua or jcs and it got copied over here and it got mutated a few times and then eventually score went in place but it's not supposed to be score it's supposed to be the number of flags or the number of votes plus the number of flags dzwdz votes + 2*flags, actually
or specifically, and let's kick it into float, the number of upvotes. dzwdz oh no yeah right nvm
And I don't remember if, votes plus two times flags, yeah.

51:25dzwdz but also using the score isn't correct at all
if we go up to the comments table the way it's defined it used to have this is going back years but the score used to be a field that was called upvotes and then i don't remember that it even memorized the number of flags it had or maybe it just memorized upvotes and downvotes and since we want to have a count of the This is sort of thinking of it as the count of the number of user interactions. So if we define score, the thing that is useful to us, as up minus flags, and what we want is up plus flags, we can get back to that by saying, how many are there total multiplied by two?

52:26And I hope the arithmetic there is obvious enough I don't have to beat it to death. What am I typing? Hard to type and talk at the same time.

...48And then this can slightly simplify to times 2.0.

53:04So we're kind of backing that out. And the reason I do that by these attributes, score and flags, rather than go hit the database is score and flags are memorizing those things. And I would rather use that immediately on the record than have to go and ask the database for things. It is probably in memory already, but dzwdz ah yes, negative amount of opinions
Rails doesn't really have any abstractions for I care about this object and its associated records. So we'll just use the cache. Everything is cache and validation is a hard problem. dzwdz unless you're intentionally hinting at the bug
Yeah.

...57That was the fourth bug. dzwdz you didn't, n can still be negative
So I've dealt with it first, but I had tried to treat this calculated confidence as a black box where I assumed it was correct. dzwdz which it shouldn't be
Unless I'm intentionally hinting at the bug.

54:32N can still be negative, which I shouldn't. Have I reinserted a different version of the bug, Dizzy? What are you seeing here? Because I was going to go talk about the order of operations bug. dzwdz the score of deleted comments gets set to -10
I thought this was what it needed to be, because if we have the score plus the number... this need to be just to be explicit the score of deleted comments gets set to negative 10. oh that's a all right so we're going to have a 0.5 in here let's take a look at that code because i haven't seen it in a minute

55:26did counters where does score happen ah so the goal was of this was to shove deleted comments down in the sort order and then that kind of

...53gets a little weird here because it's a score, but there are zero flags. So if we came in here with a score of negative 10, we would have negative 10. This is just a plus zero. Well, this could be any number.

56:17That's unfortunate. So there's two ways now to get to a score of negative 10. I suppose the primary way is deleting the comment because that is way more common than a comment being so heavily flagged it gets shoved down to negative 10. That's a rare thing. It only happens a couple of times a year. Deleting happens almost daily, I would say. So DZ, how do we want to handle that? Since we don't want N to be negative, but we want to have high confidence.

57:03dzwdz this seems like a nice opportunity to fix the way the score is handled
dzwdz remove the self.score = FLAGGABLE_MIN_SCORE line
And you know this, so this is here because users can upvote it, but that's another bug. Oh my God. This is like programming in BASIC where you have to reserve your line numbers. Remove the self score flag will min score line. I see why you're saying that, but what's going to sort these to the bottom? dzwdz pretty much yeah
You're saying just do it in calculated confidence? arh68 fractional line numbers, why didn't I think of that HahaThink
That seems actually a heck of a lot cleaner now that you mention it.

...51Fractional line numbers. Yeah, man, in the 80s. So for anybody who didn't get that reference, you used to have in basic programs, each line had a line number and you would just manually type in the line. dzwdz so that's 8 bugs already
And so occasionally what you would learn to do is write your statements as 10, 20, 30, 40, or 100, 200, 300 to give yourself room to interpolate more numbers. And I didn't do this here. dzwdz i wonder how you will give me 8 vip badges :p
And so that's why I'm into this. And it feels very nostalgic for me. So that's eight bugs already. I don't know. By my count, it's five, right? Oh, right. Yes. dzwdz ah 5 badges will be easy
I meant to make... I was going to do it as a... So I had planned that whole big speech thanking DZ for their deep contributions. But an ARH, I'm sorry to say, I think they have leaped past you in terms of how many bugs they have found. Although this last one here... I suppose I found. So if we got rid of this, up here, we could just express what we want, which is return 0 if deleted. What was that code? Is deleted.

59:12And so if we handled it there, I can already see the migration for this it's going to take like 20 minutes to run so. dzwdz you can be nasty and just return 0 if n < 0
We want to say we want to just get rid of this, so there are two ways to get to a score of zero, and this was thinking no votes. But what the other way is that's there is. hmm. The other way that's there is if you have the one submitter up vote and one flag that also hit zero so then that drops you to confidence zero, but then, if you get a second flag, this would be minus one and then be scored above. So that's just out of order.

01:00:09dzwdz this is actually in the original code too
dzwdz but it is supposed to check if n isn't 0
dzwdz because now you have a division by zero bug
So you're suggesting to be nasty, we could just say return zero if n is less than zero. But I don't think it's possible anymore.

...32What combination of, well, I guess if you had no votes and yeah. dzwdz no i was just thinking about the comments currently in the db
dzwdz that will have an n<0
So if it's deleted or you were thinking about the comments currently in the database. Yeah. We're going to have to recalculate all of those. This is going to definitely involve a migration. dzwdz at least check n=0
that will have an n less than zero. I don't think any of them should be able to now. Because we're going to say, we'll just bail out and give zero. At least check n equals zero.

01:01:22Neither of these can be... Oh, well, score could be negative. Man, I almost want to test. All right, so... All right, let's go ahead and say return zero if n is zero.

...53That is not a great error message. You know, error message error. dzwdz "Incorrect cached score"?
But at least it captures the spirit of it.

01:02:26argument error like it is not literally true in the sense that these are arguments but they're it is figuratively true in that these things you know every object has itself as its sort of global variable so i'm okay with that class of error incorrect cache score yeah but i would rather explain you know what do we think this is

01:03:02dzwdz i wrote that before you wrote this one, yours is much better
So that's what, one, two, three of these bugs? Yeah. So let's see the really pernicious one. This one, this one's funny. It's so subtle because DZ pointed it out to me and I had to have them stop and go back and explain it. So if you look, we have this in the python which okay you know i looked at this and i was like all right the parentheses nest a little bit differently but they're doing the same thing and then dz pointed out no they don't so let's add those same external preferences parentheses let's match some of this spacing so okay it's getting clearer

01:04:05So the difference here, let's say Reddit on top so we can keep track. And this is a no-op, so I'm going to go ahead and make that because over on the Python side, they've annotated that to be a float, which is why they don't have to do that here. And I've changed it here. I will note that it needs to be. So maybe rather than change the Python, I will get rid of it from ours. But if you are eagle-eyed, you can see this is 1 divided by this sum of twice n times z squared. But that's not what it's supposed to be. It is supposed to be this and then the z squared. This one is... This one's so painful. So I mentioned earlier that there was odd clamping math with me seeing in production a range of common confidences from 0.2 to 1.2. And that's because there's just your basic algebra order of operations bug where, no, this actually wants to be the exact same as Reddit. dzwdz i'd say that both of those functions are just not written that well - (z*z)/(2*n) would be much better
dzwdz right= is even worse
So it's not 1 divided by this larger number, it's 1 divided by this number and then multiplied by z squared.

01:05:55Yeah, it's possible that both of these questions' functions could be written better. This is one of those places where programming notation, though inspired by math, does not exactly line up with math. Because he shows the formula as this, and then a big chunk of what happens is Reddit just called it, why did they call the right side the left? But they just referred to where it is in the, equation physically is it on the left or the right neither of those are especially meaningful names and he gives the formula implemented in ruby and i am not stealing this because this is just equally unreadable and so given the given my choice of unreadable ruby functions i am leaning towards the one with slightly less code churn It's not a principle decision in any way.

01:07:09Yeah. So this section is under, and then this is left and this is right, but then we're kind of glossing over the plus or minus somewhere along the way. Yeah. Yeah. albynton Hello!
dzwdz the +- is you deciding if you want the upper or lower bound
You almost wish that I could just take the LaTeX for this and paste that into the code, but I mean, even Miller here... Hey, Albinton, welcome. dzwdz oh lmao i thought that was fat phonetically
Even Miller here comes up with some bad variable names, like this is called p-hat, which I understand that this is like p prime, but p-hat is... Doesn't mean what it... It doesn't mean anything. It's just referring to his typographic convention in the same way that left, right, and under. Oh, you thought it was fat? Yeah, like it's a early 90s rap song. It's fat, yo. Was that referring to people? I don't know. I'm not going to bring up the Urban Dictionary for PHAT fat right now. Because, you know, Automod might slap me for bringing up obscenities. Automata is a little bit schoolmarmish, but I kind of tuned it that way because I couldn't see us getting into, you know, this is not an advice podcast with frank discussion of sexual topics or something. So I figured it was okay to leave it on a fairly prudish setting. dzwdz Potential Hail At Twilight
I didn't think it would be so prudish that it would get mad about rails. I understand this line. All right. So that is... That is, what, four of our five bugs down? dzwdz also there's a less sfw definition
And by five, we mean eight? In the same way that Hitchhiker's Guide to the Galaxy is a five-book trilogy, this is a half of our five bugs having done four? Yeah, let's not share the less-safe-for-work definition, please. We don't need to go there. All right, so... This calculated confidence, if you didn't look at the error message, what it's trying to do is generate a number from 0 to 1 that is a confidence in how good this comment is. albynton I agree with dzwdz, z*z/2*n would be more obvious in how to read the formula
And if lots of people are upvoting something, the value should be close to 1. If lots of people are flagging something, the value should be close to 0, or actually at 0. And we just want to put generally better comments first, where better is judged by the readers. And then we are going to... Edmonton, I think you were probably right, and I agree as well. connected a second after i said it but i am trying to not churn this code too much because it's already painful enough to spelunk the history and i don't see that it gets us to a place where this formula is super clear there is something to be said for this expression of it in ruby that's just there is no attempt to decompose it it's just one incredibly dense line I mean, I appreciate that of, yeah, look, you're, you know, it is the the code style equivalent of you are not expected to understand this. albynton okay, that's understandable
And I would like to go back to treating calculated confidence like a black box, which I guess is one more reason that I'm trying to not edit a bunch of stuff and leave it generally just leave it alone. A benefit we've we've actually had here by leaving it alone is we caught we DZ caught this particular bug because they compared it against the Reddit implementation and realized that this specific line was wrong. So I think mirroring their slightly clunky variable names has actually worked out well for us. If we had put this in and there was some small difference because I decided to rename p-hat, I type out, or I added parentheses, or RuboCop added parentheses somewhere along the way, we would never have caught that. Not in a million years, right? Well, maybe. Because part of this came out of DZ noticing the domain for the function. The output was wrong. When I say domain, I'm definitely not referring to domain names. I mean in the mathematical sense of what is the range of possible outputs from the function. So that leads us to our next bug, this clamping to 0.21.2. Let's set that aside because this has a couple of long lines. So there's this other method. I swear everything is cache invalidation. This other method updates score and recalculate that goes and fetches the... No, it doesn't go and fetch anything. It's intended to act in a very lightweight manner when individual users upvote comments this function should run one each and it tries to do a single update query to the database rather than have the expensive round trip of doing multiple queries where if it were selecting the number of score the score and the number of flags rather than taking a delta it would be a little bit more robust but it would be a lot slower on a thing we do I don't know offhand. I'll say thousands of times per day. dzwdz i'd try throwing an exception if the confidence isn't in like [-0.1; 1.1] too
And thousands of times per day is not a lot to do things, but it does tend to be very unevenly distributed throughout the day where someone makes a very good comment on a very popular post and we get a ton of votes all, you know, a half a second after each other rather than rather than them being evenly distributed throughout the day. That's one of those big things of any time there is user behavior, you cannot just divide by number of seconds because everything will clump up. Everything in user behavior is power laws. It's kind of like a couple of streams ago, we noticed that on the story hiding feature, there was someone who has hidden literally half of the stories on the site, something like 60,000 stories out of the 100-odd thousand stories submitted. dzwdz or maybe even some more margin for the floating point inaccuracies
Everything in users behavior is power law. So Dizzy has made another good point, which is as long as we don't have a lot of trust in calculated confidence right now, it would probably be good to make it a little more paranoid. So let's assign this to attempt variable. dzwdz and then return a clamped value
And Ruby is not one of those languages. Yeah, we can just say if.

01:14:38I never remember the range syntax. So if I said 0.1, is it include? dzwdz (i did see out of range values for correct code)
Okay. And then if I said 1.5, it would say false. If I said 1, 1 is true. But with dot, dot, dot, it's not. So I always have to double check these range kind of things. So I'm going to say if the range 0.1,

01:15:11That is, boy, Ruby, that could be clear. There's so much going on there that I'm going to break it back out to its own line.

...28And this isn't, I don't know why I copied this. It's not that I need to know the syntax for raise. So I want to say raise, well, really, it's not argument error. implementation error, but... Oh, is there a... Is standard RB catching that there's a nicer way? Use range.cover instead of range.include. Okay. Why don't you correct that for me? No?

01:16:07dzwdz at least i think i did, not sure now
All right. You did see out of range values for correct code. Do you think it would be better to return a clamped value than throw an error? Boy, it shouldn't be possible. dzwdz i would raise an exception or at least log values that are "too" out of range
This is one of those places where I don't have a lot of the trust of this code. dzwdz and then clamp it
And so I'm trying to write this post condition. However, I don't want to blow up production. And it is rude to blow up production. On the other hand, if I just log the error, we'll kind of never see it. So the two options are clamp or raise.

...58dzwdz if out of range and user == "pushcx": break site
Oh, the answer is raise. The answer is clearly raise because one of the things we're going to do is write a migration to One of the things we're going to do is write a migration to reload, recalculate the confidences for all comments on the site. And if none of our, however many hundred thousand, half a million ish comments break this, well, I've just built half a million pieces of evidence in favor of it works. So I think I'm okay with raise. And if the migration cannot be run in prod, then we can come back and think about clamping.

01:18:09dzwdz you're putting a lot of trust in the accuracy of float math
Yeah.

...18It's less that I'm so DZ says I'm putting a lot of confidence in the accuracy of trust, or excuse me, a lot of trust in the accuracy of float math. I'm kind of not because over here, Yeah, we have 518,000 comments in the database. If none of them throw an exception, I have a lot of confidence that this formula is correctly implemented.

...54And really, the fear here, I guess, is that The float errors are going to accumulate and either underflow or overflow. And it's... I don't think it's going to get rounded. I don't remember if float rounding bias is a particular direction. I almost said that a little more confidently than I needed to. No pun intended. Let me leave raise in. I think if there's an error, we're going to see it. All right, so many lines of code. This is a long line, but... Yeah, I'm almost tempted to put the variable assignment inline, because Ruby would let me, but no, that would be bad.

01:20:02I wonder if standard will let me do this. Nope. Yes? Because this is what I'm trying to say. If our output is reasonable, that's fine. All right. dzwdz that negation seems unnecessary
dzwdz could swap the branches
So with that in mind, confidence then shows up in SQL, and it shows up The negation seems unnecessary. dzwdz but this is such a nitpick
The exclamation point could swap the branches. Oh, that's true. That might be a little clearer. Yeah. Standard would remove this as an implicit return. Yeah. So I'm just going to leave that off. It is okay to nitpick this code a bit, especially if I'm getting ready to move on and you're nitpicking just as I'm moving on, that it's kind of the perfect time to do it. If you did it while I was implementing, it might drive me up the wall, but you're hitting a great amount of and a great timing of nitpicking right now. Confidence is in the range 0 to 1. What is this confidence order thing with the giant comment explaining it on top? And the answer is, I almost want to say the answer is our daily double, but trying to get to this part at the end.

01:21:51In SQL, When we sort by things, we can sort by numbers of greater than, less than, right? Ascending or descending. But we're not sorting at one level. We have comments where at each level, the comments are a tree. So let's... And do I have a spare? Come here. Please don't be too spicy. All right. so we want all of the comments that are on the same level of reply like this first one and this one from pop-tart to be sorted based on their confidence but inside of them these child comments do not get compared to each other when they are being sorted and that's where confidence order becomes confidence order path so you see this confidence order MariaDB does not have an array type. And so to represent the tree of, and when I say tree, to represent the path of each comment's parent, grandparent, so like PopTart down here has a grandparent comment that is also by them, the sorting of this comment depends on the sorting of this comment. And we need to know the path, the unique series of all parents, all direct parents of the comments. And so I called that confidence order path. And confidence order here is the three byte value that includes the confidence and tie breaks a little in a way we'll talk about in a minute, I suppose. But it has to do all of this... What's the right term here? All this kludging into and out of a string. And all of this conversion is because a string is an array of bytes, and I needed an array, and so I just used string. And there are... Let's see. DZ caught two bugs in this? yeah so the simpler one is where i pad numbers with lpad it is padding with the ascii value of the string of the character 0 which is going to be 30 instead of the ascii value of the null character so let's let's show that So if I said select zero, I get the integer. Let's put column heading on that. Say backslash zero, I get the null byte, right? So ASCII zero. So let's add to that. Let's say, let's LPAD. What's the order?

01:25:15So LPAD A out to two characters.

...28So the difference here is, am I getting padded with that or am I getting padded with null? And this becomes real obvious if you look at it with hex, which if you're not specifically picking values that will show up nicely on screen, comes obvious. And where this is a bug is as I'm transforming it into a string, if instead of putting in 00, I'm putting in 0x30, comments with a especially low score in their confidence are going to be sorted above comments with low but reasonable. So to state it another way, If a story is at like minus one or minus, or excuse me, if a comment has a net score of minus one or minus two, because it's gotten a bunch of flags, it was getting sorted between comments that had a score of one or two, which is the exact opposite of what I want. So that was... That was a rough one. There's something especially dispiriting about these single character bugs, where it's literally there is a missing backslash. dzwdz actually i wonder if mariadb has a function for this
So let's see if there are any other instances, because this query, there are slight variations of it.

01:27:02dzwdz turning an integer into a short
And ag is doing something totally weird.

...12So I was tinkering with my Vim setup and I clearly got some kind of weird mode on ag setup where it was dropping my argument and we are finding a not very useful thing with this lever. Yeah, I can't actually change focus out of that window. That's no bueno. I will have to unbreak that later. dzwdz i think it does? https://mariadb.com/kb/en/conve…
So let's go over here and run ag. All right, so we only have a couple of instances. We have that one, we have the migration, the bitpacking spec, which is literally a test that should have caught it, and that's it.

01:28:08dzwdz ignore me losing the badge for a sec that's on me
You wonder if MariaDB has a function for turning an integer into a short. dzwdz BINARY(2)
Let's take a look at this. I'm not sure what this function is going to be.

...39I saw this function when I was implementing this. And I think there is a reason I used char instead. So char n.

01:29:13dzwdz oh wow now i have two badges
I don't remember what the distinction was between casting and using char.

...27Hmm. Yeah, Dizzy, you do now also have like a little, looks like a Cherry MX keycap. I don't know what the GLHF pledge is. I mean, good luck, have fun, but that's not a channel thing, and I'm pretty ignorant of Twitch. All right. dzwdz chat seems quiet today
dzwdz except me
And so the reason I didn't catch these earlier is the specs are testing something slightly different. These say char 0 using binary, and I used what I thought was a slightly shorter. Yeah, chat actually is really quiet. It could be that people are doing their actual jobs because we're in the US work week, but. And so I didn't catch. That when I change to the shorter format. dzwdz also: re CHAR: the docs for CONVERT said that CHAR doesn't do the padding
That's kind of painful.

01:30:47And so these are a couple of golden master tests. Docks for convert said that char doesn't do the padding oh yeah it could be the convert pads on the wrong side for us I don't recall, so these golden master tests didn't catch it because it's not like I did this math by hand, and so I just enshrine the wrong values. and I used three different things where the output should be low for a high value comment. dzwdz okay if it's the wrong endian it's hilarious lemme check
Cause we've got a sort Lux your graphically middle for a zero. If I had had an extra specking here, it, that, you know, it is slightly below middle for a lightly flagged comment that might've popped out, but it didn't. And so these didn't touch that bug.

01:31:52And probably all of these should use the same syntax. So I want to say that. All right.

01:32:21The docs for convert said that car doesn't do padding. If it's wrong ending and it's hilarious, yeah, that might be the kind of thing that showed up. All right. So I've changed a couple of things. Let's see if the specs pass. And I think they roughly will, because again, I tried to treat confidence as a black box rather than care about specific values that come out of it. These two fails, maybe four fails, maybe a golden master test or two. Numerical argument is out of date, square root. Oh, a divide by zero snuck in. Or I'm passing a negative along the way. Yeah, so this is a PR1308 that touched off a lot of this deep dive last week actually touched on the bug. It's odd that it's off by 42. What is the hex value for 42?

01:33:592a? That doesn't pop out at me.

01:34:07Seeing 154 at the end. dzwdz yes no this is to be expected
All of the bytes changed? No, just the first two. Right, because the confidence is different. dzwdz and "yes no" is a confusing way to start a sentence
So that one is fine-ish, assuming those values are correct. Replying comment is not listed when it's on a story with a negative score. That one's surprising, and I'll come back to it.

...43Yes, no is a confusing way to start a sentence. This is a quirk of native English speakers, and

01:35:03There are a couple of these I saw somebody make a chart. Where. And I think it's also a as one of the things there might also be kind of Midwestern. That saying yeah no or no yeah is a way of. dzwdz guess i pass for a native speaker
Acknowledging what someone said where. you are like superficially agreeing with somebody else's binary statement and then going on to disagree it's it's yeah i'm from the midwest i talk like a midwestern occasionally let's figure out what's up with this square root this horrific equation is occasionally giving zero or negative numbers math domain error hmm

01:36:12dzwdz oh hey i got that too
dzwdz i "just" had to catch exceptions
So if I didn't break any bit packing stuff, I'm going to go ahead and... No, I did, but it's just the one that we want the new golden master values for. So instead of that, it's getting 132, 154. I'm okay with replacing that. And then elsewhere, we're clear about the ID byte and things. So that's okay. Let's run that real quick. Just make sure I didn't typo. Yeah. And then let's close that.

01:37:02dzwdz brb
So once again, we're back to calculated confidence, but now with it, a fairly horrific test failure that this square root is catching a negative number and i say horrific because there are what one two there are three inputs but there are like nine functions here and so this way of breaking out the code by where it physically is in the equation is a little bit concerning. You know what? This is probably one more place where this upvotes equals self.score is probably just flat wrong. Let's grab the Reddit implementation again. Because we can compare against that, and that's probably where the The bad value is coming in because score can certainly be negative. dzwdz back, did we figure out why sqrt is breaking yet
So I think what I would like to do is back out from where they have ups versus downs. I would like to back out to that from the values we have rather than implicitly do it. The number of upvotes is the score minus the number of flags. And the downs is the number of flags. Yes. And then here, we can just say that we're going into float times And then this comment can go away because what the code is actually doing is pretty clear. So this is just wrong. And this I called it ups. And that might be enough to fix this bug or these test failures specifically.

01:39:28Didn't mean for that alarm to be on. I have a little timer near here just to kind of... help me keep an eye on stream progress. And so I reset it for an hour or so, but usually I turn that off. I just keep it in the corner of my eye.

...51Oh, numerical argument is still out of domain. So Deezy, we did not yet figure it out. What I thought would be most reasonable was the P equals ups jump is score jumped out at me as we don't actually have upvotes. So I was passing the wrong number there. It might've been as simple as that. dzwdz i looked at the actual formula and there's no reason it would be ever negative
arh68 what is N supposed to represent here ?
So let me restore ups and downs, which are, we are matching the Reddit implementation by backing these numbers, how we cache. What is N supposed to represent here? N is the number of user opinions. dzwdz you could assert 0 <= p <= 1
So in the statistic sense of we have N samples, we have each upvote or downvote, and downvote in our context is a flag, is one of the N.

01:41:02dzwdz i think this is the only thing that can be failing here
I could assert that 0, I don't know that it's p that is the part of this formula that's wrong.

...16Let's take a look. I mean, that's worth a shot.

...41arh68 so score already counts flags? as like a positive component ?
dzwdz @arh68 as a negative one
dzwdz score = upvotes - flags
So score already counts flags. Yes, score is a total memoization. It was intended to be exactly what the score is printed on the page. And then flags is how many times has something been flagged because we care about that as more come in for hitting the flagable min score.

01:42:07arh68 so we're like double subtracting ? lemme reread it
Oh, DZ, good guess.

...15So if p is less than 1, we're double subtracting. So I'm also kind of looking back and forth. dzwdz arh is right
dzwdz ups = score + flags
dzwdz wait no
p plus 1, 1 divided by 2n times z times c. I mean, I dzwdz nvm
Ups is score plus flags? No, it's not. Ups is score minus flags. I can increase font size if you need, just say so. The only difference I'm seeing is this set of parentheses, but that's just a safe, let's make sure we're doing order of operations correctly. If that affects anything, I'm going to die of shock. Yeah. So let's keep those there, because I do like that level of clarity.

01:43:33dzwdz wrong range
dzwdz p < 0, not p < 1
The other possibility here, besides... P is less than 0. You're right. I incorrectly translated your little double predicate there. We're still there. Well, let's just see what p is. Let's see if we are a little out or a lot out.

01:44:13dzwdz can you show the entire function on screen?
minus four. So that seems like a math error. I'm also suspicious of it being a round number. Yeah, I can show the whole function on screen. That makes me think that we have accidentally been doing integer math at some point. So let's change this to be more explicit. This is fine. Has to be a float. P, we say it has to be a float. And it should be anyways, because dividing by n, which is a float, would force it. So all of these should be floats. dzwdz this looks very weird
It is super suspicious that exactly 4 came out of that.

01:45:15dzwdz n = ups + downs = score - flags + flags = score
What value would I have to be putting in here for n and z to get 4? Actually, how am I getting a negative number? n equals ups plus downs equals score minus flags plus flags equals score.

...46dzwdz i think ups = score + flags
You think scores. Oh yes, you're right. Upvotes must be score plus flat. Yeah. Yeah, man. Okay. And the reason it was 4.0 is because all of these tests tend to deal with small numbers where there is one flag or one vote or two votes. And so all of the inputs were going to be one and two. All right. let's see if the whole suite wants to run this point i'm expecting everything except that one really odd one. dzwdz i think we did actually
We didn't address that other one and i'm i'm not seeing the connection between that and this unless something in the story scoring was catching the square root exception.

01:46:46dzwdz nevermind
You think we did actually what? DZ, I would have really appreciated if you could try to include more nouns because sometimes I don't see chat for a second. And also the delay between when I speak and when you see it is pretty variable. Sometimes it's as low as two or three seconds. On some streams, it's as high as 20 seconds. And so it's really hard to know what you're referring to implicitly. Let's go look at this spec that's failing.

01:47:21I would guess this was accidentally depending on slightly bad data. So let's drop down to 104. So replying comment I would have expected a error in the other direction replying comment tries to not show you replies under certain circumstances where there are lots of flags happening or. You are maybe getting into a back and forth with someone that's escalating in a bad way. I wonder if what's happening here is. this function is a little more thorough or one of these ifs at the beginning is allowing a story to appear because it's saying, let's see, so the story is fine. We don't expect reply R. R is going to be, so this reply to is a helper to create another comment, right? So it is visible on a story with a negative score. So this is trying to express that If a story is flagged, we do not see replies on it. It takes a lot for a story to hit a negative score.

01:49:30This is one of those places where with voting, It might have been nicer to model comment vote and story vote as separate, but they're in the same table.

...48It's not something I wanna try and to normalize here. I wonder if this was falsely passing because the wrong number was coming out of calculated confidence.

01:50:14I closed the terminal. Why did I do that?

...28one and zero so yeah the replying comment should have been visible and should previously have been visible this code is redundant now although this would never fire because of float issues. Well, let's check that. arh68 so is `p` a story, or a comment ? i can't tell
Yeah, but how about... All right, let's just trust float then. So P is a story or a comment, you can't tell. P is the parent comment. And R is the reply. arh68 so we're voting on p.story_id? why we passin that
Let's put that aside so I have a little more breathing room. I'll make this tall. And then I'll go ahead and open it up a second time. So I can get some of these factory methods on screen.

01:51:54So this is saying

01:52:13Let's see if that passes. It should. That's what the spec description claims? No. So the bug is actually up here.

...35trying to say that the story has a negative score this looks like a the spec was badly edited previously and broken and then falsely passing why are we passing story id because we're voting on the story this is trying to knock the story down to a negative score and the spec happens to know that update score and recalculate is going to get called on the story at some point by the probably by the creation of other comments and so if we had just reached into the database to change score the test would become very brittle as that test data would get overwritten by are there actually flags in the database or not yeah so this is trying to say the story score should be negative And if that's not passing, that's going to be the source of the issue here. It's not passing. Expected 1 to be less than 0. Why isn't it counting this?

01:53:50Well, I know what that value is. It's the 1.

...58Why? Hmm. So that should be calling the updateScore and recalculate method on story twice. Let's go look at that.

01:54:17And then that should go and count score. This hiding stuff, we won't get into that because it's the hiding code. This is all related to the hiding code. Okay, that's... Because the users have to also hide to... All right. So previously, what, three weeks ago on stream, I changed the way flags and hides upvote, and this test was falsely passing since then because...

01:55:10I want to track all of this freaking state.

...33Because this test is coupled so closely to the database, I have to do a ton of dzwdz since we've got sql on screen i'll mention that i've checked what CONVERT(1, BINARY(2)) does
dzwdz it returns "1\0"
setup for it and now it's getting kind of burdensome but this should have broken weeks ago and didn't because of that bad edit related to what was getting memorized convert one to binary two does okay so it's either big endian or it's putting it on the wrong place but dzwdz notice how that's an ascii 1
Yeah, that's painful and you've figured out why I didn't do that. What would be really good is if you could figure out a good place to document that so that nobody comes back and looks at that again.

01:56:27So...

...34Now does this spec... Oh, pass. What are you mad about? What syntax error?

...48Oh, you expect this to say story?

...57Notice how that's an ASCII one. Oh, yeah, that's especially... not what was desired why is the score not going negative update score and recalculate should have fired come here wrap less so there should be a hidden story Created recently where the user flagged and didn't comment.

01:57:47I am tempted to exit this spec, just comment it out for now and come back to it because this is A broken test from the other thing, and since it's kind of a rabbit hole here, I'm going to just go ahead and do that if I had gotten it in a minute.

01:58:25I'd rather come back to it and focus on it than keep kicking it around. Because it's correctly failing now.

...50need to see the whole suite run because it's been a minute and i expect the whole suite to run especially because i marked that spec as pending this almost certainly will be all green dots but i'm mostly running it to make sure i didn't like typo and break the code in some subtle way accidentally hit dd on a line when i was trying to scroll fine so we were at

01:59:24fixing this lpad, and we fixed it. I'm going to sort these again a little different. So this is just another arithmetic error. And either I already corrected this instance of it, or it wasn't incorrect.

...56Yeah, Dizzy, I think... you are correct that it should be 2 to the 16th minus, and I didn't have 2 to the 16th minus 1 in here as the value. So that part is fine. So that, we're down to only 7 out of 5 bugs.

02:00:26Yeah. So there's this scaling in here of this dzwdz wait, can you repeat that? because that didn't sound correct
dzwdz the 65536 thing
minus 0.2 ah this whole stream i've said 0.2 instead of minus 0.2 i misremembered this was me scaling for values i saw in the confidence column in the production database and so if i grab the min and the max and this is going to be like 1.3 because one of the other changes to confidence value. 1.38. You were asking if this value should be 65, 536 or five. I believe six is the correct value. And you had said you did a couple of tests. I can scroll chat up a little.

02:01:32If I subtract, it's not monotonic. For a confidence of 0, I would get all 0 instead of all f. So I was going to work this. I was going to pull this confidence stuff out, or the this minus 0.2, 0.2 clamping out, because it's dead code now. dzwdz if nothing is subtracted, you're left with 0x10000 which is three bytes long
So if we have the confidence, so confidence is going to be a value like 0.8 for a highly voted comment. Let's multiply it to put it into the range.

02:02:27Yeah, if I'm at 0 flat,

...36You're right. You're right. That is a bug. I wrote it backwards here. Or I wrote it correctly and then we've jumped through enough things that I misread on the second read. Where else did that value appear? In the bitpacking spec. Which would be a great place for a test, wouldn't it?

02:03:27Let's put this one last because it's.

...45dzwdz are those 65530 now
Yes, it's a typo. Good reading. I must have just. I must have somehow hit it wrong on the first one, and then I used period to repeat my command in vim, so I propagated the typo. So this one is actually going to be hard to test with scaling.

02:04:12So scaling is the other bug, and it just needs to get ripped out of all of these. It's not a bug, but it was trying to work around a previous bug, and I didn't catch that that bug existed.

...41So all these need just some string editing to drop that. This one was kind of frustrating to see because I had if I had thought about what confidence was instead of treating it as a black box, I would have realized that there was a fairly. These are going to change. This isn't a zero score comment, this is a zero confidence comment.

02:05:27It's a little concerning. I think I'm going to struggle to get these. Yeah, so a low for a heavily flagged comment. So this is the one that's going to be close to zero. There's no way I'm going to get these kinds of values out again. let's hold on let's check this so if i have a comment with a score of 10 and flags of zero and i ask what's your confidence what's your calculated confidence

02:06:30I get a nice high value. And if I say way more upvotes, I get a very high value. That's good. And if I say the score is bad, like minus two, because you have three flags, we get a very low number. That's good. And if I say the score is minus 10, because you have 11 flags, you get a very small value. So these numbers coming out of calculated confidence are directionally correct. And so the kind of value that I would see for a comment with one flag, 0.164, all right. And all of these specs are going to arh68 the test could instead like, order the 4 kinds o comment
fail because there is no way i am guessing these correct output values and that's one of the dangers of a golden master test so you have to have a lot of spare brain power to think about what all of these different values mean or it's very easy to get a false sense of confidence out of them a false pass all right which one of you has the wrong number of parentheses first and third arh68 but ya byte matching pretty specific
Test could instead order the four kinds of comments.

02:08:05Yeah, I think one of the things that I'm going to do for a comment test is write some tests of calculated confidence. Because if I pull up the comment spec, There is literally one test that implicitly mentions confidence, and that's confidence order path. Calculated confidence has never had any specs on it. Oh, I wanted to highlight, we had the order of operations bug.

...46The one that Deezy caught here with left. This was in the code from the very first version it was implemented in. The parentheses were always in the wrong spot. arh68 LUL i mean i assume it's intentional at that point
So this slightly changed the range of the output from 0 to 1 to, what was that, minus 0.2 to plus 1.2. It's intentional at that point? No, no. I do think it was always a bug. And it's a little frustrating when you see those kinds of, oh, this is a 12-year-old bug. dzwdz it worked surprisingly well fwiw
But that was.

02:09:35So rather than do bbytes, it worked surprisingly well. Yeah, in part because that particular error dzwdz i wish i understood the formula well enough to understand why
just sort of broadened the output range, or it had the effect of basically broadening the output range rather than reordering comments. It was lucky in that way. So as long as I'm breaking these specs, I was scrolling down to peek at these and lost my train of thought here. There we are. By root had a nicer way of checking this by saying, well, let's expect the dot bytes coming back. which standard wants to, bam, don't fight me. We can test as an array, which gives nicer outputs. I didn't want to have that code churn of touching it in all of these tests, but as long as I am touching all of these tests and getting syntax errors, come on. Making easy syntax errors. It's funny, I slept well last night. Usually when I start doing that kind of thing, I think I'm tired.

02:11:22You say you wish you understood the formula enough to understand why. I think it's just because p was in the denominator and so it has the effect of scaling things.

...36I have an error in my SQL syntax. That's going to be too many closed parentheses. So this one, this one. Let's just run this one individual spec and look at it.

02:12:04So this is the This one is spurious. dzwdz it only affected one part of the sum though, so it didn't just scale the result
I was just kind of counting the number of parentheses, and I was like, all right, there's definitely one extra close, then open. So this, if I run this one spec, it should at least be, it'll be the wrong value, but it wouldn't be a syntaxer using binary two. So take the output of char.

...50What am I getting wrong here?

02:13:07I removed the wrong parentheses. That's what it is. It should have been this one. It's char using binary to get the right char set, right?

...26Good.

...37So let's run those four specs again. It's not dash n. I don't know why. It must have been dash n with mini test, and I imprinted it like a baby bird, and now I never remember that it's dash e in our spec.

02:14:03This one, I'm putting in the wrong value here.

...18A couple more Fs. I mean, I'm taking an F on arithmetic. Why not these? All right, so that's good.

...37And this one was 254, 4351.

...44Zero score comment. We're alone. We had 531. Zero score comment. You know, it's not middle. Because a zero score comment is actually pretty negative. would just say for a low score commented because what i really want to test is that i'm seeing reasonable values all of these let's go ahead and run those again we should just see this one spec failing yeah 137.55 i just kind of wanted to see something roughly in the middle of the byte range that's good

02:15:57see if the whole sweet screen i think there might have been one more spec failing in bit packing that we didn't touch because i focused in to just have those with the bytes 154 changed to 153. yeah that's the kind of fiddly the floating point math error our floating point math very slightly changed Or I typoed. 149. Yeah. So rather than 154, it changed to 153. I don't even want to think about that one. But I should. So this is the initial value. And by default, you should be roughly in the middle. That seems fine. Good. So one of the reasons I wanted to drive through all of the bugs that Deezy found, and then I guess the extra bonus bugs that we found and tracked in the joy of basic line numbers was to see how much, how big the diff was. Is this going to be shotgun surgery where I'm touching a ton of code points? Or is this going to be just confidence in two or three things? And this has actually landed much smaller than I thought. So there were the many fixes we made in the formula here. And then one small fix in... I didn't remove that clamping.

02:18:00Then I have a point.

...35Kind of eyeballing these parentheses so I don't have to play whack-a-mole with syntax errors again. This one's here. This one's here. That one's there. And then that one's the first one. Okay, I think that's reasonable. We'll let the specs tell us if I avoided them. So I was wondering how big this syntax error, something, something failed. If it's that 153 going back to 154, I'm going to scream at the sky a little. Oh, it's that same spec. Because I fixed the scaling, or I removed the scaling that was here. So that's actually, I could have predicted that if I was thinking ahead. And so it said the new default values are 159 and 30. I scrolled up too far, and I got the wrong syntax on the output. There we go. So the suite is green. And the diff is not so bad. A lot of the length is trying to be more explicit about this so that the differences between this and the Reddit implementation popped out a little bit more. One of my concerns with this that came out of seeing DZ's very big bug report was there were so many bugs that I was wondering if Confidence Order Path was now too clever. And so I wanted to try fixing these bugs to see, do we get a small-ish diff out or do we get something sprawling? And getting something small-ish or at least right-sized, you know, the Goldilocks diff, There we go, there's a stream title.

02:21:07Is that... I was wondering if this confidence was really worth it at all. pushcx https://github.com/lobsters/lob…
And I left a longer comment about this on DZ's bug. which I am going to throw into the chat again, just in case anybody who's joined in the last hour or so that I've been bug fixing hasn't seen it. But I kind of mused in there, does this point to all of this stuff with confidence being overkill? And even though I could fix confidence and I have We've worked through it in enough detail and we've seen enough and we had those ideas for, oh, I could add a couple more tests to comment to make sure that we're seeing smallish numbers for low scores. Basically this confidence order, but tested at the level of comment where things are in the range zero to one.

02:22:18What I had informally tested at the Rails console It had me wondering just how valuable is confidence? Is it really giving us more than sorting by score? So this Evan Miller article, which if you weren't online in early 2009, this article was so dang influential because basically everybody did what he describes as wrong solution number one or wrong solution number two. These were pretty ubiquitous. And I almost am tempted to go back to what he describes as wrong solution one. dzwdz notice how we don't have comments with 600 upvotes and 400 flags
Because if the score is positive ratings versus minus negative ratings, yeah, it's more than just having 600 upvotes and 400 flags. Which it's very strange to me now looking at Reddit, because when I joined Reddit, oh, forever ago, I would have to look at my, we can look at my user account. Does it have the join date or maybe it's in the JSON? Redditor for eight years. Oh, this is my second one. I originally started under my maiden name. I was using that as a username for a while before I settled on PushCX. So yeah, 18 years ago. There we go. When did... Oh, December 28. I must have done it on Christmas break. This seems like a filler time. There is no way I joined exactly zero seconds after UTC midnight. I was at a... a meetup at the Google Chicago office, they had exactly two developers because for a long time, Google didn't want to hire outside of the Google Plex for developers. They had a very, a very common Silicon Valley arrogance that all the best coders have already moved to San Francisco because they pay more, things are more challenging. And so Google didn't want to hire anywhere outside of Where were they? Wherever their first headquarters is. I can't recall the exact city name offhand. And the more charitable version of that is that they felt that they could lure all those best coders there themselves, and doing so would allow them to have a more cohesive engineering culture. But then at some point they admitted they scaled out of it, or there were a couple of exceptional coders that they wanted to hire. and in chicago they wanted to hire i think it's brian fitzpatrick and ben's colin sussman and both of them were like again this is a mid-2000s things they were especially famous for their work in open source at the time and so google very quietly let them be the two exceptions in an otherwise sales office and then opened a larger office where they were going to start hiring and so they held a an engineering kind of meetup there. And I ran into Aaron Swartz, who's one of the three Reddit co-founders. And he was like, you should check out this site I'm working on. And eventually it just became a habit to visit. dzwdz rip aaron
And I don't know, six months later, I finally signed up for a comment or to upvote some story, which is a very, very long and rambling way of saying it blows my mind looking at the Reddit homepage now and regularly seeing comments and stories with tens of thousands, hundreds of thousands of votes. Yeah, it is tragic that we lost Aaron. And lobsters is nowhere near that scale, deliberately. But more than not being at that scale, the codebase originally started with upvotes and downvotes. But programmers really love to disagree with each other. And something very wonderful can happen when people disagree productively, where they're trying to understand each other better and learn from each other's different experience or figuring out, have we seen different things? Do we weight our experiences differently? Do we judge these risks to be more concerning or less concerning? What is it that... has caused us to have different opinions on, I don't know, C++ versus Rust. And it's very possible to have those kind of deep conversations and productive disagreements. But when there are downvotes or thumbs up and thumbs down, it's very tempting to just click downvote on someone you don't necessarily that you disagree with or just dislike or just don't want to deal with and then move on. And that's fine individually, but collectively what happens is the submitter says, oh, well, I put all this time and thought into a comment i wrote 300 words maybe i should make a blog post out of that and then they go look at it and it has a score of zero which is especially true on a small young site like lobsters was for its first couple of years and that feeling sucks that's like oh oh my opinion is worth zero well i guess i'm not going to leave another long comment again and we kind of have seen this effect on reddit where some of this is lowest common denominator and humor humor is the lowest common denominator online but there's not a lot of point in writing a substantive comment if you are going to get a very shallow response and so lobsters has changed over the years to remove the downvote button that it used to have the there's a little fossil in the css if we inspect this This is called the voters, plural, even though there is only one voting control. This is called the upvoter because there used to be a downvoter. This stuff is still there. You know, it's just beneath the surface. And flags. So downvote moved over to the right here and it became the flag button with the menu, which used to be separate. Or maybe it was just on stories. I don't recall off the top of my head right now. And it's encouraged us to have really long, thoughtful, actually, it's funny. It's lucky. This story I happen to have where someone said, expressed a experience. They express their opinion based on their deep experience. And someone asked why honestly, and said, Hey, yeah, it mostly worked pretty well for us. And then they left a very long substantive comment as opposed to If Zodvik had come on and just said, well, that's not my experience, downvote, Pop-Tart is not going to leave this long, thoughtful comment. And if you look at the formula that comes up, so, you know, wrong solution one is sort just by score. But if you look at this comment and the number of downvotes is pretty much always zero, which is almost true in our code base. So this is, I'm using a data here that I dumped from production. That's what we've been seeing. And so that's why I'm not like running SSH before I poke the production database. So let's say... And then let's limit the scores, because this is gonna be very vivid with just a cap. dzwdz -31 holy shit
Where score less than or equal to 10. there's a very strong fall off that happens here, but what's even more vivid is if we instead look at the number of flags on comments minus 31 yeah that's. It is possible to get more flags than you do upvotes if people flag you and you don't hit flagable min because... And then people later come along and remove their upvotes. This is why I'm careful about worst of lists. I don't want to celebrate what these worst five comments are by finding their IDs on stream or anything.

02:32:11dzwdz that's at least 21 people that removed their upvote though, wow
so you we saw earlier that there were a little over half a million comments in the database and 484 000 of them have zero flags so for 484 000 of them that confidence function is just a really long way of saying how many upvotes do you have are we getting much out of that you know this what is this arh68 this could be a lookup table lol
arh68 small domain, it seems
10 of comments maybe because that's something like 60 000 out of 540 like are we really getting much by for that 10 of comments sorting them better this could be a lookup table that's well but confidence isn't just number of flags Small domain. Yeah, it is. There are not a lot of unique things. And so especially, like Deezy said, we do not have 600 upvotes and 400 flags. We have 600 upvotes and zero flags. And so I was wondering, like, is all of this stuff with confidence worth it? Do I have confidence in confidence? Are we getting any signal out of here? If there are only 10% of comments that would be sorted differently. Should we just remove the complexity and use score directly? Which is another way of saying, do we want to directly implement the thing that is very explicitly labeled wrong solution number one in this highly influential 15 year old post? There's a benefit to having simpler code. So many of these bugs would have, well, you know what? arh68 worse is better ? HahaHide
Five out of our four bugs. Oh, didn't mean to minimize that. Five out of our four bugs would have disappeared. Hmm. Actually, I think that's a pretty good title.

02:34:33If you ever hear me ramble on and say something that is especially pithy or funny out of context like these, please do suggest titles for these streams. dzwdz lmao that's a good one
And I am totally stealing this bit from, I think it's Money Stuff, is the column by Matt Levine, which if you're interested in finance and business, he's very funny about those topics in a way that usually they are painfully dry.

02:35:06So that got me to, do we just wanna use score? And I look back at that, the diff. Do I feel good about these changes? What is the risk that we touch calculated confidence at some point in the future and break it? Oh, that's so high. So many of these edits were really painful and we had to add like five what one, two, and then previously we had like three or four more assertions, depending on how you count, whether you think of the range as one or two. dzwdz i'm wondering what would be a good way to quantify the change in sorted order
Oh, and then, you know, more up here, even adding all of this guard stuff, this is brittle code and it has tests that don't give us high confidence because a golden master test like these, They tell you when something changed, but they don't tell you that the value is good in the first place and the value was not good in the first place. So a bunch of these bugs slipped in. And whenever I have bugs, I like to stop and think like what test could I have written that would have helped me find this bug immediately to never have written it in the first place. And there are so few seams and the outputs of these are dzwdz you also still haven't mentioned the major benefit of freeing up a byte
dzwdz for a two byte id
opaque numbers as opposed to the score is three the score is two the score is minus five like that's that's very obvious the score is oh no it's the test would be the confidence is 0.142 or it would be you know these two random bytes i almost can't write tests that give me more trust in the code

02:37:10Yeah, DZ brings in another point. Yeah, I made this in your issue, but there is another benefit to if we don't do confidence. Although at this point, we could get that byte back. So confidence is two bytes. And you can see it here as we transform the 0 to 1 number into 0 to 65, 535. And the reason it gets 16 bits, even though there are so few values, ARH, as you notice, this could basically be a lookup table. If there are so few unique values for score, occasionally we have to tie break. And the third byte on confidence order is the low byte of the comment ID. And it's the low byte to try to say, well, let's tie break in favor of the earlier comment. But we don't really need to worry too much about it being perfect and a little rollover doesn't hurt. albynton It would be interesting as well to see which proportion of top-level comments would have their score changed with a simpler formula, since child comments aren't ranked this way (if I got it right)
And the reason it doesn't hurt so bad is we see roughly one byte worth of comments posted every day as seen on the previous stream. Like we're around 200 to 250 comments most days. dzwdz score wouldn't be changed
dzwdz only the relative position to other comments
So rollover doesn't happen most days and it would have to happen on sibling comments that have identical confidence, identical score.

02:39:00albinton you have a really good comment i'll get to in a moment and confidence order is only three bytes wide because as i said it becomes confidence order path which is tracing through an array that kind of traces through all of the direct parents of a comment. And if that becomes too long, we lose the performance benefits of all of this because the row becomes so wide that MariaDB is internally switching over to some other strategy or having to allocate more RAM to sort these queries. And so it blows up. So we really would love to keep it in this two to three byte kind of range. So, Albinton, you say it would be interesting to see what proportion of top-level comments would have their score changed with a simpler formula, since child comments aren't ranked this way. What's happening is all comments are ranked this way. However, they are effectively ranked only among their siblings. So all of these top level comments are ranked together, in this case, Federico and Pop-Tarts. And then if Pop-Tarts comment here had multiple replies, those replies would also be sorted by confidence.

02:40:35so a lot of the complexity of this is having sorting that happens at each level of depth and if you've cut your teeth on sql you know that sql and trees don't mix very well and that's why there's all of this very complicated stuff smooshing things into strings and building up the array recursively we never landed on the comment sorting query but it leans on this. So there is a bug lurking here. albynton Oh, I see, i missed the recursive part. Makes sense
If we get the same confidence values for two comments, which is very common, especially around low scores, we have to lean on this tiebreaker. And a problem when we lean on this tiebreaker is that this is the path to each comment. We are using the score to sort, not the ID to sort. And we're trying to get these comments out nested in a tree order, sorted at each level of depth by their confidence. So let's say Pop-Tart had a sibling comment here. Let's see. No, let's say Zodvik had a sibling comment here. And I'm going to go ahead and say that this comment has a score of two and zero flags. That's a really common value. If we had another comment right here that also had a score of two and zero flags, it would have the exact same confidence value. And I hope we've spent enough time beating up calculated confidence that that's obvious. And that's where that third byte, the ID comes in. However, we're looking at the low byte and it can roll over. Comments are open for I think 90 days on stories. So someone who comes back roughly this time tomorrow and leaves a comment that gets one vote actually has pretty good odds of producing a story or comment with a duplicate value to Zodvik's comment. And then when Pop-Tarts comes along to reply, it's not clear which comment is its parent anymore. And it is possible to see when there's that kind of collision, it is possible to see child comments get sorted under the wrong parent. And that's very bad. And confidence order has already had a whole bunch of edits to try to minimize that collision chance. That's exactly what this was for. And the math I removed with that minus 0.2 and times 1.2, that was attempting to say, oh, you know, I've seen that there are a bunch of duplicate confidence values. but I don't totally understand why I'm seeing that. Because again, I didn't twig to the fact that that was telling me that confidence pulled out incorrect values. It should have responded always with the domain of 0.1, right? And so I said, well, let's use our full range as best as possible to try to avoid collisions. And that helped, but it didn't get rid of it. And so at this point, With the modifications, when I first released this tree code, I want to say it happened once a day. That was pretty lousy. Very confusing. A lot of bug reports. And they're very frustrating bugs because they vanish the moment somebody votes on one of the parent comments. And it's weird to see a child move because a parent got voted on. It's very unintuitive. So one of the benefits of using score instead of confidence would be, well, what if that part was one byte instead of two? That would mean instead of having one byte of ID, if I kept confidence three bytes wide or confidence order three bytes wide, instead of having two bytes of confidence and one byte of ID, I could flip that. have one byte of score and two bytes of ID. And if there are... Where's my keystroke? If there are roughly 250 comments a day, and it takes that long to roll over, it is roughly 262 days before we roll over.

02:45:57but stories are only commentable for 90 days. And so we have gotten rid of the possibility of collisions and no child will be left behind. No child will be sorted under the wrong parent because the parent had a sibling that also had one upvote or two upvotes. dzwdz this would be /fun/ to debug if someone used this codebase to run another much larger site
Wouldn't that be nice? And when I originally explained this in DZ's issue... Oh, this would be brutal to debug if someone used this to run a much larger site, DZ. Yes, it would be. arh68 a larger site would dump this work client-side LUL
That's why there's this like giant comment and like, oh, there's like a race condition. There's so many things that are like, well, it's good that it's not just us using this. So I had said... arh68 ya let's not speak of the new reddit Kappa
larger site would do this work client side arh yeah that's why you saw me pull up old.reddit.com new.reddit.com takes like 30 seconds to load a page and even then it only loads like a third of the comments if that single page apps so I was thinking about it since that comment of what if I got rid of, yeah, and you know, I want to be careful there because I don't mean to talk smack about especially Reddit and HN because lobsters looks very similar to them and frequently gets compared to them. And almost every problem they have is a problem of scale. You know, on any metric you care to use, Hacker News is a thousand times bigger and larger than ours. They get, you know, 2,000 comments a day, 20,000 comments a day, not 200. That causes so many issues, and I don't mean technical issues. I mean, especially social and moderation issues. And then they have open signups. Lobsters is very much playing in the... The shallow end of the pool, by design, but it's almost any comparison between the sites is meaningless because everything works differently if you can stay at small scale. Single page apps are always kind of a foot gun though, so I feel okay criticizing that. So there's a third approach. Rather than leaving this as is, or replacing confidence with score, the reason I suggested that was I was like, well, if we look at the unique values of score, they roughly, you know, this bell curve is, if we chop the tails off, there are 256 values here, basically. That's fine. That could be one byte. But that is also true of confidence. dzwdz https://pastebin.com/y7E9hD46 btw
dzwdz ran some math assuming no flags
Instead of transforming confidence from the domain of 0 to 1, what if we transformed it to the domain of 0 to 255? Confidence is going to have duplicates anyways, especially at the values for scores like 1, 2. I could just have one byte for confidence, and then this gets back to two bytes for ID. That's not bad at all. That, I think, kind of squares the circle where... Deasy, I'm going to pull your paste bin up and compare it. But I think that gets us the small benefits of confidence for the situation where there are flags on stories. but then the significant benefit of not having duplicate comments.

02:50:35These are the unique values of, or is this collisions? dzwdz collisions yeah
256 last.

02:51:03dzwdz the displayed values are the first that would land in each bin
dzwdz so it's accurate for 0-22
dzwdz then you get 2 values per bin
The first that would land in each bin. Yeah. So you're kind of saying it's accurate for zero to 22. dzwdz then you have a bin with comments with scores of 59-68
Oh, I see what you're saying. You're saying what's the chance of collision given our typical range of scores. Yeah.

...31than you have been with contents of 59 to 68. Oh, that's clever. dzwdz then 210-418 but realistically that doesn't matter
I get it. dzwdz but is BIG
I think that's very nice. Yeah.

02:52:07So I think this is the direction I want to head. I'm going to back this off for a second. Because this is actually a good place to say, hey, Sprek. Maybe we should have a stream contest for Peter's best typo on stream. Sprek is a pretty good one today. I don't have that R move two characters over. This feels like a good place for a commit. I don't know if that came across on the mic. That was the cat yawning and snorting at the same time. Sir, you are an elegant, elegant beast.

02:53:06What is this about conflict?

...14Why do you think I have something named single apostrophe here? I must have accidentally saved a copy of a file. Shell quoting. There we go. So there we go. So it's roughly correct. I just want to add the spec that we talked about to comments.

02:54:25I almost wonder if I need these first two because they replicate things in the bitpacking spec, but I'm going to express a very different... Rather than do a golden master, I'm going to do a barely lax spec. So I'm going to say... And I'm calling comment.new rather than factory bot because I don't need this round trip to the database.

02:55:11dzwdz https://pastebin.com/fsckYv1B squaring the confidence gives you more resultion but i'm not sure what's the effect on flagged comments
So we're just going to have kind of generous error bars on that.

...48dzwdz accurate up to 30 and the largest bins up to 100 are 9 votes big
And then the really interesting spec here is the one I think it was ARH, you suggested this. Let's compare them. Let's say at the same score,

02:56:16that's a little extreme let's just say four which is that's darn unusual but it's not you know making my month so that gets us A little more confidence at the level of abstraction that we want about confidence. That's great. Nice to have that succeed. So let's go ahead and bring that in. And Dizzy, what's your bug number? Was that 1318? Yeah. In case you see the mouse jumping around a bunch, the cat is on the desk, and he likes to lean against something when he sleeps. I know the mouse is right there, so why not lean against the mouse?

02:57:24Do I want to be cute here? So let's see. Hold on, let's jump over here. There was... We did all of these, right? One, two... Yeah.

...44Fix. Yeah. So let's go ahead and say we're going to fix seven out of four bugs reported in 1318. One of the nice things about lobsters not being a commercial project is I can occasionally write silly messages.

02:58:21Do I want to get into describing them? Probably not.

...31dzwdz link to the stream, maybe?
dzwdz for the post stream notes
Probably not. This is going to need a link to the stream. Yeah, actually why not? I mean, it's not like it's going to be blocked by the, story submitter. So 24 or nine, nine, the blog is set up so that it wants to have the title in there, but it just uses the date otherwise. arh68 HahaThink lol it would be nice if you could pull up VODs in comments and commit messages
And it redirects correctly. That's fine.

02:59:10It'd be nice if I could pull up VODs and comments and commit messages. I am not going to make anybody sit and watch a three hour video to understand a bug fix. Oh, that would be cruel and unusual. dzwdz reportex btw
And that is part of why I don't want people submitting my stream to the site like dzwdz aw i could've only said it after you pushed
Yeah, it's interesting because it's kind of meta. And yeah, I talk about design issues or things. But report, did I typo that? Report, yep. Man, I'm so good at typos. The streams are really very low density compared to the things that exist on the site now. And that's kind of, the things I like about streams, DZ, you're already a VIP. Oh, you just want to make me push dash F. I like that about streams where they have that grit to them, that chaos of, well, there's some typos, and you can see the workflow. dzwdz would you actually have force pushed
It's real. It's unvarnished. But for discussing things, it's really helpful to have very polished and very thoughtful and very well would i have forced pushed for a single typo in a commit message no probably not it doesn't help that in open github pull requests of which there are several that we have not gotten to on stream because all of this ran a little longer than i expected anytime i force push to master the open pull requests get a line in them that it says, that tattles on me, that says push CX, force push to master, and it's there to make it clear that the pull request might be in a kind of confused state, but I just feel ratted out. Like, oh, I was getting away with something. Like, come on, let me just, yeah, I broke the build, but I fixed it before anybody saw, and I did a push dash F, and then, you know, all of the pull requests contain a record of my sins. No fun. dzwdz past tense as if there aren't two migrations still to do
Where are we at? Oh, this was just a nice idea. Yeah.

03:02:07So I said this was the third idea of what if we just shrank confidence to one byte and expanded ID to two. dzwdz you might want to link the two pastes i've sent?
There is, of course, a fourth path where we split the third byte between confidence and ID. But that's pretty painful. dzwdz oh right
Oh, DZ, if you haven't seen a stream archive, your two pastes will show up next to things in the transcript. I should pull them into the scratch. Ow. Cat, you do not get to attack my hand while I am streaming. He occasionally gets playful. So as I was moving the mouse, he reached out and put his paws and claws on me. That is not helpful. There's only once now that I've been streaming and he has retired to his favorite spot on top of the filing cabinet. And if he hangs out up there, I'm going to just put it down here just to have a place.

03:03:18arh68 i'm sure cats mourn the loss of CRTs
If he hangs out up there, I have a little cat cam for him that I can turn on, but he's only done it once.

...35All right. dzwdz oh hey that link begins with fsck
So let's grab back. Cats mourn the loss of CRTs. Yeah, you know, I only briefly had a cat as a kid back when I had a CRT. Fisk, oh yeah, good roll. I have occasionally, I think there was one, I don't remember what it was. But there was like one lobster story that generated a naughty term. And I want to say it was shit. I think in the short story ID, I want to say it had the substring SHIT. It's been years now. arh68 our very own scunthorpe? never knew
And someone submitted it as a bug report because the nanny program at work wouldn't let them load a URL with a curse in it? Yeah. Yeah, the Scunthorpe problem. That's...

03:04:43dzwdz you could run that query
I should throw that one into the link. That one's a fun one. dzwdz throw in other swears while you're at it
But basically, the substring matched a naughty filter. And I felt bad about it, so I re-rolled that story ID, because it was brand new. It was like 30 minutes or an hour. dzwdz there's probably a lot of fuck comments
Yeah, I could run that query, and there are various other swears and naughty words and slurs to throw in. There are nice little naughty lists online. But the odds of the collision, especially with only six character IDs, is pretty low. so i haven't and i haven't wanted to i know of actually i think all of the all of the sophisticated big techie kind of companies that generate urls that have you know long alphanumeric ids they all have their own internal naughty lists to avoid generating urls or ids with them it's especially problematic though if you're dzwdz no fun allowed
your slug that's the term i've always used for these if your slug that you generate includes a timestamp like you know it's a uuid 7 and then your base encoding to base 636 or something the real problem is when the timestamp portion of that generates a naughty word Well, then for whatever the resolution of that is, you know, which might be a full second, which is a lot in a big distributed system at this kind of scale. But then the scary part is will certainly eventually happen at the higher bytes. And so we'll be up for minutes, days, weeks. Then you have to throw out all of those IDs. That's painful. And so that's a... One of the things you want to do there is... I can't talk about it on stream. Yeah. There are other things you have to do to slug that are more sophisticated. albynton I imagine they only check for bad words in english?
arh68 isn't base32 more efficient than base64 anyway?
And I have nothing to disclose about how Stripe does it. But I have hassled many current Stripes to please write up how Stripe generates its IDs. arh68 base64 is a weird middleground
Stripe is very smart about it. Oh, I don't want to rabbit hole on base 32 being efficient versus base 64 in URLs. Yeah, there's UI concerns there and UX convenience when users might not be copying and pasting. Programmers are pretty good about copying and pasting and storing things in the database and users, non-technical users often end up reading each other URLs and reading each other IDs. And so if you have capitals like Pastebin does here, everything gets harder. So you really want to do like base 36. Well, really you want to do like base 34 because one and L and O and zero are just so painful. Anyway. Yeah, so I still have to do that migration. Actually, is there... There's a method in... No, there isn't here. There's a method in story called recalculateAllHotness that does the loop, but it's unsophisticated and should probably just get dropped at this point because It was written in the first weeks or months of the site when loading all stories and iterating over them was an affordable thing to do. But now that there are 100,000, it's a good way to exhaust RAM, especially because story is a very wide model.

03:09:06So DZ, you said there are two migrations. What's the other one?

...20dzwdz recalculating the score
This findEach method loads them 1,000 at a time.

...31dzwdz of deleted comments
Recalculating the score. Ah, no. score of deleted comments no it's fine this will hit all of them and by calling update score and recalculate it will update their score column based on the number of votes they have rather than what it was so this is just blowing away their score and their confidence

03:10:05is probably going to take a second to run so one thing you may have noticed at the left edge of my prompt there's always this number with s after it that is horrific prompt hackery for the number of seconds the previous command took to run so like here is five seconds for something that took a minute two seconds for running that generated so we will automatically have a time stamp of how long did computer's desktop happened to take to hit all, what did we say, 540,000 comments? Recalculate all of them. Look over at Htop. It's a shame it's single-threaded because it's imminently paralyzable. Huh, don't even see MariaDB here. It must be doing the initial query, which is going to be IO bound. So if you have any last questions or comments, now is the time to get them in because I have to get shot in, yeah, in about 5, 10 minutes. I've got to roll out of here and go get my updated flu and COVID shot. So I am not going to deploy on stream because I don't like the idea of deploying a migration that's going to touch a giant table when I'm not going to be sitting here to keep half an eye on it. mjiig My prompt has the current time, and I have confused myself several times assuming the time difference between two prompts is how long the command in between ran for
So probably it'll run in half an hour, 40 minutes when I'm back. arh68 no further questions HahaGingercat have fun y'all
And then all that other stuff that I had considered for topics, I'm going to have to punt off to Thursday.

03:11:48MJ, yeah, I used to have the time in my prompt like you currently do. I haven't, I haven't sanitized that bash RC for showing on stream. So I can't really pull up that prompt stuff safely, but there's a, basically when the prompt runs, it invokes a bash function. And I think it just shoves into a global environment variable, the timestamp. And then when, and then it compares against the previous one to print that time. I think that's where I ended up landing. There's so much nonsense for a simple prompt to get this Git thing working. And then there was some old Nix hackery I had in there. There's a lot. Boy, we may not even get to see this. I didn't think to do a print dot in there to show up progress. And I don't want to bounce it to stops. We may not get to see the end of this migration on stream. Probably the best place for me to close it out, if nobody's going to ask any more questions or come up with any more ideas, is to once again say thanks to DZ for the deep dive on this. It's been a lot of fun. And it's been an unexpected benefit of streaming. I did this to improve transparency on the site and getting so many code contributions from DZ and Beirut, but then also a bunch of other people have reported issues and submitted prs i think actually we've had a little bit of a traffic jam and two people have submitted prs to bump us up to rails 7.2 so i'm going to have to you know cut that baby in half somehow and merge them it's a a wonderful problem to have to have more contributions than i can get to on stream so thanks very much for the folks who have contributed whether that's prs and issues or whether that's hanging out and chatting because this has been a lot of fun And I do need a second to pick up before I head out the door. dzwdz seems like the two prs were made by the same person though
So I'm going to go ahead and hit stop streaming here. Thanks very much, everybody, for hanging out with me. dzwdz cya
Take care.