That method is entirely vibes

Streamed

Comment lines PR and open graph images PR. Account takeover spam. Advice on upgrading Rails from 5.2.1. Headlines model story merging: circular reference, hotness, migration. Audio and video tags getting fewer votes and comments.

scratch


topics
  PRs
    comment lines https://github.com/lobsters/lobsters/pull/1811
    open graph images https://github.com/lobsters/lobsters/pull/1812
  takeover pwned https://github.com/lobsters/lobsters/issues/1478
  headlines model
    migration
    run migration against a prod restore
    compare hotness with prod data bc fixed modifiers


story
  headline_id roll up under

headline
  story_id pointing at primary

life of a merge:
  story A submitted
  creates headline A
  story B is submitted
  create headline B
  B merged into A
  story B gets headline A
  headline B???
    don't want to show it ever
    but do want to keep that record so we can 302 the links to A
    ok, delete it
  if we then unmerge story B
    can create new headline B2

use cases:
  viewing headlines
    select from headlines.... where headline has stories
  submitting a new story
    persist to stories table
    create headline pointing at story
    update stories to point at headline as well
  merging stories
    update headline_id in stories
    maybe change primary story on old headline

  changing primary story
    update headlines.story_id
  unmerging stories


title

post-stream
    

Transcripts are generated with whisperx, so they mistranscribe basically every username and technical term. They're OK but not great, advice appreciated.

Recording



03:06graefchen Hewu do ~ limesHi
corbob zachmrLegoHYPE zachmrHype moreamHype zachmrPandaHype
I don't know what all those emojis mean core Bob, but how do you. All right, so let's see. So this is lobsters office hours. graefchen It's nearly night for me. limesSit while observing me cat that she does not do bad stuff. limesSit
This is lobsters on the left here and me getting the other the stream notes on the right stream. corbob They're really all just hype emotes LUL
I know we've got PRS for comment lines and. Ah, supervising the cat. Yeah, we'll see if this My stream supervisor decides to put in an appearance. Now that it's cold, he's a lot less likely to get up on the file and cabinet where he's visible to the webcam. But we'll see. No predicting them, right? What's the other PR? It is... Well, let's just go look.

04:06That's... Two kinds of highlight there right open graph images. graefchen That is me problem. Its cold. So she likes to hop onto the pc. limesHeck
I was hoping someone would take this one on. corbob ooooh.... it's working... things are working!!! (it's a little harrowing to have 5 minutes of just a blank screen though...)
So put these things in the. To do list.

...32Oh yeah. I learned about that one, the little. Korobob, you're talking about this? I have learned that it is normal to start your stream with a little title card like that because people who choose to follow you on Twitch get a notification, but then it often takes people a minute or two to notice the notification and join. And it also is just kind of part of my checklist at this point.

05:12I can type script that's.

...30corbob Sorry, I should have been clearer, I was talking about a system build I'm running. the screen went black about when you started the stream, and wasn't showing any progress at all
All right. So. Oh, a system build. are you talking about like flashing your bios or something that always feels kind of high stakes for me because of the risk that you have a power cut or something and you manage to break your computer a little bit of high stakes there or are you thinking more of the the black triangle method of progress where you're working on a low level gui stuff and You get nothing for a long time. All right. So this one. Yeah, so let's let's find the issue here for this.

06:34OK, so no. pushcx https://github.com/lobsters/lob…
No other activity besides these two I'm thinking of. So the gist of this one, I'll share the link here in the stream.

...50corbob Fortunately nothing that high stakes. Just building a VM with packer, and all I get is output saying it's starting things, but no idea when the output happened because why would you include a timestamp with your output moreamLUL
is that we have done a lot of layout work to comments over the last year and the nice little lines for indicating sibling comments have been kind of broken for like a couple of months when did i file this july yeah so five months that's a lot of time well because and i say five instead of four Because I know it took me a minute to file this bug. And the idea is it's supposed to guide the eyes between siblings where it's hard to see. Let's say this and this are at the same level without a little line to guide your eye, especially when there's a bunch of replies in between. So we don't need the lines when there's only a single child comment with no higher level sibling. OK. Does this level 1? The way they say that makes me think the top level comments would have lines, and they don't need to have that. So that's one thing to check on. Building a VM with Packer. I don't know what Packer is. I haven't done a lot of VM stuff. Is it some kind of container tool? Hacker DM. Why did I say DM? Container. Sure. Ah, it is a Docker tool. I think Docker and VM overlaid in my head, and that turned into DM.

08:45This doesn't say what it is. This also doesn't say what it is. Packer is a tool that lets you create identical machine images from multiple platforms from single source template. Oh, okay.

09:15corbob yeah, what that said LUL
Epic_Ninja_Elephant Oh hai.
Well, good luck with your containers. Ah, hey, elephant. So this one, if it has a subtree and not, it has the form in it for, okay. So this does need like another li comment subtree at the top. graefchen I like vm's. Moreso old chips or early virtual gaming consoles or chips that teach some low level stuff. limesSit
So I had looked at this when he first opened it, when they first opened it a couple days ago, I don't know their gender, but I really liked this for how much code was getting deleted. Yeah.

10:25Boston_Mass o/ Anyone here had an experience upgrading Rails versions? At work we have a few apps using Rails 5.2.1 and I am considering an upgrade to 8 but am wondering if it would be easier to just start over and copy in as needed
Let's grab this slide. ah boston mass yeah i have that experience upgrading rails versions and have on this app a couple of times and once you get past five they're all pretty easy i would just unless the app is well yeah if the app is tiny your strategy of recreating is fine but Epic_Ninja_Elephant I get grumpy when people who don't understand PXE and operating systems talk about containers.
otherwise no it would be better to step through the minor versions so i don't think there was a rails five three but i would go six six one seven seven one eight eight one and there's not going to be a ton of breakage because like five six well six seven were Pretty small releases. Eight is bigger, but it's mostly optional features. So I think you'll be okay.

11:32Elephant, do you spend a lot of time managing containers?

...51Boston_Mass I did check the update logs and it doesn't seem too bad to be fair. The huge issue is no tests whatsoever LUL
Epic_Ninja_Elephant I have avoided containers entirely. So far.
no tests whatsoever yeah boston mass that would be a good place to throw on a couple of high level integration tests you know request specs if you're using rspec and stuff like can you log in can you do the happy path and depending on your app size you might get away with I don't know, literally 10 tests for whatever your core functionality is, right? So whatever your God object is in the case of lobsters, where we all submit links and then write comments about it, I would test, I mean, everybody has to sign up login. So there's two. And then I would have creating a story, posting a top level comment, posting a reply, viewing the homepage. What does that get me up to, about six? Lobsters is not a giant Rails app. Boston_Mass Sounds like a good idea
So just whatever the top 10 absolutely core workflows are, because as long as you are diligent about breaks, right? Because you know you don't have complete test coverage. So when you see a failure that's like, This method that you tried to call doesn't exist. Well, don't just fix the one test. Figure out every place in the code base you tested it, or use that method. I mean, and if you and your workplace like automated, what is it, like LLM coding, getting them to write some request specs, can be a pretty fast way to get more than that kind of initial 10 and getting a little bit broader coverage. Because Golden Master tests are pretty straightforward to write, where it's look at what the app gives you now and assume that is correct. And so if it changes, you have a bug. And that kind of test is very fast to write. You know, even if it is only like 95% correct, where 5% of the time you see a bug and you enshrine it in a test, it's still so fast to write that it's worth it.

14:45Vanderbilt.

15:22Boston_Mass A lot wrong with the way this was built years ago. No test and no Dev DB :) Need to fix a few things prior to trying the upgrade but will definitely make some tests and try the upgrade path. Thanks!
ah yeah i mean test and dev db are pretty straightforward to add right that's you add one entry to your database yaml and off you go it sounds like you inherited something that was put together in a hurry so that might mean well i try not to read too much into these kind of signals, but sometimes that means the thing is especially value, and sometimes it means it's not particularly valuable. Boston_Mass It's the entire business hah
I don't know how to say here, but yeah, I definitely would say try to upgrade rather than redo from scratch because the upgrades are not too bad. You know, you joke that it's the entire business, but I think it's the whole world. Like, you can kind of half-ass just about everything and do fine. I don't know. It's getting into personal philosophy, but I think there's a lot of room to do just okay at things. I mean, you can look at this Lobster source code, and there is plenty of stuff like access control is just kind of handwritten if user, if moderator stuff shotgunned through the app, and we don't have a real clear separation that Boston_Mass For sure. This app has been chugging along for YEARS and making pretty good money. Just time to take it to the next level for growth plans ;)
controllers are for http concerns and we have a process or not a process a service object layer you know none of that kind of stuff that's okay it works fine nice yeah the the painful thing with upgrades is they force you to confront a whole lot of tech debt at once but you'll probably be fine

17:34Boston_Mass Happy to have found the stream though, will be lurking a lot
yeah well thanks for hanging out so i typically stream on monday afternoons that's monday afternoon u.s chicago time for about three hours from two to five and then on thursday mornings from nine to noon and if you are indeed in boston massachusetts just roll those times an hour earlier

18:06Boston_Mass In TX currently. CST as well
Oh, so this issue, I don't think anybody left any comments, but let's put it in the scratch because it's worth talking about. Ah, Texas. pushcx https://github.com/lobsters/lob…
So let's look at this PR that also shared the link. So I love when people do these before and afters. Don't love when Google is so aggressive about or Microsoft is so aggressive about that. So we just serve the site logo as our open graph image. And I put in a feature request to say, hey, let's grab. The existing one. And just kind of add our logo to it. This looks really promising. The logo could be a lot smaller, I think, but yeah, this is great. So let's go look at the issue. Yeah, I was trying to say, let's make our thing little. You love that ASCII art, right? I'm an artiste. Is this one the avatars?

19:35They think about avatar caching. Are they doing these live on request?

...49And then they have a couple of build failures, yeah. All right, so let's take a look at the code. I have not peeked at this one, but we've been talking about Ruby VIPs for... couple of years now because we have a avatar feature on the site you see everybody's got a little avatar if we click on a random person they have a bigger version we host those ourselves but we only proxy off of gravatar and it would be nice to just wholly take over the feature and when the site started FIPS, which is a pretty secure image processing library written by Google. I say pretty, but I mean very secure. And it was written to replace ImageMagick, which was the default everybody used. But unfortunately, it predates modern security coding practices and was not designed with hostile input in mind. Oh, that's a good fix. So we didn't want to do our own image processing at first, but now that vips is available, we could definitely do that. story image that's reasonable.

21:14story images present.

...38open where do we get the generated image okay so this is going to land on the story model which is getting wider and wider just like me after Thanksgiving

22:11Are these called cards? Where would I put that issue?

...21Are they called link previews? What's the right? Okay, it just calls them image. And then I'm thinking they're called cards because back in the Twitter days, they were called cards. Just trying to think of a more meaningful name because images. The type rather than the purpose.

...58corbob social sharing cards?
Yeah, I think that name card is pretty common. So I'm probably going to end up suggesting it.

23:33Don't love setting empty string instead of proper nil. We don't have an image.

...59graefchen People do love cards limesSit
And we're going to have And then this doesn't handle the empty string case. It's okay, we knew this was a graph PR.

24:28Yeah, speaking of code organization, We do have the story model making HTTP requests, which I don't love. But the contributor is following our existing pattern, so can't blame them.

25:17This wants to be a telebugs exception. And honestly, I don't think there's a value to rescuing these.

...36But I can't quite picture where in the flow that gets called. So

...49So I think pretty much all of this should get shoved off to a background job. Yeah. Oh, that's not it.

26:43OK. And what did they want? So. So extra fetch.

27:59to submit story controller stories controller i always get that wrong yeah right here pretty much not with that but

30:06need to reference this because we're not going to touch it. I'm trying to not give them the job of refactoring the existing stuff.

...56Thank you.

31:31sit in caddy file somewhere in here you have a couple of caddy files yes no somewhere here

32:02so when they link to avatars they're under the public directory wait this is public cache where's the rest of

...38It must be in the hatch box provided caddy file.

...58yeah.

33:08How do I see that caddy file again? With curl, right?

...33I don't think this is going to have anything sensitive, but let me just do it off screen.

...48I'm looking for the setting related to the public directory. I'm not seeing it.

34:12That's wired up.

35:21Okay.

...58Let's break up this run on sentence.

37:26then let's take a look at these two things what's happening with because this one's a test for this schema huh okay so that that's not and then this yeah right so

38:33Thank you.

39:11Okay.

...50you

40:53it's a dash capital i break man dash yeah capital

41:23This one. Probably something in the GitHub action, right?

43:17going to be encouraging, because this one that's like, yeah, we're gonna take stories, the God object, and we're going to go fetch some data from them, then we're going to fetch an image from them, then we're going to save it to the file system. And then other code you don't know is going to shove that out the door. And then you're going to also process that image along the way like that's, it's a lot of moving parts. So I really appreciate that they've taken on a big, ambitious feature. I think it's only their like third PR to the project too.

44:02So on my to do list in the scratch file, yeah, so these two are done, there's no other polls open that have been updated since the last stream. pushcx https://github.com/lobsters/lob…
And then This have I been pwned issue still hanging on open this has been a minute. So in March. We got what I called garbage tier spam where it's not at all. graefchen People loosing there account is always unfortunate. limesFeels
Vaguely related to lobsters where we've seen like check out my startup spam because people think we're hacker news or. people who can't stop doing self promo and i like look at my new project or look at version 1.14 of my project look at 1.15 1.16 1.17 and it's it's a little frustrating people are over eager but then there's garbage tier spam where it's the like make money fast online buy best cryptocurrency and what we saw in march was a user that had logged in for the first time in years just put total garbage in and that's a that was a new pattern of abuse for us we have seen like garbage tier spam there was once during the The April Fool's where I turned the site into PHPBB, I actually did enable the signup form, and I guess there were still bots out there that knew what the old PHPBB signups looked like, and they started posting crap. And then I think one time we saw garbage spam from a brand new account, and I don't remember it offhand. And Irene realized, oh, this is Account Takeover. And I looked on Have I Been Pwned, which is an excellent, excellent free service. Or has your email address shown up in a breach? So let's throw in mine. And the answer is going to be yes, because I've used my email for a few. No, let's use an old email of mine. There's no way. Oh, can't type. Not Raj. Or are they going to be like, yeah, so there's confetti when you're not in a breach. What there needs to be is like garbage gets dumped onto the page when you're in a breach. Boston_Mass we're all in that list lol
So if an email address has been around for more than a couple of years, it is almost certainly going to be in one or more data breaches. Yeah, we are all in some of these verifications.io. That sounds like something important. Wow. Had a lot of personal info. Apollo. Do I know Apollo? Yeah, this isn't something I signed up for. This is something that scraped data on me. A lot of these will be. It's funny how many of these breaches will say things like marketing and Look at the stuff they collected on me. They had stuff like date of birth, ethnicity, family structure, gender, home ownership status, marital status name. Yeah. This is the kind of thing that will radicalize you about online privacy. Not to get too political, but you look at it and it's like, why are these folks allowed to make these databases about us with such incredibly sensitive info? corbob That's not necessarily that they had that on you, but that's the kind of things that were in the breach... yours could have been as simple as the email address
So, for the spam that we saw, I threw the accounts into Have I Been Pwned? And there was a particular breach called Stealer Logs January 25. Yeah, Corban, it's true, but... Because they'll never give you more info, you have to work these as a worst case scenario. And even if your data, those kinds of data weren't leaked in the breach, people don't think about the fact that these kinds of enrichment marketing and other data broker services, that there are, what's the polite way to put this? are slightly more mature versions of these services that haven't gotten popped that are selling your data literally every day like religion and you know facebook claims they don't sell this info they just sell access to this info so i don't remember if facebook still allows you to make ads targeting like let me target an ad towards everyone interested in this particular religion and That's kind of Facebook's, what's the word, euphemism for worships that religion. They just say interested in. graefchen I am surprised that I am not pwned, even if me older e-mail address. But I am also EU, so that might influence it. limesNoted
And then if you are the ad buyer, you know that everyone that clicks on that ad has that religion or has that job or has that income level. And so Facebook didn't directly sell you that info, but they sold you that info. And I pick on Facebook there because they kind of infamously do this and have for a long while, but there are lots of these data brokers. Yeah, I don't know if EU influences it. A lot of the EU regulation pushes companies to more aggressively delete data, especially on closed accounts. I don't know if that's why you're not showing up. Or maybe you're just lucky. I mean, my peteritpushcx email that you saw me throw in there, it said zero data breaches. graefchen Probably both.
And I was thinking about how long I've used that. It's got to be like 15 years. I wonder if the breach lists are throwing away the odd TLDs or something, because I really don't believe that my email made it 15 years without showing up in a breach. I don't know. So there's a fair amount of chatter here in the issues as we kick around and try and understand how to design this feature. And then in October 27, it happened again. And two days ago, it happened again. So that's now three times it's happened this year. And that's just a sign that we're getting older and people have had these email addresses for a few years. But this one is a great issue if somebody wants to take this on. pushcx https://github.com/lobsters/lob…
You'll have to skim a little bit to put back together what the plan is because it emerged through discussion. The big important comment here is this one from Uber Nostrum about how they have implemented it before.

52:10That's pretty much the plan if somebody wants to take that on. If this starts happening once a month, I'll start doing it. But you know, right now, at worst, we are every other month. So that got prioritized, right?

...35bsandro VoHiYo hello cyberpals
Oh, so I didn't put my message in. pushcx This is Lobsters Office Hours, ask questions about the site anytime!
I hate me, Sandra. I was just going to put a... This is Lobster's office hours. Ask questions about this site anytime. I don't think I did my usual intro at the start of the stream. So this is Lobster's office hours, and folks can ask questions about the site, the community, the codebase, anytime. Or just look over my shoulder as I work on maintenance. graefchen Hello bsandro limesHi
And I was working on username stuff and I I kind of that merged right i'm trying to think of what am I working on if there's not a lot of activity over in the. Issues and pull requests, which are my first. priority because contributions are such a multiplier.

53:37pushcx https://github.com/lobsters/lob…
yeah okay so there's the username stuff i was thinking of so what do i want to get back to i guess the story merging ui so that's 1456 say yep i'm working on this a lot over the last year improving what merge stories look like let's just peek back at prs to make sure okay so nobody happened to be free and immediately see my comments so this one is about improving the story merging feature and the ui and the documentation it's a whole big to-do list ah right headline that's where i was And I had started that on stream. And then there were a couple of streams where, very wonderfully, we had so many PRs and issues that I didn't get to write code. I just got to review code.

54:58All right. So...

55:10So let's get back to this story merging UI if nobody has questions. I believe I had a branch hanging out about this.

...27Yes, headline current. So let's grab that.

...42Bring that up to current.

...54And edit it.

56:01Alright, so this was going to start creating the headline model. And. fixing these counts and i was trying to find a way to do this very incrementally especially because i keep stopping and starting my work on this so if i can do stuff like let's just get this into the database and then let's slowly revamp the ui to use it you know that kind of incremental progress is really called for here so i don't just have one giant branch that is hanging out for another year. I say another year, but the branches haven't less lasted that long. All right.

...58Right, so this was this was biting off a bunch by also starting to split out code from the story model.

57:18Okay, so that doesn't exist yet.

...25We can see if the migration runs. So it's gonna have its own token. It's gonna have a, oh right, this one. Yeah, where I left off, there was a circular reference, did I not? leave myself with to do about this.

58:06I did not. Okay, so the hassle here is If I have a circular reference, I'm really limited in working with these models. It has to always be in a transaction and Rails doesn't love that. And I think the direction to break that is gonna have to be, I'll remove it from headline. Well, so we're going to have a story model, and we're going to have a headlines model. This wants to have a headline ID to roll up under. This is going to fix a bunch of structural issues with the database. So stories, merge stories, and non-merge stories always have the same structure. And this wants to have a story ID pointing at primary so that it always knows which title and such to display.

59:28And the way I break this cycle is going to have to be, and I really want both of them not known, but I'm not going to get it.

...48that I think I'm going to remove. Make this one nullable on story because otherwise I have to insert a headline And I would rather insert a story and have a couple tens of milliseconds where it's not visible on the site, because the site is going to be displaying headlines, than have the headline handle it. Yeah. All right, so I already made that null true. How does this handle to the use cases?

01:01:02Viewing headlines, submitting a new story, merging stories. When we're viewing headlines, that's fine. We'll just select out of the headline table. When we're submitting a new one, we'll create a story model. Save it. Yeah, so wait, that looks like create story. Let's call it persist two stories table. Great headline pointing at story. Date stories to point at headline as well. These really have to happen in a transaction because if this is select from headlines and a page loads and the story doesn't point back at its headline, that is a weird state to be in.

01:02:26And then merging

01:03:09AnakimLuke makrplTurbo
those are no those are not cached okay and then maybe change the primary story and then we unmerge but what happens to happens to your old headline model

...40if there are no stories left for that headline.

...51I guess I delete it? No, I want to hang on to the short ID, the token, because I want to keep those links. Ugh. Fuck, I'm reinventing the existing story merge model one layer up.

01:04:19Right, because this wants to be able to say where headline has stories, I guess. Because the flow is going to be like story A is submitted, story B is submitted, which creates headline A. Story B is submitted, which creates headline B. And then B is merged into A. So story B gets headline A. Headline B, what happens to you?

01:05:24because then it's just going to be hanging out without a primary story.

...36But we don't want to delete that record because we want to be able to keep, yeah, to show it ever, to keep that record so we can, or, 302 links to be to a.

01:06:04How does it know. That it should forward the links to a.

...24That's ugly.

...35AnakimLuke what u working on?
I thought when I designed this model that I was handling all these use cases, but I didn't think through this part of it. Hey, Anna Kim. I'm working on the story merging UI, specifically the improved database modeling for it. And I realized it's not actually much improved. So if you click on that issue 1456, you will see that the first unchecked to do item is refactoring the DB. i'm starting to refactor the db and realizing it doesn't quite handle that so like this this number will fall to zero on headline b and this will become zero and this will become nan and this will become zero or nan nan right because none of these things will be valid when it doesn't have any stories.

01:08:06So do I just delete it then? Well, because I wasn't going to add a short ID to this table. And the token is there for later expansions. Maybe there just isn't an ID to maintain.

...25Because if the point of the headline is it's just metadata, it doesn't have any data itself. The links are going to be story A's short ID.

01:09:19So then let's imagine if we then unmerge story B, we can recreate a new headline B2. And it's going to have a new token, but it won't have any data that we can't recreate. I think we're good.

01:10:48Let me check on this.

...57Because I don't want to actually issue a query here.

01:11:15I'm in a I don't have real data.

...34Yeah, all right. I was trying to avoid hitting the database. So I guess I could say, static 2a.

...53order sort, sort by even

01:12:30Okay.

01:13:04So I'm calling a reload here because if that initial creation of the headline touches those collections after or before this, they won't get saved. So this really does want to just be a great bang.

...31And the story is mandatory.

...48I think this wants to call refresh story before save.

...59I don't love that caching. It feels like that's going to bite me later, but I don't get a choice.

01:14:20So once those are in.

...35See this run, but just a little.

...49What are you mad about? Oh, it's this references hassle. I think it's got to go type and then name, right? No, that doesn't sound right.

01:15:19Yeah, it's name and then type. why can't i say foreign key true number of arguments given for expected three oh because i did not edit correctly

01:16:03OK. Can I just call that good for now? I guess since it's nullable, I don't have to make it a references.

...53Okay, progress.

01:17:07Arguments, given zero, expected one, all right. Who's failing? Where did I?

...31Story is not deleted. Count is failing.

...42Oh, I bet this is the deleted is going to be the one that takes a user. Yep.

...54So let's explicitly pass a nil there, because these things have to be cached for all users, and I would rather that mods see slightly odd numbers than users.

01:18:22I'm going to have to drop the table, aren't I? Yep.

...45And about 66, which has the same issue. OK.

01:19:02So that started working.

...16Story isn't merged under this headline. That's a weird one.

...29Story headline ID equals ID. What line is failing? 24?

...52this one needs to also say where Story.update column. Or what was it? Update attributes. Headline ID to be h.id.

01:20:24Use update instead of update attributes. Does that? I want to make sure that'll bypass validations. Not validations, timestamps. Let's check.

...54So update attribute singular, skips validation, calls callbacks. No, that's what I don't want. I don't want to touch those timestamps. Why? Because I won't have any, but I only really use those for caching now. So that's okay, I guess.

01:21:26But in a migration, I'm really wanting to do less. So let's just call update columns.

...56Can't quote story. What does that mean?

01:22:07And then 25. When I save the headline, what is it quoting?

...33That's a weird one. I got to see the full trace back, I think.

01:23:09All right, so create headlines 25. And then we're going into all of this active record stuff.

...22What are you trying to quote?

...30And I think we're on the first one still, right? If I scroll way up. We find the stories, see if we have a headline.

...49No, we must be on the second or third one, because I'm seeing the click. Let's count things up. Let's count things up. Let's count things up. So it looks like the third one.

01:24:23Let's get this one up. I sure don't know this error.

01:25:07Got a minor narration issue. Let me handle this real quick.

...39What was that name? Now your C is painful. There we go. Just a troll in the chat room.

01:26:53Alright. Tedious little trolls. Always the same shit.

01:27:21So what's our last line in our code? 25? Yeah, so that makes it hard to know which one. Oh, this one. It's the last comment to that.

01:28:16Hang on. Let's just make the database do it then.

...48Is this going to be max? Yeah.

01:29:07That's promising. No exceptions. Let's...

...21let's look at these so now we got three headlines the last one has 17 comments let's select count star from stories where headline id equals three just throw on so that matches that's where

01:30:0817 verse 18 okay yeah so the difference there is just that one story is one comment is deleted Why are all those hotness values the same?

...44On stories, hotness is a decimal 20 comma 10. And here it's a decimal 10 comma 0. So that's not good.

01:31:21Let's get that three out of there. I mean, I still only have test data, so there's only a couple of hundred stories, but still just kind of see these things.

01:32:29All right. So if that works, the to-do list for this is the headlines model. So there's a migration. And is that enough to ship? Because we're copying all of this data up.

01:33:10Feels like that's enough.

...36Okay.

...49If all that works, that might be just what I wanted of decomposing this into migration. Yeah, I don't need to keep this log.

01:34:35So there's that. Let's look at the to-do list.

...50Yeah, so the to-do list had refactoring these. And then the next thing is the count. These fields can get migrated off one at a time.

01:35:08I feel like I'm in pretty good shape there. The next thing I really want to do is run this, restore a backup to my local dev.

...34If I do that, It's going to take a while because restoring a backup to local dev takes more than 20 minutes. And I don't have something I want to be working on besides this. I guess I could start that next commit

01:36:32twitchtd hi pushcx, I'm back from Europe :)
All right, so that's starting in the background. Let's see how that goes. Oh, hey, TD. How's it going? Speaking of the database, I'm doing some refactoring for the merge story model.

01:37:11This was a thing I had been tinkering with. So let's see how that looked out. So.

...43Head level choice is head level story ID and stories ID.

...57OK, so we are getting wrong numbers. Well, I changed the hotness algorithm a little.

01:38:12I want to see that one against real data too. So let's make a list. Run the migration against a prod restore. I want to compare hotness from the data because I fixed, what was it, the modifiers.

...53So I liked this change.

01:39:02Side by side.

...18twitchtd I'm going to try to dedicate some time this week for sqlite to brush up on the remaining work that's left and probably will leave some questions for you by next office hours
Oh, cool. It's great that you're getting back to that.

...26So I changed this one to say base because I renamed that variable from base to modifier.

...41Where not? It's not by the submitter. It says score plus one.

...57This one isn't helping anything.

01:40:10twitchtd to be honest, after the trip, I forgot what I needed to do, I remember we were talking about performance testing but I'm completely out of it, luckily there are github comments to remind me
Let's see what it's running. We've had a good. Yeah, I was thinking of performance because we twice now in the last couple of weeks, we've had slowdowns on the site because of bots aggressively scraping the site and I added a cache so we have these short ids for comments that just 301 over the full comment URL so you can go to slash see ABC 123 and it takes you to you know the story full URL and the anchor of the comment and. The overhead of that one lookup was enough that. we were getting spidered aggressively enough to slow down the site. So I added a cache on that, but it felt a little ridiculous to have to. And I couldn't help but think, you know, if the database was right here locally on the machine, I wouldn't need this cache.

01:42:01I started the database restore. So that's going to run for a couple of minutes off screen here. So I can't run that command. I can do it through the other version though.

...37Waiter not. Stories doesn't exist because I'm restoring it. Can I just say story? Because I just want to see the sequel that it wants to run. Story new merge comments. Some comments dot score plus one. It's not adding one to each, is it? It doesn't show me.

01:43:16Let's check docs. That restore says it's got four and a half minutes on it, and I don't believe it.

01:44:06This one not.

...33so i've got that cursor on a comment that says that if a story has many comments but few votes it's probably a bad story so cap the comment points at the number of upvotes hacker news actually has an even stronger version of this where if individual comments start getting voted up above the story score it assumes that those comments are thoroughly debunking the story and it sharply lowers the score or the hotness of the story. On the other hand, I don't think we need to be that aggressive. We have had a number of really good discussions come out of stories where There's kind of a mediocre blog post that gets a really thoughtful discussion.

01:46:17I don't know that I believe this don't immediately kill stories comment. I've been looking at that for a decade. I suspect it's not doing what it thinks it's doing.

...46Looks like the database finished restoring. That wasn't so bad. Why am I not seeing my query, though?

01:47:12Yeah, that plus one is just wrong.

...35It's not pi 1. It's 2 1.

...49And the big change here is this. Replacing the

01:48:00very small change from the modifiers by having them multiply the score.

...17Okay. Let's see if this migration runs.

...28Off to the races.

...37This will probably take a second. So we look at the database.

...51We need 119,000. We have three. We have four. You are taking your sweet time. Five. Why is that so slow?

01:49:16Oh, my God.

...28What is the show process list?

...38You are really chugging away.

...49Why is that so slow? Especially locally. Like I know these things are all huge. hitting the database and it's making like eight round trips, but that's active record.

01:50:13I mean, if the query has to run overnight, that's not the end of the world. It's just ridiculous. I always feel a little weird doing performance work on migrations. How fast does a program that you only run once need to be?

01:51:22Yeah, so it's working, it's just slow. It's going to be like a 18 hour migration.

...50I don't know if those numbers are reasonable. So where was that hotness query? Yes, we're getting slightly different numbers as expected. ghost_user_1984 isn’t the hotness math is just a mess?
But the thing about such an opaque number is it's hard to know if it's wrong or not.

01:52:22ghost_user_1984 I think it’s always wrong by design?
yeah the hotness math is a mess and i'm i'm tidying it and i feel like i've caught two bugs in it by going through it always wrong by design could you elaborate on that so here let me swap these windows around so they're in a more useful order but get rid of that here we go ghost_user_1984 back when I did that exercise in poking at it, we model a curve and don’t care about the exact points on the line
so on the left is the existing story model hotness and on the right is the version i'm touching for the new headline model and there's a couple of small changes like instead of calling this base i called it modifier because that's what it is it's the sum of the tag modifiers and then there was this comment score plus one, there shouldn't be a one. Hmm. That's been a few years and I don't remember the particulars of your work.

01:53:52ghost_user_1984 so if you want to change it, I think you need to start by plotting it?
ghost_user_1984 because that was the best tool I had to explain it at the time
That's probably fair. I wasn't yet changing it. I was making this new column.

01:54:07The thing that really caught my eye when I was reading was we take this base, which is the sum of the tag modifiers, and it's just added in. ghost_user_1984 yeah that was by design
So it really, like, as soon as the story has three or four votes, that modifier doesn't matter. Because the modifiers are all in the range of zero to one. And seeing them in the range of zero to one has me think they should have been multiplying rather than adding. I'm curious why you say that was by design. Because this is... still mostly JCS code.

...55And that says round seven, but then this thing has a precision 10. So we're just kind of wasting three. So we might as well round to match the actual column.

01:55:27ghost_user_1984 the idea was that you didn’t want to have the hotness overpower the community votes until it really ages out
ghost_user_1984 it’s also been a few years since I looked at it
Hmm.

...35ghost_user_1984 I’ll see if I still have those notes and sent it to you
I've been thinking for a while that the hot mess modifiers are just not strong enough. Yeah, please do. ghost_user_1984 that method is entirely vibes
It's been a long time and I sure don't remember. Actually, I'm going to poke in my notes folder. It really is at this point.

01:56:44All right. Yeah. ghost_user_1984 also if I’m remembering this was copied from Reddit
And this sign can probably get dropped.

01:57:19It's 52. Yeah, so the changes they made are clearly just nibbling around the edges, but these are all old stories rather than recent ones, because it's starting from story ID 1. Yeah, it is from Reddit, but boy, has it changed.

01:58:11Difference in time in seconds. Difference between upvotes and downvotes. And then here's that sign math, right? And z is the maximal value, which decreases the score as time goes by.

01:59:15Yeah, this is definitely what our hotness is based on. It even has the odd log and absolute value in here. And ours is cludged up to include comment score a bit,

...58Let's go look at something on because my local database isn't funny. No, actually, no. I don't have to go look at it on prod. One of the things I've had a feeling is that the video tag and the audio tag don't tend to start discussions. So I wanted to investigate that because that's kind of the primary motivation for changing the way that hotness modifier applies. Because if audio and video are not starting good conversations, they should be pretty heavily penalized. And they're definitely not now.

02:00:58So what do I want? I want to know if they get fewer votes and fewer comments than stories not tagged either of those. So score, come on.

02:01:43And I wanna exclude. There are few enough of them. It's the fact that there's two and I know I'm gonna goof this group.

02:02:20I'd love a distribution, but let's start there.

...35Where?

...48All right, so there's 115K. That's about right. And then where ID is in those, about five. Yeah, so that adds up correct. Okay, so I partitioned them correctly. So take the average score and the average comment count. Will I get anything meaningful? 12 and five without? Seven and two with? That actually sounds pretty plausible. Is there a quick way I can histogram these? Yes. Let's say score divided by five.

02:03:49Can't stop her. 4 divided by 5. Group by 2. Group by 2 ascending. Group by 10. So there's a bunch of stories. Most stories get a few votes. We're over here. And then if I say the stories that don't have the audio video tag, obviously there's more, so it's a much wider range. That's a weird distribution. Oh no, it's not a distribution. It's fine. So then the median is around here, 0.7.

02:05:10I'm not bucketing well. These are just, I gotta,

...24There we go. That's what I wanted.

...34Yeah. So the median is zero for both of these, but it is right shifted. So there's just more here. so what's this, call it 38 to 800, so that's falling off by 6x, and here it's only falling off by 3 from 0 to 1, and then from 1 to 2 by 4, and then by half, and this one only by half, and then by half again, so Yeah, so where it doesn't have the tags, it is noticeably lumpier to the right. And then let's do comments count. It's not good comments count, but most of our comments are good. So here's our base.

02:06:46here's the one where it has the tags so drops by tenth drops by what is that drops no it's gonna say drops two drops about ninety percent and this one drops about 95% 98 can't do the percentages in my head fast but. There are just. So many fewer discussions proportionally when it's audio and video. This is a pretty stark difference. And you know this one's got a longer tail because there's so many thousands more but.

02:07:47It just falls off really hard, really fast. So in bucket three. Can we normalize? How many did I say there were? Was it 115 or something? So there's 51, 36.

02:08:28gtfrvz compare audio/video tags against random other two tags
That is a good idea. Let me grab this other number first, and then we can do that. I just want to see these proportionally.

02:09:09gtfrvz 23, 42
yeah so i'm not seeing things it is a pretty stark difference here okay so any idea for any particular two tags you could pick two programming languages no see the thing is we are going to see a difference

...42So like, right off the top of my head, Rust stories attract more comments than Perl stories. And that's just the language is so much more popular, right? So what is, if we compare against two random other tags, I'm not sure that gets us anything.

02:10:22Because if we pick stories we pick two tags so 23 and 42 sure nice familiar numbers if one of those is meta it's going to have way more comments if one of those is pearl it's going to have way less comments but that doesn't tell us anything about audio or video or these comparisons Could you say some more about what the value you see of comparing against random tags is? And I'm happy to run the query. I just want to get a feel for it. What are we expecting to see before we run the query? Let's pre-register our experimental. I have to get rid of this division because I don't know how popular these tags are.

02:11:2823 and 42. gtfrvz not as smoothed out as the global view
I'm guessing it's taking you a second to type up what you're thinking about the value of this comparison being.

...51Not as smoothed out.

02:12:04yeah but even accounting for like in this case there's going to be more buckets because there's going to be you know a longer tail but this query where i did things on a percentage basis so i normed everything to be in the range zero to one you know it tells me okay you can just ignore these buckets right but here like from 95 to 84, 308, 835. You can just see that not having the tags is lower at zero and bumped way the heck to the right. So I think I've already seen the useful thing I wanted to see.

02:13:02If I had done this as 1 minus and times 100, it would be even clearer. All right. So I'll run this one because I said I'd run it. So here's the one for comments count. And here's the one for score. And let me spell score correctly.

...37Yeah, so 4318, that drop is not as wide. This has a shape very much like the overall one. Where'd it go? Yeah, where it went. 96 to 906 so i'm just looking at this and it's like okay so here's a 90 drop a 66 drop like it has the same basic shape as the overall one i don't know did we learn something from that

02:14:34Go and release. Yeah, release is another one that I would bet gets fewer comments.

02:15:11154, man, that's going to run forever.

...27It's just ticking along. It's a different query every time, so it's not stalling on anything. It's just doing so many round trips.

02:16:16Save on these. That's effectively never going to finish. All right, no. Because it got that far, I can just roll back.

02:18:37So there's a couple of round trips saved.

...58gtfrvz `ssum`
Thanks for catching that.

02:19:05So this is going to hit it twice, but that's probably OK. Start that again. If I don't instantly get an exception, I should be ticking along faster. One, one, one, and really faster, huh? That's not great. This is a poor man's sampling. Are you still selecting the score? Because up here. Now it's using the existing score, it should not be.

02:20:32doing that. Some comment score. No, okay, it's this part.

...48And some hotness mod, yeah. I don't think there's much performance to pick up here anymore.

02:21:01Because I'm going through these associations, even if I tried to preload stuff, you can't preload an aggregate. So I can preload the tags, but the sum is still going to create a query. I can preload the stories, but summing them is still going to create a query. I can preload the active comments. Well, I can preload the comments, but going into the active, and call and counter both things that would run a new query regardless.

...42Well, it's faster. All right. I'm going to make this a short stream today because this is going to take forever to run in the background. And to feel confident in the score changes, I want to see ghost user 1984's notes. And I want to see this migration finish, and I'm not going to see that anytime soon. And I got family I got to talk to on the phone. Yeah, so unless somebody has any great ideas, I'm going to roll up the stream now, I think. And we'll see how this comes out. This one, I may deploy this before the next stream because this really just adds a table that isn't used. And it makes sure that it gets filled with data Actually, does it? No, it doesn't, does it?

02:23:18Yeah. So this has to create a story.

...46Once that story is created, this needs to move into a helper.

02:24:13What do we call it, though? What I really want to say is something like headline create or story.

...29And then that calls. And that calls story.

...44It's the.

...51Honestly, it's this stuff. Yes. The rest of it should all be updated automatically.

02:25:25It's got to do these roll ups twice. That's part of why the migration is slow. All right. So this logic also really wants to exist over in the merging code. Yeah.

02:26:38All right. That's from the column. And of course, since I changed the migration, I have to restart it.

...58I'll let you know how long that takes to run. All right, everybody. Thanks for tuning in. gtfrvz b2wGG
A little bit shorter stream today, but nice hanging out, working on stuff. Take care.