Let's do a different kind of evil
Streamed
- Reviewed PR #1360 for link shortener detection - merged after confirming non-ASCII domains are allowed; so many different URL validations.
- Reviewed PR #1359 about UI/accessibility improvements for story disowning - needs changes to handle CSRF and no-JS fallback properly.
- Started on self-promo detection improvements - exploring query performance for checking if users have well-received comments (score > daily avg) in last 6 months. Added date-based index but still hitting performance issues with correlated subqueries.
- Discussed challenges with GitHub’s limited UI customization forcing use of bots like Zulip to manage contributor interactions. Really don’t want to keep nodejs in prod.
- LLM slop HaikuOS post.
scratch
topics:
* GH inspires more cookie-licking, https://push.cx/stream/2024-10-21
* https://lobste.rs/s/avgrck/html_link_button_is_question
* PR review
* improved the description of code I want: https://github.com/lobsters/lobsters/pull/1359
* merging ban code, asking for a migration: https://github.com/lobsters/lobsters/issues/1261
* self-promo detection
self-promo:
* users: recent well-received stories and comments
* expand 'is_author' to 'benefit from'
* reminds me of the new FinCEN beneficial ownership req in the US
* complexity: author might be commenting on their own piece but submitted by someone else
* authorship model?
dependent subquery on average comment score is the killer slow part
average_comment_score table (date, avg) for last 180 days
Broken Code, Jeff Horwitz
- Zuckerberg famously implied acting different in different social spaces is improper
- Book is about thoughtless incentives
post-stream:
look up: 'No ARIA is better than bad ARIA’
update my google search to ?utm=16 to remove AI "summaries"
title:
just a wafer thin method
let’s do a different kind of evil
Transcripts are generated with whisperx, so they mistranscribe basically every username and technical term. They're OK but not great, advice appreciated.
Recording
01:21dpk0 morning campers
dpk0 so, a topical discussion point to really derail things before we even get going
dpk0 on the front page right now there is a post which acquired a couple of early spam flags and was greyed out for a while
dpk0 its author has only used Lobsters to post his own blog posts … but he posted three of them over the course of a whole year
dpk0 a tiny subset of all his blog posts
Good morning.
Good afternoon.
dpk0 he’s clearly only posting ones he think are on topic
Hi, DPK.
dpk0 yes
Is this the... Just based on your description, are you thinking of the one about GCP, Google Cloud?
We had a big discussion.
Yeah, we had a big discussion about this guy in the chat room.
dpk0 oh, i’ll go and check that out
dpk0 your screen is not shared yet btw
looking at this one by tony met yeah a so moderator sent him a note about self promo and well you unfortunately oh my screen's not shared yet excuse me thank you i meant to click in as i unmuted
But the cat came up and meowed at me and I was wondering if he was going to leap up and distract me right as I was getting started.
But wondering if I was getting distracted is also a distraction.
So...
So yeah, you're probably looking at this GCP post and I believe you said yes.
dpk0 i actually looked to see if he was a Google employee or Google consultancy
A moderator contacted him to remind him about the guideline on self-promo.
You say you're going to go and check out Chet.
Inconveniently, yeah, a bunch of people looked to see if he was an employer or a consultancy because it was really not well written in a way that I wondered if it was LLM slop or...
I suspect he's a non-native speaker.
His comments on the story has some space around the punctuation, which is really characteristic of native French authors.
And this was a, you know, you say that you're really derailing things before we get going.
But a big thing from the last stream was talking about how the new user stuff was inexpressive.
And this story is actually a really good example of it where
the author had no recent experience or activity before they restarted some self-promo.
So I think it's an entirely topical, oh, I didn't clear out notes from last time I did these things.
Yeah, so that's not at all a distraction.
Okay, big boy, how about you settle?
04:14yeah so i actually asked the chat if they saw hallucinations in the post where there was just kind of plausible nonsense about gcp because it the intro and outro were so generically structured very five paragraph essay my best guess for that post would be
author's a non-native speaker they started from commanding an llm to generate slop about their topic and then they edited it because there are a couple of typos in there and for all their irritation llms actually don't make a lot of terrible typos or misspellings and that post had a couple but
They were all concentrated by the beginning or, well, one in the beginning, one a third way in, as opposed to towards the end, which was the part that was especially just bad writing.
dpk0 oh yeah. spaces before commas are not usual in French, though (thin non-breaking speak before question marks and exclamation marks and inside of « guillemets », but not before commas or full stops …)
dpk0 that’s a bit odd
I don't remove stuff from lobsters just because I don't like it and think it's bad writing.
That kind of quality issue is,
Not before commas are full stops.
Maybe I'm thinking of a different language then.
But I've seen it in other authors.
The spaces around commas and periods is so odd.
And when I see it in English text, it usually means that someone is on a phone keyboard that's not set to an English locale.
And if it's not French that's doing that with spaces in front of commas and periods, I don't know what it is.
06:16So yeah, I don't usually remove stuff for being poor quality.
That's handled with people click hide on it and don't interact.
And mostly it doesn't get upvotes.
So let me swap these order around.
pushcx https://push.cx/stream/2024-10-…
dpk0 and yeah it’s not LLM stuff because LLMs don’t make punctuation and spelling errors like this
On a previous stream, the 10.21 stream, for anybody who wants to dig in the archive, and I will throw the link in here, I talked about cookie licking, which has been kind of an increasing issue for the Lobster's repo this year, where people show up and want issues assigned to them.
dpk0 i think some of those have been spam bots
yeah their comments are certainly not llms but parts of the post the very generic so in a american education or you know maybe i'm dating myself and they don't call it this anymore but we used to call it a hamburger essay and a hamburger essay has an introduction a conclusion and three paragraphs in the middle of points a b and c why my argument is wrong
This has got to have a Wikipedia page, right?
Yeah.
The more formal term for it is five paragraph essay.
pushcx https://en.wikipedia.org/wiki/F…
I do actually love that Americans are like, no, let's name it after our characteristic food hamburger essay.
dpk0 (spam bots comment was directed at saying people asking to assign GH issues to them)
And the structure of that post is very close to a five paragraph essay where there's
an introduction to the topic of, I'm going to talk about AWS first GCP and why GCP is better.
dpk0 shrug. i think some people just never really learned to write any better
And then there's, I think he actually had four points as to why GCP is better.
And then there was a conclusion that wrapped up and was like, and so I told you for these three reasons, the GCP is better.
So it is.
And it just read very much like,
this kind of beginner essay they're very easy to criticize because they come off as formulaic i actually think they're a great practice for teaching rhetoric to students i think it is a great way to give students a formula to follow to talk about well write a five paragraph essay write
a short news article in inverted pyramid style.
And they kind of learn to organize their thought and work.
It's such a cliche structure that they always come out pretty clunky, but that's okay for beginners.
It was just striking to see that on the submitted to the lobsters today.
Yeah, spam bots, I honestly have no idea
I haven't clicked into the profiles and I don't tend to try and judge people in that topic.
Where was it?
There's one of these just this morning.
I think it was on this one.
Yeah.
10:00Yeah, so it says this person is a non-native speaker and a beginner.
...10dpk0 yeah, that doesn’t look like a spam bot, maybe?
But then also they have a very new GitHub profile.
I don't know that it's very useful to try and figure out if this kind of comment is a spam bot.
But we've gotten, I put GitHub griping on my list because
We've gotten a rash of these, and while I am thrilled to get potential contributors, having that expressed primarily as cookie-licking is pretty frustrating.
And I can't fix the GitHub UI issues that lead to it.
There is this assignee column, and I can't turn it off, even though we don't use it.
and when you come to an issue is there one that doesn't have comments so like this one i don't know what it's about but i can't insert any kind of direction to potential contributors here even if i write a contributing.md there is no point here for me to say hey potential contributors before you lick this cookie please come and read our policy document or just don't lick our cookie let me get rid of the
Assignees button.
And it's especially frustrating because cookie licking is a very common problem in open source communities, I would say.
It's sort of a good problem in that old school open source communities didn't have it because they just ran on a, yeah, they did, but they ran on SourceForge and mailing lists, which have such a high barrier to entry that you just get fewer contributions and the few that show up are, you have to be kind of committed to want to sign up to a mailing list.
And one of the reasons GitHub won big was if you make it easy for people to get involved, they get involved.
And the experience of using GitHub is overall so much more pleasant than SourceForge or if I never see Piper Mail again, I wouldn't mind.
And so it's very strange that
They've sort of never addressed cookie licking and everything in the UI points, especially newbies to do it, which is very frustrating.
And it's not huge changes.
It's if I could remove the assignees field or remove it from the column.
Well, that actually is a huge change at GitHub scale and complexity.
But they have things like if you go to open an issue,
I can at least put some text in here, even if it is out of date, because nowadays I mostly do it on stream.
But I can, you know, there's this sort of, it's not a lot of UI.
It is kind of, I can't customize this form.
I can't say there are these fields.
I can't add validations.
I am sure somewhere in GitHub, someone has written up the white paper for a whole giant feature set for allowing people to customize the new issue form or the new pull request form.
And that's such a huge topic that opens an enormous can of worms and it's difficult to implement at GitHub scale that kind of what comes out is that, well, at least we could have some placeholder text.
for the one big text area field.
And if you have a placeholder text, you can do anything.
You can explain something, you know?
I say do anything.
It's not a do, but I can freely write anything and I can give people a link.
I mean, I contributed to Sorbet last month because I went to file the bug I saw and Sorbet's bug tracker said,
yeah i mean it said it more politely and a little more circumspectly but it was basically like if you file a bug about the standard library it's going to be here for a while hanging out but you could just take a stab at a pull request it'll probably get merged pretty quick
And I'm kind of glossing.
That was what I took away from it.
But it prompted me to say, all right, well, let me actually try and fork the repo and see how easy this would be.
And it was easy enough I could get that one started and they were able to finish it.
So that worked out really well.
But here, there are no points of customization, even placeholder or, you know, like a placeholder comment here where I could write some text that says,
This is how our project works.
And so to avoid being up on too much of a high horse, I'm actually reading what this bug is.
It is also a UI bug.
I know what that one is.
But one that I followed from, filed a couple of weeks ago, I believe I,
filed this while we were on stream because I saw it and I was like, oh, right, I forgot that we had those and we shouldn't.
It's very old code.
The UI that appears above comments is a series of links.
And we do the a href equals anchor and attach some JavaScript to make it work, but they just flatly don't work if you don't have JavaScript enabled.
and I would like them to work with JavaScript-enabled.
I saw we have one open poll request.
Eduardo and I talked a little bit about revising it.
We saw this one on stream a week ago, and it was not the right approach, so I explained to myself some more in the comments and then directly with Eduardo in chat.
And aside from doing PR review,
I wanted to talk about it because there was a nice post this morning.
Oh, I meant to post in the Monday, what are you doing this week thread about streaming?
And once again, I forgot.
I am super bad about that one for some reason.
I don't know why as a chore, I'm not managing that very well.
pushcx https://lobste.rs/s/avgrck/html…
Sorry about that.
So there was an article this morning
about in html do you want to have a link or a button and if you have a thing i'm going to summarize it myself i know they wrote a tldr but in summary if you have something that looks like a link it should work like a link which allows things like middle click to open a new tab or control now shoot as a control click or control shift to open in a new window it's been
forever since I've done that because tabs.
Which is exactly the open thing we have now.
And we got a real helpful comment that we should probably add role equals button as an accessibility issue.
I've been trying to catch more of these role equal things.
And while I have found
Where is it?
18:25MDN has a pretty, as you would expect, has a pretty comprehensive documentation for ARIA. It's
...41I'm not sure how much of this is what matters to folks who are using screen readers in practice.
dpk0 i recently learned that using ARIA is considered something of an anti-pattern
I have hoped to find an ARIA guide of, hey, here's the 101 of the most important things for especially screen reader accessibility written by someone who uses a screen reader every day.
I have not yet found that guide.
ARIA is an anti-pattern.
That breaks my heart.
dpk0 https://webaim.org/projects/mil…
I thought it was a best practice.
If you have a linked DPK about it being an anti-pattern, I would be real curious to see that.
dpk0 i think you may have just scrolled past it on that page
I've had a couple of interactions over the years with folks who've said they're using screen readers for lobsters, and they have pointed me at ARIA stuff.
What are we looking at?
Okay, so ARIA has been steadily increasing over the years.
19:54albynton Hello!
But it's associated with more accessibility errors.
That's really counterintuitive.
dpk0 the reason is that ARIA doesn’t automatically give the actual functionality of the role you assign it
Hey, Albinton, welcome back.
The reason is ARIA doesn't automatically give the actual functionality of the rule.
Yeah.
dpk0 ‘For instance, native elements have built-in keyboard accessibility, roles and states. However, if you choose to use ARIA, you are responsible for mimicking the equivalent browser behavior in script.’ (the MDN page)
You know, I wonder if this association that they found, just this first paragraph, I wonder if what's happening is
Goodhart's law.
Native elements have built in everything.
If you choose to use ARIA, you're responsible.
Yeah.
That's a big issue and why I lean towards native elements as much as possible.
I wonder if this increased ARIA usage being associated with higher detected errors could be Goodhart's law where maybe corporate development says we have to be accessible.
If we have ARIA attributes, we have checked the box to be accessible.
And so perhaps people are being incentivized to do the easy to track thing instead of the very hard work of thinking about what all of these mean and what it's going to be like to use them in practice.
And the places that have, yeah, the places that have thoughtless metrics driven development are going to tend to have more errors.
I don't know.
dpk0 yeah, that sounds plausible …
This isn't,
I don't know that this quite meets what you said, DPK, about ARIA being an anti-pattern.
dpk0 that was probably overstated
Because it doesn't caution against implementing ARIA.
dpk0 the phrase is ‘No ARIA is better than bad ARIA’
It seems like a very odd survey response that perhaps overstated.
Yeah.
It's just such a weird correlation.
22:09Hmm.
And it's not...
Okay.
All right.
albynton From what I remembered about the Aria anti-pattern argument (I'm not an expert about this) : ARIA is hard to do well, its is safer to only use the browser's native elements, unless you know better.
So sliding from GitHub's UI issues to our UI issues to, hey, let's fix this explicit issue.
Let's take a look and see how the pull request is looking now.
Oh, yeah.
Speaking of UI confusion, I was confused about Site Features.
I was pretty puzzled when I last reviewed it and I said, hey, this data confirm thing won't work.
And I said that very confidently and very incorrectly, where, yes, it's Rails UJS that we deleted years ago,
but I wrote the code to sort of carry over that one feature from Rails UJS.
So this is my little five lines of code, and I totally forgot about it.
I usually have a pretty good memory for code, so embarrassed by that one, but mistakes happen.
I appreciate Eduardo pointing it out.
So let's see the current version of this.
23:41one of the i mentioned it over here but i want to talk about one of the so tostal mentioned the set role equals button on these links although it really sounds like we want to do the reverse of use a button and style it to look like a link that might be better semantically and again That's my understanding of the semantics. I would really love to hear from somebody who uses a screen reader in practice. I've tinkered with... Oh, the phrase is, no ARIA is better than bad ARIA. That's interesting. I'm going to look that one up. Thank you for the pointer. I'm grabbing my... Come here. That is my pre-streamed tweet. Welcome to the Peter struggles with Linux clipboards. portion of the stream, I just wanted to make sure I looked that up. Alright. Alright, it's hard to do well safer to use the browser's native elements, unless you know better yeah in the case of lobsters. only the story submission form could be the tags field on the story submission form could be considered as re-implementing a native element and i believe without js it cleanly falls back to a multi-select it should be the last time we touched it i tried to make sure it did and then otherwise we use native ui everywhere which is small part of that is aesthetic but mostly it's i have used so many broken forms and in part because i use vim and then i use a text area and all of the keyboard shortcuts are wrong i think about how there's just a tremendous amount of functionality in browser ui like text editing fields and buttons and then i guess i use web apps pretty regularly that are broken in that exact way. So I try not to do anything except use native elements. It's been ages since I worked on UI that tried to be super fancy about these things. It's better to be functional. And also, you get to be done in about 5% of the effort.
26:27All right. this is an improvement right off of instead of the javascript knowing how to construct this url it just grabs it and then this i don't like at all every time we've had any kind of i actually don't think we have Any remaining, where are we?
27:11Oh, there's one. Oh, is this a round action? Yeah, all right, that's fine. So this is a general purpose callback that hooks for authentication. This is the only place I would use request get or request host. Otherwise, it is much more railsy to handle this in the router. So Eduardo has moved the existing code. Is this whitespace detection that I'm not?
...59Yeah, now that's a much clearer diff.
28:08Yeah.
30:00So there's that issue I raised of CSRF, although this comment, or I'm sorry, this code was written before I wrote this comment. I just hadn't seen it yet. And then there's the Rails practice. And then I want to leave the, yeah.
...52I didn't mention as long as I'm happy with something. Let's also say that it's a real easy pattern to slip into as a code reviewer that you only ever talk about the negative instead of, oh, I actually liked this thing, so I want to call it out. All right, so these are repeated because of the heinous inline partial. So this really is just the one code change in list detail that's replicated. That's fine. and saying link to disowned story disowned path. That's fine. It's the old style hash syntax, but we really don't care about that much, especially in views.
31:44Oh, yeah. And this is having to list the action here, the Rails router would have probably given Eduardo an error message without that. And that's one little hint that you shouldn't send both verbs to one endpoint like that.
32:37And I think there's a place where, so one of the, hold on. Minus four to allow the user to, to someone's story without jazz.
33:49And I think we had somewhere in one of these controllers. And I want to say, I'm not remembering. I would really like to give them the reference because somewhere we have an example of saying if the user asked for JSON or if the incoming request comes in as JSON, we will just render the partial and give it back to them to redisplay. But if they didn't, we will give them the whole page and I that's the pattern I want them to repeat. And that's not super clear here. Actually, let me look at this the other direction because it'll be up here where something that uses fetch with CSRF If I find who does it, not that one.
35:22Not that one. I think that's always JSON. Might be hide story, or it might be post comment.
...53anshulxyz hey how's it going
Yes, it's post comment, yeah.
Hey, Anshul, welcome.
Leaving some pull requests and then getting into coding.
Oh, hey, I realized I totally have forgotten my standard spiel that, hello, welcome, this is Lobster's office hours.
If you have questions about the site, the code base, the community, or you would like to run queries, we can do that on stream.
Just pipe up anytime.
And if nobody pipes up with meta discussion, I tend to review pull requests and write code.
And today I slipped right into reviewing pull requests, but that's the standard intro.
37:17created without JS.
38:17And so I wrote a comment, but a bunch of this is on me because so let's copy that link I wrote in the or did I do with the issue. I wrote a really light description of what I wanted here, I should have said more of this.
...58So hopefully, if anybody, I suspect once we do this first one, Eduardo may quickly zip through the others, which would be wonderful. Eduardo has been a pretty active contributor recently. But maybe it'll end up being somebody else. So I figured I should link that so that I don't waste somebody's time on another pull request. so these other two pull requests let me grab the link for the notes and come back here all right so on this one i
40:01all right so this one oh this one was previously reviewed on stream and i was a little distracted in that i looked at it and i thought it was correct and had a minor style issue. And I fixated on the style issue to the point of giving a demo, what, two weeks ago of a code tool that could have helped write a linter. I may still pick that up, depending on what happens in the next couple of days for me. But when I rereviewed, I realized the code was wrong. And this is still wrong. So the actual code change that this contributor has made is to make the menu transparent and the menu should never be transparent.
41:10This is. Okay, so they've left a comment saying they'll take a look at it. Ah, only 11 hours ago. All right, I was a little puzzled because I was like, wait, this code is still wrong, but the blue activity indicator was just their comment rather than a code revision. So hopefully they can take a look at that. The bug here. 601 is what was getting reintroduced that the menu was becoming opaque again there is a just a couple of css gotchas that interact here and this is one of those places where it just keeps tripping my my gut feeling that we are using css in a slightly odd or unsafe way because this is one of those bugs that every year or so we reintroduce one of these two bugs that either the menu becomes transparent or it goes underneath or both or to put it another way i know our css is not robust because we keep re-implementing this bug but i don't know any better way to do this.
42:52So this pull request. Let me look at something real quick. This pull request has some changes since I looked. Okay, so there was this odd. Yeah, this odd reject that I had seen. And they tested on that. So I pulled up the domain model here. Oh, it's
43:41So I see what's puzzling me here.
Let's come over here and I'll show it on screen.
So I knew that it picks up the domain from the URL.
Where is its regular expression?
Utils URL re.
We have a couple very similar.
We have
like three or four places that we parse URLs in slightly different ways.
And so I was misremembering where something happened.
And it's very easy with Rails to have less of a validation than you probably want.
So this is what I was thinking of, that when we grab the domain out of a
specialku I can't read regular expressions :(
url we filter down i had assumed that it doesn't even do what i remember so the thing that puzzled me here was at the end of the last stream when reviewing this pr i saw there were a couple of shorteners at the end of the list
that had emoji names, and I assumed that we had a validation here that domains had to be ASCII, that we were doing the, at some layer, we were doing that puny code normalization, because I know I have seen URLs that have been XN dash dash, where you know it's encoded to be in a different character set.
specialku exactly, xn-- is still ascii
Perhaps we do have... Yeah.
However, I thought we had code forcing that because I know I've seen them in the database, but it's possible we permit an emoji URL and we've just not had one submitted.
46:09What's the best way to select this?
Well, you know, it's over in story.
We have a regular expression for finding.
specialku :D
Oh, cat, cat, you don't Vim very well, cat.
Oh, thank you, sir.
Okay.
Ooh.
All right.
The cat is even late for Halloween with all of these oohs, sir.
So he was sleeping on the desk and he decided he should be in my lap.
specialku aww hi cat
So he just came and stepped on the keyboard.
All right.
So I could either, I don't wanna search for these specific code points.
What I really wanna do is throw away anything,
any domain that is ASCII only.
And I want to, I'll say what I would really like to do is reuse this, this URL.
So let's do this in Ruby instead of thinking of my development.
Come here.
47:43All right, so let's say story dot where, what is it? Story dot not where URL is nil. This is our long way of saying URL is not null. And let's just ballpark that. I think I have to say all to get into the, Oh, okay. I'm misremembering the order. I have to say .where or .not. The method chaining an active record doesn't stick in my head. 109,000. All right, that's about the right number. Let's pluck the URL field rather than load 110,000 story objects. At least this would just be an array. And let's go ahead and say limit 5. just so I can test with a reasonable number and then I will do on all of them. Oh, OK. So I said we're not nil and I got a bunch of empties.
49:05OK, that's a little more plausible.
...15specialku I found this by asking Microsoft copilot SELECT * FROM your_table WHERE your_column REGEXP '[^\x00-\x7F]';
It's my utils, colon, colon.
...28You asked Copilot.
Yeah, that's...
So I specifically want to find any URL where the domain includes non-ASCII characters.
And yours is going to find any URL that has any low ASCII characters.
Well, to 7F?
Yeah, I don't think Copilot understood what you were asking there, because that's going to be any ASCII
character.
specialku oh domain sorry
Even the non-printable ones that are not valid, that's not so useful, unfortunately.
So let's match the URL and then look at the matches.
I'm just going to trust.
And so instead of each, I want a map.
Let's see if this works.
specialku :(
Not even valid syntax, missing bracket.
Undefined matches for an instance of matched data.
Okay, so I just go directly into square brackets.
Okay, so this is using the URL to grab out the domain.
And now, I guess what I want to say is,
select or well let's write it as select just so i can see it and then i'll swap it to say reject but let's select any domain where the domain zero to nine
and a dot.
So let's say from start to end, the domain is ASCII.
So that's all of these.
And now let's throw away where from start to end, they're entirely ASCII.
And now let's throw away that limit and see if any of our 110,000 include non-ASCII characters.
I knew this was coming.
There's one in here or two in here that somehow have odd domains that don't parse.
So let's also throw in a compact because this is going to insert nils.
So I want to call, I'm just doing this the easy way, public send.
Oh, no, please don't put me in read line hell.
Thank you.
I was hoping to avoid.
Come here.
52:45I'm going to blame this one on the cat, too, because I kind of have to reach over him.
53:03dpk0 readline hell?
And I added that compact.
...12Oh, it's not domain.
It's got to be colon domain.
And we're in read line hell.
Read line hell is my cursor is not lining up anymore with the text I'm editing because something weird happens between being on the, when it line wraps,
specialku I am guessing that is what you have to ctrl + C to get out of
There's something about VIM's built-in terminal that doesn't read line correctly.
All right, so I didn't permit hyphen.
So we just found every domain that has a hyphen.
dpk0 oh weird
So let's say those are okay.
And then we have a couple with underscores and then one with, I'm gonna bet that this one with lobsters colon is a username and password.
dpk0 i’ve had that happen with badly-implemented line editing libraries but not with proper readline or libedit etc
And then, OK, we have a couple here that have non-Latin characters.
So that's acceptable.
OK.
dpk0 oh, right, maybe your terminal emulator
I thought we had forced those to be encoded.
Open NSFW.
specialku hardin.se
I'm not loading that one on stream.
Yeah.
One of my least favorite things in the world is figuring out whether a terminal bug, what layer it's happening at.
Is it Alacrity?
Is it setting Xterm 256?
albynton Well, they are latin characters, just not ascii
Is it the term info files on the remote host?
Is it running all of this through Vim?
Is it, there's just like nine layers and something like,
Yeah, they are Latin.
When I say Latin, I mean Latin in the character sense, character set sense, which is not matching the general purpose term there.
All right.
All right, so now that I understand correctly that we do permit and want to continue permitting non-ASCII, do we want to continue?
Should we force these to be encoded
I'm gonna say no, and not just because I know there are sister sites that are non-English.
I've seen sister sites in Arabic, in I think Greek, in a Chinese alphabet, so I don't know, not alphabet obviously, but character set, but I assume
Mandarin or Cantonese, I can't really tell them apart at a glance.
So I don't really want to include that validation.
I would rather let them keep working and let our domains keep working.
All right, so let me just back this up.
All right, let me explain what I was going on about there and look at his code.
All right, where's GitHub?
56:25albynton You're right, š is not latin-1
so all he did was drop that or i should say they i don't know kernel 53. let me look at this again if i'm going to demand other changes i want to write one comment demanding things rather than write one to explain my confusion and then five minutes later that's fine
...55Elbington, yeah, there's a really good distinction. So when I was saying Latin, I should have been saying Latin 1 to be real specific that I was talking the character set, not the general idea of Latin letters, which you are correct that they are.
57:24Okay. Okay.
...30That's totally solid. Yeah, it's a little noisy for a cron job, but that's fine. I can hack that stuff out. I don't need to make this person polish every last nuance that I might want. So that's resolved. And that's resolved. And that's resolved.
58:27specialku I mean when I hear latin letter, I think of US english A-Za-z :) I tried searching härdin dot se on the production lobste.rs but couldn't find it as a submission
What am I saying?
Prohibit saving a domain model with a domain attribute, including non-ASCII characters.
...51specialku so it would ban this härdin dot se domain?
espartapalma you may want to compare with the current state of the art in other places and their usage. For example, Heroku need to be a valid domain because their usage in their custom domains: https://devcenter.heroku.com/ar…
The harden.se, it's possible it was removed, but I didn't, because I,
pushcx https://github.com/lobsters/lob…
No, it wouldn't ban this hardened.se.
If you look at... Oh, hey, Spark Palma.
Special coup, if you follow the link in here back to the issue, I give a link to the list of URL shorteners and... Or actually, I believe...
I believe kernel here did here somewhere.
yeah if you go look at this list of link shorteners and scroll all the way to the end the last couple are emoji domains so that was my concern not that and if i see non-ascii characters in the database like i did with this harden not that i'm pronouncing that a correctly but that harden.se then we know so
Yeah, let me check out
Espart Poma's link before I finish that thought and leave a comment that is wrong in a different way.
01:00:13Puny code. That was the name of the format I should have said earlier just to give the jargon. So that's useful right off. I talked about it being encoded as ASCII and this is what I was referring to.
...39Custom root domain.
...50Okay.
espartapalma so Heroku demand the user input the right punycode, otherwise it's just rejected. But that's because they are going to use it
All right, I feel like we're in good shape here.
01:01:31specialku oh so this is about not allowing people to obfuscate like literal spam by putting it behind bit.ly or something silly like that
Yeah, special coup, that is the big part of this.
So the major part is obfuscating banned domains, or also attributing lobsters traffic, because
The professional marketers want to get statistics like, oh, I spammed lobsters and that got me 10,000, 50,000, 500,000 visitors who came in with a URL that says lobsters in it.
And we don't want that to work.
We don't want to look like an attractive target and we don't want to make it easy for professionals to get professional recognition for spamming us.
01:03:36I can't type with the cat in this position. Shorters.
01:05:05So this, like I said, I have to click in here and then here, and then I can say it's approved. And then I can merge.
...29So this, I'm going to reopen this.
01:06:06specialku I don't know ruby or rails so I am just curious, how often would the application fetch this text file like if there was a new domain added, would it be on the website the next day or after an application restart?
specialku oh wait, it is a task so I guess you can configure how often it runs?
Forgot my thought over in the task.
Meant to say.
...44I just wanted to make sure that I explicitly said, you did the thing that I asked for. It's only when I reviewed your thing that I realized that I want more, but I wanted to merge their contribution. So let me glance over. So Heroku requires puny code. I can kind of understand that one. That's what puny code is for. We're operating at a different level of abstraction now.
01:07:17Yeah, Special Coup, it's a standalone task, so I'm going to have to schedule it to run on the regular, and probably I will just do it once a week. I looked at this list a little bit. It doesn't have a huge amount of churn, and we don't get too many people attempting to use link shorteners, especially, you know, the 2000th most popular link shortener.
01:09:46specialku yeah, the emails I get where the story is gone is usually just some people sending blog spam which this wouldn't catch anyway
specialku sorry, I guess spam is too strong a word :D
The emails you get where the story is gone, yeah.
The emails run on a five minute cron job.
I've been waiting for active job to mature a little bit.
specialku The emails are fine as they are, I am not complaining at all :)
But I would like to set the emails to be a little bit farther behind so that there's a chance for users to flag the bad spam or mods to remove the bad spam before it gets emailed out.
That was part of the motivation for...
I showed this on stream last...
I showed this on the last stream for the...
other reason yeah people are always curious at what gets deleted i understand that so a small thing i showed last stream that comes into work here for a reason that i didn't mention which was so that the emails are more useful and the rss readers are more useful is logged in users
get to see the moderation log message rather than just a 404.
So if you're not logged in, you get a 404 for a story that's been removed.
If you're a user, you get kind of a placeholder page where at least you get the mod log entry.
I was just trying to make that a little more useful without making a
specialku there was a self post or something I read that didn't make sense at all but someone later flagged it as llm generated blog spam :D
vector for abuse unfortunately so let's grab this url i just posted or just posted to reviewed
01:11:57specialku yes!
specialku that is the one
self post that didn't make sense at all was it the one about haiku os i remember there was a site that somebody submitted and it was a post about how oh somebody yeah that one is the llm slop that i remember banning over because i clicked through to that site and i looked at the last couple of stories because
Nobody writes just one LLM slop post.
And they had one that was very obviously...
specialku oh I didn't read the other posts
They took an announcement post that somebody made on a forum for Haiku.
And they threw that into the slop engine and they said, summarize it.
And that came back and was like, other commenters, blah, blah.
Yeah, no, it wasn't...
The other posts weren't submitted.
I went to the blog and poked around a little.
And it was a...
Just a painfully obvious bit of slop.
And that was when I felt pretty good about it.
specialku it was basically something like iirc haiku os now has a virus and that is the best thing ever zomg :D
It's kind of frustrating, actually, to now have this new task to be suspicious when reading stuff.
And just before you connected, DPK was talking about another one where I strongly have to suspect slop was involved in the writing.
It feels like...
The first draft of the post was slop and then the user came and tweaked some things and added some detail and added some typos too.
All right.
01:13:42So was there another one open? So this one just got a comment and this one I had to have more changes. And these two are kind of stalled. So earlier I was saying, or no, I think I just tweeted it, but because GitHub doesn't really give us an opportunity to customize things and explain our practice to potential contributors, we keep getting these cookie-licking comments on issues. And Sumina, who is the one who reminded me of the phrase cookie looking, pointed out that there are a couple of bots people have written to plug into GitHub to work around this issue. And so it's a chat bot that can come and post comments and say things like, hey, if you wanted to assign this, fine, we will use the assigning. And now the bot can manage. If you're request goes stale, we will go ahead and unassign you and remove you from the issue and we'll close your PR. And all that stuff is a little bit, it would be harsh if it came from a person, but it's fairly low energy if it comes from a bot. So I think I am just going to have to burn a bunch of time to set up probably Zulip bot. It looks pretty straightforward to customize it. Even though I have said that I really don't want to put more JavaScript into production and Zulip bot.
01:15:30Hold on.
I see.
GitHub has ignored the thing I searched for to search for.
Come on, Google.
Christ.
specialku I think angular uses some kind of thing like this where it auto closes old discussions :(
It's gone to hell the last year and a half.
Boy, I am a little sharp today.
I knew I didn't sleep well, but I'm coming off a little rantier, so please forgive the occasional interjection.
And if I say anything spicy, please just turn it down about 10 or 15% because I am not trying to come off as mean.
It is just a little frustrating to have
To have Google that used to work so well, there's something particularly about its, I know you typed X, but I threw it away and searched for Y instead, that is particularly annoying.
And then the AI summary on top after we were just talking about slop is bad timing.
So Zulipbot looked pretty full-featured.
I poked around its docs a bit, and it looks quite nice.
I don't want to put JavaScript into production, but the cookie looking is irritating enough that maybe I will do that.
I don't know.
01:16:58Yeah, special coup.
We're thinking of the same Haiku story.
it was less of vulnerability.
Somebody made like a proof of concept vulnerability, and then they wrote that up as, or it was like a bootloader hack or a rootkit or something.
And they wrote up a readme of this is mostly just a proof of concept.
And this proves that Haiku OS is worth paying attention to in a roundabout sort of way.
And then the blog post was just nonsense LLM rewrite of that readme.
The actual,
specialku I just remember it for not recognizing it as slop :D
underlying github repo where someone did that work that was actually legitimately pretty interesting and i don't think that's been independently submitted it would be totally topical and interesting it was a shame that the lead into that was just garbage yeah angular i assume every large open source
project has to have some bot on top of GitHub because the edit points are so strictly limited.
And I've used GitHub Enterprise at a job.
Yeah, just the one for about four years, four and a half.
And it was striking to me that
specialku I used Github Enterprise at Microsoft as a v dash and my understanding is github enterprise is literally just an older version of github
even for GitHub Enterprise, which presumably has an enterprise scale price tag attached to it, almost all of the customization was done by having a GitHub Enterprise version of Zulip bot.
dr__ey hello! its cool to learn about lobste.rs >3<!!! new to rails in the past 4 months or so,, been in tech for a few years. will poke around and see where i can contribute,, looking at some good first issues :,)))
It wasn't actually Zulip bot, but it was, let's just plug a chat bot in and the chat bot can manage things.
Ah,
Hello, DRA, Dr. A, welcome.
dr__ey AHAH thats funny,, my name is drey
Our very puritanical auto mod here on Twitch.
When I talk about rails and people, your name is Dre?
dr__ey this is true
Okay, well, you could be Dr. A.
dr__ey i am canadian
dr__ey WOAH
specialku in my org, there was a mandate to move everything off of Github Enterprise and on github dot com proper
Or maybe you could be a Canadian doctor, Dr. A.
specialku hi Cat
yeah very often thank you cat yeah everybody needs to hear you scratching your collar sorry about the the noise there the cat is right next to the mic and if i throw him out he will sit at the edge of the desk or the door and scream at me which is even more annoying
So in case you're wondering who runs the house, the answer is in fact Raz.
And if you dig back in the stream archives, he appears in the, maybe the third or fourth stream.
I noted it in the summary.
So you can see who the boss is.
All right.
gtfrvz KOT
So yeah, the end of that thought was Twitch auto mod is not great for writing Ruby on Rails because every time a new commenter says Rails, it thinks it's something sexual.
specialku el gato
Thanks.
That's okay.
In your org, there was a mandate to move off a GHE to github.com proper.
specialku I am no longer at Microsoft
I wonder if that was just expense.
I never knew the amount, but I got the impression that GitHub Enterprise was not cheap.
specialku Microsoft owns Github though?
Couldn't tell you.
You are no longer at Microsoft.
Oh, well, that's really weird that Microsoft would want to be off of GitHub Enterprise because they own it.
I don't know.
That's a strange one.
However, I learned very long ago in my career that I don't want to spend a lot of time trying to guess why Microsoft does things.
specialku yes, they got off of Github Enterprise and on to github dot com
That'll just make me mad.
specialku they aren't leaving github
The biggest answer that's frustrating for technical people is a lot of the reasons it feels like they never fix bugs is they have a astounding, almost unique commitment to backwards compatibility.
They see it as a
And it probably is a very effective business strategy.
It's very frustrating from a technical perspective.
I wonder if that moving off of GHE and onto .com was for Microsoft dogfooding.
Oh, see, there I go, wondering what Microsoft's motivations are for things.
It's so tempting.
specialku haha yeah, there is no one Microsoft, basically there are multiple companies with one owner
speaking of people's motivations for things let's talk about self promo so on the last stream i did a big deep dive into what are our issues with self promo when is it okay versus bad a previous attempt to mitigate content marketing that was the title of the post and what was learned from that mostly that
to summarize real quick, that it was hard to automatically characterize out of the database.
And there is some of it that's totally okay.
And in fact, can be some of the best material on the site.
Really the important thing to look at is our north star of, are we generating interesting discussions?
01:22:48And
Where I end up leaning with a design of more features for detecting and reducing bad self promo was.
let's start out and find users who.
are not participating on the site outside of their own things, and when I say their own things I mean.
Both.
things that they have submitted that they've authored, but then also kind of this idea of, do you benefit from it?
One place that the is author checkbox falls down is people who are submitting content marketing from their company blogs.
specialku I don't know if I can say this out loud but I type comments on reddit pretty much without thinking, and I think a little when I comment on hn, and I pretty much never comment on lobsters. I basically only read.
And it's like, no, Alice the marketer wrote it, but I'm just Bob the coder, so I don't have to check authored by.
And so the is authored checkbox is freighted with a couple of meanings.
So I threw this second bullet point on before this stream.
Out loud, you type comments on Reddit without thinking too much.
And when I comment on HN, pretty much never on lobsters.
You basically only read.
Okay.
Yeah, the...
01:24:17I think there are enormous cultural differences between Reddit, HN, and Lobsters, which is kind of funny because Lobsters, you could call it a copy of a copy.
HN was inspired by Reddit and Lobsters was, in a funny way, inspired by HN.
specialku yes, also it is funny because I am the same person but I behave differently based on different sites
And so they have very, very similar displays and functionality, but the end result is very different.
And some of that is...
focus and some of that is nudges of the ui and a big chunk of it is just scale like reddit is staggeringly popular huge and then hacker news is very popular and then lobsters is a small community and there are just so many problems we don't have by virtue of being small
Yeah.
Well, I think to your point about being the same person, but behaving differently on different sites, I think humans are pretty fundamentally social animals and we only really make sense in groups.
people can act very differently in very different contexts and see no contradiction.
And I don't mean to imply that I see a contradiction, I just see the power of community.
And that's why I spend a lot of time thinking about the effects and the second order effects and the responses and the rebuttals to what kind of changes we make because
If those work out very positive, we end up making a very wonderful community, but it is easy to incentivize the wrong things and end up with a place where people post spicy hot takes because it rewards argument.
There's a... Actually, a really good take on that might be the book Broken Code.
What is it?
Shoot, I don't remember the author.
Or did it have two authors?
01:26:42Jeff Horowitz.
specialku exactly, I don't want ten thousand comments on a single story on lobsters :D
So this is a really nice book because it's a very deeply researched look inside of how Facebook works.
And Zuckerberg very famously in the late 2000s made some comments that implied that there was something inherently dishonest about acting differently in different social spaces and having almost multiple identities in the way that
have an identity as a knitter in your knitting circle and you have an identity as a parishioner at your church and he sort of implied that having two identities like that was improper and that someone was hiding something about themselves or acting nefarious when really it doesn't they're just different social groups with different norms and the reason this book comes to mind is is what i was just saying about incentives matter
a recurring theme of the book.
Wow.
01:28:11Is,
...30A recurring theme about broken code is implementing features and judging them by the most naive possible metrics and ending up with very, very negative results.
Whoa, cat.
Whoa.
OK, buddy.
albynton Maybe Zuck wants us to have our 'real' identity on FB?
specialku I wouldn't read too much about what tech CEOs say publicly. My understanding is they have to follow whatever the company has decided
You're OK.
Hold on.
I'm going to throw up.
I don't know if I can do this on my phone.
albynton For whatever 'real' means
this idea of a real identity you know to to carry on with my comment there i don't think it is unreal in any way to act differently in your knitting circle as your church as your job you know in my knitting circle i might tell a very personal story with people i've known for years whereas at work
I would never mention I was having a personal problem.
Neither of these is less real than the other.
They're just different.
They're contextual.
01:29:42And I think our social software needs to handle that. Oh, broken codes. Man, I am full of typos today.
01:30:09arh68 I'm pretty much the same everywhere. besides, all my like identities can 99% be tied together
you know i understand your caution against reading too much into what tech ceos say but he kind of famously has so low control of his company and so he can say anything he likes and very clearly has been personally involved in fairly low-level decisions as very well documented in this book broken code that i mentioned
So I do think it is appropriate to look at his actual statements and read into them.
...49arh68 but everyone's different around friends v family
Yeah, ARH, it's not a question of anonymity or pseudonymity, it's just a question of fitting in.
specialku oh ok if he is in control
You know, so I am streaming under my same name, but
I talk to y'all quite differently than I would around, say, my friends.
There's an example of it where I talked about, well, I'm kind of tired, so I'm coming off as snarkier than I want to be in public, or a little bit cutting.
And I'm more comfortable doing that around friends who've known me for a very long time.
And I'm aware that online,
It's very possible that every interaction or any given interaction somebody has, any comment I write might be literally the first thing someone has seen of me.
And they will judge me on it in a way that friends I've known for 30 years, we are building on 30 years of context.
So let's take a look at
01:32:02Let's take a look at figuring out if users have left comments that are well received recently.
...17And I'm not sure. I think this is going to end up being a scope on users or a Boolean method on users. Let's say Boolean method. Let's start there. All right, where's my tags? They got a recent threads method. Let's just make the got object one more method bigger, just a wafer thin method.
...58There's a Monty Python sketch I'm quoting where they have diners and the waiter forces a wafer thin mint on a very overfed guest and it goes poorly.
01:33:30So I joked about
writing the most ridiculously long method like recent well received activity.
And I haven't come up with a better name for this idea of are they actually leaving comments, participating in stuff that's not theirs.
specialku so like recent well received means it gets upvotes?
So let's say and this is
I'm going to be kind of sketching in code here.
And I know already that for some of these clauses, it's going to be like a pause to do kind of thing.
So we're going to start out by doing these sketches and then run them against production data and see, are we getting something like what we want?
Does it make sense?
Do I want to add more complexity?
then refine from there so don't look at this and go oh it's this is how it's going to be in two seconds we're talking about big hairy social problems really so we're going to say comments where that are written in the last
six months so that's six months ago i'm going with a fairly wide window because i want to be pretty conservative about this in the direction of it is much better to have false negatives and miss that someone is being a little bad than it is to have false positives and block a potentially really nice contributor
So I want to say they have comments in the last six months.
We're not comment.
What is it?
comments story.
01:35:45All right, so comments that are not on stories they've submitted. So here's that to do. Yeah. That's probably pretty good. The thing it keeps reminding me of for complexity is the new beneficial owners for companies.
01:36:26What is it it's like utm equals 16 or something to turn off these yeah all right, I will have to update my.
...46See, there's me being snarky.
...55Heck, even those quotes. So this beneficial owner information is a change to US law. Just, you know, it takes a hot minute for it to come into effect, but it came into effect mostly this year where The United States for the first time is requiring that all companies report their beneficial owners. So in the US, you might've heard of the phrase a shell company. They're not automatically bad, although occasionally they can be created for nefarious purposes, but anyone can own a company, say a lawyer at Shady Law Co, and then the actual person who is using the company might be somebody who has been sanctioned by the US or is from a region that's considered at very high risk of certain financial crimes. And so for the first time, the US has required that these companies report, that all companies report who benefits from ownership, which is a slightly different question than who actually owns. So I keep thinking of that because we have this checkbox for, did you write it? But the question in this context that I want to answer is, do you benefit from posting? Is it promotional? So that's why the to-do is hanging out there. And that's kind of a, the complexity here is, so hold on.
01:38:54If we look at an author, let's say, I've already used Alice and Bob, so let's say Carol, but an author might be commenting on their own piece, but submitted by someone else.
And it would be nice to recognize that when they are leaving comments on that,
That's not really participating in the community.
They did author that post.
And I'm not so worried about someone trying to be sneaky.
Usually people pop up and say, oh, hey, I wrote this post.
Thanks for submitting it.
dpk0 that’s problematic … i don’t feel like i benefit from posting my own stuff that i wrote and didn’t get paid for on my personal website that i don’t have ads or subscriptions on. would a user who checks ‘i benefit from this’ get the positive hotness mod like they do from ‘i made this’ at the moment?
I'm not worried at all about sneakiness in this context, really, because it shows up in such different ways.
It's...
01:39:59Right. So what I'm thinking is that this is, so I read your comment here, DPK, and I'm thinking this is fairly orthogonal to whether you get a bonus from checking the author box. I don't want to remove that bonus. I want to keep incentivizing authors from submitting, or I'm sorry, keep incentivizing authors to submit good work. But I also want to know if your personal website that doesn't have this stuff on it, if someone else submits that and you show up in the comment, we kind of want to consider your comments there promotional. And even if you don't directly make money off of ads or subscriptions or linking to a product, We have had self promo issues with people who just want attention. I mean, it happened literally last week. There was one of these where the blog was entirely non-commercial and that was never a concern that they were connected to a commercial activity. but they were using lobsters to draw attention to their work in an exploitative way where the only reason that they were posting on the site was to post their blog and try and get readers rather than because they were in the lobsters community and wanted to participate and show off a thing they'd made. And it's that exclusive, I'm just here for the clicks, that's the problem I'm trying to isolate.
01:42:02specialku actually I didn't know that there was some kind of ratio rule like not 100% of what you post should be promotional or something until I read a comment
So with that to do hanging out there, if I grab all their comments, I also care about making sure that whatever this code looks like, it's something that you can
Special Koo, yeah, I don't know how long you've been on lobsters.
I wrote up the guidelines.
We've used the rule of thumb and posted it in comments for years, but I only wrote it up as guidelines and put it on the About page, I want to say a year ago.
I would have to look at the Git log for About to know.
So it's very possible that people don't see it right off, and that's part of why
We lead into it pretty gently and the users who help bring it up will leave fairly anodyne comments that say things like, hey, you should check out the guidelines rather than you're a bad person.
We actually tend to lead into things fairly calmly.
And I mentioned earlier on the stream, somebody got one of those notes from a mod about self-promotion and it was basically just
hey, please take a look at this rule of thumb and understand why this is here and follow it rather than, you know, it wasn't a castigating note.
It wasn't saying you're a bad person, none of that kind of stuff.
01:43:54You know this, it's already on the comments table. So that one is redundant.
01:44:06So here, what I want to say is, so I talked about how I kind of can't write a naive rule. So what I want to say is something like, score is four, dot, dot. Like, if your comment got three upvotes, because of course there's the one that the author puts on it, so that makes it four, well, great. You're at least getting some kind of positive response on stuff that's not yours. But that's really sensitive to site activity and I know right off that's broken for sister sites they're not a major design criteria for us, but I do think of them. And what I would like to say is.
01:45:15And this is just going to be a hideous join to express in Active Record.
I mean, it would be painful even in raw SQL.
So let me just run this just to get the query out.
What I mean is in the Rails console, so what did I call this?
recent well-received activity i just wanted the query all right so it's finding my user comments that are created in the last six months that's correct author is true all right and score is greater than or equal to four all right great let's just let's puts to remove the layer of quoting
I think I'm going to take it over to the database console to revise, and then I will try and translate it back into ActiveRecord.
I just find that a lot more comfortable as a workflow.
What am I doing?
I can just say Rails DB.
I'm thinking I'm on production.
Yeah, I'm clearly more tired than I realized because I'm making kind of basic errors.
So it's possible that this ends up being a very short stream because if I can't write this join, you know what, let me turn that around.
Let me phrase it more positively.
This is an excellent time to participate, chat.
Won't you help me write this join?
specialku hahaha :)
What I wanna say is, I wanna say that the score is better than comments, the average score of comments posted for the week around the creation.
either side, the week before, the week after.
So that's going to be an aggregate query, and it's going to be a join.
So it's just going to murder performance, but it's only really running infrequently, and we can cache the heck out of it.
So we'll see where that goes.
Let's not rabbit hole there.
So select the average score.
from comments and then let's name it separately or i'll never remember it around i don't think that's a keyword where what do i want to say i want to say around created at is greater than or equal to
arh68 where a.created_at is between ...
Is there a MariaDB function for saying that created at isBetween these two timestamps?
Can I say that?
Because if I can say that, then I can add isBetween.
All right, thank you.
Where isBetween?
arh68 between is better to write, AFAIK mysql/maria supports it
And do I have to do, what is it, date add?
So I want to say like a week before and a week after.
So comments.createdAt.
Oh, as far as you know.
I better double check.
So I grabbed zeal.
01:48:54And if I bring that over, wrong one.
Yeah, so MariaDB does support a isBetween.
And I just lost it to scrolling when I changed the font size.
Come here.
So I can say between min and max.
specialku how did you get the "is"?
Examples, if you compare a date time to two date values, convert.
OK, so I can do it like that.
So that's my syntax.
Between this and... How did I get the is?
I made it up.
arh68 between (created_at - interval 1 week) and (created_at + interval 1 week) or so
No, ARH said is, and I just blindly copied it.
specialku :D
So let's do that.
Comments.createdAt, and then... What is it?
Date.
Oh, interval.
That's the syntax I'm trying to remember.
Yes, thank you.
between created
dpk0 do you lose stream VIP status for causing a bug? 🤔
at minus interval one week and comments created at plus inter read line one week.
All right.
specialku can't lose vip if you don't have vip
Unknown column story is author because it should be stories plural.
That's an error over here.
Please be good, read line as I wrap back a line.
Good.
Do you lose VIP for causing a bug?
I think if you lost VIP for causing a bug, I would be demoted out of streaming entirely because all I do is hop on twice a week and write bugs for three hours.
So we're going to be pretty generous about that.
I think it is normal and low key.
So it's not called stories.it.
What am I doing around here?
Oh, is it called submitter user underscore is underscore author?
How did that query run in the Rails console?
Because I ran it, and I saw a whole bunch of records go by, and then I added the to SQL to get the query.
arh68 we're only selecting comments .. where do we join stories
What are you doing to me?
User is author.
Stories user is author.
We're only selecting comments.
When do we join stories?
Oh, we don't.
Yeah, it's probably important to join stories.
How did this not cause a bug in the Rails console?
All right, so from comments, join stories.
Oh, that's a,
Helps to have an on there.
I figured that one out.
01:52:15Okay. So this is fairly plausible. I don't remember the indexes off of comments. We might not actually index created at. So this might just be slow as heck. Let's see the explain. So this part of user ID is fine. And then this row where we're comparing against 405,000 is basically the comments. Yeah. So there's not an index getting used for the around. Should we create table comments? So comments. does not or no show indexes yeah on comments from comments we do not have an index on created at okay so that previous version it's pretty good i control seated because it would take roughly forever to run so
01:53:42arh68 would you dump that average into a temp table ? hmmm
Let's make a migration and run it.
...53arh68 if you're just exploring
I would not put it into a temp table.
Oh, for exploring?
Yeah.
Maybe?
Let's just see a rough version of it first.
I think that's all I need.
I was going to say there are roughly half a million comments in this particular copy of the database.
We saw that number go by.
So let's jump back into the console.
01:54:49And now if I explain font size, but it does actually say we are not using that key. Oh no, possible key, but it's not going to use it.
01:55:08That feels weird.
specialku see you later guys
I would also expect it to be used up here, but I guess the, oh, so it knows that this additional index is there, but it gets much better cardinality out of user ID.
All right.
arh68 gn @specialku
Let's see if that wants to run in a reasonable amount of time.
Maybe not because if it doesn't realize that it wants, it really should be using this.
All right.
Take care of special coup.
Nice chatting with you.
...59Yeah, this is definitely running in, not doing anything better to use the index. I wonder if this is... Well, the index should be in good shape having just created it. I'm wondering if I need to run the optimize table command for MariaDB to generate stats for the index. I'm at the limits of my knowledge there where I wonder if... Now let's go ahead and just swap this over so that I don't have width issues.
01:57:05Optimize table. All right. So I just killed that query. Let's try. Quick optimize table. It's going to optimize. I probably already just deleted the indexes. I don't remember if this So what I am remembering just after I whack enter is, don't we have a full text index on this that's gonna take forever to rebuild?
...48Apparently not.
...55arh68 uh the bottom one
Comment story ID, so there's the foreign keys.
The bottom one, index comments on comments.
Yeah, there it is.
I should have just scanned.
Like I said, I'm missing stuff.
I guess I had a good weekend.
All right, so it did a recreate and analyze.
I was worried that was going to take like a half an hour.
I can vamp a little when commands take a few seconds to run.
Can't vamp for a half an hour.
No, even with that, it still doesn't want to use the key.
01:58:46And it's this dependent subquery that's killing me.
...58So I would believe that this is something like the number of comments I've submitted.
01:59:13Yeah. And then this created ad is going to filter it down. I'm puzzled that I don't see it in here, but I assume it's going to be using a much lower number here of 100 or 200. But it's this dependent subquery that's killing us two ways. It's got to run for each of those couple hundred rows, but then also it's aggregating a bunch of data.
...55arh68 so maybe suppose avg(score) is like 3.5 for now, unless you wanna explore it
Yeah, so 119.
arh68 could check a couple dates, all dates
But then if we're 119 scanning all half million comments... Well, ARH, I kind of...
If we look back here, I'm revising this subquery because I was just guessing at an average.
And it felt like four-ish, but that's going to be my gut feeling based on recent stuff.
arh68 LUL good point
And the reason I want this averaging around the times is because our activity changes significantly recently.
So you're kind of, I don't remember exactly where you tuned in, but
I sketched out exactly the thing you are saying.
And then I was like, all right, now let's do it for real.
arh68 oh i got here a lil late myb
And then I'll bring it back to the active record query interface.
So how do I get it to, I know that there are like the force keys.
No worries.
I'm also running a little slow.
So it recognizes that that's a possible key, but it doesn't want to use it.
I guess another filter here is I want to say, I almost want to add a comments that are not yours, but I know that's just going to push it to use index comments on user ID like this first row.
And it's not going to have any cardinality, is it?
Yeah, it just ignored that entirely.
02:02:15I wonder if it's something around between.
...25What if I just said no math on this comments created? No, that didn't tweak anything. We get on one line. That's nicer. Let me know if that font size nudge is too small.
02:03:05If comments came at a particular rate, I would just, well, maybe I can nudge it.
...26What is it? We've had roughly 200 to 250 comments in a day, so if I said. Just to try and knock the cardinality down a little here or the the row count down a little here. How about a really easy index to use.
02:04:08And I'm only doing this math just because I want to express that I'm seeing that we have around 250 comments and then 10 days is a little more than seven. Yeah, that doesn't get me anything. That's weird. What if I drop out the whole date thing in favor of this ID nonsense? And I say nonsense just because it's not a good proxy at all. Oh, here we go. Now it's just a sub query. And it's indexing on score because it knows that I'm taking the average of the score. Oh, that's interesting because I'm using the primary key of comments here. Then it can just go hit the index. because the index has all the info it needs. So it doesn't actually have to touch the table. That's compelling. However, it's still looking at every comment in the database.
02:05:22So if I run this, it's not going to finish in like 10 or 15 seconds, right? No, it's going to give me an error.
...43arh68 lol unsigned bigint ??
Can't minus because maybe the ID number is going to be less than zero.
That tells me it's looking at way too many rows.
Yeah, I'm not overflowing undersigned bigint.
I'm underflowing unsigned bigint.
02:06:11I don't have the syntax correct.
...30This is some fiddly, I put a parent in the wrong place somehow. All right, let's get back to the error.
...46What is it?
...51Oh, OK. How do I get your max?
02:07:06greatest great i just didn't remember the name that's i'm not underflowing anymore
...39That's painful.
...49So let's leave all this ID nonsense behind and get back to the version that just takes an eternity to run. Think about ways to improve that.
02:08:08Could just say and around.id is less than or equal to. Yeah, it doesn't even show up in the query. Doesn't meaningfully improve anything.
...40I wonder if I could do this with a common table expression. Because the big improvement in filtering is this initial, let's just look in the last six months. Well, it's doing, let's look in the user ID, but also let's just look in this last six months is also going to be a huge filter here because
02:09:17If I do that, we knock an order of magnitude off. But interestingly, it's going to be the same order of magnitude. So like with recent comments as.
...56Let's make this one up.
02:10:07recent comments. I'm aware the window isn't going to exactly line up here. We'll get there in a second.
...39Let's explain. Where did I use comments? Over in the external. All right, so it's doing a ref. I thought this. first one was going to be using index on created at. So MariaDB has a pretty important optimization for common table expressions where it pushes clauses down. So this external queries I guess, external expressions where clause about the user ID is getting sort of pushed down into the comment table expression, which is overriding the key I wanted it to use, which is fine. Like, that's correct for what I wrote, but it means that I have to port more before I might see what I want.
02:12:18So one of the reasons I wanted to do that a week on either side is kind of averaging out the effects of holidays, because if I just said the same day, traffic can be really swingy weekdays versus weekends. What if I just averaged over the six months?
...54That's interesting. We got down to 78K, which is, I mean, I'm just eyeballing, but that looks like exactly double this number.
02:13:15It's not quite, but it's close enough. I'm suspicious of it. All right. So this is a little different than I wanted. Instead of saying, show me the comments with a score better than the average comment for the two weeks it was posted in the middle of, it's for the average of the last six months. On the other hand, this query will probably actually execute
...55I'm going to put a count on this rather than dump 100 comments. What are we mad about? That count? Yeah. Just have to say star. Or I can say 1.
02:14:23dlamz explain provides exact row counts? (not used to mysql)
yep and that's going to take longer than i want it to and 10 seconds that's a little rough explain provides exact row counts i don't think these are exact row counts i think this is the query planner estimating in the case of this index it's an exact amount because
...55the data it needs to estimate all appears in the index where this created ad is getting used in a conditional that's based on the current timestamp. And that's why it's estimating. That's my understanding. This is just a, all right, well, at least it caches, but still Spending 543 milliseconds even in the cache view is a little high. Now the cache is full. If this was all the time, 40 milliseconds, I would think about it, but that initial 10 and a half seconds is pretty painful. And what if I just change the ID to something like
02:16:06Oh, okay. So I was just trying to bust the cache and see if I just give it another kind of arbitrary user ID, how long does it take? So maybe it's just that my database had that whole table rebuilt and is running on my machine where it hasn't been iterating that whole table. Hmm.
...46All right, well, it's pretty clean to express this in Active Record query syntax, especially if I extracted this average score section out. The Active Record is not great at common table expressions, When I say not great, what I mean is they're supported in arbitrary queries, but I don't think that there's a way to use them that's not just pulling the escape hatch to get to active record, or I'm sorry, to raw SQL. And I'd prefer not to. We have a with method. oh oh there's something new in rails 7-1 yeah so look at that the feature is only two and a half years old so it's basically brand new it's got that new code smell on it to me which also says i should maybe read the rails changelogs a little more closely but it's hard to just memorize all that So if I said,
02:18:56So I get that with.
02:19:31Okay, so that does generate the width I'm expecting.
...41And then what am I doing with it? Am I just saying... That is a small scroll bar lozenge. I'm kind of looking for the add description for CTE. to the querying guide great i was looking for an example i guess it was back there in the stack overflow answer so let's jump back to that sample Book with, books with review, from, or joins, okay.
02:21:23joints.
...56all right so now i'm whoa this is correlating in the opposite order than i'm expecting yeah i was trying to write the external query not the internal so what if i just say where
02:22:23just to filter. OK, now I'm writing the external. I see how that works. OK.
...43So if I have this.
02:23:28I want to say the score is better than the recent comments score. Just say that is this.
...43I'm not sure how flexible the range is going to be here. Expected. Come here.
...57So this, it doesn't want to make a hash? All right. Ooh. Come here. Make me a hash. With recent comments pointing to score.
02:24:19And I will pass that to form the range.
...28And I'm missing a parent somewhere. But luckily at the end. And Cominstead score is greater than null. Yeah, you did nonsense. I wonder if I just passed the fragment.
...57I don't love this because every time I use a fragment like this, I end up inserting a joins or an includes, and then the table name changes to the aliasing. I'm getting what I want, aren't I?
02:25:30So if I took this, I lost the fact that I'm only looking at these in recent comments.
02:26:02In which case, why do I even need the where? So I don't have the six months ago predicate on this external query.
...19But it does have to be two separate things. Yeah, so. We will just duplicate it and see what comes out. And I will run explain on this to see if I'm getting something reasonably like what I was typing at the MariaDB console. Get none? That's not right. Oh, I just said to SQL with puts. That's why.
...55if i ran it with count invalid use of group function great there it is with the my issue is as soon as i introduce a table alias it just blows up in a frustrating way right
02:27:32I don't think width is getting me anything here. I think I just have a... At this point, I don't even have a join. Not in a real sense. I was just trying to reuse that width, but I don't need it. So...
02:28:02Yep, that's not what I wanted. Wrong clipboard.
...12Let's find it. Where user is the admin. Let's grab that where from up here so I don't have to re-type it. And then this wants to be It wants to have a nice little subquery.
...40Oh, they're even using AVG. Oh, it did two queries. Oh, I see what they're saying.
02:29:24OK, let's bring that in. Where?
02:30:17All right, so it's printing those.
52.
Does this query look right?
Oh, it returned fast.
Select account star from comments.
And score is better than average.
So yes, I've got the reasonably.
Good.
Let's bring this over to the database console and see what the explain looks like.
because I want to make sure it's going to run in a reasonable amount of time.
That's weird to paste a query.
I wonder if it's grumpy that something here went wrong with the.
No, it just took 13.8 seconds to run the exact same query and get the same number.
That's wild.
arh68 what's the 1 warning ?
This was roughly instant, and then this took 13 and a half seconds?
Were you taking a nap, MariaDB?
No, you were saying, I don't know why I would use a key on this correlated subquery.
And then I'm looking at the whole dang table again.
02:31:53What's the one warning? That's a really good question. Oh, it doesn't want to do it on the explain. Got to run the query again. Oh, it's the new line in the date.
02:32:22Yeah, so it still took 927 milliseconds. I wonder if I could prompt it to lean more into the easy key by saying, because we know that the ID is an auto increment one, where
02:33:12So many layers of nesting, I am never going to get these parents right.
...21Wait. This one? No. All right. Color might have saved me there. So we're created a. It's going to be six months ago, dot dot.
...43this dot first dot id yeah ah that's okay so this one yeah the user admin is going to be a separate thing but then this show me the first idea of a comment from six months ago and this took 900 milliseconds even with the index on created app and then this took 446 this kind of pushes for getting back to a, let's just imagine 3.5. I guess if I did this aggregation of selecting the average score from comments in the last six months, That's very cashable. And then I can stick it in key store. And then this query, when it executes is going to be, and score is greater than, you know, 2.14 or whatever the heck that average is really being pushed to make some serious compromises here.
02:35:41arh68 just call em "first-order approximations" lol
If we had nicer background jobs, I could just say, when you load the form for submitting a story, we will throw in a background job to fill the cache.
arh68 aka two-point-two LUL
Just call them first order approximations.
Yeah.
I mean, everything is approximately equal when your epsilon is eight, right?
AKA 2.2, yeah.
02:36:21Offhand, what is this number? I don't actually know.
...35And this is going to be an approximation because this development database is weeks out of date.
It didn't evaluate.
Oh, no.
So four and a half.
You see the E1 hanging out there for scientific.
So four and a half.
That guess of three and a half wasn't too far off there, ARH.
arh68 your original 4 wins
I mean, if you just hand wave and say, oh, I was discounting the user's own upvote, you're dead on, right?
Totally plausible.
Little retcon.
Your original four wins?
Oh, yeah, I guess.
02:37:23It's the sort of...
arh68 so other days ? 1 y ago ? &c
Given that I stare at the site a couple of times every day and I poke around the data, I would hope that my intuition for the average is pretty close and being within, you know, and now I'll hand wave, being within 10% where I said four and it's 4.5.
That's good.
I should be able to be accurate to within 10 or 15% on those kind of gut feeling questions.
Other days, a year ago, et cetera.
Yeah, let's check.
12 months ago, basically the same number.
Let's say we'll just stair step it 24.
Yeah, now it's dropping some more.
So this is we're seeing the effects of site activity growing over time that the average is being forced down.
02:38:34And like, it's tempting to say, all right, well, let's go with my gut instinct and just put the numeral four in there and I'll revisit in a year. But realistically, nothing is going to force me to revisit in a year. And I started this stream pointing out that I'm not doing a good job remembering to revisit the, what are you doing this week to post about the stream every Monday. So the idea that I will remember every November 4th is pretty unlikely.
02:39:22So if we counted, you know the correlated subquery or dependent subquery itself is cacheable. So what I mean is for each day of the last six months, so for 180 days, The average score each day is going to have a value of the average score of comments in the two week period around. it that is something i could pre-calculate and stick in you know so then i would have 180 records in keystore that's such a bad place to put it keystore is kind of the junk drawer of our database and it gets used for these odd caching jobs, which is why I'm mentally reaching for it. Maybe this wants to be, I mean, ARH, you mentioned a temp table. I don't want it to be a temp table so that it drops whenever the database does. But if there was a table that was like called, yeah,
02:41:08arh68 date could be bucketed by like iso-week even
if there was a average comment score table that had the keys date comma average for last 180 days, well then I would be joining against that table
And instead of saying, let's go select a half a million comments and compare them, data can be bucketed by ISO week even.
Could it?
...58I guess it could.
arh68 in the sense you could check vs last-week's avg
you know the this look a week on either side is actually slightly different than i previously wrote i think i previously even said last two weeks on either side because i'm trying to get at whether and since you could check first last week's average
arh68 but it seems the table, and the index, really do need to exist 1st
Well, I'd have to, I still need the join because over six months, it's not so much that we swing very much over six months.
It's that there's some outliers and the two outliers that immediately come to mind are
american holidays that are big so our traffic especially when they are not floating holidays when they land on weekdays so the fourth of july is a big one because people often end up taking the day before or the day after
arh68 personal rant: they should make all holidays monday/friday
Thanksgiving is a big one because it's set to fall on Thursday, and our traffic is down enormously on that Thursday, and then also the Friday because almost everybody... A significant chunk of our user base also takes that vacation.
And then the holiday week, what do you call it?
There's got to be a name for that week between Christmas and New Year's.
arh68 festivus, i would call it
It must have a proper name, but so many places either take it off or people choose to take vacation.
Should make all holidays Monday or Friday.
Yeah, that would be nice.
Festivus?
Yeah, that's the cute Seinfeld holiday.
I am sure there are so many little sub-names for everything before and after a Christian holiday that I assume there's one for the week from Christmas to New Year's.
I just don't happen to have it loaded in my head.
02:44:30dpk0 octave of the nativity
dpk0 in liturgical speak
just kind of zooming out to context check octave of the nativity is that that sounds fancy enough that's probably it come on oh there are even two senses the eighth day after a feast okay
02:45:06The term is applied to the whole eight-day period during which certain major feasts came to be observed.
Well, this is certainly the... Yeah.
Exactly what I meant.
arh68 mannn they were doin week-long fests ? sheeeeesh
I wasn't thinking specifically as an eight-day period and 25 to 31 is a six-day period or seven if we're counting to the first.
But I see how you get to octave of the nativity.
dpk0 eight day period including both weekdays on each end
dpk0 so it’s Christmas Day to New Year’s Day inclusive
were doing week-long fests oh yeah the historical christian holiday calendar actually had a an enormous number of holidays by the end like around the time of the renaissance the number of festival days was pretty darn high on an annual basis
And I sure am not enough of a historian to know the import of them, of were they all treated with, you get time off work for this level of seniority, or they just had other attachments?
arh68 calendars are so silly. also happy boomtime yall. espically if it is boomtime
Anyways, we're rabbit-holing a little on Christian holiday history, although it's a good rabbit hole.
Thank you, DPK, for getting that term.
I am trying to say is the point of this averaging is just how darn much comment scores can swing based on holidays and you know if they swing that much for like the high holidays of Christian of
Christmas, New Year's, Thanksgiving, Independence Day.
I probably don't want to be looking a week on either side because if you post on Christmas, you are never going to make the average for that week because the week leading up to it is pretty normal traffic week.
02:47:17arh68 how bout 52 weeks ago
No, ARH, I regret to inform you it is not boom time.
It is still not boom time.
...2852 weeks ago god i almost just want to say average for that day thinking about holidays so much because I mean, at that point, if I was thinking of narrowing it to just one day either side, well, then Monday and Friday, your comment almost always is going to make it because activity is so much lower on the weekends.
02:48:23All right, let's jump. back to the database console.
...39What's the?
...46So it's just date.
02:49:34yeah we're right back into dependent subquery i'm wondering if this is going to be like 15 seconds or 15 minutes i'll give it a couple more seconds then i want to cancel and run the explain on it yeah even just saying the same day huh nope you want to look at the whole darn table
02:50:15I wasn't even getting into the issue that our table timestamp is in UTC, but then almost all of our activity is in America time zone, just to speak with a broad brush. And so I was actually cutting a day in a third. And since we're so correlated to weekday traffic, you kind of want to do that time zone adjustment, but I can't get away from this darn dependent subquery that is not using the key.
02:51:23All right. Not bad. But it's not going to change this. No. All right. Let's do a different kind of evil. What's the MariaDB to force an index?
...47dpk0 stream title: let’s do a different kind of evil
Because the optimizer is doing a table scan, even though I know that using an index would be better.
Because surely I have never been bitten every single time.
I think I know better than the optimizer that hundreds of very experienced engineers have poured millions of dollars of time into.
This always works out for me when I try it.
Force index and name, OK.
02:52:35I guess I have to put it after the table name directly. I thought it just kind of came generally after the query. All right.
02:53:00So even if I tell you to use the index, you ignore it. Which is fair, I would ignore me too. What does that schmuck know about indexes? He couldn't even create this index correctly so that the database would recognize that it's exactly the thing it wants to use for the dependent subquery.
...29Did I, did I create this index wrong?
...47Index comments on created at is looking at the created at field.
...59Oh, it still has a cardinality of 512,000. That's why it's not using it. It's not using it because even though it's a B-tree that is ordering all of these, the cardinality is the number of rows in the table or, you know, like three less than the number of rows in the table because none of the comments have exactly duplicate timestamps. Right? So if I could... Drop index. Can I index just the date value?
02:54:48What's the drop index? Do I have to say the table?
02:55:05Where are you going to be?
...15From the table, table name, drop index blank on. All right. All right, we'll put you in the other order.
...30No, we won't. You still don't like it? Index, did I type this name wrong? Thought I pasted it.
...50This statement is mapped to an alter table. All right. So then. All right. You're happy with that one. Great.
02:56:08Let's go look at the.
...16Did I say that phrase, let's do a different kind of evil? That's actually pretty metal.
...28All right. Create index. Let's use, does it say it maps to an alter? Yes.
...46OK, index that.
...54Let's go look at the alter table syntax, because that's what I'm about to use. So I can say add index. There we go. I don't think MariaDB will actually let me do the thing I want to do, which is say it's on the date like extract the date here just to get a higher cardinality on the index but i want to try it on index index call name so if i said date created at yeah can't do that if i said created at you'd be happy with it yeah I don't want to materialize that and add a date column. I mean, maybe I do want that.
02:58:07This is a lot of hassle to differentiate between 2.2 and 4.54, whatever that was.
02:59:10I can't have a null false until I have a default. Rails, does it update all? Which one are you?
...32Okay.
...52I'll think about that null in a minute after I deal with whatever syntax error I get out of this.
03:00:13arh68 so are you going with actual column? or virtual column
And here's that hassle that comment is such a wide table that even though it only has a half a million records in it, it is many, many megabytes.
Am I going with an actual column or virtual?
I'm going with actual because I don't think I can make a index on a virtual column.
...43Or at least not MariaDB.
...53OK, so we've done that. We've created it.
03:01:11looks valid to me and yes there's still a time zone issue but that's pretty minor and then where's my explain so instead of extracting i will say comments dot created at date equals c2 dot created at date And let's get rid of this force index. Well, that came back in a reasonable amount of time.
03:02:07Yeah, this dependent subquery could almost be a join.
...20But at least it's only looking at 126 rows instead of half a million. So if I run this again, well, now it's just in the query cache.
...51So let's grab a comment.
03:03:04Yeah. Missing a parent or something. Come here. Let me edit that. Where did I miss a quote? at the start of, here we go, start here.
...33That's a lot worse than half a second. Even though this should be a independent subquery that doesn't take much time at all to run.
...55But uncacheable subquery. OK, that's all right. My little clever bit was too clever and is absolutely blowing up.
03:04:23OK. All right. It's a reasonable amount of time. So what I'm doing with this limit offset is just trying to grab user IDs that have posted comments in the last six months. And I'm just introspecting on the database in the most direct single row lookup shape I can. Yeah, that's this subquery. It mostly doesn't matter. But since these are coming back briskly, you know, I say briskly, 60 milliseconds. It's not as nice as the 30 I had here. And 134 is a little painful. But now we're getting into, we're working in reasonable territory here.
03:05:44arh68 i like talking directly to the DB, to know what it can answer quickly HahaGingercat and what it can't
arh68 doing it Ruby-side seems lik. 3x harder
Yeah, ARH, that was part of why I switched over from Active Record to SQL was, like, number one, with all these experiments, it's so much easier for me to iterate on direct SQL syntax, but then also because I didn't have a great intuition for where performance was going to end up.
I didn't want to pay the cost of iterating in Active Record and not knowing where this is or...
printing the SQL and bringing it over and pasting it to see the explain and going back.
I'm trying to iterate as fast as possible here.
And I've gotten pretty far from my initial idea of, well, let's look at the two weeks around.
Actually, I wonder how this would do with the two weeks around idea.
I'm not sure I like that idea anymore, but let's just see.
where comments created at between c2.created at date minus interval one week and c2.created at date plus interval one week.
03:07:13arh68 is this like casting datetime to date or wat
so that that blows up the dependent subquery so i've lost the cardinality on the rows and i've lost the use of the key and it's going back to doing i mean it doesn't say the word table scan but that really smells like a table scan
...41feels like a table scan given the delay here.
...52arh68 idk whatever. brains tired
Yeah.
...58Yeah, ARH, you may have tabbed away for a second, but I rewrote the migration to create a new column called created at date.
And then I
stripped out or filled it with the date flex to here was where I was going to say, you know, TZ offset and also a set column, not, no, that was the kind of the to do's that were left over there.
But I just wanted to see this, like, tell me about average of votes for the day stick.
arh68 all good. 20 sec query noooooo
And it got a good value.
03:08:44That came out of the query cache. Let's get a new user ID.
...54See, there's another nice one. So there's something about the, oof, 440 milliseconds. There's something about the, oh, but it's someone who's left 128 comments that were above average in the last month. Either I've got some weird distribution of test data, which is possible. It's been a hot minute since I refreshed the data. Or that's like me. There's not a lot of people who will, well, I can think of two or three commenters who would leave 128 comments in six months. That's almost a comment today. Assuming they are always above average. So that's almost a comment today that is above average to say it a different way, just to move my predicate around a little in the sentence.
03:09:57So am I happy with this idea? of considering it well received if it's on the same day? I think so. I think that gets at it. So let me finish this migration.
03:10:27Is it alter?
...35It is change column. I never remember that. I always want to call it alter column.
...53All right, so there's that. Why are you yellow? Symbol with a Boolean name. You probably want to use false. yes, I do. Good catch. And then now what do you highlight? Change column is not reversible. That's right.
03:11:34All right. That'll take the index with it. It's not drop column. It's remove column, right? Or delete column. Remove column. This is one of those places I constantly struggle with it, because I would expect Rails to mirror the SQL syntax, and instead it's like, no, we'll slightly change that. And it doesn't feel like a large enough... I think the Rails terms are slightly better, but not better enough to justify the difference.
03:12:33I'm committing crimes. I'm committing crimes. It's times on math and I'm committing crimes. Okay. So I mentioned that most of our activity happens on US hours. And plus or minus. So Chicago time most of the years is about five hours off of utc i might be swapping plus for minus here because i'm doing it off the top of my tired head but if i say i want to adjust to average america time i could almost say with a straight face that the population weighted average time zone difference of america from utc is central time. That has nothing to do with the fact that I live in US central time. And most of the site runs on US central time and it is hard coded. It took 11 years for someone to notice that and be like, hey, see, yes, we are UTC minus five. so that's legit let's rerun the migration you fibber did I not roll back I did not roll back I thought I had before I started editing try to be good about do your rollback before you edit because otherwise i edit and then the rollback doesn't want to apply cleanly and i spend some time mad at rails database migrations that can't be totally magic and just read my mind and do what i mean but it is fair that they can't do i want you to roll back to the version before i saved the file or based on the version before I save the file.
03:15:12There are limits to the magic.
...38actually story has the story text that is the extracted text of the story using diffbot so like the full article text that we link to and i moved them over to their other to another table called story underscore texts and i'm occasionally tempted to do it for comment but the The big driver of it was that running a migration against the stories table took forever, just locked the whole site for minutes at a time because the table is just so darn wide and now comments is catching up to it. Comments are typically much shorter than story texts, but there's so many more of them that they're getting up into the same neighborhood of bytes on disk for the table. And it does slow other operations where, although mostly MariaDB is very, very smart about it and the query planners, you know, see our nonsense coming and they do the joins on the indexes rather than having to load the whole row into RAM. All right. But maybe comments is getting to the point that it wants to normalize out text.
03:17:07It's like 79. Ooh, only 30? That was the one that was slow before. So we're seeing just some fairly significant swings in these. But now they're all nice, like that first one. how would i no that's not it that's i wanted to optimize so i'm assuming that first one was because the comments table just got rewritten to add the comment or the column it wasn't in some cache somewhere. That first one was just kind of a gimme. So again, high number, and then these other ones are coming in low. All right. So that's pretty reasonable. I like that. So where score is. And I almost don't care if it's a second query. Except it does have to be because it's dependent. Yeah. So we'll say where score is This is that other syntax we saw a few minutes ago. Score is better than AVG. And then we define AVG to be. Come here. Toss that comment. That's distracting.
03:19:46All right. I was going to see if I could do it just by memory, but I can't. So let's get it up on the screen. Let's get a simpler one up on the screen. Our score is greater than AVG, and then AVG is... Yeah, there's another one that's easier. This. Oh, I didn't correlate them, did I? All right. Let's see if we can get there.
03:20:42See, I know that's not going to work.
...51How do I tell Rails that it's doing this join? Let's see what I get out of this. Reload. And then what's it called?
03:21:15Well received activity recent well received. Syntax error.
...45Missing parentheses. All right, try that again. I before E, E before I.
03:22:13And we're just bound to... Why do you think this method doesn't exist? Oh, because it's an instance method, not on the class. And we got back a nonsense query that does not do what I want.
...46Let's just jump in the escape hatch.
03:23:08So if I say that, reload is not a method. It's sort of a standalone command, so it has to be on its own line. I always forget that. That's reasonable. Strip off the extra quoting and run it by hand.
...41stories is author oh right let's fix that in here this is the thing that keeps unknown columns stories oh it's still missing the join how does this how does rails let me get away with this
03:24:11so if i say story singular which is the name of the association or the has many or no i'm sorry the belongs to comments has belongs to story that should get interpolated correctly or transformed correctly i'm not sure the right word there comments join stories on You called it story singular. All right, that's fine that you aliased the table like that. I don't know that I've ever caught Rails aliasing a table to singular like that before. Score is ambiguous because stories has score C2. So we want to see the average of C2 score. That's a mechanical enough transformation. I'm not going to make Rails reprint the query. Column and score. Oh, the other score column, that one's less mechanical. I know what it should be, but it's not. Look at me, mix and matching Ruby and SQL.
03:25:42All right. Well, we've got a bunch. This becomes count. And this becomes comments.score. I'm going to dump that again just to make sure that I did the Ruby version back. Oh, you can't call to SQL on a count. You just get the count. Good.
03:26:25Okay. Okay. This feels pretty good. So once we've loaded a user, we're hitting the database and saying, how many comments have they left that were above average for the day in the last six months? And for me, it happens to be 25. Fine. It's a realistic number. And really where I'm getting at is I want to compare this number against how much self promo they're doing.
03:27:12because if a user is occasionally submitting something and then just knocking it out of the park on their comments off of their stories, that's great. Like there probably isn't a need for anything to fire. And if we have both numbers, still with that authorship caveat, So do I want to pair against stories submitted? Or do I just want to fire recent above average comments? It's a better name. It says a little more about what it's doing. This might be good enough that it could plug in in place of some of the is new calls.
03:28:36Because the thing about is new is there's going to be about zero. And if I said, well, you have to have Couple. So I'm thinking of if the user on their first day submits a single comment that's well received, I don't want to totally take the training wheels off. I almost care about the number of separate days that they submitted stuff on more than the number of raw number of comments.
03:29:24I think what I want to do is play with this some more and look at it on real data and say, for the last 100 submitters to lobsters, what's this number?
...56I think it would be a good time to kind of gut check.
that this is producing something that is directionally accurate.
Yeah.
dpk0 so much for a short stream today … i’m tired myself and off for now. have fun!
Because the hypothesis, the working theory of all of this is that people who are acting poorly and exploitatively will have, I mean, often I can imagine a zero number here or a very low number.
DPK, have a good evening.
arh68 take it easy @dpk0
arh68 a good day so far SeemsGood
think i'm gonna wrap up rather than try and do that on stream especially because it might produce a worst of list and i try not to create lists of people to get mad at i would rather celebrate the people who are doing best at stuff but a lot of this work on self promo is people who are acting the most exploitatively so
I think maybe I just want to follow DPK and call it an evening.
And then on the next stream, I'll have results from this for recent users.
And if it doesn't feel like it's going to be a public shaming, I will show those results.
I'm just trying to avoid
really nasty social effects of, look at these 20 people, let's get mad at them.
So if that means I have to anonymize things a little, or just talk in general, I'll do that.
We'll see if I just get back absolute nonsense, because if this is not, I don't want to say predictive, if this is not post-dictive, where
That rough relationship of the people we've had to chide for self promo have low values and the people who are very valued community members still have high values.
Well, then that says this whole thing was a false start.
So we'll see how that goes.
Yeah.
So I will come back on Thursday for the next regularly scheduled stream that is Thursday at 9am.
Chicago time aka best time.
And we are in the middle of.
Daylight savings time changes which.
shake your fist at the sky or 1970s congress about that but do note if you are not in the usa that i believe that you are having your time change on a very different week i saw something go by about this year it seems like they are especially far apart
So you can follow me on BlueSky or Mastodon, where I mostly remember to tweet a couple hours before I do streams.
And that'll be specific.
arh68 daylight savings so silly. peace yall actionerWiggle
Otherwise, take care.
Thank you for chipping in.
Yeah, we're in a good place.
All right.
Yeah, no kidding, silly.
That's a good word for it.
All right.
Take care, everybody.
Hope to see you Thursday.