To those of you in the audience and listening to the recording later welcome to Path to Citus Con. It is a new live show on Discord this is episode number one, the text chat is going to be happening in the hashtag CitusCon channel specifically in the Path to Citus Con episode 1 e01 thread. We've got a Code of Conduct as you might expect and there's a Code of Conduct for this Discord server, but in addition we we tend to follow the Citus Con Code of Conduct which you can find at aka.ms slash CitusCon hyphen conduct. And my name is Claire Giordano, I'm an Citus open source Champion here at Microsoft and I'm co-chair of the Citus Con: An Event for Postgres event which is virtual and is happening in a couple of weeks. And I'm here with my co-host Pino de Candia who is an engineering manager at Microsoft working on Postgres and we're super excited to introduce our two guests. I didn't mean to cut you off there Pino, PINO: No no no, that's all right actually I wanted to ask do you want to shout out to the producers now CLAIRE: That's a great point. In the background, Aaron Wislang and Carol Smith are producing. This show wouldn't be happening without them and all their work behind the scenes so you can say hello to them on chat or give them any feedback during or afterwards as well And then I think Teresa Giacomini is also here, Teresa is my co-chair for the bigger Citus Con event happening in a couple weeks. Cool all right so without further ado let's let's get started. Simon Willison is here with us to talk about working in public on open source. Hey Simon, so I just have a few things that I want people to know about you, obviously you're a keynote speaker for the Americas live stream for Citus Con happening later in April and officially I think I think people would say you're an independent researcher and developer when I think of you, I think back what is it 20 years ago to the fact that you were a co-creator of Django. SIMON: wow yeah yep, 20 years ago, 19 and a half years ago at this point I think CLAIRE: And more recently you created Datasette and I'm sure we'll we'll talk about that in a few minutes so you can kind of explain to people what that is and why they might care. You're on the PSF Board of Directors, and most recently you've been talking a lot about large language models in ChatGPT and even the new Bing. SIMON: Yeah, they're beguiling, I can't pull myself away from them they're just too fascinating and incredibly distracting. CLAIRE: I have been following you for years online and one of the things that you've been talking about for a few years is this concept of working in public and the benefits of working public and and that's part of what inspired us to choose that topic for today. But before we dive in we should introduce Marco. PINO: And I have the honor of doing that so first of all hi Marco. MARCO: Hey how's it going? PINO: And for those of you that don't know Marco Slot our second guest today who is the keynote speaker for the EMEA live stream of Citus Con: An Event for Postgres, the EMEA live stream will be happening in Europe on the morning of April 19th. Marco is the lead architect for the Citus open source project. Citus is a Postgres extension that allows you to scale Postgres to multiple nodes, so you can start small with one or a few nodes and scale to many many nodes to a large distributed cluster, he's also the creator of the popular pg_cron extension and all of Marco's Citus engine work is done out in the open, in public on GitHub, in the Citus open source repo. MARCO: Yeah definitely yeah you can pretty much see anything I did in the last few years in GitHub, but yeah thanks for for also the pg_cron shout-out, that is my life project it's much simpler than Citus I spend way less time on it but it's it's like funny that these kind of simple pieces of software you build and then just put out there in the open it's gonna have a huge amount of impact if it just solves a specific problem that people have uh so I'm always happy to hear people using pg_cron. CLAIRE: So if we want to dive into today's topic, I guess I'll just start with a super open-ended question like: what do you both see as the benefits of working in public? SIMON: Okay so the biggest thing for me is that the work that I do like I never want to have to solve the same problem twice, ever, that's like the most frustrating thing is when you sit down to solve a problem and be like wow I solved this before and I don't have I don't have those I don't have I'm gonna have to do it again I have to figure out waste my time figuring it out all over again and a lot of the problems that I solve when I'm engineering are problems that can be captured in some kind of form maybe it's a commit message with a commit that updates something maybe it's a few notes maybe it's just a sketch in an issue description of of the approach that I was going to take and I found that having those out there just having those in a system massively increases my productivity and defaulting to putting them in public partly it's sort of an insurance scheme you know I've worked for companies where I did everything in private and then I left those companies and I've lost all of that work you know I I don't get to and I'm gonna have to to re-reinvent things and and solve the same problems again that I've already solved everything that I do public that has like an open source license attached to it and it's it's just out there I will never have to think about those things ever again that that's that's a problem that I've solved once and will never have to go back and revisit and I love that you know I feel like the work that I'm doing is constantly adding up to just me having more capabilities and more tools in my toolbelt. So that's a really Simon-centric perspective like it's very selfish absolutely I kind of expected you to talk about the benefits of sharing your learnings with others and how we can build on top of each other's learnings and how other people benefit when you share your you know today I learned um yeah no again it's very selfish so I have this website um my til website which I'll drop a link into the chat and I just published my 400th note there and on the one hand it is for other people you know so that if somebody else needs to figure out how to copy a table from one SQLite database to another and they do a Google search they'll land on my site it'll solve the problem for them but mainly it's for me you know these are the the fact that I'm publishing causes me to increase the quality of the notes a little bit so they make more sense to other people but also makes means they make more sense to me when I come back in a year's time and I've forgotten everything so so yeah I feel like you can actually be very selfish in your sort of motivations and still do all of this stuff in public in a way the benefits other people I really like that because I hadn't thought of open source beyond code, documentation, design um and you're actually preserving your thought process for so that you can look at it later and that is a huge part of what we lose over time. Well, there's this concept that comes up a lot in recent years I mean maybe maybe people talked about this 10 years ago and I just missed it and it went right over my head but the the concept of doing something for future Claire right not just present Claire and uh I I definitely think about that I mean I had conversations with people just this week about hmm you know what let's go update that document so next year when we're planning for Citus Con we don't have to re-solve this problem right we'll remember this this um bug and it will have been addressed. Sorry go ahead Simon I was just gonna say that actually that's that's really important the um the the I think for like publishing you're writing these notes anyway right like we to be productive in our in our lives we need to make meticulous notes about things the cost of publishing them is pretty tiny compared to the effort of putting them together in the first place and so you may as well default to publishing them so they benefit other people which benefits you because you know you get a reputation as a useful person and you're all of that kind of stuff but yeah it's it's a very incremental cost on top of or you should be keeping meticulous notes why not publish them? That that reputation as a useful person is an interesting concept so Marco I'm going to turn to you do you think your work in open source gives you a reputation as a useful person? Well like I work in a more of a team capacity on on Citus I guess so I don't know if it would single me out as a useful person uh and if you're if you're kind of just maintaining a very actively-used open source repo by yourself you're you're mostly just going to get a lot of complaints a lot of time to be honest but um but no I I do agree with like uh what Simon says it's like it's it's good to especially if you found a good way of doing things it's good to just put it out there um uh I mean it can be on on a blog it can be on on GitHub um because and and I mean one thing I I try to do a lot especially in writing code is um like I feel like code should always sort of, the the main job of code is to explain the problem like it's not so much to like I would say like good code explains the problem and solves it as a side effect um like you you kind of want to structure it and and comment on it in a way that uh future Marco or or future team member will look at it and say okay this this makes sense I can work with this um so but I mean it's a bit more of a narrow scope to do that within like let's say code comments and code structure of a particular project but um yeah for me like a big part of the of of working in public is is the uh is the feedback mechanism like if you it's kind of hard to write write good software um that's reliable and solves the right problem for people um and if you're kind of working on a sort of propriety fashion proprietary product it's I mean eventually your cus your customers might complain or maybe you don't get customers and and there's no one to complain but in open source it can go really quick right I I have to sometimes that oh I we push out the new version and then 15 minutes later someone says "Hey something broke for me" um and and and that's extremely useful um but you also get this constant feedback of like man I really wish we could do this uh on on GitHub issues or uh or on our slack Channel um and and that feedback really also just helps you do better work um and I mean there's also a kind of developer experience side to it which is which is part part selfish and part almost kind of promotional where it's just easier to use stuff that's open source because you don't have to set up a lot of authentication mechanisms and vpns it's like you just do a Git clone and you compile and it works um and and and that uh the developer experience of of open source I think is a is a very important aspect Marco when you uh earlier said that you get a lot of complaints I thought you were going to point out that that's difficult but then you pointed out as value does it also have a negative side um something you can deal with? I'm not sure there's there's many negatives I mean you you shouldn't you shouldn't take it personally I guess um but uh like I mean if many people are complaining about the same thing then uh you know it's a good thing to fix. If one person is complaining about the same thing over and over again with no one else complaining about it I mean that can I guess be a bit a bit annoying but um I mean most of the time it's just just useful and most people are well intentioned anyway um but um yeah sometimes there's also a challenge of okay you get old you get too much feedback right and you have to kind of start saying no to 90% that comes in and then project becomes more popular so you get even more feedback and you need to say even more you need to say no to 99 %of requests so that uh that can get a little bit difficult sometimes I've got to work-around for that um which is kind of fun so so my main project dataset one of the big features it has is a plug-in system so you can write plug-ins for it and the joy of plug-ins is that you people can add features to my software without me being involved at all even if I think those features are a terrible idea and this actually plays well for me because I come up with features that I think are a terrible idea and I can still build them like I can I can go and build a plug-in that does something kind of ridiculous and silly because it doesn't harm the core project because it's it's a separate thing and so that I've has been so like when people say how can they contribute to my software I tell them write plug-ins for it I won't even have to review their pull requests you know they can work completely independent of me and explore new things and maybe my software gets a really cool new feature as a result well that's really interesting because that connects to a capability that Postgres has like if you go back to the very first paper that was originally published when was that 1985? 86? Marco about Postgres um one of the primary uh design constructs in the database was that it'd be extensible um and so in fact there are there's this ability to create Postgres extensions and Citus is in fact a Postgres extension and it's enabled all this Innovation to flourish um that wouldn't have been able to be put into Postgres core right it kind of gives people runway to go off and make things happen without having to get them in and uh yeah pgcron is another fantastic example of of how well that kind of thing works yeah What about community fragmentation uh aspect of that? so if if you in particular dataset Simon I wanted to ask you so if someone goes off and writes a plug-in do they continue to have the conversation in the context of your of dataset or do they end up splintering off and and how do you bring those conversations back together so right now the community is small enough that we we have a Discord that we hang out on and that's not been a problem yet but also the project is young dataset itself is five years old now the plug-in systems may be four years old but it's only in the past year that people have really started building quite elaborate plug-ins on top of it which is super exciting and it's the position I've wanted the project to get to but I feel like there's a lot of there's going to be growing pains going forward that we haven't encountered yet so right now it works because it's a small enough community that's been okay I do worry about things like um I want to make changes to dataset which could break plug-ins and that's fine when I wrote the plug-ins because I can upgrade them but once now I've got external like volunteer maintainers building their own plug-ins I have to think more carefully about that kind of thing um but yeah I feel like it's a well-trodden path like my inspiration was WordPress where WordPress plug-ins have been around for 15 years and the I feel that the reason that WordPress has been so successful but it is yeah it's a very diff the design and architecture challenges is really interesting you know designing a plug-in system that lets people do flexible things without sort of binding your hands in terms of the future of the project itself takes a lot of practice and work and I I still I feel like I'm figuring out those patterns as I go along but it's difficult there's not much guidance out there as to how to design your plug-ins or extensions model for sort of maximum power and minimum friction so what about the Postgres community how how has that been obviously the extensions are massively successful there are lots of really popular extensions was that a problem uh um I'm fairly new to the Postgres community so I'd love to know a little bit about the history of extensions and and how the community remained um avoided fragmentation yeah I wouldn't know what the uh what the first extensions were I mean I think the first really major one was was post GIS uh which is the kind of adds dome geospatial data types and functions to Postgres uh without changing a line of Postgres code and it's very interesting so they can have these polygons like on maps that you store in your database but you can also then create indexes on on spatial indexes effectively um because the the plug-ins the indexing system in Postgres is also very pluggable um and and uh yeah I think most most places where you can run Posgres you usually also have post GIS um but this this kind of data type adding a new data type that that's a very kind of clean interface in Postgres there are certain other types of extensions I kind of they don't really have a name I kind of refer to them as deep extensions uh that that really alter the behavior um Citus is one of those and timescale DB and uh I mean there's this new graph extension called age or I think agents graph previously it's basically a graph database on top of Postgres and like they go a lot deeper into the extension interfaces that Postgres offers I think sometimes it happens in the Postgres community where someone wants to add a new feature how to do it and and what the design should look like so they just add a function pointer and say okay you built your own extension then you know do it and do it your way um and but and and that gives an enormous amount of power like that there's no nothing like it in any other database that you can just change the planner into something completely different um but that part is also a little less uh cleanly uh layered so it's it's hard to layer uh certain extensions that that mess with the planner on top of each other uh sometimes it works sometimes it doesn't uh like you get into also you know binary compatibility issues that they mess with the data structures in incompatible ways um so but I mean it hasn't really I mean become a huge problem yet uh so far I mean most of the time uh people use one or two of the deep extensions but not not uh the combination and then there's the vast majority are more like new data types and new functions and those extensions you usually compose really really nicely so when we were brainstorming topics for today's first episode Postgres extensions was absolutely one of the things we considered we could spend a whole hour talking about it and in fact next um next week in Episode two I think we're going to be back here Wednesday 10 a.m. Pacific time as well um with some different guests and our topic is something like um how to get Postgres ready for the next 100 million users and I'm sure Postgres extensions will come up in that conversation too. I want to circle us back to working in public for a second um Simon and Marco you both have talked about some of the benefits of doing it but what I wanted to drill into because um one of you I've kind of planted the seed in my mind what was surprising what has been surprising about working in public? Has there been anything surprising about working in public? I think the surprises uh they're the nice little surprises when somebody says oh I really appreciated that little that that note that you put out and it's something that you threw out six months ago and promptly forgot about and honestly didn't think anyone would ever read so that's that's the real delight of it you know it's when somebody says that thing that you wrote helped me solve a problem or was useful to me and like like somebody will come talk to you at a conference or whatever and that's always delightful because honestly I published so much stuff like on any day, given day I probably publish one or two TILs like half a dozen issues a bunch of commits there's a massive volume of it so I assume that nobody is going to see most of it or any of it because who's got the time to look at my latest GitHub commits or whatever and so when people do that's kind of lovely you know because like I said earlier I do this selfishly it's mainly for me but anytime it's valuable to someone else it's always kind of it's kind of a treat it's always delightful to hear that that that did have a impact in the world beyond just me having my own notes. Well the reason you're here is because I've been following you for years and um have learned so many different things from you that I finally mustered the courage to introduce myself and invite you to be a keynoter at Citus Con um so yeah I do I think oftentimes people do appreciate the work that gets shared publicly but we're not wired to necessarily express that gratitude or say thank you so when you do get a compliment it is kind of cool yeah think about it on GitHub there's a tab for issues there's no tab for gratitude or appreciation or compliments um or accolades it's it's not there we're all focused on what's wrong and how can we make it better well there's a star. The star button. oh that's true there are stars yeah I love it when people start the Citus GitHub repo um but I I try not to ask for that too much because it seems so shameless. Could be nice but uh yeah you were asking were there any downsides and um no reason what was surprising I was surprising um So I I think um yeah one of the surprising things about like if you uh you just push some software out there it's just uh you know a lot of the usage is actually very silent a lot for a long time well just like Simon says okay just someone came out of the blue and read this message I understand the same thing happens with software where you suddenly realize there's this enormous user that has been uh doing very interesting thing with the things you've built um like the the funniest anecdotes we have to share is on pgcron uh a lot of that our team is in Turkey and it supposedly the uh like Turkish government uses pgcron to schedule the the street lights or turning on and off um amazing I don't know the exact I don't know the exact mechanics of it but it seems it seemed brilliant I I I totally love it um and and so that's those are also always the nice surprises because they they tend to happen quietly like they you know for years it turns out uh uh your project has been used I I also remember this uh like you know I I come from this university where we had this professor Andy Tannenbaum and he had this operating system called Minix which before Linux was kind of the main uh sort of open operating systems that people were using and he had these long debates with Linux and about the operating system architecture and I mean in the end Linux kind of you know became much bigger on Minix kind of became uh much much less important but then at some point it turned out that Intel had put Minix in one of their uh chips there was like one of the most widely deployed operating systems in the world and often these things come like really quietly you can uh it turns out that there can be massive impact of the things you've done in public so that's that's like one of the really nice things I think one thing I'll say there is that people often ask like how can I contribute to open source? and there's this idea that oh well now you need to fork the code base and fix bugs and submit pull requests totally like forget about that that's like there is so much you can do for an open source project that comes way before you're actually like sending in um patches and trying to commit code and one of the most valuable initial things is just tell the people who built the project what you did with it because I can guarantee for the vast majority of open source projects weeks will go by with with maybe a bug report or two but but no real evidence that it's being used because people can take it and use it use it for free um independently and so if somebody says to me hey I used dataset to build this thing and honestly it doesn't matter what this thing is I will that will make my day I will be absolutely thrilled to see evidence that people are engaging and using it so yeah just um like like telling people that you use their stuff is great even better than that: write about the thing that you did if you like tweet a screenshot of something you've built my software again that's like you know that's social proof like an endorsement I can I can promote that to people and I get to see what's going on but but so there's very tiny things you can do to support an open source project just in terms of talking about what you're doing with it that are way more valuable than you might expect about 10 years ago Josh Burkus published a a talk he gave a talk um I think the title was something like 50 ways to love your project and it was about all those non-code ways that people can contribute to open source and I love the two that you just mentioned tell the people what you did with it and write about it tweet about it um I ended up doing a reprise of Josh's talk at a couple conferences in the last year like Fibonacci Spirals and you know all these ways to contribute to Postgres beyond code um but telling people what you did with it is probably one of my favorite ones um because uh it helps other prospective users right? it helps other people who are thinking about using it in that way to learn from your experience and then it makes the creators' day I mean you probably your dog probably got an extra walk that day um or maybe not does does that not happen when you're really excited do you not take the dog out for a walk I don't know it's what I do we have a cat he walks himself Marco what's up to flight project since you were talking about pgcron earlier uh sorry a two-flight project yeah isn't that how you describe pgcron no oh to fly oh yes a two flight project oh yeah you asked me this before yes so uh I don't know I I used to have I mean I don't I don't really uh travel by plane much any more but uh I used to fly a lot and then for work and then uh I don't know there was always the best place for me to write code it's just sort of disconnected from the internet and people and email and chats and then uh so I I guess writing pgcron initially took me like two flights probably to the US to get it done um but I mean yeah I I had an intention there of um you know it's not it's not part of my my day job and um I kind of really tried to carefully think of you know if people are going to rely on this how without me doing a lot of work do I make it as reliable as as possible and and that's it begins with keeping it very simple and and uh and small and focused on a specific problem um and uh you know there's a kind of quite a few feature requests which I'm I'm not ignoring I'm just like weighing them extremely carefully uh so for a few years I didn't add for example the ability to schedule jobs in in seconds because um it's not something that cron does and it's but recently there were so many people asking for it's like okay well this this follows a real problem I'll I'll I'll I'll take I'll pay the cost of of doing it um and uh but uh yeah you also have to be careful that you know if you if you sign up for it like if you put something out there and really within that um people then start relying on and I mean you can have the choice of either building a community or around this or being on the hook for it or kind of abandoning it uh and but so currently I'm sort of keeping myself on the hook for it but keeping it as simple as possible as well I felt like what you're describing there is that's the hardest problem in software development is is prioritization it's deciding what to build next deciding what features are worth paying the sort of ongoing maintenance tax to start just figuring out you know what is the most valuable thing that I could be building I find that incredibly difficult because I'm independent you know I don't have a boss or investors or anything so there's very little sort of thought I don't really have a forcing function to help me make those decisions so I can get to the end of the day and I've built a new thing which is fine but it wasn't the thing I intended to do with with my time to sort of to reach my my larger goals In that case um I was going to ask you about that before just in terms of you know you publish weeknotes you clearly have a discipline and you have a habit of doing certain things you explain the motivation before as selfishness but then there's also this aspect of prioritization so since you've got to decide across multiple projects what what's your what's your day like do you sit down and um I I don't know pre-prioritize everything you could potentially do you do that weekly I'd like to hear about your habits so on a good day when things are going well I have a slot between 9 and 9:30 in the morning where I make my plan for the day and I figure out okay I try and say I'll go for one big thing and two small things that I want to get achieved and then I'll check in at the end of the day and see if I did those and then once a week I sort of look at my larger goals and try and try and use that thing that's when things are going well past couple of months things have been going disastrously wrong because every sodding morning some new AI thing has happened which distracts me for half an hour and I miss my planning session and so forth so honestly I like there are periods of time when it's all working really well I'm I'm prioritizing well and then there are periods of time where it's it's just it's it's complete dumb luck if I get something useful achieved by the end of the day and the weeknotes are a good cover for that because every week or every two weeks I publish a thing with notes on what I've been doing and it always looks like I've been super productive but if you actually look at the strategy and say hang on a second were those things that he wrote about the things that were the things that he most wanted to get done or needed to get done they often aren't so yeah it's it's an ongoing struggle for me and and you know I I I've learned to that sometimes I do this stuff well and sometimes I don't and that seems to be a sort of like cycle that I can't break out of so you know as long as I don't have six months of complete sort of productive unproductivity then I'll be okay but yeah at the moment I really need to bust away from AI research world and get the the next Alpha of the datawet 1.0 release out that's been top of my priority list like a week and a half and I'm still need to push forward and get it done. Are you saying that it's distracting when Elon Musk tweets out a link to a blog post that you've written? Yeah yeah that that that was um that was quite that was a few weeks ago I wrote a story about Bing um when Bing had just launched and it was it was going completely off the rails and you know threatening people and blackmailing people and all of that kind of stuff and yeah so I wrote a blog entry about what had been happening and Elon Musk tweeted a link to my blog entry and I got a million readers in the next 24 hours because it was two days after he'd tweaked the Twitter algorithm so that everything he said was shown to everybody so it was all that was that yeah that was that was very that's actually that's kicked that was what four weeks ago and I've been distracted by AI stuff ever since because stuff just keeps on building on top of that and yeah it's um on the one hand it's fascinating with the other hand it's it is definitely delaying my ability to get a whole bunch of stuff done that I wanted to get done . Okay I want to Circle back to working in public again and ask a different question um to each of you uh Marco and Simon what do you think makes engineers who are new to working in public I mean maybe they're either fresh out of college or or maybe um they've been working in the proprietary context for a number of years what makes them uncomfortable with working in public on open source? I can speak for myself here actually and the biggest one is when I'm employed I feel like the work I'm doing is private to that company and so so I've had periods of my career where I've done very little stuff in public because I'm working for an organization who pays for my time and they pay for my code and I signed a thing when I signed up that they had the intellectual property and so forth and that really held me back and then um there was a point a few years ago where I realized hang on a second I am allowed to work on things on weekends and stuff you know it doesn't I don't have to have to stick as hard and also these companies will never say you know approve it with your manager and the vast majority of managers you say hey I want to do this thing they'll say yes because why would they say no so for me I think it's probably because I'm a habitual rule follower I I sort of stifled myself by just going too close to to the feeling that no my employer should get all of my intellectual out but when I relaxed that I was way happier and I was way more productive. Makes sense. Marco what about you? yeah I mean I think it partly depends on uh uh on the on the culture of the company and uh and how uh how you experience that but um so for example by uh a long time ago I worked for uh for Amazon and uh they had a very I would say anti-open source policy but they're like using open source but not contributing and there was a time when if you wanted to contribute to open source you had to ask VP approval I mean that has radically changed at Amazon but um like Microsoft has a little bit more of a well I mean the new Microsoft let's say in the past few years um has a very pro-open source policy but um not everyone necessarily feels people making the decision that say oh I wrote some useful code and I'm going to push it out by there on GitHub like what if what if somewhere someone somewhere in the company uh when we'll disagree with that um but actually it in practice it's it's often quite uh I mean if it's if it concerns you know high-stakes software and say that customers are paying a lot of money for it is obviously not not a good idea but um the company is very open to it and it's not it's not hard to get approval and for many things you don't really need to ask even like Microsoft actually encourages it so it yeah I mean it really depends a bit of the company on the company and also on how you've experienced the company so far is is it like in your team in your organization a common practice to just put stuff on GitHub or or work work on other projects uh I think in our immediate teams like it's pretty common you see a bug in let's say PG bouncer you go fix the bugs like the best you're not going to ask anyone should I should ask permission to fix this bug but um I I don't know if that's the case for for all teams at Microsoft but probably not or I mean I think they they could but I don't think they necessarily feel comfortable doing that so I do have a suggestion for things to write about that I feel are safe no matter what um and I just dropped a link into the chat to this but basically TILs: this idea of writing about things that you have learned I feel is the sort of lowest risk form of online publishing that there is because the great thing about saying today I learned how to do a for loop in bash or whatever is that you're setting expectations up front this will not that you're not going to rock somebody's world and give them new insight this is just I learned to do this thing today and I'm writing about it and if that's useful to you then fine if it's not useful to you then then that's okay this wasn't for you and so I started publishing those myself a few years ago and I love it it's so liberating because I don't get that writer's block anymore I'm like wow do I really have something unique and interesting to say about this topic you're like no I just figured out four loops in bash I'm going to write up two paragraphs of text and a sample of code and I'm going to publish it and I'm going to move on and that I love that and I feel like if I was working for a company with very stringent sort of no you can't like release code and things I'd still feel okay writing about things I'd learned you know that feels like a very safe category of of of notes to be making and sharing with the world and then the other thing is um and I've set myself a rule that anytime I do a project the price for doing that project is I have to write about it and this is good for me because because like I said earlier I'm very easily distracted and I will quite you know I can get to the end of the day and I built I had an idea for a project and I built it and that wasn't on my list but at least now I have to pay for it and the payment is I have to write it up and the write-up can be let's like a README in a GitHub repo with just explaining what the thing is in four paragraphs of text and then always add screenshots I feel like anything you build you should take a screenshot because the code won't work in 10 years time but the screenshot will last forever so I'm a huge fan of of screenshotting your work as a way to illustrate it but yeah I feel like if if that's all you ever do online is any time you learn something you write up notes about what you learned and anytime you do a project you build something you put up a quick post saying what it was you built and adding a screenshot that will put you in the top one percent of internet users in terms of sort of quantity of content that you're producing and it's great content and none of it is stuff which I feel like it's very low risk like if you if you put out a blog post saying hey this is the way Agile should be done lots of horrible people are going to tell you that you're wrong about it if you put out a blog post saying I figured out for loops in bash I kind of feel like it's a waste of their time to be to be for people to be super critical of that Simon you make it sound so easy could I just ask how much times did he did does that take you each of those examples you said sort of you know two paragraphs and I learned on on the TIL and I've been writing yeah so I've been writing a line for 20 years and so I'm very fast at it so a TIL post will take me between 5 and 20 minutes generally and that's right partly as well because anytime I'm figuring something out I'm actually making notes as I go along I use GitHub issues for this I've got public issues in public repos I also have a private repo called 'notes' which I just used for when I'm figuring something out and so often when I get to a write-up it's literally copying and pasting markdown from my issue notes into into a TIL document and hitting go so so a lot of the time I've kind of written the notes already the the the the public write-up is just cleaning it up a tiny bit and adding a little bit of extra context but yeah I I feel like for the vast majority of people it's not going to take 5 to 15 minutes at first because you've got to get into the swing of it and find your voice and and sort of learn learn how to productively write but over time it just keeps on getting faster and these are these are crucial skills when when you talk about becoming a senior engineer the path to a senior engineer I think is through writing through written communication like that's the difference between seniors and juniors the seniors are better at communicating about their work so developing writing skills is a crucial professional professional skill anyway. and thanks for for writing up about for loops in bash because it's one of those things that I have to Google every single thing along with for loops in plps that's one I also cannot yeah do you have a TIL on that um I have to admit for loops and bash I will never write one ever again because ChatGPT writes my bash scripts for me so I'm just like hey write a script that loops through every file in this repo in this in this folder runs FFmpeg to extract some frames and that puts them as if and it does it and so no I no longer feel like I need to dedicate even a corner of my brain to thinking about bash because ChatGPT knows all of the bash that I'll ever need to know so this brings up a topic for me Marco earlier talked about um writing code on flights and not flying as much these days now we're we touched on ChatGPT I wanted to ask about changes in working in public working on open source in the last five years both technical technological changes and cultural changes what comes first of mind it may be a Marco I'll go to you first if if that's okay. um yeah obviously the uh ChatGPT and AI is going to constitute a huge change and even like I've used VIM and bash for most of my career uh and but I'm starting to think I should probably be using VS Code because that's kind of where all the good integrations will be happening for for things like Copilot um so that's that's where we personally it's going to be a big change um I mean less technically I guess a big change is that large organizations uh like both software like tech companies but also just starting the prices are kind of massively embracing open source uh both in terms of usage but also in terms of of contribution um are you cutting out just for me or for everybody yeah yeah that cut out for me yeah you're cutting out Marco I don't know why okay we'll try again and then we'll circle back to you if it doesn't work this time. Yeah it's still not working okay I'm gonna jump in and say that um one of the changes I've observed is more of a long-term change um if we go back to when I first started um in my career the way people wrote was a little bit different um there was almost an expectation that their audience whether it was a paper they were writing or just a lengthy email that people would read every word that may not have been true but it the way they wrote felt like that right? there were these long chunky paragraphs and um maybe things were written at the 16th grade level or something like that and I know that when I advise people about how to blog and when I write my own stuff now I think about scan-ability, browse-ability I assume people are not going to read the whole thing I make sure I assume they might jump to the screenshot right like just read a couple of the section headlines and then go to the screenshot and read the caption underneath that and so I at least when I think about writing for people I think about the fact that they're busy and how to make it easier for them to digest um and to scam and I didn't used to think about that 20 years ago so. Yeah that's um there's definitely something very important it I mean that's part of why you want to write a lot is that the more you write the more you develop those instincts for what's actually going to work and yeah I had a sort of moment of crisis uh professionally a few years ago when I'd been I'd gone to the habit of writing these enormously like detailed documents about project proposals and stuff and then I kind of had a hunch that maybe nobody was reading them and I started polling around and I couldn't find anyone who'd read these documents and yeah it made me rethink and think okay actually like like screenshots and like illustrating things more animated demos I love having a live demo like I feel like a quick live prototype of an idea speaks a thousand like text documents because the moment people start playing with it they can have a much richer conversation about it um another thing I found something I found getting back to the sort of um the ChatGPT AI side of things a realization I had the other day is that there will never be documentation that is better in quality than what I can do with ChatGPT and tools like that provided they have the underlying knowledge my favorite example is is FFmpeg where I I did this project the other day where I had a video and I wanted to spit out a for every 10 seconds I wanted a JPEG frame of that video it was a video of a um of a thermometer over time and I wanted to do OCR on it to extract out the readings and so how do you use FFmpeg to spit out one JPEG for every 10 seconds? I cannot tell you that but I have done it because I said to ChatGPT use FFmpeg to spit out of JPEG every 10 seconds and it gave me this incomprehensible sort of set of DSLs and scripts and all of this stuff which just worked and I cannot imagine FFmpeg documentation that would be good enough that it would answer that question for me as quickly as a as a chat bot that has been trained on that documentation and so on the one hand it's weird like there's this new world we live in where a chat bot provides better documentation than than the best possible crafted documentation but it also speaks to skills we need to develop as writers we need to almost write our documentation so that chat bots can interpret it correctly and accurately to help answer people's questions and then as users of this stuff it's the skill that we need to develop is getting really good at sort of getting these these these language models to spit out the right information for us to help us solve problems spot all of the times that they make stuff up which is a huge problem but just and then just lose that fear I'm no longer afraid of FFmpeg because I know that something can can show me how to use it whereas previously I very rarely used it because it's a notoriously complicated piece of software So um as I was preparing I want to pivot from the how have things changed back to the the working and public thing again um I reached out to Scott Hanselman last night and he says hello by the way Simon um and one of the questions he suggested I ask and actually somebody um Olaf on Mastodon suggested I have something very similar is as you work in public how do you stay positive in the face of [ __ ] like right there are critics out there on the internet um there are haters um there are complainers how do you stay positive um when faced with that? Simon? I think I so part of it is that you develop a very thick skin you know if you're online for 20 years you get to the point where somebody's mean to you and I kind of think they're probably in their early twenties they wouldn't be that overconfident and mean if they had actual like real like like life experience um so that helps me to a certain extent I'm quite good but also um and yeah I think it's you it's also a lot of it's about self-confidence like I am confident enough now that I know my stuff that if somebody says no you're clearly an idiot because you got this wrong I'll be like yeah but I'm better at Django than you are you know I so so so that helps me a lot but that but it is it's very much it comes down to a personality thing as well you know I think if you want to really expose yourself on the internet in this kind of way it does help to have quite a quite a robust ego you know and to have that that sort of like confidence in your own abilities because yeah people can knock you down and they will and if they succeed it's it's miserable Does it happen much? I mean I I feel like people have become more um more aware of of the consequences that even words can have is it getting better does it still happen maybe it is getting better yeah maybe oh it still happens at least I see it like I share information on Reddit um because there's you know I espouse the philosophy of meet developers where they are right and so um when Marco writes some brand new brilliant Citus-related blog post that I want Postgres users to see right in case it's useful to them I will I will share it on Reddit and yeah a lot of times comments are supportive and positive but yeah sometimes they're definitely not I mean also I'm incredible like I'm in the most privileged position you can be you know I'm uh I'm I'm I'm white male or you know I I don't have any of the and it like like there are all sorts of of aspects of like sexism and racism and stuff that I'm that I I just don't see so you know it's it's a lot easier I think in that respect. Marco, are you back yeah let's let's see can you hear me yeah we can hear you perfect. that question too yeah how how do you stay positive um in the face of critics on the internet? um has that been an issue for you well I I I don't spend a lot of time on Twitter and those kind of things but um the I mean the main thing for me is just like focus on on on what you're doing and and like believe in what you're doing it's like if so if someone comes and uh sort of criticizes your project it's like you know I've already thought about this much more and uh I kind of know we're doing the right thing or or that we're just working within the constraints that we've had um so I mean it doesn't it doesn't really bother me in that case but uh yeah just probably like Simon I'm also not in a position where it's like uh I mean I'm also in this kind of privileged position I guess so it it it's uh I I probably also see less of it. Yeah of maybe junior developers or or people that are new to open source and um some people in the open source community have reputations for being harsh and quick to um critique a new piece of code or an idea um how yeah I think I think the the PG SQL hackers list is kind of quite quite interesting in that but it's very it's it's well it has a particular style but it's not um but it can be very critical of patches and and designs uh but it's kind of for in some sense the good cause of of making Postgres as good as possible um but and it's never on the person it's just like always critiquing code but it can be very tough for for a new person to come in and say oh I have this nice nice patch and then it kind of gets criticized and that can be a little bit tough Speaking of junior engineers, um I I went back and did a search on Twitter uh Simon for all the instances where you talk to you tweeted about working in public and there is something that you tweeted back in July of 2021 um where you said okay you said if you want to stand out from other candidates having even one piece of writing or published piece of code that shows something you've built is a great way to do that so would you still agree with that and do you still offer that 100% so I do mentoring for code boot camp boot code boot camps occasionally and yeah one of the things I was because in these boot camps like often the students they'll they'll have a final project that they do and they'll put that up on GitHub and I always tell them put up screenshots in your README because the people evaluating these things are not going to click on the demo link and the demo will be broken in six months anyway because that's that's just what happens so but if your README has like multiple paragraphs of text with interspersed screenshots and all of that kind of thing that right there will be your resume for the next like three years and it'll be work incredibly well because yeah when I've been interviewing candidates um for for work most candidates don't have and like inevitably you're gonna you're gonna cyber-stalk your candidates a little bit you're gonna check and see if they've got a GitHub repository and looked at their LinkedIn and the kind of stuff the vast majority of candidates you won't find anything that helps you answer the question can this person do the job when you stumble across a candidate who's got one project on GitHub with some screenshots that shows that they can code now I can skip the fizz buzz interview question you know because I've seen their work I've seen that they can do that and if they've got a blog entry from five years ago with like six paragraphs discussing the internals of React or something they're now in my mind an expert on this one subject so so yeah if you want to stand if there are 100 people applying for a job and you're the only one with a blog and your blog hasn't been updated in five years and it has one article on it and a screenshot of something that's still a leg up that still makes you stand out from from the crowd. Marco um do you cyber-stalk your candidate when you're talking to them uh yeah sometimes I mean uh it's it's it's nice to just uh review basically do a code review before you uh work with someone rather than after after hiring them but uh yeah no it's it it definitely is a leg up like if you have some some great projects on on GitHub or a very technical blog it's um yeah I mean it helps um it's it's much more I mean the resume format is is um like I always remember that one of the most senior and and sort of the best engineers I've ever worked with before he got into software in like during the.com bubble he was a forest firefighter I was like so it it doesn't really I mean your background matters a little bit but it's uh if you can display your skill it's it's just you know 100% better than than anything else. So before we wrap today um I wanted to give each of you a chance to to talk a little bit about your upcoming keynote now I don't know if you've written your slides yet and if you're ready but Simon, the title of your keynote at Citus Con which is on Tuesday the 18th at nine o'clock Pacific time live streamed virtual is 'Big opportunities and small data' and I just thought that the back story to why you're giving that talk and why you think it matters might be interesting for people to hear about no so this is something um so my my day job and I say job I'm self-employed so it's the thing I try to spend most of my time on is building this open source project called dataset but really the theme is this is tools for data journalism specifically so I have a bit of a journalism background I've worked with a couple of newspapers Django came out of a newspaper 20 years ago and um data journalism is the bit of journalism I think it's the most interesting thing in the world right it's where you work with journalists to try and tell stories with data and you know anytime you see an infographic in the newspaper any time you see a chart or a map or something somebody went out and collected the data for that that's a data-driven story um and when I worked at the Guardian newspaper in London like 13, 14 years ago um we realized that there was a report at the Guardian called Simon Rogers who he was the data expert he had like he knew who to call at which government department to get data on any story that you like that you wanted and then he had all of these meticulous spreadsheets that he kept on a hard drive under his desk and we got talking we're like we should do something with these meticulous spreadsheets about the world so we start we ended up starting a blog we started a thing called The Guardian data blog and the idea was to publish the data behind the stories and we ended up doing that just using Google Sheets because Google Sheets was free and you know and it worked and you could put data in it and people could copy that data back out again but I always felt like there should be a more effective way of publishing data like a way of putting data online so people can browse and explore it but also do API integrations with it and export it as different formats and all of that kind of stuff and so that was the initial idea for dataset it was it's a python web application for publishing data online and it's built on top of SQLite because SQLite is tiny and fast and you can actually package a database with your underlying code when you deploy it so you don't have to think about even running a separate database server and it this led me to this whole world of small data where I realized that there's been lots of fuss in the industry in the past sort of five to ten years about big data which is measured in petabytes and you need a giant data warehouse for but actually for the vast majority of people and organizations what matters is the small data it's the data that fits on a USB stick um it's like if if you have as an individual I care about things like my blood sugar levels over time and my step count and all and and my tweets and emails and so on as an organization maybe I want to know who my customers are which is probably like 50,000 rows of data you know it's it's absolutely tiny and there's this space where I don't feel like people in there investing enough effort in building these tools for small data it shocks me that Microsoft Access has been kind of frozen in time for the past 20 years when it should be one of the most powerful pieces of software in the Office Suite but yeah so I've been looking at small data building tools for that thinking about it from a sort of data reporting and journalism point of view and then watching how all of these governments are now releasing open data through these open data portals so name a city in the United States it probably has a data portal with CSV files full of trees and parking meters and all sorts of things like that and this it's kind of just sat there because the tooling isn't good enough for regular individuals and you know reporters, journalists who I'm building for to take that data and turn it into stories so yeah I feel like there's a huge opportunity there it's like I would like to use Postgres for this stuff more than I do and so part of the part of my keynote is going to be talking about the ways I've been solving these problems outside of the Postgres ecosystem and then I also want to try to inspire by by thinking about I was trying to inspire things that Postgres could do better or the Postgres community could build that would make it even more applicable to solving these these much sort of smaller problems or you just answered my question which is um what does this small data talk based rooted in dataset and SQLite have to do with Postgres and there's the answer so hopefully people will will tune in for that obviously it'll be recorded online after the fact but we'd love for people to to join live too and ask you questions okay Marco your keynote for the EMEA live stream on Wednesday uh April 19th at nine o'clock Central European Summer Time is the distributed Postgres problem and how Citus solves it so in a nutshell yeah um and like I kind of will update it in my slide to how Citus sometimes solves it because it's actually not not an entirely solvable problem so I want to talk a bit about um I mean there's kind of different implementations appearing for distributed Postgres but it's also been this thing that's you know people have dreamed of for many years but it's it seems to never quite happen and I want to talk a bit about like what's the what's the technical problem behind it why hasn't someone come in and like submitted a patch to Postgres and it's now it's distributed um and uh you know there's there's new implementations appearing of like distributed implementations of Postgres but um in my mind fundamentally if you spread data across many machines the first thing that happens everything gets slow because now you have to go to a different machine to get the data so it's no longer nicely compacted into one place and and this is a very important thing to understand when because Postgres is kind of a relational database so there are a lot of relationships within the data um and uh if if all those relationships start spanning over a network distribution doesn't help very much so it's worth understanding that problem and then seeing well when does it help like when does distribution make a lot of sense for what kind of applications and what kind of patterns do they use uh use to get there so that's that's what I'm going to talk about awesome um we are a little bit over time not that we're set in stone to end the top of the hour Pinot do you are there are there more questions that we should have asked are there more things you wanted us to cover not not for me I think we covered everything I thought of but I want to say thank you to our guests this was really interesting yes Simon and Marco um I I have been fans of your work both of you for quite a number of years now and um yeah totally an honor to have you be uh the first guest for Pino and I here on Path to Citus Con um so thank you and again big shout out to Aaron and Carol in the background without whom you know this thing wouldn't even have happened today and to everybody who came in the audience um who participated in the chat and joined us and tweeted about this and let their friends know um really appreciate your support and Claire do we have another event coming up we do we do so next Wednesday um April 12th same time of day 10 o'clock A.M Pacific time um which is a nice time slot for those of you in Europe because hopefully it's like right before or right after dinner um doesn't work for my friends in New Zealand unfortunately um but we have a number of guests Melanie Plageman will be here Samay Sharma um Abdullah Ustuner whose last name I probably mispronounced and um Barak Y will be here also and um the topic is again how to get Postgres ready for the next 110 million users um and so we've got some people with uh who work on Postgres open source amongst that group and others who work on Citus open source and we we thought that would make an interesting discussion too um I think somebody could probably drop the the calendar invite in the chat um for that episode too um in case any of you want to put it on your calendars um I think it's aka.ms / Path to Citus Con hyphen ep02 hyphen cal or something like that anyway assignment Marco thank you thank you so much this has been really fun yeah so it's it's been great and and thanks everyone in the audience for for tuning in uh yeah hope we get more of these for next year's side is God knows group it was very cool and Pinot I love collaborating with you this is cool same here I had a lot of fun I'm looking forward to doing it again all right see you next Wednesday everybody and uh Marco and Pino we'll see you at Citus Con on the 18th and 19th can't wait yeah see you outside that's a wrap bye everyone cool thanks a lot bye everyone! Back To Top