[Music] Nationwide sequel or death yo son what are you going to choose yo check it this seminar is filmed at carneg melon University thanks a lot Google this is Jeff he's been in Google for 20 years he's touched every single database you know about a Google he's phenomenal he's super smart he waterl it's the best school in Canada for computer science talk about pipe seel Jeff Flo is yours go for it thank you yeah thanks for the do over on the intro um happy to be here um happy to share some of what we've been doing at Google to try to make SQL better for everyone um so I'm gonna start with this idea that SQL has some problems and I don't think that this is a a new idea or uh particularly controversial um I like this Stonebreaker quote from a couple years ago he's talking about SQL and talking about it as a very old language a lot of annoying things that nobody's ever gone back and cleaned up we've kind of assumed that we can't um here's another one from this talk Don Chamberlain one of the original inventors of SQL did a couple years ago a very interesting talk on the whole history and evolution of SQL um Cherry Picked a few things here that found interesting he sort of contrasts sqls English style syntax with what you might expect more as a programmer like he he describes it as a functional language where you've got like operators as functions that are orthog orthogonal that don't have side effects and like interestingly sequels not like that one of the things he says like had they known all those decades ago how things might have evolved and that we'd be using them for this long maybe he would have made some different choices so the problem from my perspective is just SQL is way too hard to use this affects everyone like for beginners uh SQL can be hard to learn and even for expert users who know it really well it's still much more awkward and difficult to use both when you're writing it and when you're reading it I think fundamentally this stems from the syntax where like you write a query in SQL you've got this list of Clauses select from whereare Etc you can only write them in that order and it's very rigid and very arbitrary and doing anything else or anything non-trivial requires you to use subqueries or some other workaround this structure creates this strange Inside Out data flow where the query starts in the Middle with the from claws or with nested subqueries and flows out from there logic both AB above and below the starting point there also a lot of rocity and repetition where you end up listing the same columns over and over again and select group by Etc just the general complexity in the language like the interaction of Select and group by which are written far apart in the query so after 50 years I think it's time that we fix this in SQL and I don't think the fix needs to be replacing SQL we can actually fix it so there are a lot of things that are really good in SQL that we don't want to lose um the declarative semantics uh the very fundamental level works really well um the the oper relational operators are nice like it's sort of the right set of operators and the right set of operations and the table level composability things with like views and subqueries that works really well maybe even more importantly is the ecosystem around SQL there's so many databases query engines and other tools that all speak SQL uh it's a very familiar language with a huge user base who knows it well there's a ton of existing SQL code and we don't want to give up any of that um doing migrations is always really painful both to learn and then to rewrite in new tools so meanwhile we've seen this pattern in many other languages and in apis um that have designed more recently that they use this piped data flow syntax and and structure it works very much like Unix pipes where you have a collection of operators you chain them together with some kind of pipe connector output from one flows to the input of the next um we see this in many query languages that were designed more recently you see it in apis like data frames or Flume um and users working in these systems generally find these fairly easy to understand and easy to use so our solution is that we can do the same thing in SQL um we take all of the operators that you can do in SQL we make a pipe operator equivalent of them like using the same syntax as much as possible is allow chaining them together in any order arbitrarily any number of times so you get query logic that flows from the top of the query to the bottom um it's very simple to understand what's going on it's important to note here it's still declarative so we still expect optimizers to go in and reorder it to an optimal execution path there's this detail like we use this two character pipe symbol this slightly unfortunate it's a bit of a compromise because the single pipe character which we would have liked to use but it's used for bit wisor in our dialect and many others um it's not so bad once you get used to it and we've seen the same exact same symbol actually showing up in many other languages for similar purposes sorry maybe you'll get into this going back to the last slide like you have the the two wear Clauses can you still put a an Clause to combine them in yeah yeah you you can do that this is just for illustration that you can do wear multiple times okay awesome thanks so here's an example from uh one of the queries from The tpch Benchmark in this case it's one that's doing a aggregation in two steps if you look at that query on the left kind of shows this weird Inside Out data flow pattern you get in standard SQL start in the very middle of that with the inner from Clause you got logic above it logic below it and to trace through what's going on you have to like start in the middle Walk Up Walk down match up a lot of things if you look at the pipe syntax on the right it basically just expresses exactly what you want to do in the order you want to do it like start scanning a table do a join aggregate it aggregate it again sort and you're done it ends up being a very straightforward and easy to understand so here's a list of a most of the operators we've got you can start a query with anything that any normal from Clause including doing joins in the from Clause if you want to um many of the standard SQL Clauses just get a pipe version of that Clause with exactly the same syntax it includes doing the select list as an operator you also have some short hands for uh adding updating changing columns in the select list without having to list out every other column you're keeping we've made aggregation a separate operator separating it from projection and making it a distinct thing in the flow which we've found to be convenient for several reasons including readability um a bunch of other operators so you might have seen this picture this comes from the paper from cider last year and from the speakers from a couple of weeks ago uh it's just a very clear illustration of the strangeness of standard C equal where the order you write the operators in the syntax is very disconnected from the semantic order of what's happening when you run that query and if we compare that we write this in pipe syntax basically we go in and uncross all those lines it's very clean and simple because the syntax exactly matches the semantics of what you're doing there there's this sort of match between the relational algebra the set of operators you're applying and the syntax um you go back and forth and translate in either direction yes it's still important to note like both of these syntaxes are declarative so they it's not specifying execution order it's just semantics and you'd expect uh to get the same performance and the same results writing queries either way so interoperability is an important Point um and one of the nice things about doing this inside SQL um we allow adding pipe syntax anywhere that a normal query would work you can mix and match inside the same query uh across views um commentable expressions in a WID Clause uh any query including a queri and standard syntax you can add pipe operators on the end to do additional computation and you can do this uh using all the same tools so that example on the right is just showing uh mixing some operators in both syntaxes so just briefly about our implementation um we've implemented in in Google SQL which is our shared component for SQL parsing SQL analysis it's used in uh all of our SQL related tools inside Google and in our CL Cloud products big query Banner F1 and others um and doing it in that uh query front end analyzer place like we we are able to generate the same intermediate representation for a query written in pipe syntax as we would in standard syntax so for a query engine uh they receive the same thing feed it through their Optimizer and execution then uh are able to basically get support for pipe syntax for free just by abling a flag without having to implement any new execution support so that that's great and that's enabled us to support this in several tools so we had this paper last year in BLB it has a lot more details on the language and some of the choices we made and some of the analysis I'll talk a bit more here about like what we've been seeing from actual usage um so here here's a graph showing usage in F1 um users doing pipe queries like F1 is one of the main query engines users inside Google are using to query data and um important thing is to see the shape of the graph it's just uh growing fairly quickly and accelerating what we're what we're seeing is that users see the syntax uh once they see it they they able to learn it pretty quickly and uh want to use it and it tends to be uh pretty sticky and to spread virally um like showing that the the users are uh quite happy with it you have do you have numbers that could maybe say like would you know whether something like the the pipe syntax is being generated by like a tool or I guess in the very beginning there are no tools everything's handwritten yeah so this this would mostly be uh queries people have written um there are lot various tools that generate queries uh I'm not aware of any that are yet generating a lot of queries in pipe syntax I think that is one of the advantages that as a generating queries in code that Genera them this form is a lot easier or should be easier so yeah I think this represents actual usage by people for the most part so we get the question of who who's this for who uses it I my answer is really I think it's for everyone who's writing SQL or at least everybody doing non-trivial queries in SQL um for the experts or people who know SQL well already um it's really easy to learn this it just takes a few minutes to show a few of the details there of how it works and a few examples and um it's a big benefit that it's the same operator you can do with mostly the same syntax and it's just a better and more flexible structure for applying them then these users uh find themselves immediately more productive writing and editing SQL uh for more beginner users and also that the set of users who have been exposed to SQL and don't like it um this fixes many of the more difficult and more annoying parts that cause a lot of users to not want to use SQL um so here's a few samples of the kind of feedback we're seeing from users so I don't go through all of it this experience yeah it's overwhelmingly positive and like I see I've seen these comments like this like multiple times that this is maybe the most useful change they've ever seen in SQL and that level of excitement is kind of amazing um you might yeah you might look at that and think that it seems a bit overblown like is it really that big a difference um I think there's I I can show you examples side by side and like yeah this looks like a nicer way to write that query it doesn't really capture the full benefit here it's more than just a nicer syntax to look at and like being able to work in this syntax actually changes the way you think think and the way you can use SQL it's very like freeing that um whatever you want to do in a query pretty much can just do it and not feel like you're fighting against the language so like Beyond hearing that it's faster defitely easier even like see users saying that it's been more fun working in SQL this way when you go back to regular cql after using this you can really feel the awkwardness in standard SQL where there's so many things that feel like workarounds are more difficult than necessary think about like how exactly it helps to have queries written in this structure um when you're building editing queries um it's very nice that you can build them incrementally and you generally like start with a from clause and just add more operators on the end as you're building queries at any point the query is executable and you can run it and see what you've got so far just add another operator on the end and generally you're adding operators they're independent of most of what's going on in the query and you don't have to do these kind of global edits about keeping selects and group buys and subqueries all in sync with each other things like autocomplete and suggestion generally work better because the context comes from above where you're writing which you don't really have that property in standard SQL I think it's a good area for future exploration but lot the AI C- Pilots for SQL I think you could really take advantage of this and do something really smart and and like really helpful uh when you're reading and debugging queries with trying to understand performance there's also a lot of benefits like these queries have this really nice prefix property where the query up to a pipe operator is also a valid query so you can grab that prefix of a query and run it and see the intermediate results or see the results before and after applying some operator like aggregation um at any point in one of those immediate queries you can stick in another operator like an aggregation to see the count or the breakdown of values in a query which tends to be quite quite helpful when trying to debug I haven't done it yet but I think this s There's an opportunity here to build an amazing IDE for working with SQL where you could think of doing things like a a debugger that single steps through a query you can't even really think about doing that in standard SQL because just the syntax doesn't make sense to step through we also seen a lot of advantages of doing this in a query engine that in in in the SQL language and in a query engine that people are already using because it's not like some new product some new language people have to learn or make a big decision to start using and a feature that's just there and it's really easy to try it with no commitment and no setup it also lets like one user just go try it first maybe just try it through their ad hoc queries and then like as they see value in it they can spread it across the rest of their team um it sort of enables this viral spread where you can get somebody to try it and like it and it goes from there um versus like other approaches that might require bigger decision or some kind of migration or there's like you give up a lot if your existing queries don't work in a new syntax or it's kind of All or Nothing versus here it's much more incremental you continue doing what you were doing before and use the new syntax only where it helps well and keep full interoperability with everything you had before there's also this you avoid some of the downsides that would come with extra systems and proxies or translation layers so add some challenges with debugging where you can't tell what's really running when your quer is in a translated other language um you don't have to worry about any issues of cost or latency when going through an extra proxy layer so now I'm gonna focus a bit on extensibility which I think is a very interesting area um I break this uh talk here about so three categories of extensibility when you're doing it in the quer engines are doing it when users are doing it or doing it in language design so let start with talking about table valued functions and if you think about a table valued function it's basically a generic relational operator which takes one or more tables as input and produces hey Jeff do you mind if I ask you a question real quick before we get into this I I was just curious have you seen more use in sort of like olap style readon queries like you know in big query for example or you're also seeing like you know same viral growth in like you know oltp style sort of like read write Ops and like spanners yeah is definitely much more interesting in the readon cases that when it comes to doing updates uh there's not nearly as much value in the syntax it's more about the read side of it we don't really have a update or delete syntax to do this we could do at insert where you can produce rows this way and like pipe them into an insert operator but uh okay not really using it for transaction processing and it's also like us a kind of like omm style read modify write transactions there's not as much value there it's more sure when you're exploring or analyzing thanks yeah so I was saying like tvfs are basically generic relational operators and my you can add tvfs in a query engine you can call them in standard syntax the problem is that the syntax is really bad really painful to use because sort of forces you into this nested subquery pattern of beating the input into the tvf as an argument it's especially bad if you're trying to chain multiple tvfs together um we've made a pipe operator form of calling a tvf where is BAS just it's just a top level pipe operator um to do this transformation which is like the this argument in an object oriented call so the input table gets past as the first argument to the tvf and then that allows calling them in a natural way very much like a built-in pipe operator so here's an example uh big query has a bunch of functions for machine learning operations uh several of them are implemented as tvfs um the example on the left that comes from their documentation showing an example of calling uh two ml model lookups like calling an embedding model and then a classifier model sequentially uh really shows this Inside Out pattern or like bottom to top thing that you get uh particularly with tvfs and standard SQL you look at the pipe syntax on the right it's obviously much more straightforward where You' actually just call these operators uh call those tvfs basically with um close to First Class syntax except we didn't have to add any language or support in the grammar we allowed the engine to plug in more more things just as functions so I'll talk about uh extensibility by users uh with SQL um so here's an example there's been discussion going on about adding streaming operators in SQL paper from a couple years ago there has been discussion on this it's kind of stalled a bit in terms of figuring out I haven't reached agreement on how to do uh streaming how to do these operations in SQL um so these three fundamental operators here are sliding Windows hopping windows and sessionization U this paper was talking about how to do them as tvfs which is a really nice approach semantically because it makes them work like a relational operator you can plug in um but the call syntax using them as tvfs is quite awkward um the pipe call syntax would be nice for that uh other ways of doing it have been pretty messy but this hasn't been added in SQL yet so what if you want to do it in SQL um here's an example uh trying Computing sliding windows in SQL uh so you see a query Computing uh active users over 7day sliding windows so it takes records with dates and spreads them across uh seven consecutive days um this block of logic in the middle here is the part that uh smears like duplicates rows across seven days uh with seven consecutive dates and then you can Aggregate and get that uh sort of sliding window behavior um I won't walk through the query I think if you read it it's fairly easy in this form in pipe syntax to kind of figure out what it does um the point here though is like if I end up doing this a bunch of times in a bunch of queries there this block in the middle or something similar might be repeated um you could you could copy paste that in every query but it's actually uh much better to make it a reusable function so I'll just grab that block of sequence of operators out of a query and I'll make a table valued function out of it like this it's taking basically the same set of operators just en encapsulating it into a function and then when I go to call it it's like a on line call to do the logic that was inlined in the query previously effectively it's a user built extension to do sliding windows that then becomes very easy to plug into many queries in a way that looks basically like a first class operator so like this kind of encapsulation makes it possible to build libraries of extensions and write them in SQL but make reusable operators now technically you could do this in standard syntax with tvf too but the syntax is so bad to call them that uh users don't really do it also like you don't really in standard syntax you wouldn't really even have this property of like okay here's a thing in the middle of the query that I could just snip out and run as a self contain thing but is in this example you have an implicit schema inside the extended extend dates function yeah so the function I I I wrote it this way like assuming that there's a column called Date which is why I renamed a column to be called date when I called it um there a few like sort of unrelated features but like some sort of reflection features where you could pass in a column name and resolve it would be really nice and a little bit more powerful templating functionality would make this like even more powerful can I ask join on Nest uh that's our join to an array so like this is generating an array of zero through six and then uh the unest join is basically a cross product with that array that makes seven rows so my question is can you do that in a nested way like if you're in the from world with joins and SQL land can you also where you have like maybe you have an array of objects or records with arrays inside of that and you want to join across the hierarchy can you do that in the pipeline world or do you have to go back to equ world uh I think I might miss like you could write this joint in the from Clause too so that you you could do this query in in standard syntax it'd be a little bit different but it's it's not doing anything new that's not possible actually I'm asking about the other way I could let this go but uh if you could scope inside of the joint un Nest to deal with nested hierarchies inside of the pipe context rather than the SQL from world oh like rather than flattening it out yeah just if you had like that complex example array object array kind of thing yeah you definitely could build the array and yeah there's lots of other things you could do like I'm not even saying that this is the optimal way to compute that operation is you could figure out a way to with window functions it's a way that is actually it was one I used when making the dashboards for the graph I showed earlier so like it's easy enough way to understand and the example is more like once I do something I can snip it out as a tvf okay cool thanks so now get into the language design aspect of extensibility if we talk about extending the SQL language it's actually really difficult to add new operators to standard SQL which is maybe why it doesn't happen very quickly um there's a lot of syntax challenges firstly like where do you even put something into that select whereare from structure um like it's often difficult to even figure out where you would put something to make sense um and once you figure out what you're going to do to fit something into a query uh you have to figure out how to get it to work in the parser and parsing standard sequals very dependent on reserved keywords um which makes everything really difficult because adding new reserved keywords is always a breaking change on some query so there's often a lot of compromises of trying to figure out some way to reuse something that you already have as a reserved keyword to avoid having to add new ones there's also challenges in the semantics about how a new operator would interact with everything else that might be happening in the flow of a query like whether there's aggregation or window functions correlated subqueries everything adds complexity and it sort of creates this quadratic complexity of all of the interactions of other features of the language um and then like because of the syntax compromises you have end up something with poor usability and often the new extensions require using even more subqueries to get something to work now if we contrast adding something in pipe syntax The Operators are independent and orthogonal to each other uh parsing is generally really easy the so one of the advantages of having the pipe symbol is that it makes a natural separator between operations so that they can all parse independently and generally they don't need Reserve keywords and at least they don't interact with each other or cause conflicts the semantics are also simple and local to each individual operator there's no Global interactions or state across operators other than an input table and an output table so it's much more like linear complexity when you're adding features you don't you just add the feature you don't have to think about interactions and for users features added this way are are great because you can just use the new operator anywhere and it's fully flexible uh so here's an example of a feature that has been added to standard SQL long time ago that the idea of recursive queries like when you've got a tree structured tree structured data and want to do a tree traversal or graph traversal um it's really nice to be able to do recursive queries um the way they work in standard SQL is fairly bizarre um it's quite difficult to explain or understand how that query works like it uses this with recursive syntax it's not really recursion in a normal sense it's written as a union but it's not really a union um a lot of complicated requirements on exact query shape that's supported and execution rules where it's like grabbing one half of the Union running it and grabbing the other half of the union and running it multiple times um it also requires to actually use it you have to structure your query around this uh with recursive clause which means once you want to put something recursive into a query you have to often rewrite it so here's our take on how to do recursive queries in a pipe operator uh this is new we haven't put it out to users yet but we've got an implementation so um starts just with a base query producing your input data and then recursive Union as an operator takes the input query and runs the subquery repeatedly adding more more rows into the Union until it doesn't get any more rows just Returns the output of that Union and then the rest of the query continues so you get both a nicer syntax that clearly separates the base query from the re cursive query it's really easy to explain the behavior with simple pseudo code you get something that's composable um you can use it anywhere in the query like in a sequence of end joins if you want your third join to be a recursive traversal down a tree you can just mix that into the middle of a query without having to restructure the whole query around uh with recursive Clause so I think it's a example of with more flexibility in what we can do in the language we can provide uh much nicer Syntax for users so why does language extensibility like this matter um as lots of people probably seen this paper uh it shows this describes this recurring pattern of uh new things being created often new languages new uh data processing systems to solve problems in a new domain and the argument is like eventually we always find that we want to add those things back into SQL and back into their relational model um and we find in the end we didn't need other systems but it takes a long time we end up creating those other things first I think there's some the way that like this pattern shows up when there's these new ideas are always getting developed in new systems and in new languages if if they're proven useful over time then much much later they get folded back into SQL it's a very indirect path I think a big part of the reason this happens is that standard SQL is such a bad platform for Innovation where it's just very hard to experiment or add things into SQL umly for these quadratic complexity issues I described earlier so pipe syntax I think can change this because it becomes easy to add a new operator and you can do it independently of everything else without breaking anything about existing queries I think this is a potential way to unlock a lot of innovation like we can look at what are the good ideas that have come up in other places in other systems um and we can add those operators into SQL and we can do it fairly easily and then the more forward-looking piece like Innovations for the future um what's the next big thing we should be able to do like I don't think we need to invent language for it and people to use a new language we can just add operators and features in SQL so I've got a case study here or an example of I think this happening and showing it it working well um problem area I'm talking about here I'm calling it operational analytics sort of the um production monitoring production alerting where like Production Services are uh generating a lot of metrics system collecting those metrics time series there a lot of streaming systems that are uh capturing those metrics from Production Services srees and Engineering manag Services do alerting and and Etc um and generally SQL isn't used in this space um I just assess the the current state-of-the-art in this space there are a bunch of languages and tools out there but um compared to SQL they're fairly primitive when it comes to query and understanding data um this is a problem we've got at Google too like obviously we've got a lot of cloud services Production Services a lot of uh monitoring and reliability issues and we've got a lot of tools that have been built up over the whole history of Google and generally like our sres or Engineers aren't very happy with the state of those tools and I've been studying our options in the space for what we should do the first option is like try again invent another language for this uh I strongly argue against that for uh lots of reasons inventing new languages isn't a great idea um there's not really a lot of reason to think that our third try at this will be better than previous attempts so the next branches is there anything out there in industry or open source that we like and would want to use uh we haven't found anything that seems compelling so what about SQL um there actually quite interesting because this has been a recurring discussion I've had with teams in the space every couple of years over the last 10 years it's always seemed like yeah we could do this with SQL but like would it really work and would anybody really want to use it uh with pipe syntax now you see something is flipped and it's actually like now it makes sense now now it seems like something people would actually want to use it seems appealing um so what we need to do is just add a few more time series operators for the kinds of things people do with a style data aligning interpolating time series um and then we can make this whole space like fold it together into the same ecosystem as SQL analysis and we get a lot of benefits from that so that's a active area fairly early but active area of work that we're uh trying to do next so I think there's a whole lot of other areas where the same thing applies I'm not going to go through the list here there some of some areas that I'm aware and and thought some about and like things I'd like to add to SQL um sure everybody else probably has their own list of things they'd like to be able to do in SQL or some other system um I think there's a lot of opportunity for the future so um I'll start wrapping this up here a little bit um this is something I saw recently in some headlines um it's talking about C++ and asking the same kind of question that C++ is another language that's been around for almost 50 years and asking if it's going to stay relevant in the future um this case like is it going to be displaced by languages like rust um I think C++ is a more a challenge and a more difficult starting point because the problems it needs to solve are about like Safety and Security issues which are much more require much more fundamental changes language if we ask the same question about SQL what does SQL need for the future I think our our starting point is is actually great and like first 50 years have been a great success and there's not a lot in SQL we actually uh need to remove or change it's not a lot that's broken um most users aren't going anywhere else so far like maybe the most competitive thing would be people using python but it's not really quite the same thing I'd say SQL is SQL syntax is holding us back it's a it's a burden for users and it's an obstruction to Future growth and Innovation so my take on sequel for the next uh this Century next 50 Years first we need to fix the syntax and pipe syntax is a really helpful way to do that we don't need to replace SQL we don't need to give up all the good things or give up the ecosystem um if we just fix the syntax like this for users everything is still SQL it's just a better version of SQL with the syntax change alone like it's already a great win for users they love using it and it's a um great change for the future evolution of the language I I'm quite optimistic here about the the set of things that we could do like there's a lot of promise for the future of SQL if we take some of these steps uh so that's sort of the end here um we've got yeah I think the the best way to really get a feel for this like you can read our paper it's got a lot more details the best way to get a feel for this is actually to try it um just as of February it's open to everybody to use in big query uh it's been implemented in uh data bricks and Spark uh first release of this has just come out recently um so there's like places you can try it on real data with real workloads um we've got uh some of our code available as open source people can play with that uh really I think the thing that's interesting for the community is like how do we go forward or do we support uh SQL pipe syntax in more systems or uh eventually as a standard thing I would hope um so uh that's what I got uh thanks for listening and do some questions I so I will clap on behalf of everyone Jeff that was fantastic uh we have time for questions so if you have any questions for Jeff just unmute yourself and fire away hey Jeff this is jish good to see you after a long time uh this is fantastic the PIP is awesome one of the things that you often see with u SQL in in practices some of the SQL queries are 500 thousands of lines long and besides the syntactical ease of use which you get from what you described the other complimentary component is this cognitive overload right there might be 50 CTS that are defined before you get to the main query you're trying to figure out like which uh which CT is this SP referring to do you see problems like that which I said are complimentary to the pipe syntax and are you thinking about how you might be able to solve that cognitive overload component yeah I there's a bunch of things connected there like obviously at some point when you get to thousands of lines of logic it gets overwhelming um people can like break something up with CTE that helps a little bit Yeah there one of the like some of the things that you can remove subqueries by using CTE but I still consider a CTE to be a subquery you still have to like Chase the names and follow the links and you get like out of line logic so like and being able to write the logic linearly is helpful it's still at some point it's like still useful to break things up into multiple statements you can still use cdes give things names um I think that example I showed with tvfs of like pulling things out as encapsulated blocks of logic can be really helpful I think that's something we can do more of so another parallel direction we're working on is more like modularization of SQL so you have more like reusable building blocks like more like code sharing more code reuse I call this like making SQL more more like a real programming language where you can actually have libraries and share them so a a variety of techniques and it starts with fixing the syntax but then you do more about like managing the complexity of some of the logic that people do that's great one one follow-up question do you think text to se gets easier if you had of course enough training data but if that was in pipe SQL as opposed to the ugly SQL syntax that we have right now yeah I think like generating SQL from any representation in this form I think is easier like if you're doing it in code you generally have some data structure expression of what you want to do like basically that's a sequence of operators and here you can basically translate your operators one to one into syntax which is much easier than having to figure out like how to wrap things up into subqueries or CTE how to like what can combine into one query versus being a subquery like a lot of that goes away and you can directly translate into operators you don't have to have like cross referencing nearly as much um so and I think this applies both to humans generating queries and also if it at some point is AI generating queries the same benefits apply uh I think a particular benefit beond that with AI generating queries like depending how much you trust what would come out of a AI model that I think human validation of the query is still really important like if you've looked at generated code that come out of other systems it's usually a nightmare um the fact that you can generate something that is much more concise and much more readable in this form like if you're generating something that you expect a human to validate it's really nice to have it in a nice readable syntax great thank you other questions from the audience Yeah you mentioned standardization how do you see that unfolding is this ansy or an industry standard what the z a SQL repos say it g work uh yeah I think like ultimately like doing this in the SQL standard would be great and that' be the best way to get this into uh as many engines as possible and like if we start like yeah soon might be the time to start that discussion is if we're going to have this picked up in more engines the more consistently we could do it uh would be great um I mean the standards committee is gonna take at least five years right yeah that's prob like we could have tried to start that way I think like I don't think it would have worked to start that way because just like writing this idea on paper without having tried it or demonstrated I think it's like a bit sounds like too much and sounds like we're crazy and also like by actually doing it we learned a lot and like what we have now is informed a lot by like trying to build it trying to use it and like figuring out how it worked and also like it's s of speculative that yeah like it seems like it would work but actually trying it and finding out how compelling it is once it exists it's like it's been useful but uh figuring out the next step of like getting interest across the industry and with standard boards and uh figuring out that process that that is somewhere in the future I mean if if you can get it into like the SEL pars libraries in like rust python I don't know what the equivalent of C++ is right but there's a I mean basically every new DAV system bitten in Ross is using SQL parts so if you can get it in that that's like get you at least in the newer generation like oracle's not going to put this in right yeah I've seen this in some other like sqlite or postgres or some others have said yeah this looks cool but like are we willing to be that aggressive on strange Innovative new features like if it's in the standard or like used more more widely then it gets like more permission for more places to do it um and I we like R did it in our in our implementation like directly in our tools which is like I think the ideal way to get it as a user like the the proxy approach of something that you can put in front that would like receive a query in this way translate it into something that you could run on multiple engines that still may make sense as a approach to get it working and usable in like much more broadly across other existing systems but do you even want to try to get into a standard this is a layer above SQL right we don't have standards for Assembly Language so does SQL become like an assembly language and this just sits above it or do you think that's not a good way to think about it um yeah if you think of it as the like this rewrites into SQL then you take that view of it's a it's another language in front like I think of it much more as a feature of Sequel and much better as integrated into the language like both because it should share syntax with like something you can do in standard syntax you can do it in this syntax in exactly the same syntax um but I think that the interoperability or like if you make a view in one syntax you can use it in the other or you can incrementally add things into queries that this way it's like really useful to support that incremental adoption and that interoperability which is it works best if it's a feature of the language rather than another language in front all right Nick Nick from the Nick from the chat asks uh can you talk more about what's coming next for pipe SQL Maybe by showing the slide again what do you see are the priorities domain specific time series streaming code fragment reuse semantic model pre-join measures fan outware um I mean or is the answer all the above yeah it's kind of all all of the above like some of these have more complexity than other things and some of them are like somewhat parallel tracks that uh happen at different speeds like the time series and streaming that's very connected to the sort of monitoring use case which we definitely like we're kind of early in like fig out the operators we need but that's one we're definitely working on um graph query is more to me like hypothetical like we've been working on gql and this uh SQL pgq the embedded graph language inside SQL um kind of some thoughts of like if you could just do the graph matching operator so if it's a kind of shorthand for recursive queries inside SQL like that would be neat that's more exploratory um semantic data modeling and olop query areas uh that's the area of a lot of active work and we're trying to figure out like what a SQL integrated solution where you can just do SQL queries over things that are semantic data models like like more like a research project figuring this out but I think that will be really neat if we get it working um working with structured data uh I there's a talk a couple weeks Malloy I think one of the things that's really nice that Malloy does is how it works with structured data and returns structured output um we've had in Google SQL working with structured data and protocol buffers forever um but every all the SQL operators basically flat take a structured thing and flatten it out as a relational table um SQL Plus+ is that works primarily on Json it's kind of the same approach is that very very similar to how protocol buffers or Json work in Google SQL um yeah most of these SQL approaches basically turn things into flat tables as you query them and I would like to be able to do more queries that like build structure and preserve the structure as you query um so that might be one we do more work on soon uh the SQL has a real programming language modularity code reviews testability like that's say that's another track and we're doing a lot of work at Google in that area and uh probably we should at some point publish some of what we've got we've got several pieces there that are uh quite interesting I think um maybe in another six months or a year or something we'll have enough of a complete picture that it's worth doing like a overall paper on what we've got in that area but yeah it was my S question to the audience what are the next 10 things you'd like to fold into SQL there there's a lot to do all right other questions from the audience yeah I have a follow on to the standardization stuff so try let me lay out this Vision so no offense to people who like old SQL but if uh your vision of this is like the new way for the next 50 years pans out and you really nail like the clean semantics of the pipe syntax is there a way to prevent the backward compatibility of of some of the cfty sequel stuff seeping into the PIP land or could you keep those worlds like very deliberately maybe you have some backward compatibility but there's like the canonical way you should write your pipe queries one example is Json arrays in many sqls are zerob based and SQL arrays are one based and so the same index operator Works two ways and so is there a way to like head toward the the right outcome by writing your vision of this pipe syntax Tex is the way to go yeah I think there's a bunch of things there like a few s of more exotic operators that we've s of experimented with a bit and some of them probably come out as operators that only work in pipe syntax just because they don't fit anywhere into a standard SQL query without like a lot of difficulty and compromise so make get to the point of that there's certain things that um sort of require pipe syntax to do some of the newer features um we get to a point of like we really want this to be the future and want to S consider old syntax deprecated in some way like that's kind of just an option that you could put in warning or just like at parsing time or whenever make it an error to do something in the old syntax um actually like there's a particular tool inside Google that was going to start using SQL and they actually asked for the feature of could we put an option in so that only pipe syntax is allowed we hadn't thought of doing that so far but like that's the kind of thing we could do um I guess another area of work I didn't mention is like we want to do a we're working on a translation tool to take queries in existing syntax and convert them to pipe syntax um not just as a like yeah we've got something that can convert for consumption of query engines but we want something to convert in a way that produces idiom idiomatic output that is like similar to what a human would have done as translation and like produce something that's actually nice to look at preserves comments and preserves all the aliases and so if you were if you wanted to migrate your code to the new syntax I think we can make tools that will do that translation for you U something we're working on it's basically like a series of refactoring steps to like fairly algorithmic we just have to implement as basically like take a subg and pull it out and put it in front of a pipe and like take the basically just explode the standard query into pipe operators and then start simplifying it with a set of rules cool thanks quick question uh Jeff does big query Zeta SQL or does it have its own paral implementation uh it's the same implementation like we have Google SQL as a component underneath all of our systems and Z sqls the same code as that thank you all right any last question um my last question would be like what was the sort of the the the lineage the progression of BU building this out meaning did you just sit down at once say hey let's try this and and you generated the pipe SQL sort of the first go or was there sort of like an earlier prototype you tried some things out that didn't quite work or make sense or didn't map to exactly maybe the semantics you wanted to achieve uh guess what was anything was a predecessor to pipe sequel um yeah it's interesting because it was kind of like the idea originally came was I think like five years ago there some docs sitting around that I wrote that long ago it's kind of like a brainstorming exercise at that point and we s talking about some of these other languages like kql and Splunk that like people use that have these sort of pipe Behavior or P pipe pipe structure and the users really like them in certain domains it's just kind of this thought exercise of like what would it look like in SQL to do that I like at that point I like yeah this seems kind of cool it's kind of radical like do we want to do that much would it really uh would we really want to launch it if we built this we kind of just sat on it for several years and it's more like year or two ago kind of coming back to it again part of it was talking about this monitoring use case that I mentioned and so trying to figure out the future of monitoring and that was one of the motivations to like really do pipe syntax because we could see like well there there are people who uh do those kind of things in custo or Splunk and and and really like it and like would like to do the same there's a lot of things in that space if like once you get past the first few time series operators basically you do aggregation queries to make your dashboards like being able to query that data like do the do the time series stuff and then do regular SQL after join to all the other data you've got would be really interesting so that was one of the motivations to like okay yeah look let's actually try this and then it was like we started implementing it adding more operators it just more further we got into to it the more compelling it seemed and like sort of I think I say exceeded our expectations and realized like more benefits than we even s of thought of ahead of time also like realized some of the things to like tweak or figure out of what operators we needed or we want to have inquiry so that evolved a bit more as we were trying it out trying to use it and seeing others try to use it [Music] he Back To Top