Friday, May 20, 2022

Christopher Forno | Data engineering vs. software engineering

Thanks guys we're gonna let's get started today we're super excited to welcome chris chris forno chris forno is the co-founder and cto at tentmon which is a company that builds data engineering products yeah chris uh chris has has has a wide variety of .

Experience in software engineering for the past 20 years he's developed and led teams in distributed systems computer vision he's been working in devops cyber security and front-end development as well he's worked across the u.s he's worked at private equity funds not just tech companies and he's also worked at lazada .

Uh impressively chris has has uh written several articles and thought leadership within the within the area of software engineering and data engineering his github projects have a lot of stars and he's made youtube tutorials that have been viewed over half a million times so quite a .

Quite a distinguished software engineer uh super excited to welcome chris here chris is going to be sharing a little bit about data engine the field of data engineering how data engineering is done at ted bond at various companies in the industry and uh .

And uh and uh some about his experiences as well so without further ado i'll let chris take it away for a short presentation and then we'll open up the q a i will also post the q a link for everyone in the chat so when you have questions uh you can ask them there .

Thanks kai i hope you can hear me yes um yeah all right great hi everyone just just to give a little bit of um background i came here to singapore it's been almost 10 years ago back when uh .

Zolora and lazada forked off and zolor was building a data science department so that's what that's what i came for prior to that i was working in california at a company that some of you may know it's called deviantart at the time it was one of the top 100 websites in the .

World and we had a very small team running the whole thing so anything from writing software tools to running into the server room when uh you know when the the water main above the servers broke and shorted a bunch of the servers and so on so .

That was a really interesting formative time um and it turns out you learn a lot more on the job uh in practical terms than than you often do in courses and textbooks and so on i have a few slides to go through sort of on the data side of things .

They're they're well let me see if i can share those first before i talk about them oh chris i think your audio might have disconnected oh yeah yeah yes okay yeah it it disconnects whenever i share share a tab um okay okay so hopefully you can see my uh see my slides here yes .

Um okay great um so i'm a software engineer first and then i work on data second right um and and that's a lot of what exists because sometimes working with the data you really need to understand the fundamental concepts the algorithms you know how to manipulate the data and so on so i have some slides i'll talk .

About that but i'm also happy to talk about anything else in sort of data engineering or working with data in general in the field so um in the interest of time why don't i just start here now i i call this a software developer secret weapon because not a .

Lot of people learn about this until much later in their careers and um it's sort of like a light bulb goes off some some never do um but it is uh this is a the relational model is a uh it's the foundation for a lot of what happens in professional software engineering when you're working with .

Data every almost every database out there is based on its concepts but a lot of people miss the foundation so i just want to talk about this since this is what i almost always talk about when i'm talking to new developers or someone new to their career so .

So who's this for right essentially you know when you're learning about algorithms they're they're always applicable in any software building that you do and the relational model is probably only 99 applicable there are certain things that you can build uh as an engineer that don't involve .

Needing to structure data right so you can if you if you just want to make art you know interactive animations or you want to make games uh certain types of games then you don't need to worry about data but this really applies to a lot more than .

Just a pure data engineering or a role that has data in the title software engineers across across the spectrum will end up working with this whether they know it or not at some point and uh this was invented or discovered or published back in .

1969 or 1970 by this guy edgar efcot while he was working at ibm ibm worked on a lot of very large data systems and he was seeing the same patterns emerge and he was a mathematician who um who kind of phrased it formally this .

This model that he had he wrote a book um by the way this if you do enjoy math if you you know i know not everybody does um this his paper is quite clear even 50 years on this paper relational model of data for large shared data banks it sounds a little antiquated but it's completely applicable nowadays .

He also wrote a book that's a little bit more approachable and goes into more examples called the relational model for database management i also recommend that even though it's i think it was written in 1980 is still applicable today it's still what i would recommend as a first book .

To anybody getting into data anywhere related to the software engineering field today um so briefly i won't go into this too much or make this a class but but briefly what is a relation and you may have seen this it looks a lot like a spreadsheet or a table .

But it has a very specific mathematical definition you can go to wikipedia to learn more importantly um there is a uh well i i won't go into all the mathematical details you can you can you can take a look or i can answer .

Questions if you're interested but here's two big misconceptions about this because you will run into this at some point in the field especially if you're working on business applications is you'll have relational databases now they're usually called sql databases or sql databases but they're generally relational databases .

On the left is a relation and on the right is a relationship and this is a mistake that even some of the biggest most popular books and and people in the data science community make it's a very easy mistake to make you see relational you think relationships but that's not what it is .

It's actually a mathematical term that describes a type of set misconception number two is that it's like a spreadsheet and and that's another misconception there's a very important concept that will make everything you build a lot more robust and predictable and .

Perform it if you if you understand the distinction between these two and i won't go into it here but i'd be happy to talk about it more um but if you can if you if you're curious just look at you know the definition of a relation and i'll give you the hint that it's its uniqueness .

So if you're making a table of data in a spreadsheet you can make duplicate rows right whereas in a relation you cannot based on the identity um now this is these are terms you're you're likely to encounter if you get into data you may have even heard of them before .

Sql nosql new sql something interesting is that new sql is actually older than sql but the name is new uh i i love these counter-intuitive things so so i uh i like sharing this nosql is actually older than sql it's a it's a .

It's a sort of um age regression that the industry is going through because they don't want to be responsible adults they want to just make things work at large scale and and throw out the um there is an application for it i'm not trying to just purely slam no sql but you need to understand the trade-offs .

When when you're looking at a nosql solution versus sql the sql has a lot of mathematics behind it that scientists and mathematicians have worked on for decades to to iron out all these complex logical problems to keep data in consistent states and nosql says hey you know what we want to build a massive .

Social media site and we don't care about the rest we're going to throw it all out and go back to the 60s again it has some application uh but but you should at least be aware uh of them and then the last thing um that i recommend if you're interested in this .

Is to go through an interactive tutorial if you haven't already learned sql or something like it go to somewhere like where you can have an interactive tutorial so that you don't need to get lost in all the stats steps of installing a database and setting it up and creating the data and the tables and .

So on this is a little bit faster way to sort of get your feet wet with writing queries and and queries are really the sort of the meat the bread and butter of what data roles inside of companies work with right so you'll have analysts who are writing queries every day you have .

Software developers who writing queries can make up a large amount of their time either they're writing them in an orm which we can talk about object relational mapping or they're writing them in a different way or writing them literally in strings .

But also designing these tables or designing these relations it's it's a it's a foundational um activity and and what i've seen between successful systems that get built and unsuccessful ones oftentimes ones that that that have a lot of bugs or that .

That fail at large scale or under load the the biggest dividing factor between those is how well the data model was designed up front it's very difficult to change if you don't get it right in the beginning but if you get the data model right the code sort of naturally flows .

From it it sort of dictates the it gives you a path that you have to follow so um so hopefully this is sort of uh what your appetite to go learn more i'm happy to talk about any of these concepts or more generally outside of the relational model even though i think the relational model is 80 percent of .

What you'll deal with and data in a professional context i'm happy to talk about other aspects of data like data cleaning any of the data science topics uh stuff like that so uh kai i guess back to you that's that's my intro .

Awesome thanks so much chris great so i'm gonna i'm just gonna dig a little dig a little bit uh into chris's experience in this space uh if anyone has any questions on chris's presentation so far please post it in the q a link let me just see yeah please .

Post in the q a link in the chat i will just resend that again in case anyone who recently joined hasn't seen it chris yeah i have a couple i have a couple i mean there are a few more topics that i would like to dig into uh and i'm just thinking in my head now where where should i start i wanted to discuss a little bit about data .

Engineering i wanted that the different areas of data engineering and uh records because relational databases are our students go through sql in our course they use postgresql build relational databases for their applications .

Uh they don't have that much exposure to nosql so it could be interesting to learn a bit about that but maybe we can start here maybe more broadly when we talk about data engineering right uh you know we know that software engineers in general use sql most software engineers or or some kind .

Of they need to engage with some database rather whether it's sql and like postgres or mysql or nosql like mongodb but then what's the point of having a specialist role called data engineering like what do they do is that different from software engineering what is their role so .

I guess you can think of data engineers increasingly the way i think of them is as sort of the uh the inventory right so as as software engineering gets more specialized um the senior software engineers want to .

Pass off the stuff they don't want to do and oftentimes that goes to data engineers because so much of the value of software to companies is the data that it collects or generates or or creates right and increasingly when um you know beyond the .

The basic role of a system right let's say you're running a website i have a lot of experience in e-commerce so you're running an e-commerce site the core function is that you have to accept payments you know and ship products right send the orders to the warehouse and everything else you need to keep the system running and that's a .

That's a baseline but everything beyond that is usually a data request right so so executives need reporting um they may want dashboards uh all kinds of analysis um but even beyond that um the the core data there's there's .

Supplemental data right you have telemetry data which is how the system is performing so the system will usually be instrumented with a lot of uh tracing logic that will time how long things take and and uh structure this data and push it out to another application that people will go .

Through and that application may be queried through sql itself the data engineers will often get a lot of these odd jobs so if for example developers have set up a new feature and they want to migrate the data over to it right it's a .

Different structure or a different version or so on the data engineer may be tasked with me with writing a migration script that's that's one very often data engineers are working for a bi or data science department to clean the data for them there's a saying that .

90 of data science is actually data cleaning i think it's higher than that and it's also yeah it's also data sourcing and everything else increasingly increasingly the the fancy sexy parts of data science have all been automated with tools .

Because computing power is is cheap enough that you can throw sort of a random forest algorithm at something that can automatically select the best performing uh data science algorithms but what's important is making sure the data is clean making sure that you're augmenting the .

Data with as much external or extra data that you can because data science succeeds or fails based on the quality of the data period not the algorithms the algorithms make very little difference i mean for certain things like machine .

Learning vision and so on the algorithms is important but for most data science that you're going to see in like a normal business context which is regressions linear regression right which is fitting a curve to some data or support vector machines or any of these things which find .

Coefficients of a model right so so some it's solving some equation the best as possible to build a predictive model any of these applications the data quality outranks everything else and so data engineers always get involved i always say that data engineering takes a certain type of personality somebody who's .

Detail-oriented somebody who's a little bit of a perfectionist who gets annoyed at the and can feel when the data is not clean those make excellent uh data engineers even more important than the skills that you have because the skills you can learn and in fact you can clean data and .

Engineer data in many different ways using all kinds of different tools different techniques different algorithms the most important part is thinking about the data and and caring about that data so um yeah i i want to make that answer any longer than it already is .

Yeah that was that was great so yeah data engineering is a lot about cleaning the data about managing the data so that we can perform analytics on it for i mean in rocket accounts bootcamp we mostly do app buildings we build applications and we also practice algorithms and .

For someone who might be keen who might be detail-oriented like you mentioned and is keen on getting into this area of data engineering how how would you recommend we we do that sure yeah so if you're if you're interested in it i mean first of all get get the .

Concept of sql down well now some people will if you if you really get into the relational model some people will complain about sql because it compromises on some mathematical elegance and so on but effectively sql is king right now in the industry and it has been for 50 years ever since it came on the scene um .

Because it's it's so it's such an effective way to not just store data but to make sure it remains consistent and that you don't get problems especially in something like the finance space you just cannot afford to have data go missing or .

Have data anomalies and so on so sql is is the foundation in the current market um so i mentioned sql bolt um there's other sort of interactive sort of and whatever else you're learning at rocket academy and so on but you really need to go into practical applications .

Um from there you know if you are building your own applications just practicing with a database is important but i'd say that the best thing to do is learn from well-engineered databases out there because you it takes time to understand why .

You would put in the effort to um uh to to to to design a data model uh or any of this other work um so so that's for if you're if you're if you're a software engineer who wants to do some data engineering now if you want to go purely into data engineering um the skills you .

You've already learned are probably sufficient because data engineering is usually seen as sort of an entry level thing where like i said it's it's you know a lot of the software engineers don't want to do that type of work because it can be mundane or it's just not sexy um and you know just being able to um .

You will need to have some tools that you can rely on so uh for some people that's uh if they like python it's python pandas or it's um uh another one of the python libraries for for managing data frames i would recommend actually oftentimes the only tool you need is a .

Sql database and you can do everything you want through sql queries there's also r which is uh on the mathematics or statistics side of things or any any almost any general programming language if you know the fundamentals .

And you study a little bit of the relational model or nosql that's enough to get started in the industry and start and start learning um depends on the company some companies will be very particular about what they're hiring for what they're looking for they might ask certain sort of .

Tricky questions and so on but uh from what i've seen um in my own experience uh there's always a shortage of data engineers who care and who have the right sort of approach to it so i don't know how else that is but yeah that's great it gives us a sense of .

What the data engineering work is how about ten months specifically so i know tedmon builds a product to help companies manage their data i'm not sure about the specifics but i would love to hear uh what kind of data engineering work or engineering work in general is that a tip model .

Yeah that one it's interesting because we we formed out of a data sort of data engineering data science services company we now build a product the product is a sql interface to disparate data sources right so databases or google sheets or excel or csv files on on amazon f3 or something like that so .

Um we're seeing a shift in the industry away from traditional data warehouses so with the traditional data warehouse you have you have your data sources you either have change capture where you're capturing the changes as they happen or you have batch processing where you're .

Where you're you know you're pulling all the data out of your data sources and you're running an etl process on them before you put them into a data warehouse right and that's how yeah not everyone here knows what etl is it might be worth just sharing sure so etl stands for extract transform and load so you extract the data you .

Transform it in some way and then you load it into the data warehouse now the etl is is very broad definition it can be any tool it can be any transformation process sometimes there's no transformation at all etl just means .

Select from one insert into another it can be most frequently it is pre-aggregation so that means that you're taking some raw data and you're aggregating it along some dimension like time so .

In the e-commerce example you're going to be aggregating total revenue per day and per week and per month and so on these aggregations are usually done so that the data is quicker to work with inside of the data warehouse and the etl process just the etl process .

In even medium-sized companies can be a full team of engineers just depending on how complex all the different data sources are i used to work on the etl team at when i was in the u.s yeah yeah and we had a team of engineers working out yeah .

Yeah yeah it's it's one of those things where uh i i mean you know kai you you have to get it right it has to run reliably if it goes down it affects everybody who's consuming the data and and the the most important thing is that if it breaks in a non-obvious way it can steer the whole company in the wrong direction .

So so um there will usually be a team servicing requests uh and and making a lot of changes either in a visual workflow tool or sometimes as python scripts or something else um yeah .

And and so 10 months product is helpful so techmonth product sort of removes that and instead you go in you enter the database details and when you write a sql query our product .

Breaks that query down and comes up with a query plan that uh ex that runs queries or extracts data from the data sources at query time so it's it's often called a virtual database um others exist out there it's it's it's been i think it's been around the .

Concept has been around for maybe five ten years um that that sort of eliminates etl or i'd say what it does is it makes the e and the l sort of automated and then the t shifts from a python script or or something else into a sql query or a .

View right so when you're working with a sql database i don't know if you've been exposed to views in your coursework but um often you'll write a query and then you can save that query as a view and so that query appears as another table or another relation if you will use the math term um and so that's one way .

That that's really the only way of abstraction inside of sql right so you learn about abstraction when you're when you're building algorithms and then the the foundation of abstraction is a function or in some cases an object right and on the data side the the foundation the basis of abstraction is a view .

For sql so you you can build a view and then another view can build off of that view so our product takes the approach of um we automate the extraction and the loading and then everything gets put into views for doing the transform yeah and .

Okay that was helpful uh for for those of us who might be keen on working more with data uh what does is every company hiring data engineers or or should we be looking at specific kinds of companies if most of the work .

Is ctl it probably requires some large amount of data to have data specific engineers i'm curious to hear your thoughts on the the different options in the industry for data engineering sure yeah .

I i'd say um any sufficiently large you know business tech company or or any large company will have it and then any sort of medium-sized tech company will have the .

Need for it i'd say now it depends on what your definition of medium size is i think the only tech companies that are not in need of data engineers tend to be video game companies if that if they if they um are just producing sort of single player .

Games not if they're hosting if they're hosting then they probably need a data engineers right it's a multiplayer or online game um other than that i'd say almost every company needs them now whether they're hiring for them or not you know depends .

It depends on on a few things if if the engineering team is very strong if the data sources are all very regular and they come from clean you know clean places like for example a small e-commerce site may have very clean data because it's all entered by customers and everything else but but other than that i'd say pretty much any .

Any company has the need and probably many of them are hiring nice now even if they're not hiring specifically for data engineering if they're higher for data science if they're hiring for ai if they're hiring for machine learning yes .

They they will be they will be putting somebody in the role of the data engineer even if it's not listed in the job title right it might just be a job might be a software engineer and they're having their own doing data engineering right exactly exactly or the job might be data scientists and they're doing data .

Engineering which is which is quite often and the common denominator is software engineering they need those skills to be able to write scripts and exactly the way that i think about sql is that sql is a restricted programming language .

Right there's um the only inputs are relations or tables the only outputs are relationship tables um but it is the type of programming language um if you if you work with postgresql there's the idea of recursion is in there .

Um looping a lot of them are implicit it's declarative but all the same concepts are there so that's why it really is an engineering role even if it's just writing queries all day it's still an engineering role you still think like an engineer you still need to understand the algorithms even if you're not .

Writing algorithms you the understanding of algorithms helps understand why a certain query produces a certain result or why it's very fast it's very slow nice okay i would like to solicit questions from the group because you guys are quite quiet uh both in here and on the the chat .

What what what areas would you like us to dig deeper into i see a few questions showing up on the um on the picture hall yes board okay quick question from justin uh are there good examples of good database .

Designs that are available publicly that you would recommend yeah that's a that's a good question i i i i wish somebody compiled those or i wish i had myself um what i would look for is um database diagrams for open source projects i'm not claiming that they're the best .

But they're probably the only ones you're going to find published because any data diagrams inside of a business are tightly guarded as a sort of not as a trade secret but just you know it's not something that they're going to publish so if you if you're interested in .

E-commerce for example this is you know a field i keep going back to because i have experience with it you can look at the big systems like magento and so on that the guys have published database diagrams they're not perfect you know every software system has grown and grown a lot of craft and everything on the edges but .

They're often they're often well studied another one you can do is you can go look at for example wikipedis database backing it how does that work or or any of the larger the larger you know open source .

Project i don't know if stack overflow publishes the code but something like uh discourse or um a lot of these larger projects will have at some point they'll get documentation on how their whole data model works and that would be a good thing to study um but be warned that it's uh it's very .

Large um the book you know it it's it's it's often hard to understand some data models unless you also understand the code but a well-designed data model should be understandable on its own of course cod the guy who came up with relational databases and the book i .

Mentioned does go through some smaller toy examples and and of course i think they're they're they're excellent examples of of sort of the concepts that you may not see if you're just looking at a big database system out there you may not see why certain primary keys were chosen and why .

Certain things were done and i think the books will sort of explain those concepts so that that's where i'd start oh that was really comprehensive justin and q and does that answer your question yeah with us thank you awesome thanks justin uh my kind of question around positioning mike mentioned that uh chris .

You mentioned you were a software engineer first and a data engineer second why do we make a distinction between software engineer and data engineer is it an advantage to position ourselves as either a software engineer or a data engineer software engineer with experience in data science what is the .

How should we position ourselves as software engineers that can work with data so the reason i think of myself as a software engineer is is that once you've had enough exposure to this um data engineering and software engineering that the line between them .

Gets quite blurry because most most development you know there's there's a certain number of algorithms and you will learn most algorithms at some point right there's a lot of algorithms in the art .

Of computer science and some of these encyclopedias of algorithms but you will never have the need to apply them right certain algorithms will show up over and over in your programming and those algorithms eventually get abstracted into libraries or features of the language and increasingly what i've noticed over time is that software .

Engineering the software engineering that i do is i'm always thinking about the data data engineer as a title is is it generally seems to be applied to people who are cleaning or adding features to or .

Finding um finding you know aspects of data out it's usually applied to etl type roles or um uh these type of you know scripting tasks and so on so that's why um so so if you want to position .

Yourself i mean there's there's an advantage to positioning yourself as a data engineer right anything that you do that makes you more specific increases your your uh your higher ability and possibly the the the salary that you could command because you're you're you're you're a specialist right .

Um that said uh it it can be considered that the data engineering is more of a junior or commodity role so it may not um it may not give you the same sort of salary as like a full stack developer right .

Um and so so um now as far as an advantage with being a software engineer with experience and data yeah i think any of your exposure to data data engineering data science all that will actually translate quite well into your software engineering it .

Will change the way that you write programs definitely why is that um once you can once you i guess it gives you know this i've heard this anecdote that um that uh you know these master chess players .

They can look at they can look at a chess board and instantly know what's going on with it right they know they know who has who's in an advantageous position they know how this game has played out roughly to this point and they can remember the positions of everything on the board .

Very quickly whereas you know we look at it if we're not master chess players we look at it and we have to like individually look at all the pieces and and you know there's no there's no framework in our mind that we can quickly hang everything on the more you work with data over time and the more you work with the .

Mathematical concepts that are underneath it the more you're able to generalize the data that you're working with and fit it into something and and a lot of software engineering is combining algorithms with data structures .

Um so i i think of most software engineering as having two types of data structures that are used 90 plus percent of the time one is a tree and the other is a relation or table or something like it now .

When you look when you look at when you look at when you're going through and learning about algorithms you're going to learn about lists linked lists arrays you know all these all these kind of things right and you may look at some like really fancy data structures like red black trees and you know tries and all this .

Other kind of stuff and you may have some chance to use those at some point in your career but in general if data is hierarchical it's going to be some type of tree and if it's not hierarchical it's usually going to be in a relational format and um once you once you've .

Seen fitting data into those then you can focus on the algorithms and once you once you know the data algorithms the algorithm abstractions and the data structure abstraction then programming is more of a more of just kind of combining and getting things to flow from input to output rather than worrying at the .

Individual level of the chess pieces what what we're going to do with it that's helpful cool thanks chris there we had a question about uh specific dialects of sql we just learned postcard sql we choose postgresql at rocket academy what do you think of my .

Sequel and oracle uh which one do you prefer um i think mysql is owned by oracle right i'm not wrong some uh yes they they ended up purchasing it uh in an area so yeah but but they are three distinct databases i i love postgresql um i think it's it's um .

They have a very rigorous development process and they always prioritize uh correctness over features and performance and so for a long time they were behind because of that but now they seem to be ahead even in a lot of the performance areas as well right because once you get the the system performing correctly you have .

A rigorous process in place then you can start making optimizations without worrying about how you're going to break things so i use postgresql for a lot i have a lot of experience with mysql and mysql has its advantages so there's historical mysql which used .

Mysam as the as the engine and i think that's inferior in in most ways there's modern mysql which uses innodb as the engine and that is much closer to postgresql in odb was developed heavily by some guys inside of facebook that facebook hired when when uh when sun i think uh was being was being .

Bought over and they were getting the my sequel guys because i think sun bought my sequel before before oracle did so my sequel my sequel has some advantages but in general i think unless you really know what those are and they're very narrow advantages i would go with postgresql there's less surprises things .

Are just generally more correct and the performance is on par the one major advantage that mysql has even up until recently is that replication is better so replication is when you have data that's inserted into one database being automatically copied into another .

Database right usually within seconds or less than a second and the reason for doing this is that as you scale up a service as it comes under more load eventually you're going to have too much load for a single server right this is when you hit the point uh .

Where you know if you're processing thousands of queries per second eventually a single server may have an issue now servers have gotten a lot better especially with ssds um you should all be i think you're all familiar with ssds uh and and and also you know um beyond ssd's uh you know you can get .

One million two million iops uh i i i o operations per second so i'm getting into a little bit of the operations in the devops side but essentially you can scale a single database only so far .

Now most people most companies will only need a single database if they structure it well and optimize their queries but if you're at facebook size or something else like that right you need replication and that's where my sequel has historically been stronger in terms of oracle in the past i'd say in the 80s and 90s .

Oracle had advantages but postgresql has made up all or almost all of that ground and then exceeded them in many ways i can't i can't imagine i mean oracle from what i understand if you're willing to pay the licensing on it uh there's a lot of documentation and the performance should be pretty good but .

It's you're usually going to see it in in legacy systems and very large companies only and i just don't have enough experience with it who is maintaining postgres it is a group uh you can you can actually meet most of .

The postgres skill maintainers once conferences you know once you can get to international conferences again the guys are very approachable they show up at the uh at some of the database conferences um i've met a number of them um and usually what it is is they are employed .

By consultancies that position themselves as experts and want to have influence over the development of it there are some university uh staff that that are contributors and so on but the core is pretty small at least .

So when i worked with them it was less than 10 guys um yeah and so it's like linux right you have a sort of you have a you have a broad base of contributors and then you have a sort of a number of generals that that are sort of .

Um processing their areas of expertise and making sure that the contributors are sending in something that that follows the sort of the guidelines and so on the difference between linux development and postgres is i don't think you have like a single guy at the top like linus so .

Actually oh yeah yeah the person named after himself right um oh there are some new questions in the thing uh okay brian had a question specifically .

For boot camp students do you have any recommendations for a learning roadmap post-graduation maybe not not even not even just within data engineering uh how would we continue to grow even after we start working um yeah i think .

So so when i started making youtube videos um of about software development i i thought there was a shortage of i think there's a there's a problem in our industry that people don't read um specifically code they don't read other people's code and learn from it right we read books we watch videos and .

So on so i often like i i like anybody who's doing any coding on camera increasingly you know you can find these people on twitch even writing code on camera i'm not sure they're the best right like a lot of these game developers are self-taught they've been working not on a team for years and .

Years um but you know you can learn you can learn something uh i i if you can find anything that that's interactive that's that's best but i know those resources are pretty pretty um few um .

But don't get i'd say try not to get too focused on algorithms um i know that uh google specifically really pushed the industry in that direction for a long time and for google it's important right because everything for .

Them is latency you know they have to deliver things as quickly as possible so they had this real focus on algorithms and all these companies were copying their interviewing style and so a whole generation of engineers got obsessed with algorithms they're important but i i'd say just avoid going too far down into algorithms and and and work more on .

Um whatever you're interested in really i i could i could give more specific recommendations if you you know if you want to be a front-end web developer or something else like that but uh in general you know uh i i don't really have any good general recommendations sorry .

Brian we have any follow-ups were there any particular areas you were thinking about oh i think you're muted uh but uh sounds like okay okay yeah yes i was just asking um because in the context of us just graduating and we're all going to get our individual jobs but then the the i was wondering if .

There was like a way that we could set a direction and continue what's important in the industry or how do you view um tech the tech industry is moving towards where we can actually better position ourselves as um software engineers for the future and not just for the present yeah and this is mike's question as well .

So good good up too that's a good question um all i can say with confidence is that is to tie your learning to what you're doing because without if you're learning abstract concepts if you're just trying to learn a concept i i don't know how .

Well it sticks um for me personally i've learned the most always trying to solve a problem i'm working on it and that that problem just like looking at it and looking around on what people are discussing will reveal a lot of topics that i can study about more but always in the service of solving .

Solving a problem so if you're in a job i think just paying attention to what you're doing you're going to have plenty of motivation to to to to to uh branched out uh from there and then but if you're not in a job yet and you're going through you know the interviewing phase um .

Uh i mean obviously having a personal project if you have the time for it would be the best you know something that interests you and you'll find plenty and there's so much information available you don't you'll have no you'll have no problem you'll find the style of you know book or video that that that .

Works for you i think we touched on this briefly earlier uh when we were talking about uh postgres and the dialects that was that was actually really insightful you were talking about um i can't remember in what context it came out where we mentioned devops and i realized we had we hadn't talked about .

That uh so devops you know people know about devops as this you know it's kind of this uh devops and security right it's a bit of this kind of niche and sometimes sexy part of software engineering that but i guess maybe didn't used to be but now it's more sexy because those people get paid .

A lot what should we know as bootcamp and graduates or general software engineers coming in at entry-level software what how much do we need to know about devops how much is specific to data engineering how much should we know as a general engineer yeah i think that also ties into mike's .

Question as well about whether where the tech world is going and what what you need to know to be relevant and the reason i say that that ties in is because from where i see this sort of thinking about systems on a larger scale is becoming increasingly important and building individual programs is is .

Is less i think decades back writing a program was a large amount of the work and now there's so much open source systems available a lot of the work is connecting things together um .

Now you know there was there was this vision of a glorious future for software engineers where objects are reusable and everything has this well-typed interface and they can all plug together like legos it never happened but i think being aware of which systems .

Solve which problems well i mean obviously postgres is is an important one for for data but also for other things um uh knowing knowing these systems and knowing the connectivity between them is is important and this is a lot of where devops is right so you can think of .

Devops either as the very strict definition of the code building and deployment pipeline or you can think of it more as a developer who's working at a higher level than an individual system and that's how i think of devops so once you once you can move up and .

Connect things together that makes you very valuable in the company and i i don't know if it's always the case but but generally devops should be commanding a higher salary i know it's always in demand it's always in demand there you can find almost every company tech company here in singapore needs leads .

More devops um that that's how it feels to me at least the ones that i come into contact with yeah um yeah so so so uh i i i say two things right that that's sort of a long term goal don't try and learn all the systems out there and everything else in in the short term be .

Specialized right so if you know postgresql really well that that will make you quite valuable or if you know how the browser and the dom works or you know css very well or something like that you know those those skills will be valuable if you have if you if you're a generalist and an inexperienced generalist it's .

I mean you can still find work right this industry is very hot um but i think it's harder to uh really stand out and and even once you're in the job it's harder to add value um uh when you don't have a specialty so so potential specialties like i said postpresql really understanding the browser and the dom .

And and the javascript execution model and the css and all this this is all very important you know front end almost everything has a web front end nowadays um uh the areas that i think are less important are really the performance uh the the esoteric algorithms and so on i .

Know i i've downplayed them before um but knowing like you know knowing all the sort of the the uh you know it used to be uh popular these um uh patterns the patterns movement and knowing all these different patterns and and so on .

I think is not as useful as as knowing a specific tech to solve a specific problem well that's really helpful and and so devops when i think of doubts i think a lot of the cloud providers awas gcp have to mention all the azure alibaba cloud or so many different ones .

Uh ibm has one how how much of those cloud systems should we try and familiarize ourselves is it or should we just go with that problem-based approach that you were mentioning earlier yeah i i i i i think um i i i tend to think of a few services .

That that all of the platforms have in common as a foundation that you should probably know about right so the aws is is ec2 and then all the equivalents right which is which gives you a virtual machine to operate with then object storage right so this is f3 on aws .

And all the others have an equivalent i mean these are the two cores this is what amazon launched with in the beginning um everything else is sort of you know you can build everything out of those two right i mean technically you can build everything out of ec2 just a bunch of virtual machines but the object .

Storage is actually a useful abstraction it's very applicable the the other things tend to be sort of automated services that um will depend on the problem domain that that should be going after right like so rds well that's just a managed database that's effectively an ec2 instance they're just managing the installation .

For you um uh a lot of these services you know route 53 that's that's that's dns you may never work with dns it may be very important for you to know how dns works at a higher level though i i would recommend .

Essentially because you are software engineers you have the skills to learn um some abstraction so what's very popular is a tool like terraform i think there are other tools out there um like pollumi or um or specific to the cloud platform there's um there's one that aws puts out .

But all of these systems have apis and uh so get get out of the web interface and then be working with the command line tools or the apis some way that you can actually abstract do that as soon as possible now obviously in the beginning you're going to have to work in the in the web .

Console to kind of understand how everything is working but the sooner you can sort of abstract the more useful it is if you want to specialize in devops the web console of like aws for example i mean get into the automation and the abstraction of yeah of working with those services through .

The command line okay yeah so so with aws the first step to getting operating at a higher level if again if you're really going into devops and want to manage large networks of services is for example cloud formation right you design a template of how something gets deployed and you apply .

That so instead of clicking through everything you know yeah those those skills are are in very large demand that was very insightful do we have any other questions from the group that was super insightful chris maybe just one final question before we wrap .

Up uh would you have any parting advice for the for the group who are becoming software engineers for the first time you know it just goes back to the slides at the beginning don't underestimate the data take a little bit of time to understand the mathematics behind things i know i mean .

I i don't know everybody individually how they feel about about math but sometimes just learning these concepts to have a to have something to hang uh things on in your mind uh helps a lot and this one i think is the the biggest bang for your buck in in the software world um .

And so that's that's that's my uh that's my advice it really is it really does it it will if you do master it it will feel like a secret uh secret superpower right yeah awesome chris i love you so much for the next question oh go on mike is that okay .

Please yeah my mic was on mute i was i was talking and then anyway okay what's your tech stack nowadays what do you use every day so personally i wrote write most general code in haskell so i'm a big functional programming guy if you have an interest in going that way .

It has a very steep learning curve but all of the if you want to get to a high level of abstraction and do more with less code that's where i would go as far as the full stack you know javascript for anything sorry typescript if you if you do any work .

In the browser use typescript do not use javascript don't waste your time typescript is strictly better on every metric because it's just a superset of javascript that prevents a lot of silly errors so use typescript if you do any any web browser stuff um so typescript haskell postgresql linux those are those are my go-to tools for almost everything .

All right thanks everyone


Most Popular