Database vs Data Warehouse vs Data Lake | What is the Difference?

  Переглядів 690,240

Alex The Analyst

Alex The Analyst

День тому

Database vs Data Warehouse vs Data Lake | Today we take a look at these 3 different ways to store data and the differences between them.
Check out Analyst Builder! www.analystbuilder.com/
____________________________________________
SUBSCRIBE!
Do you want to become a Data Analyst? That's what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
____________________________________________
RESOURCES:
Coursera Courses:
Google Data Analyst Certification: coursera.pxf.io/5bBd62
Data Analysis with Python - coursera.pxf.io/BXY3Wy
IBM Data Analysis Specialization - coursera.pxf.io/AoYOdR
Tableau Data Visualization - coursera.pxf.io/MXYqaN
Udemy Courses:
Python for Data Analysis and Visualization- bit.ly/3hhX4LX
Statistics for Data Science - bit.ly/37jqDbq
SQL for Data Analysts (SSMS) - bit.ly/3fkqEij
Tableau A-Z - bit.ly/385lYvN
Please note I may earn a small commission for any purchase through these links - Thanks for supporting the channel!
____________________________________________
SUPPORT MY CHANNEL - PATREON/MERCH
Patreon Page - / alextheanalyst
Alex The Analyst Shop - teespring.com/stores/alex-the...
____________________________________________
Websites:
Website: AlexTheAnalyst.com
GitHub: github.com/AlexTheAnalyst
Instagram: @Alex_The_Analyst
____________________________________________
All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for

КОМЕНТАРІ: 280
@o.nature
@o.nature 2 роки тому
Alex, i don't know if you remember me. Over the past 2 years, I've been on your streams, commented on your videos, and emailed you my resume for help. I finally got a data analyst position 2 weeks ago and i love it. Thank you for everything.
@AlexTheAnalyst
@AlexTheAnalyst 2 роки тому
I do! Congratulations!! That's so awesome to hear!
@lukehening5915
@lukehening5915 10 місяців тому
Love this!
@tjpradeep9217
@tjpradeep9217 Місяць тому
Congrats man, how it's going ❤
@wesley8190
@wesley8190 Місяць тому
amazing
@dorins3787
@dorins3787 Місяць тому
Congrats!
@torontothomas8689
@torontothomas8689 2 роки тому
Thanks for this! I would love to see you make a video on ETL’s and automation used in it breaking it down for rookie data analyst !
@gangaadhikari9491
@gangaadhikari9491 2 роки тому
Yes! ETL is something I've been trying to learn and I got a new job where I will be able to learn more but having a source to learn the foundations would be great!
@brittanyholloman5693
@brittanyholloman5693 2 роки тому
I would like to see this as well! Thank you!
@jonahjohnbaba
@jonahjohnbaba 2 роки тому
I will love to have you do this for us at a granular level. Many thanks
@TheSchmed
@TheSchmed Рік тому
I write a lot of my own ETLs, with T-SQL/powershell/cmd, etc., most very large amounts of delimited text based data into a Star schema type of data structure. I always build it to tolerate changing file structures, using a “Stage then Load” type of method, bulk load and stage to all Varchar defined columns then Dynamically build and merge (upsert) from stage to prod/fact table by stage/fact column name match via the system catalog or some type of metadata definition tables for column mapping/join and key columns to use for merging (mostly time series) data. It’s important to use features defined sample sizes (i.e. Batchsize) so as not to cause excessive T log or resource usage on the back end, especially if a shared Relational data server. Many of these vendor products cannot handle files that change, whether data type, column / header name change, etc. without the load failing, or requiring a new definition/format setup each time. I’ve actually come across one or two products that attempt a single transaction when loading very large data files to tables, that caused excessive T-log growth, and brought the DB server to a halt. I write mine so it will always attempt the load to “known” columns, then report any that were new or missing. I also define processes to “transform” based on staging to fact column data type match, one example being many files from excel sheets have this “serial” date value that needs to be cast to datetime from a 5 digit numerical value. I need to go the next step though and how to incorporate AI with it. We get tons of loan payment data to be analyzed, probably 40-50 GB a month, with 30 or so years currently stored in SQL tables, one table 4-5 TB in size, both page compressed and partitioned, used for both very selective queries and queries that process 2-4 year data samples. Fun stuff. I would love to learn new methods that successfully replace pre aggregating, that work fast and are very flexible. I’ve Worked in the past with products like Hyperion, Cognos, SSIS, etc. as well as some vendor DW products like Paladyne, but they are always an 80% solution, with the remaining 20% being 80% of the work.
@ItsJustTooRed
@ItsJustTooRed 2 роки тому
This is great content, thanks. I've been working as an analyst/developer in anti-money laundering for over two years now, with zero tech experience going in. There are a lot of things where I've learned how to work with them without actually learning much about them, like the differences between databases and warehouses. This sort of short-form content is useful for quickly covering those questions I didn't even realise I had. Will definitely be referring to in the future.
@KahanDataSolutions
@KahanDataSolutions 2 роки тому
Great breakdown of such critical concepts. I find that it's also common to get tripped up when companies decide to use such goofy internal names for their databases/lake/warehouse. To the point where you can lose sight of what it actually "is" you're being asked to work on. Being able to relate back to these foundational concepts is always a helpful exercise. Big fan of the channel btw!
@traetrae11
@traetrae11 2 роки тому
Thanks for this. I already knew what a database and data warehouse was but had never heard of a data lake.
@MartijnVos
@MartijnVos 2 роки тому
Nice breakdown. From your explanation, I get the impression that the neo4j graph database we used on my previous project was actually a data warehouse for us, because we filled it with data we drew from many other databases, and structured it in a specific way (relations between the many different data elements from the different systems) for the reporting tool we built on top of it. And much of our data didn't come directly from those other databases, but drew it daily from an intermediary portal that contained all sorts of different kinda of data from different systems, which I guess was a data lake of some sort.
@jhewitthunt
@jhewitthunt 9 місяців тому
Nice simple video. Good job. Only negative comment is, there's no need to constantly show your face which blocks part of what you're trying to show - diagram, title, description etc...
@jenniferbell514
@jenniferbell514 2 роки тому
This was awesome! I'd been googling for answers to these very questions, and the visuals helped bring it into perspective. I see "data warehouse/ing" a lot in job descriptions, so it's great to get a handle on what it actually is.
@abdulrehmancheema5121
@abdulrehmancheema5121 5 місяців тому
The way you break it down and make it simple is great. Thank you for such a quick and insightful video.
@JoshuaJMorley
@JoshuaJMorley 2 роки тому
And now we see the latest iteration of a data topology, the lakehouse :) Great video! i really like that you mention the summarisation as a point of differentiation. Analytics on a data warehouse is typically done by a data analyst with traditional data analyst skillsets (SQL, R etc). Analytics on a data lake is typically done by a data scientist (Python, ML etc) An important point to note is "if you have all this data and you have no idea what to do with it" for data lake, a vital thing to focus on when creating a data lake is the structure of your data in the lake (as in directory and file structure) if you just dump it all in, it will become a data swamp and waste of money
@AlexTheAnalyst
@AlexTheAnalyst 2 роки тому
For sure - I'm implementing an Azure Data Lake at my company right now and that's exactly what we are trying to avoid lol
@JoshuaJMorley
@JoshuaJMorley 2 роки тому
@@AlexTheAnalyst adls gen2 using hierarchical namespaces? good technology :)
@MartijnVos
@MartijnVos 2 роки тому
A data swamp is a great addition to the list, although it's of course the thing you want to avoid.
@xMastJedi
@xMastJedi 2 роки тому
Better lakehouse than leakhouse :D
@andrij.demianczuk
@andrij.demianczuk 2 роки тому
Lakehouse is a term that marries the Data Lake and the Data warehouse. It does this by adding an abstraction layer to help organize and normalize data in the data lake with a combination of a hive meta store and a Delta format.
@umarahmed6853
@umarahmed6853 2 роки тому
Short and to the point, just like i wanted. Thanks!
@russellchase4544
@russellchase4544 Місяць тому
This was perfect. Short, sweet, and to the point. Thanks!
@n19ence
@n19ence Рік тому
Whoa, Alex with this clarity and instruction you're going to get my University ph.d instructors "Fired".
@niftyoptionslivetradingand7231
@niftyoptionslivetradingand7231 Рік тому
There are so many big universities around the world, but this guy made it so clear for me. You deserve to put your own University brother, thanks for enlightening me 🙏👍😊all the best 💐💐
@user-vb1gl7tn8y
@user-vb1gl7tn8y 2 роки тому
Alex, this is such a good concise description for folks! I’d love to learn more about data lakes, in my current role we have so many streams of data but apart from its initial use the data just gets lost in Excel files on shared drives. I’m wondering if we could leverage it better by having a centralized way of storing it. But I just don’t know that much about data lakes. Would love more content exploring this!
@AlexTheAnalyst
@AlexTheAnalyst 2 роки тому
Yeah that's usually what a data lake is for - some type of centralized system
@svenhohlfeld3483
@svenhohlfeld3483 6 місяців тому
Thank you so much. Having an already found understanding about OLTP and OLAP systems I was always struggling to understand what this datalake thing is! Now I know: It’s simply a file system. It’s a synonym for storage. Thank you for finally explaining this. I will always point to your video as a reference. 🎉
@Vucci_Mane
@Vucci_Mane 2 роки тому
Appreciate the video. I needed to know the difference between the data platforms since at some point I’ll be transitioning to data engineering once I studied enough. Also wanna take the time to say thank you so much for your vids! Been watching you since last year, your vids helped me prepared a lot for DA. As of now, I’ll soon be starting my 1st DA role with a great (I think lol) starting salary of $65k in SaaS. Would’ve been completely lost if it hadn’t been for your career insights and projects. Cheers from your HOU neighbor!!
@AlexTheAnalyst
@AlexTheAnalyst 2 роки тому
I'm so honored to hear that! Keep up the good work!
@JoeCMath
@JoeCMath 2 роки тому
Timestamps for chapters: 0:00 Introduction 0:35 What is a Database? 1:13 What is a Data Warehouse? 2:34 Key Differences between Database and Data Warehouse 3:15 What is a Data Lake? 4:10 Database vs. Data Warehouse vs. Data Lake In my previous job we worked with a Data Lake, which ended up being amazing for building general SQL skills as cleanup was needed to join the disjoint tables and get valuable results. I never looked at the exact definitions so thank you for this video!
@jal6008
@jal6008 2 роки тому
Thank you very much for this video. Actually I am currently doing the Coursera Google Data Analytics Course and in there I found these words databases and data warehouse and I was really curious about the difference between them and just found your video on it.
@arielspalter7425
@arielspalter7425 8 місяців тому
Super useful video. Much appreciated!
@user-fl1ip3gr1f
@user-fl1ip3gr1f 11 місяців тому
Than you. I am making a similar transition into becoming a Data Analyst so this background is extremely helpful .
@mzkhan1576
@mzkhan1576 8 місяців тому
thank you Sir Alex. Great and concise video.
@blankdevs
@blankdevs Рік тому
Understood the difference and I now know what I need for my own purposes. Amazing content kudos 🥂
@namanbhayani1016
@namanbhayani1016 9 місяців тому
Very well explained!
@jenbac.s1858
@jenbac.s1858 Рік тому
Great video! I just understood the differences between these key terms, thanks to your video. Something I did not grab with a very long texts written for the same purpose . Well done Alex 👏
@jamiethesailor
@jamiethesailor Рік тому
Great explanation! Been struggling with understanding the differences, this really cleared it up!
@GuruDanny
@GuruDanny 3 місяці тому
Thank you - for me, I learned new concepts - had no idea about data lake. Once again thanks for sharing your knowledge.
@katkyle8169
@katkyle8169 2 роки тому
This is a great video and you are a great communicator. I was having alot of trouble communicating this to my leadership team and this was really helpful!
@NdubisiOnuora
@NdubisiOnuora Рік тому
Simple and easy to use. Great voice and extremely friendly and humble.
@justindorsey8922
@justindorsey8922 11 днів тому
This was a great video. Concise and very clear. Excellent job on keeping it brief and giving all the information either. Could not have asked for better video
@_incarnate
@_incarnate Рік тому
Nothing but awesome!! This is very nice Alex. You won't ever know how much you have come through from me. 👏
@tund3_
@tund3_ 7 місяців тому
This is an amazing summary, clear and easy to understand, especially for someone with networking and security background. Thanks a lot.
@user-tc8sm4bq4x
@user-tc8sm4bq4x 2 роки тому
So simple and so fast!) Thank you so much)
@allsin03
@allsin03 11 днів тому
this helped me get a better grasp on the 3. great video
@Vikram_8621
@Vikram_8621 2 роки тому
Thanks for simplifying, great video! 👍
@TrickData
@TrickData 2 роки тому
Hi Alex, would love to hear your thoughts on the subject matter of adjusting to a new analyst position. I’m a new senior data analyst and going through a steep learning curve. Would love to hear your advice. How long does it take? Tips on being successful, your personal experience, etc. I think it would be a great video.
@veronicab2096
@veronicab2096 2 роки тому
I’m new to a higher level data position than I’m used to. I would love this kind of content as well.
@j.nakajima9070
@j.nakajima9070 2 роки тому
Thank you Alex, this was a question I had in a recent interview for a MN company in Singapore, I did not know how to answer.
@rajkumar-xg3iy
@rajkumar-xg3iy 2 роки тому
Oh. Much needed. Learing these concepts now on job
@homaiphuonganh131
@homaiphuonganh131 8 місяців тому
thanks Alex, love your channel and very clear explanation of critical concepts :) - can you also cover data lakehouse?
@dominiqueingrid7086
@dominiqueingrid7086 Рік тому
Short, helpful, well explained. Thanks!❤
@shadowitself
@shadowitself 2 роки тому
clear, straightforward, fantastic ;) thx
@jesselima_dev
@jesselima_dev 2 роки тому
By the first time, those concepts got clear to me. Thanks!
@user-zy7dh7mn8v
@user-zy7dh7mn8v 9 місяців тому
Clear and short, thank you!
@JH-py9wf
@JH-py9wf 2 роки тому
Thanks Alex. Would love a video on how you query data from data warehouses/data lakes and how that method differs from SQL
@janpedersen5780
@janpedersen5780 Рік тому
A data warehouse is queried using SQL, if the data warehouse is built using a relational database technology, such as MySQL, Oracle RDBMS, SQL Server. You usually don't query data lakes. They are mainly used for AI/Data Science purposes, and since they also contain non-structured data, you would not use SQL (Structured Query Language) which is designed for relational databases.
@toshioikene8200
@toshioikene8200 2 роки тому
Thanks for that clear concise explanation man.
@JHatLpool
@JHatLpool 11 місяців тому
A nice, clear presentation and nice explanations of the key terms. Thanks !
@mosesvarghese4566
@mosesvarghese4566 2 роки тому
Alex, it is awesome to see these videos man! Very informative.
@AlexTheAnalyst
@AlexTheAnalyst 2 роки тому
Is this THE Moses Varghese?? Thank you man! :D
@raj_kundalia
@raj_kundalia 5 місяців тому
It was helpful and made sense in terms of my current project too. Thank you!
@mangaart3366
@mangaart3366 Рік тому
Dope video, thanks!
@rvipinkumar
@rvipinkumar 2 роки тому
Super easy explanation. Thanks Alex.
@oyindamolatomoye6520
@oyindamolatomoye6520 2 роки тому
Super helpful. Thanks Alex
@jessechichi5609
@jessechichi5609 2 місяці тому
wow this nice, so much clarity and simplicity, thank you.
@myyntisuurvisiiri
@myyntisuurvisiiri Рік тому
This was the best explanation in UKposts. Thanks :)
@mohamedeliwa1380
@mohamedeliwa1380 6 місяців тому
Thank you. That helped me to understand and take care if opportunities I'm handling
@richbashaw9240
@richbashaw9240 10 місяців тому
thank you. great comparison and explanation
@majidrasouli2841
@majidrasouli2841 Рік тому
Much appreciated keep up the great actions
@thrilled2bits
@thrilled2bits 9 місяців тому
Very helpful video for basic understanding of differences between them, thank you.
@keifer7813
@keifer7813 2 роки тому
Awesome video. I got a question about a data lake on an interview recently and was stumped, so this was helpful
@umjain2007
@umjain2007 8 місяців тому
Excellent explanation. Thank you! Also wanted to know about Delta lake
@vijayhpune
@vijayhpune 3 місяці тому
The way you explained was very simple and easy to understand Thanks
@burinyuybongfen7857
@burinyuybongfen7857 2 місяці тому
Very well explained. Thank you
@NomNomCactus
@NomNomCactus 2 роки тому
Such a good and to the point explanation.thanks
@JaeHoYun
@JaeHoYun 2 роки тому
Thanks Alex. This is simple and useful for me
@guilboy
@guilboy 5 місяців тому
You're really good at walking people through concepts 👏👏👏👏👏👏
@stephenjones6260
@stephenjones6260 Рік тому
Very well done Alex!
@alainb4734
@alainb4734 Рік тому
Very well explained. Thank you for sharing.
@damianztone
@damianztone Місяць тому
amazing video- concise and very well explained
@yochai4561
@yochai4561 2 роки тому
Thanks for the explanation! I would expect to see some examples for each one, it makes the terms much more 'close' to us.. Keep going with this channel, it's super helpful! Thanks again
@ethanlipson1637
@ethanlipson1637 2 роки тому
ur welcome
@E_Gaks
@E_Gaks 2 роки тому
Really instructive video ! Thank you Alex for these details. I learned a lot.
@AlexTheAnalyst
@AlexTheAnalyst 2 роки тому
Glad it was helpful!
@out_of_ends
@out_of_ends 2 роки тому
Very informative. Thanks for sharing
@durgaprasadvadlamoodi1271
@durgaprasadvadlamoodi1271 9 місяців тому
Thanks for explaining, very clear now
@ruchaj.5550
@ruchaj.5550 Рік тому
Very informative,simple, easy to understand..thanks a lot
@meryemLux
@meryemLux Рік тому
Thanks, that was really well explained :)
@LLhawksley
@LLhawksley 5 місяців тому
best explanation of these 3 that I've seen
@bigstupidgrin
@bigstupidgrin 3 місяці тому
I don't know if I'll ever be ready for a Data Ocean...
@nikhilgoyal007
@nikhilgoyal007 Рік тому
wow! thank you!!!
@ifechukwumaduabuchi6455
@ifechukwumaduabuchi6455 2 роки тому
Thanks for this video
@harshalgavali
@harshalgavali Рік тому
very helpful! thanks :)
@Shoto_UK
@Shoto_UK Рік тому
Super helpful thanks.
@kevinmcinturf8976
@kevinmcinturf8976 Рік тому
This is great!
@JustNavika.k
@JustNavika.k Рік тому
Awesome explanation. Thank you.
@ben_jammin242
@ben_jammin242 Місяць тому
Hi Alex, thank you for the concise and informative video. I was wondering if you do any online mentoring? Thanks
@dmitrya9435
@dmitrya9435 Рік тому
Cool stuff, short and informative.
@amitavapaul118
@amitavapaul118 2 роки тому
very comprehensive video..Thanks!
@oscarparadajr
@oscarparadajr Рік тому
Clear cut explanation!!! Thank you!!!
@AlexTheAnalyst
@AlexTheAnalyst Рік тому
Glad it was helpful!
@I_am_smooth_as_butter
@I_am_smooth_as_butter Рік тому
Awesome explanation thanks
@lazyGirl014
@lazyGirl014 Рік тому
Thanks for your video, it is really clear and helpful information.
@marieriokeme3010
@marieriokeme3010 Місяць тому
Amazing explanation
@AnuEletu
@AnuEletu 6 місяців тому
You just saved me soo much time
@subhi_sadiyev
@subhi_sadiyev Рік тому
Best explanation ever 👍
@whoopinyou
@whoopinyou Рік тому
Great video. Subscribed.
@mehmetkaya4330
@mehmetkaya4330 Рік тому
Great explanation! Thanks
@AlexTheAnalyst
@AlexTheAnalyst Рік тому
You're welcome!
@Studio-oy6cu
@Studio-oy6cu 2 роки тому
Such a good recommendation by youtube ! - Thnx Alex.
@AlexTheAnalyst
@AlexTheAnalyst 2 роки тому
Glad to hear it!
@rajibroy1170
@rajibroy1170 2 роки тому
Like your Explanation Sir.
@drew315ful
@drew315ful 2 роки тому
Hi Alex... Great video... I have a question. Is it right to say that Hadoop or HDFS is a data lake?
@surfh3r0
@surfh3r0 7 місяців тому
i'm really interested on data lakehouse, are you considering explaint it too?
@codingstream572
@codingstream572 2 роки тому
you could have included some examples of each, that would be of great help !! and also maybe a real life usage of these, I know you discussed it for database and datawarehouse but for data-lake as well in a littlr but more detail. datalake was very vaguely covered
@andrij.demianczuk
@andrij.demianczuk 2 роки тому
Also have a look at Delta Lake. It’s about providing warehousing capabilities on a data lake, relying on a hive metastore and parquet normalization for column features. Delta lake is OSS and becoming more widely supported :)
@danielblair1916
@danielblair1916 Рік тому
Like a Data Lake-house then? :)
@sammail96
@sammail96 3 місяці тому
Very great explanation
@is.3846
@is.3846 2 роки тому
Alex always presents points in clear fashion.
How I Would Become a Data Analyst if I had to Start Over in 2024 | 6 Month Plan
11:01
НЕОБЫЧНЫЙ ЛЕДЕНЕЦ
00:49
Sveta Sollar
Переглядів 6 млн
How do NoSQL databases work? Simply Explained!
7:38
Simply Explained
Переглядів 1 млн
Top 6 Most Popular API Architecture Styles
4:21
ByteByteGo
Переглядів 801 тис.
What is ETL | What is Data Warehouse | OLTP vs OLAP
8:07
codebasics
Переглядів 397 тис.
Data Lakehouses Explained
8:51
IBM Technology
Переглядів 73 тис.
The Birth of SQL & the Relational Database
20:08
Asianometry
Переглядів 179 тис.
What is Microsoft Fabric? | New Data Analytics Platform!
7:23
Alex The Analyst
Переглядів 190 тис.
SQL vs NoSQL | What's the Difference?
5:55
Alex The Analyst
Переглядів 57 тис.
How I'd Learn to be a Data Analyst in 2024
13:17
Luke Barousse
Переглядів 208 тис.
НЕОБЫЧНЫЙ ЛЕДЕНЕЦ
00:49
Sveta Sollar
Переглядів 6 млн