Database Sharding and Partitioning

  Переглядів 54,583

Arpit Bhayani

Arpit Bhayani

День тому

System Design for Beginners: arpitbhayani.me/sys-design
System Design for Experienced Engineers: arpitbhayani.me/masterclass
Become a member for exclusive in-depth videos: / arpitbhayani
Redis Internals: arpitbhayani.me/redis
Sharding and partitioning come in very handy when we want to scale our systems. These concepts operate on the database and help us improve the overall throughput and availability of the system.
In this video, we take a detailed look into how a database is scaled and evolved through different stages, what sharding and partitioning are, understand the difference between them, see at which stage should we introduce this complexity, and a few advantages and disadvantages of adopting them.
Outline:
00:00 Introduction and Agenda
03:05 How a database is progressively scaled?
08:10 Scaling beyond the limit of vertical scaling
11:57 Sharding vs Partitioning
12:43 Example of Data Partitioning
17:15 Sharding and Partitioning together
20:20 Advantages and Disadvantages of Sharding and Partitioning
Arpit's System Design Masterclass
I teach a course on System Design where you'll learn how to intuitively design scalable systems. The course will help you
- become a better engineer
- ace your technical discussions
- get you acquainted with a massive spectrum of topics ranging from Storage Engines, High-throughput systems, to super-clever algorithms behind them.
I have compressed my ~10 years of work experience into this course, and aim to accelerate your engineering growth 100x. To date, the course is trusted by 500+ engineers from 9 different countries and here you can find what they say about the course.
Together, we will build some of the most amazing systems and dissect them to understand the intricate details. You can find the week-by-week curriculum and topics, benefits, testimonials, and other information here
arpitbhayani.me/masterclass.
Other links
CS Engineering and Software Development books that I have read
arpitbhayani.me/bookshelf
Research papers that I have read
arpitbhayani.me/papershelf
Newsletter: arpit.substack.com
LinkedIn: / arpitbhayani
Twitter: / arpit_bhayani
Things I use to make these videos
Apple iPad Pro 12.9 Inch: amzn.to/3jb4keI
Apple Pencil 2nd Generation: amzn.to/3DJ1Gq8
Boat Airdopes 621: amzn.to/3DIqGO6
GoodNotes Notetaking App: www.goodnotes.com/
Until next time, stay awesome :)
Yours truly,
Arpit
arpitbhayani.me
#AsliEngineering

КОМЕНТАРІ: 124
@Jamsessions0
@Jamsessions0 3 дні тому
One of the best explanations on the internet, well done sir
@shishirchaurasiya7374
@shishirchaurasiya7374 10 місяців тому
I was literally consfused in gaining the clarity untill you came to the point where you transposed this theory into understanding through tables and the reference with SQL queries, thanks a lot to your efforts for this loving beautiful explaination Arpit sir
@ranjithpals
@ranjithpals Рік тому
Thanks a lot ! That was well explained with clear and concise explanation. Looking forward to enrolling in your complete system design course.
@jaskiratwalia
@jaskiratwalia 2 місяці тому
Wonderfully explained! Cleared all my doubts. Please keep making such videos. These are also well timed, not too short nor too long.
@timamet
@timamet Рік тому
amazing explanations, thank you
@nimitkanani1691
@nimitkanani1691 Рік тому
Very beautifully and simply explained. The content of the video flowed so smoothly. Thank You @ArpitBhayani
@mohitkumartoshniwal
@mohitkumartoshniwal Рік тому
A very clear and detailed explanation. ♥️
@chaitanyawaikar382
@chaitanyawaikar382 Рік тому
One of the best videos explaining the nuances between partitioning and sharding. Thank you @ArpitBhayani
@___vandanagupta___
@___vandanagupta___ Рік тому
The knowledge of amount in this video is tremendous!!! Extremely helpful 👍👍👍 thankyou sir!!
@kritibindra4232
@kritibindra4232 Рік тому
Wow this was really really helpful! Thank you posting this.✨
@zeyuli53
@zeyuli53 Рік тому
well explained, thank you
@jithinb7047
@jithinb7047 9 місяців тому
Awesome content Arpit ! Thanks a lot and please do continue post more on concepts such as well as analysis of real use cases.
@neerajdixit7102
@neerajdixit7102 Рік тому
Awesome Arpit, Thanks truly admire your way of teaching
@AlokMehta24
@AlokMehta24 8 місяців тому
Excellent video Arpit . Coming from no software and system engineering background , this was the best video to explain data sharding and partioning . I am a Tech PM for AWS Supply Chain and data partitioning and sharding is real deal for us. Thank for making this extremely easy to understand video
@Sharmasurajlive
@Sharmasurajlive Рік тому
Simple and efficient explanation 👍🏻
@AqibJavaid-zl7vc
@AqibJavaid-zl7vc 29 днів тому
Excellent video ❤. Finally, I got a good grasp of the whole concept.
@hanzalasiddique6313
@hanzalasiddique6313 Рік тому
Mind Blowing ❤
@vamsidharvemuluri3817
@vamsidharvemuluri3817 Місяць тому
Best explanation so far. thanks brother
@nuclearniraj
@nuclearniraj 8 місяців тому
One video and all the clutter on Sharding and Partitioning is clear. Thank you so much Arpit.
@aditijalaj5036
@aditijalaj5036 8 місяців тому
this is an amazing video and your explainations are very clear
@kalinduabeysinghe8917
@kalinduabeysinghe8917 9 місяців тому
Such a clean explanation🙌
@vijaymunavalli335
@vijaymunavalli335 Рік тому
Its very practical explanation...cool one
@DEEPAKKUMAR-wk5pk
@DEEPAKKUMAR-wk5pk Рік тому
Wow great explanation
@letsexplorewithanika2642
@letsexplorewithanika2642 Рік тому
Very clear explaination
@KriszSch
@KriszSch Місяць тому
Great explanation!
@sameer1571
@sameer1571 4 місяці тому
Bro your diagram example made my day. Such a clear and concise explanation of this topic. Bro dil se love u ❤❤ for making this video.
@iMakeYoutubeConfused
@iMakeYoutubeConfused 2 місяці тому
Very clear explanation, thanks!
@shintojoseph9166
@shintojoseph9166 Рік тому
Clear explanation
@jasper5016
@jasper5016 2 місяці тому
Thanks so much Arpit!!
@prashantkamble898
@prashantkamble898 9 місяців тому
Greatly explained
@anandahs6078
@anandahs6078 Місяць тому
Very good explanation with right examples. Hats off to you. Thanks for great content. I always thought shard and partitions are same but you clarified it very well.
@KishoreThatavarthi
@KishoreThatavarthi 3 місяці тому
thanks a lot arpit sir really enjoyed and got full clarity
@heykalyan
@heykalyan Рік тому
Kudos to you❤
@kaal_bhairav_23
@kaal_bhairav_23 Місяць тому
thanks a lot arpit for an awesome explanation as always
@varshard0
@varshard0 3 місяці тому
thank you. I always assumed that they are the same thing. This cleared things up for me.
@nikhilrajput8696
@nikhilrajput8696 Місяць тому
Wow...really nice. Nowadays a lot of people are selling and talking about system design and always try to build some optimistic solution straight forward without going into the internals and in fact they have not even worked on a lot of systems. I strongly feel the way of your explanation is very very nice and I am going to buy your system design plan to improve mine.
@AsliEngineering
@AsliEngineering Місяць тому
Thanks. Looking forward to having you enrolled 🙌
@lazry1773
@lazry1773 Рік тому
Dude this was amazing
@TechSpot56
@TechSpot56 Місяць тому
Nice explaination, arpit.
@akshayrahangdale8511
@akshayrahangdale8511 5 місяців тому
Very Nice Video, I just loved the explanation.
@dhaanaanjaay
@dhaanaanjaay Рік тому
One question, at 21.00 the matrix shows what it looks like when we have both sharding and partioning, how that is different from having two databases on two different EC2 instance for two applications?
@pixiedustdreams
@pixiedustdreams 9 днів тому
I think I'm in love with this guy. 😢
@ranjithpals
@ranjithpals Рік тому
Thanks!
@PoojaDurgi
@PoojaDurgi 6 місяців тому
Amazing !!
@pramodpatil-ue8sm
@pramodpatil-ue8sm 6 місяців тому
Great explanation, as always. Please post a link If you have recorded any video on Partitioning strategies
@ryan-bo2xi
@ryan-bo2xi 10 місяців тому
bohot badhia bhai .. lajawwab
@shreyanshsinha37
@shreyanshsinha37 Рік тому
When we say Shard1 or Shard2, do we mean the sql server hosted on the EC2 instance combinedly as a shard?
@aneksingh4496
@aneksingh4496 7 місяців тому
super video Arpit
@anshujaiswal5622
@anshujaiswal5622 3 дні тому
Simple and to the point explanation .. Thanks Arpit, Liked & Subscribed :)
@sarthaknarayan2159
@sarthaknarayan2159 Рік тому
Awesome!!!!
@pranjalchoudhury1670
@pranjalchoudhury1670 3 місяці тому
Nicely expalined. :)
@ankitmaheshwari2341
@ankitmaheshwari2341 10 місяців тому
Do we use sharding when we have better options available like Oracle RAC where database can be scaled horizontally
@jivanmainali1742
@jivanmainali1742 2 роки тому
Arpit sir I need your help clearifying few doubts In ecommerce platform like shopify each mechant is given their own collection for order cart account differentiated by some merchant identifier (projectId-order ) vs Same order table index by merchant ideidentifier ie projectId.So we can't apply sharding in first case. Also is it wise idea to deploy each merchant application separately as we would have to maintain each merchant app separately.So what do you suggest in those case?
@hemsagarpatel8992
@hemsagarpatel8992 Рік тому
If we had horizontal partitioning and 1 partition getting so much traffic in real time how can we load balance the traffic. is it possible
@codecspy3479
@codecspy3479 5 місяців тому
2 Important points which i felt could be discussed more are 1) When you said the choice of partitioning depends on the load , usecase and access patterns , can you please give an example of each case ?? 2) When you were talking about the advantages and disadvantages of sharding , have you written these points considering only sharding and no partitioning or have you written considering both sharding and partitioning ??
@amananurag07
@amananurag07 6 днів тому
@arpit Thanks for such dense information in so short and simple video. However I have a query on a corner case - How can have replicas when one has multiple shards with partitioning? - In this case is replication locally on the shard or it can also be replicated on other shards for high availability across avalability zone or DR (like kafka architecture)?
@sumeetsingh1729
@sumeetsingh1729 2 місяці тому
how's it decided which shard is hit by request? Is there any router in front ensuring routing of requests?
@user-dq8sg4ik5k
@user-dq8sg4ik5k 9 місяців тому
literally one of the based video i have ever seen on this topic.
@tawseefbhat977
@tawseefbhat977 Рік тому
how do we know which partition or shard our data is located when we make query? any detailed explantion
@vikasbhutra9400
@vikasbhutra9400 2 роки тому
Thanks a lot Arpit for explaining in so simplistic way. One request can you please make video on Sharding strategies and also on how composite indexes stores in the disk.
@AsliEngineering
@AsliEngineering 2 роки тому
Soon.
@hc90919
@hc90919 Рік тому
@asli engineering - Bhai, any update on the sharding strategies. Also, one more request is examples of scenarios to explain shard key selection. How is the data replicated behind the scenes n stuff please ?
@rahulpanjwani1887
@rahulpanjwani1887 Рік тому
Beautiful
@rahulpanjwani1887
@rahulpanjwani1887 Рік тому
It makes you understand the value of a unified data platform team when scale increases.
@aditigupta6870
@aditigupta6870 3 місяці тому
Hello arpit, at 5:49, why you mentioned that the new resources are being allocated to the EC2 machine? I think that should be allocated to the DB server running on EC2 machine right?
@AsliEngineering
@AsliEngineering 3 місяці тому
I meant the server running the database. The database is eventually running on some VM.
@aditigupta6870
@aditigupta6870 3 місяці тому
@@AsliEngineering thanks arpit
@geekmuralin
@geekmuralin 8 місяців тому
Wow
@GaneshSrivatsavaGottipati
@GaneshSrivatsavaGottipati 25 днів тому
what if we have read replicas and still have partitioning?
@kritibindra4232
@kritibindra4232 Рік тому
Also which software did you use in this video to create pictures and write content?
@AsliEngineering
@AsliEngineering Рік тому
GoodNotes
@eatajerkpal99
@eatajerkpal99 Місяць тому
Hey arpit acan drop link for the notes that you presented in this video, thanks!
@eatajerkpal99
@eatajerkpal99 Місяць тому
found them on your github, i wont spam anymore. thanks!!
@Bluesky-rn1mc
@Bluesky-rn1mc 2 роки тому
how foreign key constraints are managed when two tables are in different shards ?
@AsliEngineering
@AsliEngineering 2 роки тому
Foreign keys are dropped when you adopt sharding. You cannot maintain FK when data is partitioned across multiple shards.
@Bluesky-rn1mc
@Bluesky-rn1mc 2 роки тому
@@AsliEngineering thanks
@abhigujjar7439
@abhigujjar7439 10 місяців тому
Can you please share the notes
@ohmygosh6176
@ohmygosh6176 11 місяців тому
Cross sharding quiries very very expensive. Its best to use tools to find out how the database is being used before making these decisions. I use PG Analizer tool for PostgreSQL
@arbazadam3407
@arbazadam3407 Рік тому
When you say we can have these partitions on the same server? That confuses me. On my linux server i installed MySQL which runs on port 3306. I have one MySQL process in this situation, so how can i spread the partition on this server.
@AsliEngineering
@AsliEngineering Рік тому
multiple databases within same MySQL server.
@sachinjindal4921
@sachinjindal4921 2 роки тому
Awesome, can you give some practical examples.
@AsliEngineering
@AsliEngineering 2 роки тому
These are practical as they can get keeping it generic and not touching upon SRE side of things :) Every database comes it its own partitioning and sharding strategy and we need to go through their documentation to apply it. I talked about using a database proxy to bifurcate the request in one of the earlier videos, in case you are looking for that. Would recommend you picking a database and seeing how you can actually create shards and manage them. ElasticSearch can be a great start.
@aditigupta6870
@aditigupta6870 3 місяці тому
One shard also must be having replicas right? I mean if a shard is handling the first 2 partitions, then all data from those first 2 partitions will go to this shard, but what if the shard is down?
@AsliEngineering
@AsliEngineering 3 місяці тому
shared can have replicas to scale the reads. If the shard goes down, then either you auto promote replica to take over, or take the downtime.
@dbads
@dbads Рік тому
💯
@imperfecto7734
@imperfecto7734 8 місяців тому
@arpit what's the benefit of partitioning the data but not sharding it. Can you give me a usecase please?
@AsliEngineering
@AsliEngineering 8 місяців тому
Partitioning allows your database to read/access/move the required subset of data easily and efficiently. 1. Imagine if you partition data by time and create one partition for every hour and someone queries how many events happened in the last 10 hours, you would just need to access last 10 partition to fulfil this query. Others are not even required to be read. 2. In a distributed setup, instead of moving individual rows/elements we can easily and efficiently move partitions across the cluster for balancing the load.
@imperfecto7734
@imperfecto7734 8 місяців тому
Understood! Thanks 🙏
@gigachad400
@gigachad400 11 місяців тому
One of the biggest disadvantages of sharding over a SQL server is you lose the ACIDity so you have to be careful while you doing it with SQL databases
@ankitmaheshwari2341
@ankitmaheshwari2341 10 місяців тому
I think that's not true
@ManojYadav-ls6wo
@ManojYadav-ls6wo 26 днів тому
12:10 20:12 👍👍
@jineshbagrecha6278
@jineshbagrecha6278 Рік тому
When to use master master, master candidate master replications?
@AsliEngineering
@AsliEngineering Рік тому
master master - scaling writes beyond one machine master replica - scaling reads
@GaganJain2508
@GaganJain2508 9 місяців тому
Does it mean Sharding and replication are the same? 22:16
@aadimanchekar1032
@aadimanchekar1032 Рік тому
How do we know that in which partition does the data lie?
@AsliEngineering
@AsliEngineering Рік тому
That's that partitioning strategy
@sachthecool
@sachthecool Рік тому
Hi Arpit... You have nice videos. I like interviewes with people involved in growing high scale systems. However in this video, concept explained is wrong. Partition & Shards are same (term is used interchangeably). What you are referring as Shard is Nodes (or host container). You may want to correct the same. Hope this helps.
@AsliEngineering
@AsliEngineering Рік тому
I agree the terms are used interchangeably; but overall what i explained is correct also I cleared the same in the video as well.
@shrad6611
@shrad6611 6 місяців тому
finally I understand what sharding is, thanks a ton
@anupkut
@anupkut 4 місяці тому
I think we should not consider only read replicas as sharding concept.
@iHariPatel
@iHariPatel 6 місяців тому
As my view Partition is more complex because you have to work with partition key! With wrong query accidentally query scan all partition’s.
@mudassarh4268
@mudassarh4268 2 роки тому
Sharding strategies could have been taken up like range based and hash based sharing with their user case
@AsliEngineering
@AsliEngineering 2 роки тому
Sir. Video would have been too long. No one would have watched it. But definitely planning it for the next one.
@mudassarh4268
@mudassarh4268 2 роки тому
Definitely sirji that could have added another 30 mins of content. Awesome content as always and looking forward to further stuff 👍
@pranavnadimpalli4929
@pranavnadimpalli4929 Рік тому
22:34 cross share queries are expensive
@kumarshubham4640
@kumarshubham4640 Місяць тому
Why course price exceeded by 20k in 1 year?
@AsliEngineering
@AsliEngineering Місяць тому
In 2 years, not one. The course has changed completely and I go much more in-depth and the sessions go for 4 hours each. Earlier it used to be 2.5
@akshatreddy9870
@akshatreddy9870 2 місяці тому
Hi
@abhishekdhillon7110
@abhishekdhillon7110 5 місяців тому
dude, the way you have explained higher availability as an advantage of sharding is not right. When you have a sharded DB and various shards live on different servers, if one of the shards go down, availability is not an advantage since you can't perform any operations on that specific shard which is not available. For example, if you have two shards named A and B, if shard is down or not available, you can't read anything from that shard so all of the queries that are expected to read from shard A would fail unless you have a read replica of that shard. I feel that there is a better way to explain it. However, thanks for all your efforts and your content is helpful to a large extent.
@AsliEngineering
@AsliEngineering 5 місяців тому
Yes we cannot perform operation on that shard but we can still serve requests that can be served from the other shards. Hence the system still remains partially available.
@arun10071990
@arun10071990 5 місяців тому
I think sharding has specific use cases not every solution requires sharding. The way he arrives at sharding solution is totally absurd. If one really wants to scale the writes he can also upscale the master db servers. Why to shard then ?
@AsliEngineering
@AsliEngineering 5 місяців тому
When did I not consider vertical scaling?
@arun10071990
@arun10071990 5 місяців тому
@@AsliEngineering it's not about vertical scaling it's about we can scale database with horizontal scaling and that too without using sharding Like multiple master servers for writes and multiple slave servers to handle reads
@sharoonaustin551
@sharoonaustin551 Рік тому
Small suggestion ad beech me mat daala karo bro, concentration toot jaata hai
@AsliEngineering
@AsliEngineering Рік тому
UKposts daalta hai. I just enable them. It is upto their algorithm to decide where to place.
@AsliEngineering
@AsliEngineering Рік тому
And I totally understand your frustration with ads but the world runs on them. Can't do much without it.
@luisdanielmesa
@luisdanielmesa 7 місяців тому
We both worked for Amazon and you know nobody there would have taken this course... So you're either lying or... nah, you're lying.
@AsliEngineering
@AsliEngineering 7 місяців тому
15 SDE-2s, 3 SDE-3, 1 PE and 1 HoE took my course. If you do not to believe it is upto you.
@AsliEngineering
@AsliEngineering 7 місяців тому
Fun fact, after I replied to your comment I went on a 1:1 call and it was with an SDE-2 at Amazon working in CCF org :D
@jose000
@jose000 2 роки тому
Iio
@akshatreddy9870
@akshatreddy9870 2 місяці тому
Very bad. Hindu never shave off moustache and keep beard. Mussalman banne ka irada hain keya ? Please understand that you are Sanatani
@akshatreddy9870
@akshatreddy9870 2 місяці тому
Either shave both beard and moustache or keep both moustache and beard. Don't just shave moustache only and keep beard.
@iMakeYoutubeConfused
@iMakeYoutubeConfused 2 місяці тому
He's put so much effor into the content of this video and this is all what you've got to say?
@amogu_07
@amogu_07 2 місяці тому
thank you so much , clearly understood!!
How do indexes make databases read faster?
23:25
Arpit Bhayani
Переглядів 43 тис.
What is DATABASE SHARDING?
8:56
Gaurav Sen
Переглядів 888 тис.
Анна Трінчер - Бар за баром (Official Music Video)
02:38
Анна Трінчер
Переглядів 1,7 млн
Парковка Пошла Не По Плану 😨
00:12
Глеб Рандалайнен
Переглядів 12 млн
Артем Пивоваров х Klavdia Petrivna - Барабан
03:16
Artem Pivovarov
Переглядів 2 млн
When should you shard your database?
21:20
Hussein Nasser
Переглядів 75 тис.
How does the database guarantee reliability using write-ahead logging?
22:06
2. What Makes Redis Special? | Redis Internals
22:04
Arpit Bhayani
Переглядів 25 тис.
What is Database Sharding?
26:56
Be A Better Dev
Переглядів 146 тис.
Why do databases store data in B+ trees?
29:43
Arpit Bhayani
Переглядів 24 тис.
How Dukaan moved out of Cloud and on to Bare Metal w/ Subhash | Ep 5
1:37:08
Implementing Vertical Sharding
24:41
Arpit Bhayani
Переглядів 9 тис.
Introduction to RPC - Remote Procedure Calls
33:05
Arpit Bhayani
Переглядів 22 тис.
Vortex Cannon vs Drone
20:44
Mark Rober
Переглядів 12 млн
Компьютерная мышь за 50 рублей
0:28
dizzi
Переглядів 329 тис.
I had no idea SHEIN sold PC parts…
27:10
Linus Tech Tips
Переглядів 1,4 млн
Нужен ли робот пылесос?
0:54
Катя и Лайфхаки
Переглядів 792 тис.