Solving real world data science tasks with Python Pandas!

  Переглядів 1,524,056

Keith Galli

Keith Galli

День тому

Practice your Python Pandas data science skills with problems on StrataScratch!
stratascratch.com/?via=keith
In this video we use Python Pandas & Python Matplotlib to analyze and answer business questions about 12 months worth of sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.
Setup!
Github source code & data: github.com/KeithGalli/Pandas-...
Installing Jupyter Notebook: jupyter.readthedocs.io/en/lat...
Installing Pandas library: pandas.pydata.org/pandas-docs...
Check out the first video I did on Pandas:
• Complete Python Pandas...
Check out the videos I did on Matplotlib:
• Intro to Data Visualiz...
• Python Plotting Tutori...
Detailed video description! (timeline can be found in comments)
We start by cleaning our data. Tasks during this section include:
- Drop NaN values from DataFrame
- Removing rows based on a condition
- Change the type of columns (to_numeric, to_datetime, astype)
Once we have cleaned up our data a bit, we move the data exploration section. In this section we explore 5 high level business questions related to our data:
- What was the best month for sales? How much was earned that month?
- What city sold the most product?
- What time should we display advertisemens to maximize the likelihood of customer’s buying product?
- What products are most often sold together?
- What product sold the most? Why do you think it sold the most?
To answer these questions we walk through many different pandas & matplotlib methods. They include:
- Concatenating multiple csvs together to create a new DataFrame (pd.concat)
- Adding columns
- Parsing cells as strings to make new columns (.str)
- Using the .apply() method
- Using groupby to perform aggregate analysis
- Plotting bar charts and lines graphs to visualize our results
- Labeling our graphs
If you enjoy this video, make sure to leave it a like and subscribe to not miss any future similar tutorials :).
Check out the new "solving real world data science tasks" video I posted!
• Solving real world dat...
---------------------------------------------
Follow me on social media!
Instagram | / keithgalli
Twitter | / keithgalli
---------------------------------------------
Video Timeline!
0:00 - Intro
1:22 - Downloading the Data
2:57 - Getting started with the code (Jupyter Notebook)
Task #1: Merging 12 csvs into a single dataframe (3:35)
4:25 - Read single CSV file
5:44 - List all files in a directory
7:06 - Concatenating files
11:00 - Reading in Updated dataframe
Task #2: Add a Month column (12:48)
14:12 - Parse string in Pandas cell (.str)
Cleaning our data!
17:31 - Drop NaN values from df
21:25 - Remove rows based on condition
Task #3: Add a sales column (24:58)
25:58 - Another way to convert a column to numeric (ints & floats)
Question #1: What was the best month for sales? (29:20)
30:35 - Visualizing our results with bar chart in matplotlib
Question #2: What city sold the most product? (34:17)
35:32 - Add a city column
36:10 - Using the .apply() method (super useful!!)
40:35 - Why do we use the lambda x ?
40:57 - Dropping a column
46:45 - Answering the question (using groupby)
47:34 - Plotting our results
Question #3: What time should we display advertisements to maximize the likelihood of purchases? (52:13)
53:16 - Using to_datetime() method
56:01 - Creating hour & minute columns
58:17 - Matplotlib line graph to plot our results
1:00:15 - Interpreting our results
Question #4: What products are most often sold together? (1:02:17)
1:03:31 - Finding duplicate values in our DataFrame
1:05:43 - Use transform() method to join values from two rows into a single row
1:08:00 - Dropping rows with duplicate values
1:09:39 - Counting pairs of products (itertools, collections)
Question #5: What product sold the most? Why do you think it did? (1:14:04)
1:15:28 - Graphing data
1:18:41 - Overlaying a second Y-axis on existing chart
1:23:41 - Interpreting our results
---------------------
If you are curious to learn how I make my tutorials, check out this video: • How to Make a High Qua...
Join the Python Army to get access to perks!
UKposts - / @keithgalli
Patreon - / keithgalli
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

КОМЕНТАРІ: 1 700
@KeithGalli
@KeithGalli 3 роки тому
Posted a new "Solving real world data science tasks" video! Check it out here: ukposts.info/have/v-deo/faeYrWN-cJmew5s.html
@Trazynn
@Trazynn 3 роки тому
This is awesome. Learning Python is so much easier when there's something tangible and grounded to work towards.
@colorways518
@colorways518 3 роки тому
hii keith!!! I am getting an error after this line CODE: for file in files: current_data = pd.read_csv(path + "/" + file) ERROR: ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 Please can you help me solve this error....I tried to find solution online but didn't get any.
@larrywang1983
@larrywang1983 3 роки тому
@@colorways518 Just thinking out loud,aren't we able to find the below kind of info from Amazon Jungle Scout, Helium10, Sellics. We are amazon seller, do we also need to go thru Python and data-science on Amazon. There are 3rd Party SaaS plug-ins to solve these questions. Correct me if i am wrong? - What was the best month for sales? How much was earned that month?
@ismaeelaileru4612
@ismaeelaileru4612 3 роки тому
For the problem on getting city with highest sales, we ran into an ordering problem while plotting the cities, I think we can also use result.index as our xtick That way it simply takes the values straight from the Dataframe in the right order rather than using df.unique and rearranging
@rodrigodasilva9176
@rodrigodasilva9176 3 роки тому
This red warning displays bcuz u didn't make a copy of the original dataframe, do it and this warning goes off.
@billyjorrosh9394
@billyjorrosh9394 3 роки тому
"I dont know how to do it, but i know how to google it." this guys knows how things going in real world haha
@thanhnhando3070
@thanhnhando3070 3 роки тому
Googling is, indeed, one of the most important skills for coding.
@indexima6517
@indexima6517 3 роки тому
Hahaha! We invite you to take a look at our videos which deal with the same topics :)
@carlurbananimals
@carlurbananimals 3 роки тому
His very fast too, like I would need to know it, coz once I go to google im there for 4 hours :/
@samirvinchurkar8226
@samirvinchurkar8226 2 роки тому
I did the exact same process be it R, Matlab or Py
@samirvinchurkar8226
@samirvinchurkar8226 2 роки тому
@@carlurbananimals that's coz your question isn't exactly right ;)
@justapugontheinternet
@justapugontheinternet Рік тому
As a programmer/data analyst/systems administrator I can safely say that this is exactly how we solve problems in real life. Good job!
@pasha7293
@pasha7293 Рік тому
you wouldnt have watched this video if you were
@justapugontheinternet
@justapugontheinternet Рік тому
@Pasha people who think they know it all are a bore. 🙄 You could always learn something new from other people, it never hurts to learn new perspectives. Good luck with that mindset. I learn everyday. 😌
@saugatjarif8272
@saugatjarif8272 11 місяців тому
@@justapugontheinternet love your mindset on🎉🎉🎉🎉
@terrymaverick580
@terrymaverick580 3 роки тому
the best part part was watching some one google the answer an seeing how they implement the solution instead of just acting like they know everything. man your tutorials are the best an down to earth
@Amir-tv4nn
@Amir-tv4nn Рік тому
hahahahaaha you think this kids knows what he is doing and for your information we all google no matter what postion we hold. 🤣 we built websites for a reason to always look back to when needed. Google provides faster search capability rather going to src and look through to get to. Get your mind straight about goodle 🤣 This kid clearly looking around for the code he already written and you assuming google is preferred to be a bad example as a programmer 😂 tells me you expecting movies type like hackers hahahaahahaha. Come to reality
@dragonmateX
@dragonmateX Рік тому
It honestly makes it feel more real, like, I am studying data science now and I google stuff all the time, the fact that even someone well versed in data science still googles stuff constantly is reassuring.
@Amir-tv4nn
@Amir-tv4nn Рік тому
@@dragonmateX people who work in google google stuff 😂 get back to reality to why google is meant for🤣
@buak809
@buak809 Рік тому
@@Amir-tv4nn and? what the fuck is your problem? so far you didn't write anything valuable here
@Diabolic9595
@Diabolic9595 Рік тому
@@Amir-tv4nn Come to reality. Man, come to reality. Could you please come to reality? Btw you should come to reality
@olajiireolajide
@olajiireolajide 2 роки тому
Love how realistic and down to earth all your videos are! Makes data analysis way more approachable. What a guy!
@helmialfath9897
@helmialfath9897 4 роки тому
This situation so realistic. The mistakes, the solving.. great video!
@Pidamoussouma
@Pidamoussouma 4 роки тому
Yes liked it ..it was so realistic
@user-ok5xb8sf3q
@user-ok5xb8sf3q 3 роки тому
is this sarcasm?
@ipshie
@ipshie 3 роки тому
Юрій Черній pretty sure no it's not
@billyjorrosh9394
@billyjorrosh9394 3 роки тому
not only teach us about pandas but also give us the confidence that "If this guy could be so success in data science then why shouldn't I?"
@89DerChristian
@89DerChristian 2 роки тому
@@user-ok5xb8sf3q no
@ujjawaljani6731
@ujjawaljani6731 3 роки тому
He is like my friend who teachs one day before exams. 😂😅
@sushiplatter5540
@sushiplatter5540 2 роки тому
Keith, you're literally the most underrated and one of the best teachers on youtube. This exercise cleared most of my doubts about Data Science and i fell in love with it because of you. Thank you so much for this, you're the best!
@H99x2
@H99x2 2 роки тому
Dude, this is by far one of the best real-life tutorials on YT. Subbed for more like this!
@devmrin
@devmrin 4 роки тому
Hands down one of the most useful I've seen. Insights galore. Thank you!
@Magmatic91
@Magmatic91 3 роки тому
I love how this guy is explaining, I really enjoyed learning from you.
@karimkhatib8569
@karimkhatib8569 3 роки тому
Really interesting to go through the entire process, including looking up solutions and solving errors!
@imdadood5705
@imdadood5705 3 роки тому
Thank you, Keith. I haven’t got enough words to thank you for this work. This is a great project for a beginner. Thanks again! 😊
@user-ci1oj3xo6h
@user-ci1oj3xo6h 4 роки тому
Content of this quality deserves far more recognition. Thank you!
@mid_paulownia
@mid_paulownia 3 роки тому
This is the most practical Python tutorial video I've ever watched.
@akosasuke5128
@akosasuke5128 Рік тому
I get the feeling in this video that you know more than you're letting on but you're just trying to make things as basic as possible and I love it. I hope to teach others in this same manner. God bless you
@hoiying-chan
@hoiying-chan 3 роки тому
Your assignments are harder than Coursera's. I'm actually learning something. Major thanks all the way from Holland! 🙏
@Random_dudebro
@Random_dudebro 4 роки тому
I just finished your two videos demonstrating numpy and pandas, finally feeling a good grasp of python basics (y) Thank you for everything you do!
@aphotos2284
@aphotos2284 4 роки тому
This is one of the best videos out there. Please do more of these. It's great to learn about the mindset as well as the technique!
@abdulqadirtinwala1296
@abdulqadirtinwala1296 3 роки тому
Dude , literary i have never seen anyone solving real world problems on you tube .Your, way of teaching is quite impressive. Many, you tubers just showcase basic problems .But, hats off to you !!!
@mikeyu6347
@mikeyu6347 8 місяців тому
I was absolutely blown away by the fanastic lectures. The best teacher I've ever had!
@francescofaccia
@francescofaccia 4 роки тому
Hy Keith, you're great! thanks to you we can be introduced to a hell of a lot of useful panda tools! keep up the good work!
@DarshanMalu
@DarshanMalu 4 роки тому
You are awesome! Thanks for patiently explaining everything, also teaching how to google what you want! Thanks man!
@deeplysuperficial8132
@deeplysuperficial8132 2 роки тому
By far one of the best tutorials I've seen in a long time. I'll be watching all your content. You explain things in a way that I'm able to perfectly keep up with.
@jeffmiller7010
@jeffmiller7010 2 роки тому
I enjoyed working through this real world data analysis problem with you. I look forward to more, please do more problems like this. It helps me to work out problems in Python.
@royvivat113
@royvivat113 3 роки тому
This is the most informative video I've ever seen on what data science actually is! I keep looking for actual applications and I loved seeing your thought process, comments, and method of asking and answering questions.
@rezap1356
@rezap1356 4 роки тому
The best graph type for correlation is 'scatter graph', looks like a constellation. Great video Keith. Thanks.
@Jordanptheone
@Jordanptheone 2 місяці тому
Watching this 4 years after you published it, and you're still a legend ! Thank you !!!
@KeithGalli
@KeithGalli 2 місяці тому
Thank you for watching and the kind words!!
@matty5ps444
@matty5ps444 Рік тому
just to add to what most people are saying, this is in my opinion the best way to do a tutorial. you showed me that even though im a super beginner and not long coming out of learning basic python things im able to pick up something really easily while realising that i dont have to feel bad thinking everyone else is better than me and that even experienced programmers google stuff and actually are not gods sitting on pedestals acting like they are better than us haha. great work
@rafacardenas8783
@rafacardenas8783 4 роки тому
great job Keith!, keep up with the walk-through-style tutorials, hands on is the best and even better when you have the feedback.
@yaswanthfinds
@yaswanthfinds 4 роки тому
so nice I was searching this kind of tutorial, it has real-time mistake and solution,I hope you do this kind of videos regularly
@chineduezeofor2481
@chineduezeofor2481 3 роки тому
Thank you so much for this Keith! Beginners like me appreciate this a lot.
@stefanlasek3256
@stefanlasek3256 3 роки тому
Honestly, one of the best videos I have seen. From mistakes, how to look for answers and little tips & tricks. You have got new subscriber in me.
@ijbarraza
@ijbarraza 4 роки тому
As a new learner of python I found this to be one of the best videos on youtube for beginners. How he managed to deal with the problems and solve them on the go (not knowing it all, but knowing how to consult google for the right answer). Way to go! Loved the approach and how easy you made it look
@anthonygonsalvis121
@anthonygonsalvis121 3 роки тому
Love how this cool dude researches solutions on the fly and explains things as he goes even when he commits minor unforced errors. He is so relatable. His other tutorials on Pandas, Numpy, Matplotlib, etc. are equally helpful. I wish him all the success and hope that he continues to share his knowledge for decades to come.
@chineduezeofor2481
@chineduezeofor2481 3 роки тому
He's such a GREAT tutor!!!
@indrajeetsinghyadav876
@indrajeetsinghyadav876 2 роки тому
Agreed totally relatable and helpful videos for beginners giving them a chance to know what error can happen due to what syntax errors. Thanks for the informative guide.
@muskankaushik5628
@muskankaushik5628 3 роки тому
This was a great video,you covered a lot of pandas and also showed real work which includes learning by making mistakes,looking things up.Exactly what i was looking for. Thanks a lot!!
@manhaabdellah2682
@manhaabdellah2682 Рік тому
Im new to data analysis. My instructor always tells us to search our questions on google and get help from stack overflow. I didnt understand it till now and got stuck on my second project for sales analysis. This helped me big time!!! I'm so thankful to you for telling all those shortcuts. The data time split had such a long tricky code online.
@Account-fi1cu
@Account-fi1cu 3 роки тому
Great tutorial! thank you for sharing In 50:26 for cities: can always use the index values from 'results' DF: cities = results.index.values instead of a for loop
@dawnfantasy
@dawnfantasy 4 роки тому
50:47 cities = result.Sales.keys() works as expected. great tutorial, tks!
@user-jw5tk2ef2f
@user-jw5tk2ef2f 6 місяців тому
This was the best python tutorial video I have ever watched! Thank you for taking the time to go into depth about the process of data science. You're awesome!!
@TanayaAmar
@TanayaAmar 3 роки тому
Loved this video - especially the real-world approach! Please keep creating more such content! Thank you so much!!
@SaulOjeda
@SaulOjeda 3 роки тому
this video was amazing, I can't believe I actually sat throught the whole thing past my bedtime
@exploringwithdave5926
@exploringwithdave5926 2 роки тому
If you are a coder, there is no such thing as "bedtime". Just, awake, and not awake.
@FrancisBaconthe3rd
@FrancisBaconthe3rd 3 роки тому
Didn't watch more than a few minutes since I already know how to do most of this stuff but loved how the dude straight up tells us to google it. SO TRUE!!! I've had professors who tell me the same thing. Thumbs up.
@Jack-xy4fy
@Jack-xy4fy 3 роки тому
fantastic video, thank you so much! showing your mistakes and working out the solutions is absolute gold, that is something many other tutorials are missing.
@sathirasilva4958
@sathirasilva4958 2 роки тому
Great tutorial! 55:00 When parsing a column into datetime, specifying the format manually will decrease the execution time significantly: all_data['Order Date'] = pd.to_datetime(all_data['Order Date'], format='%m/%d/%y %H:%M')
@rotan90
@rotan90 Рік тому
on google colab it was like 30 sec vs 2 sec. Great tip !
@edric7552
@edric7552 Рік тому
Hi Keith, I feel obligated to personally thank everyone that helps in pursuing my data career and of course, you included. I've used your project (and learned a LOT) and modify/add codes here and there with my own styling for my online portfolio. Moreover, you're a fantastic teacher and you deserve all the credits you should get for helping others like me. Thank you for doing this, may God return the favor and always bless you. Rock on Keith!
@KeithGalli
@KeithGalli Рік тому
Thank you so much for the kind words! :)
@kelvingitari
@kelvingitari Рік тому
Best data analysis video I have watched so far! I also love how most people in the comment sections have outlined alternative ways of approaching some of the tasks.
@rafaelmachado7666
@rafaelmachado7666 Рік тому
Amazing video ! All the mistakes and the searching process make the beginners in data science realize that it's possible to do a lot of things since the start of the journey. Thanks
@oluwadamilaretijani1777
@oluwadamilaretijani1777 Рік тому
Your courses are very great as you delve into practical content. Your course helped me to pass data analysis test in Turing. Thank you so much
@akosasuke5128
@akosasuke5128 Рік тому
Congrats oludamire, I'm guessing you're a Nigerian. I'm a Nigerian too and recently got into Exploratory Data Analysis through the udacity Nanodegree program. I'm currently on my second project which is an Investigation of WeRateDogs Twitter dataset. I think I have learnt a thing or two so far. Do you think I'm ready for Turin?..i hear it's like going to the big leagues lol.
@kyledawes9593
@kyledawes9593 3 роки тому
As a business major with very limited internship experience, I am teaching myself python and data analytics from scratch. This video is literal gold to me because this is one of the few that actually shows the entire wrangling process! Thanks for the great vid!
@vilw4739
@vilw4739 2 роки тому
If i use only fd=pd.read_csv("./Sales_Data/Sales_April_2019.csv") i get file not found error..i should use the whole path starting from c drive..How does he not get error
@ashiksrinivas
@ashiksrinivas 2 роки тому
@@vilw4739 He is using jupyter notebook where files are stored separately in a jupyter notebook directory and you can upload files in the directory and import them by simply running fd=pd.read_csv("./Sales_Data/Sales_April_2019.csv") If you're using a local python IDE like pycharm and VSCode, you need to specify the whole directory like fd=pd.read_csv("C:/Data Science/Sales_Data/Sales_April_2019.csv") to import.
@vilw4739
@vilw4739 2 роки тому
@@ashiksrinivas thankyou
@muhsintabatabayee8592
@muhsintabatabayee8592 Рік тому
@@vilw4739 did you ever figure it out? getting the same error
@vilw4739
@vilw4739 Рік тому
@@muhsintabatabayee8592 they should be in the same folder.Otherwise you need to put the whole path
@dp6736
@dp6736 Рік тому
Hi Keith, Even after three years, this video is very useful. You are very good at explaining the concepts. Thank you very much
@mikshubhatt1175
@mikshubhatt1175 3 роки тому
This is really an example of real world data analysis. Appreciate your efforts.
@jenn6997
@jenn6997 4 роки тому
You are always so passionate and enthusiastic even if there're errors haha :) Love your positive attitude! Look forward to more great videos!! :)
@masthanjinostra2981
@masthanjinostra2981 3 роки тому
I get tensed like in hell..
@geekyprogrammer4831
@geekyprogrammer4831 3 роки тому
he purposely introduced those errors for us to have real-life problem-solving experience :)
@KeithGalli
@KeithGalli 4 роки тому
Video Timeline! 0:00 - Intro 1:22 - Downloading the Data 2:57 - Getting started with the code (Jupyter Notebook) Task #1: Merging 12 csvs into a single dataframe (3:35) 4:25 - Read single CSV file 5:44 - List all files in a directory 7:06 - Concatenating files 11:00 - Reading in Updated dataframe Task #2: Add a Month column (12:48) 14:12 - Parse string in Pandas cell (.str) Cleaning our data! 17:31 - Drop NaN values from df 21:25 - Remove rows based on condition Task #3: Add a sales column (24:58) 25:58 - Another way to convert a column to numeric (ints & floats) Question #1: What was the best month for sales? (29:20) 30:35 - Visualizing our results with bar chart in matplotlib Question #2: What city sold the most product? (34:17) 35:32 - Add a city column 36:10 - Using the .apply() method (super useful!!) 40:35 - Why do we use the lambda x ? 40:57 - Dropping a column 46:45 - Answering the question (using groupby) 47:34 - Plotting our results Question #3: What time should we display advertisements to maximize the likelihood of purchases? (52:13) 53:16 - Using to_datetime() method 56:01 - Creating hour & minute columns 58:17 - Matplotlib line graph to plot our results 1:00:15 - Interpreting our results Question #4: What products are most often sold together? (1:02:17) 1:03:31 - Finding duplicate values in our DataFrame 1:05:43 - Use transform() method to join values from two rows into a single row 1:08:00 - Dropping rows with duplicate values 1:09:39 - Counting pairs of products (itertools, collections) Question #5: What product sold the most? Why do you think it did? (1:14:04) 1:15:28 - Graphing data 1:18:41 - Overlaying a second Y-axis on existing chart 1:23:41 - Interpreting our results Thanks for watching! If you enjoyed, please consider subscribing :).
@ANKITRAJ-fe8dh
@ANKITRAJ-fe8dh 4 роки тому
Heyy,machine learning would be awesome
@luuminhvuong
@luuminhvuong 4 роки тому
I Have very big data in xlsx format. Read excel tâkes like forever...
@mberoakoko24
@mberoakoko24 4 роки тому
I am on holiday and have started datascience for fun to see what the buzz is all about. I have to say I love it and I would appreciate if you'd apload more videos like this. I have learnt a TON
@kulpreetsingh9064
@kulpreetsingh9064 4 роки тому
Hey man, are you gonna do more such videos anytime soon?
@mohammedyounis7207
@mohammedyounis7207 4 роки тому
Thank you so much, it is very useful to me
@GunHolsters
@GunHolsters 2 роки тому
i really appreciate your approach to these tutorials. Allowing the problem to drive the programming solution (while learning some of it on the fly) is how i do most everything.
@TheMaltesemania
@TheMaltesemania 3 роки тому
I feel like I struck gold with this video. It's helping me learn a lot quicker than online tutorials. Thank you!
@a.yashwanth
@a.yashwanth 4 роки тому
Checking the length of dataframe helps instead of storing in csv file and verifying.
@arnopisspot5115
@arnopisspot5115 4 роки тому
this video was super interesting. I can certainly watch 10 more of these!
@pranavkrishna9137
@pranavkrishna9137 2 роки тому
Keith! Thank you so much! I honestly mean it when I say, this is one of the best videos I've ever watched, trying to learn Data science. Thank you so much for this wonderful piece of content!!
@priyalarunnile7981
@priyalarunnile7981 3 роки тому
This is awesome. Thank you so much @Keith. Would love to go through more videos in the future. Please do post.
@anubhkumar8824
@anubhkumar8824 4 роки тому
34:34 Pro tip: go to command mode (press Esc) and press 'b' to make cells below current cell or 'a' to make cells above
@KeithGalli
@KeithGalli 4 роки тому
Thanks for the tips! Love when people comment helpful stuff like this :). Just started using command mode to easily switch cells from code to markdown, will have to add these two commands to the arsenal as well!
@FlyingMonkeis
@FlyingMonkeis 4 роки тому
f and j will move focus to above or below cells and u can pair this with shift and then press ‘m’ to merge the highlighted cells. so shift+f+m will merge the current cell with the one below it. ‘dd’ will delete a cell also! (these bindings are very vim like)
@christopherlyons7613
@christopherlyons7613 4 роки тому
Think that's reversed. Use 'b' to make cells above and 'a' to make cells below.
@OK-Computer
@OK-Computer 4 роки тому
Great video! At the beginning it is much more concise to do this and concatenate all csv files into one like this (better to put ipython notebook csv files in the same directory and then): files=[f for f in os.listdir("./") if f.endswith('.csv')] df=pd.concat(pd.read_csv(i) for i in files) THAT'S IT!
@muhammadbashirmuhammad5529
@muhammadbashirmuhammad5529 3 роки тому
Thats better thanks
@subho1766
@subho1766 3 роки тому
monthly_dataframes = [pd.read_csv(file) for file in glob.glob(filePath + "*.csv")] merged_dataframe = pd.concat(monthly_dataframes)
@bartproffitt5240
@bartproffitt5240 3 роки тому
thank you so much i have been battling no such directory all morning
@jeisonsanchez4842
@jeisonsanchez4842 2 роки тому
Also consider adding a condition to skip the first row of each subsequent file - to avoid duplicate headers.
@Abdullahkbc
@Abdullahkbc Рік тому
You are great Keith. You are doing it in a manner that most students can understand better.
@iunknown563
@iunknown563 3 роки тому
All the errors that were driving nuts are resurfacing here and being handled nicely! Such a treat:)!
@KeithGalli
@KeithGalli Рік тому
I'm launching a data analytics bootcamp! goto.masterschool.com/5wn3sw Some highlights of the program: - Fully remote (with flexible working hours) - No tuition fees until after you land a job in tech - Open to applicants anywhere in the world! This is a 7-month long program kicking off in June. To learn more and get your application started, click the link above ⬆
@anthonycampos4673
@anthonycampos4673 Рік тому
cool, greetings to you from Lima/Perú
@berkayozkan2631
@berkayozkan2631 3 роки тому
I love how he freaks out whenever there is a small warning lol
@vickyzhang820
@vickyzhang820 2 роки тому
Sooooo fantastic!!! This is definitely the best Data Project video I've seen on UKposts!
@MashiroRedo
@MashiroRedo 3 роки тому
Throughly enjoyed this! I dont get a ton of practice at work so this makes me more confident!
@Yayaloy9
@Yayaloy9 3 роки тому
At 50:10 for anyone who wants to use .unique(), when you calculate the sales for each city make sure to throw in a .reset_index() in there, it will reset the indexes and your bar is going to be alright. cityy=all_data.groupby("City").sum().reset_index() then you do the rest like him, you can also throw in ascending order in there as well, just follow the rest of his instruction. cityy=all_data.groupby("City").sum().reset_index().sort_values("Sales",ascending=False) xxx=cityy["City"].unique() plt.bar(xxx,cityy["Sales"]) plt.ylabel("$$$") plt.xlabel("Cities") plt.xticks(xxx, rotation='vertical', size=8) plt.show()
@smackedup7657
@smackedup7657 9 місяців тому
thanks a lot
@rezwanmehedad2095
@rezwanmehedad2095 8 місяців тому
unfortunately, I am getting a ValueError. Any idea how I can solve this: ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (10,) and arg 1 with shape (12,). I havent got any proper answer from google or maybe not an expert enough to understand :p.
@Scratchmex
@Scratchmex 4 роки тому
22:00 I think is more reliable to parse column of dates as datetime type to avoid all these problems
@stevejuso
@stevejuso 3 роки тому
pd.to_datetime did not work for me on this data. How did you use it? I get an error
@SiIentFire
@SiIentFire 2 роки тому
@@stevejuso Really late reply, but just incase it helps someone. You can tell the read_csv function to read a column as a date by passing in parse_dates=['col1', 'col2'] for any amount of columns. You can tell it to use European format with dayfirst=True And if you need a specific format you can use date_parser to give your own parser for a specific format. So in my case it was: df = pd.read_csv('filepath', parse_dates=[datecols], dayfirst=True) to get the cols I needed into European date format. One key thing is that it converts the dates to a pandas timestamp. But they are interchangeable with python datetimes almost all of the time. Can also be converted with an .apply(lambda x: x.to_pydatetime) if you need.
@bloosea123
@bloosea123 3 роки тому
Awesome video Keith! Finding the pairs of items is a problem I had in a previous project and now it is nice to know how to solve it! It is also nice to see the entire data discovery process in the video, complete with plenty of Stack Overflow questions. Quick tip: plots of specific columns in a data frame can be completed in one line of code, for instance df["ColumnName"].plot()
@florenthoti9101
@florenthoti9101 3 роки тому
As a beginner in Data science with Python, I find you as the best youtuber in this field. Good Job!
@abhishek_raj
@abhishek_raj 2 роки тому
Keith: I am gonna snatch the first two digits and make it the month. The data: Hold my NaNs !
@MicahJohns
@MicahJohns 3 роки тому
23:39 that duplication was because of the header rows in each of the files. I've dealt with this a lot. You would have had had to have excluded those header rows on each file before you concatenated all of them together to resolve this. Great video course man, thank you for making all of that content
@vertik3895
@vertik3895 2 роки тому
I just did what he did and all I am getting is the header rows, what's the solution?
@oscardyremyhr5948
@oscardyremyhr5948 2 роки тому
@@vertik3895 load first df as normal and proceeding df´s as pd.read_csv('file2.csv', skiprows=1) before concat
@eduardosa9658
@eduardosa9658 Рік тому
@@vertik3895 The solution is call the method read_csv(..., header=None) for each iteration
@dakafranklin1786
@dakafranklin1786 2 роки тому
This tutorial is wonderful bro most especially I like the fact that you google some of these problems unlike other UKpostsrs the make it feel like they do all from their head which makes it more difficult because audience may think they have to memorize everything.
@qiaochow6668
@qiaochow6668 3 роки тому
Thanks so much for making the video! Like your style! Love how you teach and the way you solve problems! The imperfection makes the video perfect!
@JoaoOliveira-wh1tp
@JoaoOliveira-wh1tp 3 роки тому
Great video. Just a few suggestions: At 4:25 when using os.listdir("'./"), this returns a list alread. So using [file for file in os.listdir(...)] is redundant. At 40:50 you don't need to use the lambda function, even if you want to access a cell content. If you simply pass the reference to a function, by default the *args will be passed. Example: def modify(a): return 'CHANGED ' + a + ' CHANGED' df['Column'].apply(modify) # modify without parenthesis is the reference to the function.
@mahermonirify
@mahermonirify 3 роки тому
could u please help : why i'm getting path error when i did try to use os.listdir but not when i opened a specific file to read?
@kafaayari
@kafaayari 2 роки тому
When passing a function to apply, you could have just passed the function name, there's no need to do apply(lambda x:get_city(x)). This is just enough and better => apply(get_city)
@MattHuisman
@MattHuisman 2 роки тому
Came here to make sure someone said this! As long as the function you pass only takes a single argument. Otherwise lambda x: my_func(x, other_arg)
@arpangoyal7337
@arpangoyal7337 Рік тому
LOVED the entire video and how raw it was, alongwith his explanation!
@hamishdosiad5764
@hamishdosiad5764 2 роки тому
mate, you're a legend! not only did I learn matplolib and pandas but now I know my pokemon too, tip of the hat!
@Doorshlak
@Doorshlak 4 роки тому
This channel is the best thing I've encountered in a while. Thank you for helping the desperate ;-; Would do 5 likes if I could
@JohnnyRottenest
@JohnnyRottenest 4 роки тому
50:00, use result.index as x values and x ticks.
@jasonwong8315
@jasonwong8315 4 роки тому
yes that would be easier.
@geetanjalimisra4676
@geetanjalimisra4676 7 місяців тому
Keith you came through!! This is the kind of tutorial I was literally looking for to hone my data analysis/preprocessing skills. Thank you!!!
@fthxperia
@fthxperia 2 роки тому
Thanks a lot Keith for this style of teaching, trying to solve the problem on my own is the best way to learn.
@omrieliyahulevy7985
@omrieliyahulevy7985 4 роки тому
Great tutorial, I've learned a lot! a suggestion for you first question for the best month for sales: Instead of creating the extra cols of 'month' and 'sales' we can use the pandas "resample" method which does the group by month for us, and just like in the groupby method we close it with the "sum" and we get the same table! all_data.resample('M', on='Order Date').sum().sort_values(by='Price Each', ascending=False)
@Yayaloy9
@Yayaloy9 3 роки тому
But heres the problem, Order Date is not a date time type so you have to conver it first. all_data["Order Date"]= pd.to_datetime(all_data["Order Date"], format="%m/%d/%y %H:%M")
@nishantbanjade920
@nishantbanjade920 4 роки тому
I like the way you say in every mistakes - :: AAAAh What did i do ::" lol :D xD
@Jack-xy4fy
@Jack-xy4fy 3 роки тому
hahaa it made me laugh because i do the exact same thing
@zewduwereta302
@zewduwereta302 2 роки тому
I have been enjoying your videos (a few) recently but this one is just superb! I am eager to chase more videos. Thanks for your styles and tricks! !!!
@rachrach9871
@rachrach9871 Рік тому
Great tutorial! So much I’ve learned in this video! Thanks so much Keith. Looking forward to learning more useful stuff here
@vikram3297
@vikram3297 4 роки тому
32:15 you have created months list to pass it to plt.bar() out of thin air, in current scenario as our data is coming in sorted way by month so no issue is coming else it would have plotted Sales against wrong month. Instead I tried this, please let me know if I'm wrong about it? all_data.groupby('Month')['Daily Sale'].sum().plot(kind='bar') plt.show()
@naishkiteboarder
@naishkiteboarder 4 роки тому
The groupby function sorts by months I think so that will be [1:13], same as the new month variable
@naishkiteboarder
@naishkiteboarder 4 роки тому
Monthss = [month for month, df in All_Data.groupby('Month')]
@ng4logic
@ng4logic 4 роки тому
58:22 I heard that
@diogoledermann7393
@diogoledermann7393 4 роки тому
LOOOOLLLL
@katherinenavarrohansen2748
@katherinenavarrohansen2748 3 роки тому
I write from Denmark, but I'm Chilean, I followed all the steps and really everything is very clear, I loved your explanations of each task and each question
@calvinwijaya9706
@calvinwijaya9706 3 роки тому
Hi Keith, first time seeing your video, this kind of format of 'tutorial' is just perfect. Thankyou !
@tomasrubinstein4889
@tomasrubinstein4889 2 роки тому
Great video! I'm currently doing this kind of exercises for my Business Analytics degree and I found a lot of useful tips in this video! Thank you very much! :)
@MrDviratis
@MrDviratis 3 роки тому
Really enjoyed coding along following this video. Nicely done, Keith!
@andre__442
@andre__442 2 роки тому
if every human being on earth had the will and disposition to teach like Keith... the world would be a 99% better place
@dana6006
@dana6006 3 роки тому
thank you for this tutorial! it helped me retain a ton of information in a more practical, applicable way than other tutorials
@realtor__edward
@realtor__edward 2 роки тому
your tutorials have really been very helpful, thank you so much Keith.
@jacksonwagner9254
@jacksonwagner9254 2 роки тому
Watched the whole thing thanks man great content and I will certainly check out some of your other videos as well.
@komalkaursasan5425
@komalkaursasan5425 2 роки тому
New to data science and this was my first ever analysis using python. Thanks a lot. Do make more videos of this kind.
@DataScienceMAHAMAT
@DataScienceMAHAMAT 17 годин тому
This is the most practical Python tutorial video I've ever watched. Thanks for sharing!
How I use Python as a Data Analyst
13:56
Luke Barousse
Переглядів 337 тис.
КАК ГЛОТАЮТ ШПАГУ?😳
00:33
Masomka
Переглядів 2 млн
10 Minutes To Escape Or This Room Explodes!
10:00
MrBeast
Переглядів 63 млн
Підставка для яєць
00:37
Afinka
Переглядів 94 тис.
КИРПИЧ ОБ ГОЛОВУ #shorts
00:24
Паша Осадчий
Переглядів 2,5 млн
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Переглядів 404 тис.
How I became an unemployed MIT grad still living with my parents.
21:12
Learning Pandas for Data Analysis? Start Here.
22:50
Rob Mulla
Переглядів 68 тис.
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
Alex The Analyst
Переглядів 230 тис.
Data Science Job Interview - Full Mock Interview
1:25:04
freeCodeCamp.org
Переглядів 477 тис.
Python Object Oriented Programming (OOP) - For Beginners
53:06
Tech With Tim
Переглядів 3,2 млн
Python Machine Learning Tutorial (Data Science)
49:43
Programming with Mosh
Переглядів 2,7 млн
Solving Real-World Data Science Interview Questions! (with Python Pandas)
1:47:50
КАК ГЛОТАЮТ ШПАГУ?😳
00:33
Masomka
Переглядів 2 млн