Compiled Python is FAST

  Переглядів 79,034

Doug Mercer

Doug Mercer

День тому

Sign up for 1-on-1 coaching at dougmercer.dev
-----------------------------------------
Python has a bit of a reputation -- fast to write, but slow to run.
In this video, we focus on a simple to understand dynamic programming problem that would be terribly slow in native Python or numpy. We show that Python can achieve (and actually exceed) C++ level performance with the help of just-in-time and ahead-of-time compilers such as mypyc, Cython, numba, and taichi.
Also, I finally got a camera, so, uh... face reveal, I guess.
#python
Chapters
---------------
00:00 Intro
01:07 The Problem
02:38 numpy
03:08 mypyc
04:08 cython
06:46 numba
07:58 taichi
09:47 Results
11:48 Final Thoughts

КОМЕНТАРІ: 515
@dougmercer
@dougmercer Місяць тому
If you're new here, be sure to subscribe! More Python videos coming soon =]
@thesnedit5406
@thesnedit5406 Місяць тому
You're very underrated
@FabianOtavo
@FabianOtavo 19 днів тому
Mojo and Codon(Exaloop)?
@flutterwind7686
@flutterwind7686 Місяць тому
Numba and cython are an easy way to improve performance beyond what most people require for python, and they don't require much boilerplate either.
@dougmercer
@dougmercer Місяць тому
Absolutely!
@megaspazos1496
@megaspazos1496 Місяць тому
Great video, I enjoyed it! In my eyes the video actually shows how fast C++ is. Unoptimized line by line translation from Python to C++ can be as fast as compiled Python optimized with HPC library.
@dougmercer
@dougmercer Місяць тому
Absolutely. C/C++ and gcc -O3 is basically magic.
@BartekLeon-jx5jv
@BartekLeon-jx5jv Місяць тому
​ @dougmercer I am pretty convinced that taichi under the hood creates 1D array and not 2D. Doing vector hits the performance quite a bit (while not the most reliable test, changing vector to normal vector gave ~10% boost. Although both C++ versions where faster than taichi for me. (compiled with MSVC release). There are still some minor things, but they shouldn't influence anything since in my case it was ~40-50% in std::max and 20-30% in creating the vector. All in all, nice video showcasing the tools.
@BartekLeon-jx5jv
@BartekLeon-jx5jv Місяць тому
Ah, also... just out of curiosity: @numba.njit def lcs2(a, b): m, n = len(a), len(b) dp = [0] * (n + 1) prev_row = [0] * (n + 1) # Temporary storage for the previous row for i in range(1, m + 1): for j in range(1, n + 1): if a[i - 1] == b[j - 1]: dp[j] = prev_row[j - 1] + 1 else: dp[j] = max(prev_row[j], dp[j - 1]) for j in range(1, n + 1): prev_row[j] = dp[j] return dp[n] Less memory allocation / 2D array. Testing this against C++ / taichi would be a nice one :) [and you have some vectorisation you can throw there]
@ruroruro
@ruroruro Місяць тому
​@@BartekLeon-jx5jv it's not a 1D array, but a homogeneous ND array. It's somewhere between vector and int[A][B]. It is represented as a flat array in memory, but unlike int[A][B], the data type, number of dimensions, sizes of these dimensions and the iteration strides are dynamic. Also, it's not just taichi that's using ndarrays, numpy and numba are also using ndarrays here.
@BartekLeon-jx5jv
@BartekLeon-jx5jv Місяць тому
​@@ruroruro That's what I meant in a sense. Although all is still boiling down to: are you allocating once or are you allocating N times (in case of vector).
@mr_voron
@mr_voron 11 місяців тому
This channel is highly underrated. Excellent analysis.
@dougmercer
@dougmercer 11 місяців тому
Thanks for the support Maks! =]
@s8r4
@s8r4 6 місяців тому
I've also had some fun using various methods to speed python up, and this video is a great overview of the major ways of going about it, but while it's a big departure, I've found nim to have the most python-like syntax while being as fast as things get (compiles to c, among many other languages). I've seen that you know about the true power of python already, but James Powell did a great talk about this exact topic titled "Objectionable Content", big recommend. Thanks for the video!
@dougmercer
@dougmercer 6 місяців тому
I'll check it out! Also, I have looked at Nim in the past. It seems nice. Eventually I may do another video on this topic, and branch out to other languages (Nim, Julia, and now Mojo). Thanks for the idea, the video rec, and thoughtful comment =]
@dhrubajyotipaul8204
@dhrubajyotipaul8204 Місяць тому
Thank you for making this. Trying out mypyc, cython, and numba right now! :D
@dougmercer
@dougmercer Місяць тому
Enjoy! And good luck =]
@Masterrex
@Masterrex 5 місяців тому
Subbed, nicely done. I can tell you were having fun, IMO don’t worry so much about the glitzy graphics - your story telling is great!
@dougmercer
@dougmercer 5 місяців тому
Thanks so much =]
@ethanymh
@ethanymh 10 місяців тому
Love this video so much! The quality of content, animation, and visualization is unmatched...
@dougmercer
@dougmercer 10 місяців тому
Thank you so much!
@stereoplegic
@stereoplegic 2 місяці тому
After reading the other comments while thinking up my own, I feel compelled to echo this sentiment first. Fantastic job, @dougmercer - both technically and visually - I loved it all.
@dougmercer
@dougmercer Місяць тому
Thanks @stereoplegic! That means a lot =]
@jcldc
@jcldc 5 місяців тому
Nice video. I have just learned cython and achieved a speed up of 500x vs pure python(+numpy) in one of my code. It worth to mention that using cython, you can automatically parallyze your loop with prange statement instead of range.
@dougmercer
@dougmercer 5 місяців тому
500x is great! And good point on prange-- I should have covered the parallel aspect more of all the solutions (numba, Taichi, and cython) but I glossed over it due to the serial nature of the example problem. Thanks for the comment =]
@onogrirwin
@onogrirwin Місяць тому
damn, this is a high effort channel. your stock footage game is especially on point. hope you pop off big time :)
@dougmercer
@dougmercer Місяць тому
That's so nice! thanks =] 🤞
@matswikstrom7453
@matswikstrom7453 6 місяців тому
Wow! Really informative and interesting - Thank You! I am now a subscriber 😊👍
@dougmercer
@dougmercer 6 місяців тому
Thanks so much =]
@Finnnicus
@Finnnicus 11 місяців тому
good content, great presentation. love the style!
@dougmercer
@dougmercer 11 місяців тому
Thanks Finnnicus! Much appreciated =]
@josebarria3233
@josebarria3233 5 місяців тому
Gotta love mypyc, I've been using it in my project and never felt disappointed
@YuumiGamer1243
@YuumiGamer1243 Місяць тому
I was already aware of numba, but it's good to see all the others like this. Enjoyable video, and I was happy you showed most of the code, while somehow making it feel like a documentary
@dougmercer
@dougmercer Місяць тому
That's an awesome compliment-- I'm gonna put "Code Documentarian" on my resume. Thanks for watching and commenting =]
@pietraderdetective8953
@pietraderdetective8953 9 місяців тому
This is a very high quality content, mate! Well done! A question, for gamedev use case, can we just use the tools mentioned to speedup things? I've seen horrible performance when someone is using Python-based game engine (like pygame etc).
@dougmercer
@dougmercer 9 місяців тому
Thanks! =] Yes, you should be accelerate a pygame-based game with these tools. You can't speed up pygame functions and methods, but you can speed up your code between those calls. It'll be most well suited for larger, number crunchy parts between methods rather than quick little one-off operations. Let me know if you end up tweaking something and seeing a boost in performance!
@giannisic1544
@giannisic1544 6 місяців тому
Brilliant video and useful content. It's a pity there's so few of us... Glad the algorithm suggested this video
@dougmercer
@dougmercer 6 місяців тому
Thanks! Glad you found it helpful =]
@billyhart3299
@billyhart3299 Місяць тому
Great video man. I'm going to try this on my web server project that uses numpy quite a lot.
@dougmercer
@dougmercer Місяць тому
Numba should work great! You may just need to tweak your implementation slightly to use the subset of numpy features supported by Numba.
@billyhart3299
@billyhart3299 Місяць тому
@@dougmercer have you tried anything that helps with matplotlib?
@dougmercer
@dougmercer Місяць тому
Hmm. Hard to say. Could try mypyc-- maybe it'll just magically work. Alternatively, though this might be a bit disruptive, you could swap out CPython with PyPy (a JIT compiled replacement for the CPython interpreter). In the video I'm working on now, PyPy was shockingly convenient and fast.
@dougmercer
@dougmercer Місяць тому
What are you plotting, out of curiosity? Maybe do a quick sanity check to make sure the amount of data your plotting has exceeded the usefulness of matplotlib. If it's a scatter plot with millions of points, maybe you should use something like datashader or similar
@billyhart3299
@billyhart3299 Місяць тому
@@dougmercer I'm using it to do histograms for images that have been turned black and white and then converted to 8 bit png files to convert them to stippling.
@EdeYOlorDSZs
@EdeYOlorDSZs Місяць тому
crazy good video! I'm gonna check out Taichi for sure
@dougmercer
@dougmercer Місяць тому
Thanks =]
@enosunim
@enosunim Місяць тому
Thanks! This is a really great info!
@dougmercer
@dougmercer Місяць тому
Glad it was helpful!
@alexsere3061
@alexsere3061 12 днів тому
Dude, the quality and depth of this video is insane. I feel like I have a deeper understanding of the strengths and limitations of python, and I have been using it for about 7 years. Thank you
@dougmercer
@dougmercer 12 днів тому
Glad it was helpful =]
@dar1e08
@dar1e08 Місяць тому
Easily the best video I have seen on performance Python, subbed.
@dougmercer
@dougmercer Місяць тому
Thanks so much! I should have another performance related video out in mid April so see ya then =]
@MrXav360
@MrXav360 9 місяців тому
I learned C++ in the last month (came from a Python background!) and tried my luck at coding real-time animations of fractals. I wanted to compare with Python's performance, but now I am scared I learned C++ for nothing... Thanks! (Just kidding I loved learning C++ and I am glad I did. It's super impressive however to see that we can achieve similar performances with these packages in Python! Thanks for the video).
@dougmercer
@dougmercer 9 місяців тому
Taichi is great for fractals! I like that it has good built in infrastructure for plotting to a canvas. That said, I'm sure you'll find a use for your new-found C++ knowledge =]
@user-yk8yb5xy8r
@user-yk8yb5xy8r Місяць тому
My favourite was numba as we were able to achieve our goal with very little code, there are certain shortcut algorithms that can be applied to makeup for its non applicable functions
@ManuelBorges1979
@ManuelBorges1979 Місяць тому
Excellent video. 👏🏼 Subscribed.
@dougmercer
@dougmercer Місяць тому
Thanks Manuel! Glad to have you =]
@mariuspopescu1854
@mariuspopescu1854 Місяць тому
So, I'm not a big python guy so I was curious. I repeated your experiment for C++ vs numba. Only real difference: for the C++, I rewrote it just a bit (used auto and changed the indexing a bit to be more c-like) and I wrote the function as a template in which the size m and n were the template variables. This allowed me to change from a vector to a stack allocated array, the main benefit I believe being that the whole memory is contiguous and allowed for better caching. The C++ version was about 1.5x faster than numba on my machine. I really enjoyed this video though! Made my question my biases, and I think there's alot to be said by letting compilers/optimizers do the thinking for you. I think this was really insightful and I think I'm gonna give the numba one a go for many of my future quick projects.
@dougmercer
@dougmercer Місяць тому
Oh, that's awesome! I think that's the fastest anyone has gotten it so far! Someone else in the comments encouraged me to try a 1D vector of size (m+1)(n+1) and index into it with arithmetic -- that gave me a roughly 1.1-1.2ish x speedup over the original C++ . So, I guess much of the remaining speedup came from data locality-- very cool that it was another 0.3x-ish boost. I'm glad you found the video interesting =]
@beaverbuoy3011
@beaverbuoy3011 21 день тому
Super enjoyable video, thank you this was very helpful!
@dougmercer
@dougmercer 21 день тому
Thanks! Glad it was helpful!
@user-np9il4is1t
@user-np9il4is1t 9 місяців тому
Love this video ! it was amzing and usefull !
@dougmercer
@dougmercer 9 місяців тому
Thanks so much!
@sdmagic
@sdmagic Місяць тому
That was exceptional. Thank you very much.
@dougmercer
@dougmercer Місяць тому
Thanks for watching and commenting!
@pranavswaroop4291
@pranavswaroop4291 Місяць тому
Just excellent in every way. Subbed.
@dougmercer
@dougmercer Місяць тому
=]
@guowanglin4537
@guowanglin4537 3 місяці тому
Well, I use numba in my research, concerning the human genome, it was really fast!
@dougmercer
@dougmercer 3 місяці тому
That's awesome! I love numba-- super convenient and fast
@cmleibenguth
@cmleibenguth 6 місяців тому
Interesting results!
@dougmercer
@dougmercer 6 місяців тому
Thanks! I was surprised too
@famaral42
@famaral42 7 місяців тому
Thanks for the analysis, I got motivated to look at numba and cython more carefully. Taichi looked cool, but not having it in the anaconda repo is a negative point for me. Have you tried running this code with TORCH?
@dougmercer
@dougmercer 7 місяців тому
Oh interesting, I didn't realize taichi wasn't on conda-forge. I wonder if they'd accept a PR 🤔. For what it's worth, you can pip install it (and that's possible even if you're using an environment.yml). I did not try torch, but I suspect it would very slow. Reason being-- the main use case for torch is parallel computing via tensors. Since this problem is inherently not parallelizable, my guess is it'd be super slow in torch.
@famaral42
@famaral42 7 місяців тому
@@dougmercer Thx for insinghts
@abhisheks5882
@abhisheks5882 8 місяців тому
This channel is a hidden gem
@dougmercer
@dougmercer 8 місяців тому
Thanks 💎 =]
@miriamramstudio3982
@miriamramstudio3982 Місяць тому
Text on the screen was definitely engaging ;) Thanks
@dougmercer
@dougmercer Місяць тому
Yay! Success =]
@jamesarthurkimbell
@jamesarthurkimbell Місяць тому
Nice video! Well done
@dougmercer
@dougmercer Місяць тому
Thanks for watching!
@ThisRussellBrand
@ThisRussellBrand 3 дні тому
Beautifully done!
@dougmercer
@dougmercer 3 дні тому
Thanks Russell =]
@NicolauFernandoFerreiraSobrosa
@NicolauFernandoFerreiraSobrosa Місяць тому
Very cool video! Did you consider compilation time in C++ tests? I used Numba daily, and the first run is always slow due to the JIT feature.
@dougmercer
@dougmercer Місяць тому
I did not count compilation time for the c++ times, but did include JIT time for the first run of Numba. However, it doesn't play a big impact, because we are typically doing 100s or thousands of runs and adding up their times (so the first run being slow only accounts for a small part of the overall time)
@atharv9924
@atharv9924 6 місяців тому
@Dough: Your channel's popularity should be atleast 100x more!!!
@dougmercer
@dougmercer 6 місяців тому
Thanks so much! Fingers crossed the channel does grow 100x 🤞. At that point I prob could make videos full time 🤯
@cmilkau
@cmilkau Місяць тому
pypy is a jit for full python with special bindings for numpy and scipy. you can use it for any python code, but for max performance might need to write critical parts of your code in rpython, a subset of python that can be statically compiled to native binary. The example subsequence code is valid rpython btw.
@dougmercer
@dougmercer Місяць тому
PyPy is fantastic -- I'm actually going to cover it in my next video!
Місяць тому
Very usefull. A quick question, what eas the optimization level for compiling the c++ code. It can really make a diferrence.
@dougmercer
@dougmercer Місяць тому
I used -O3. Another commenter recommended using a 1D array and handling indexing through arithmetic, and that does speed up the C++ by about 1.1-1.2x. (still pretty similar to the ndarray approach from Taichi) Here's the c++ code and build script if you want to play around with it yourself =] gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@luaguedesc
@luaguedesc Місяць тому
Great video! Did you compile the C++ code with optimization flags?
@dougmercer
@dougmercer Місяць тому
Yup! You can check out the C++ code/compile command here, gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@chkone007
@chkone007 10 місяців тому
That was funny, I did both C++ and Python but now I'm more on C++ side. I had in mind the meme "look what they need to mimic a fraction of our power", I didn't tested it, but I bet If you change the proper compilation options that will be faster again in C++. To my understanding this is what taichi do, it's general SIMD based on your current hardware, under the hood via LLVM optimizer based on the data structure (taichi is tailored for sparse data structure). As you work with dense data Halide would give you [maybe] better results. For all cases the code generated by python front end can be generated by C++, the python will always have an overhead. This is what Machine Learning people do, they don't care about python performances, because all the computation which too 90% of their frame is implemented on CUDA and C++, the python is here only to provide data to lower level system.
@dougmercer
@dougmercer 10 місяців тому
> "look what they need to mimic a fraction of our power" Haha, true! In another comment, I said I loved that even if I write terrible C++ it still turns out pretty fast. That said, the same argument could be reversed, if we consider productivity and third party library access. If an application is 95% high level glue and one hot spot, I'd rather write the majority in Python and the hot spot in an AOT or JIT compiled variant of Python than write my entire app in a low level language. The overhead would be worthwhile from a productivity perspective. > Proper compilation flags Do you have flags you want me to try in particular? I did -std=c++11 -O3, but maybe I'm missing something. > SIMD Since this is all sequential, can SIMD help? I thought SIMD was for packing multiple of the same operations in a single instruction (but again, I'm not a C++ dev) > the Python just provides an interface to a lower level language. True! And I'm OK with that! I def agree that well written, native code in a lower level will out-perform generated code from Python. That said, for all but the most trivial algorithms, I can't write well-written C++. So, if I can get even a 95% solution for free from these high level LLVM interfaces, then I'm stoked!
@chkone007
@chkone007 10 місяців тому
@@dougmercer ( : That remind me a benchmark done by Microsoft, Debug C++ /NoSIMD vs Release C# SIMD, and they notice faster C# :D Yeah sure... The point of Python is not to be faster, it's mostly to be gentle with non-engineer-long-beard programmer, the user are mostly scientist and data-analysts. > Productivity For this example I see no productivity differences between C++ and Python. But personally I'm more productive in C++ with Eigen and few other lib Like an experimented Python will be faster with numpy and his other favorite libs. > Proper compilation flags I don't know what is your compiler, but for Visual Studio: /Ot {favorize speed} /Oi {Inable Intrinsic} To increase the STL speed, Disable C++ expcetion, "Basic Runtime Checks", /GS-, /GR- ... To help intrinsic generation /Zp8 or /Zp16 (here you're processing int), but we can process And based on your hardware /arch:AVX, ... > SIMD You have gather and scatter instruction that could help, need to profile ( : > Improve On both side I'll bet we can performance by using only type you need. If your number cannot go higher than 100 just use a byte/uint8_t, etc. As I said the video was funny, the point is not to say Python is faster than C++, but more "if you're careful you can have performance higher or close to baseline C++"
@dougmercer
@dougmercer 10 місяців тому
I'm using g++, I'll try to find the analogs for the compiler flags you recommended. And true, a uint8 is enough. I'll mess around with that too. In any case, thanks for the comments! I'd def like to learn more about C++ but I don't get the opportunity very often
@user-zi2zv1jo7g
@user-zi2zv1jo7g 9 днів тому
@@chkone007 Ok, I get the point but theres a lot of production code written in python, most code writing does not require performance and the few bits that do you can write a C extension or simply use C++ and python together
@chkone007
@chkone007 9 днів тому
​@@user-zi2zv1jo7g I kind strongly disagree. Did you ever experienced slow UI, stuttering App, lagging game, ... If yes, you already met a programmer who said "most code writing does not require performance". If you said a code does not require performance that just mean you consider your time more valuable than the user time. As a developper we don't own time, the time is not ours, it's the user time. That's what make the difference between a smooth app, slow and memory heavy software, like everything web based, slack, etc. And all chromium stuff. Most of the devs said It's just a chat app, I don't need C++, just a chromium based. Consequences... My Mac/PC uses 8 GiB for doing nothing, just running a VM. And in a industrial point of view, you can release your startup with python code and saying "how I don't care it's CUDA underthehood". You just expose yourself to have a competitor who implement his stuff on C++/CUDA directly and this competitor will explode his profitability because his AWS bill will be much cheaper. We always require memory efficient and fast code. If none of those argument convience you, consider the CO2 argument, it's more eco-friendly for you PC or your server or your N-instances of your programmer running on AWS. I love python to prototype idea, and accelerate my exploration of ideas, but I cannot be serious with that to my clients. I know lot of "AI startup" are like that, download the model from the researcher, create a docker, build a website => step 2 => profit. Most of them rely on Python, but any competitor with cheaper infrastructure can scale more and be more efficient. I had in mind Facebook developed on PHP fine, cool, but at the beginning each new user cost more than the previous one, ... FB wasn't able to scale. They create "HipHop" compiler from PHP to C++, and now the company became profitable each new user became cheaper than the previous one. Conclusion => Performance always mater. Don't read me wrong, that doesn't mean I over-engineer everything to save 1 byte or 1 pico second in median. But keep in mind the quote "early optimization is the root of evil" was written from a time when everybody was written C and assembly code... The code is different, today with python, javascript, ... "early non-optimization is the root of evil".
@Iejdnx
@Iejdnx Місяць тому
5k subs? I swear I thought you had like 1 million because of how good this video was I'm subscribing
@dougmercer
@dougmercer Місяць тому
Thanks =] I appreciate it. It's been a slow grind, but the past few days the algorithm has blessed me with some impressions, so I hope it keeps going 🤞
@JohnMitchellCalif
@JohnMitchellCalif Місяць тому
interesting and useful! Subscribed.
@dougmercer
@dougmercer Місяць тому
Thanks! And welcome =]
@abc_cba
@abc_cba 18 днів тому
If you don't keep your content consistently uploaded, you'd be committing a felony. Subbed!!
@dougmercer
@dougmercer 17 днів тому
I'm gonna try! Hahaha Thanks for subbing =]
@sageunix3381
@sageunix3381 23 дні тому
limited branch c code will usually be faster in most applications , but if you want code to be ridiculously fast use assembly. inline assembly is cool too works directly with c. however speed comes at the cost of convenience often
@BaselSamy
@BaselSamy 3 місяці тому
Wonderful video, even for a beginner like myself! I wonder if you could share the animation tool you used? I feel it would be awesome for my presentations :))
@dougmercer
@dougmercer 3 місяці тому
Thanks! I primarily used Davinci Resolve, but used the Python library `manim` (community edition) for the code animations.
@BaselSamy
@BaselSamy 3 місяці тому
Thanks! @@dougmercer
@khawarshehzad487
@khawarshehzad487 10 місяців тому
Amazing content, engaging presentation and sadly, underrated channel. Subbed!
@dougmercer
@dougmercer 10 місяців тому
Thanks so much! Be sure to share with friends/coworkers you think might enjoy this, and hopefully the channel will grow over time 🤞
@khawarshehzad487
@khawarshehzad487 10 місяців тому
@@dougmercer keep up the good work, it sure will 🙌
@MaxShapira2real
@MaxShapira2real 11 місяців тому
You should put out an advanced Python course. Great job buddy!
@dougmercer
@dougmercer 11 місяців тому
Maybe one day! Thanks Max!
@user-by8fp5uw2o
@user-by8fp5uw2o Місяць тому
Consider using Golang if you want speed + simple to learn (mostly, ofc). Python is fantastic at some tasks, but if you’re really trying to get the best of both worlds (fast to write and fast to run), then Golang could be a great fit
@dougmercer
@dougmercer Місяць тому
I do plan to do a project in Go sometime soon
@lchunleo
@lchunleo 8 місяців тому
Good work
@dougmercer
@dougmercer 8 місяців тому
Thanks =]
@roshan7988
@roshan7988 11 місяців тому
Great video! Super underrated channel. Love the graphics
@dougmercer
@dougmercer 11 місяців тому
Thanks Roshan! Means a ton to hear that =]
@ivolol
@ivolol 11 місяців тому
Would be interested to see what Pypy and nuitka do for it as well.
@dougmercer
@dougmercer 11 місяців тому
If this video ends up getting some more views, maybe I'll do another pass at adding other options. I have a *guess* though... PyPy would speed this up significantly, probably on par with numba. I've heard good things about it *but* it didn't install first try when using conda on my M1 Mac, so I skipped it ¯\_(ツ)_/¯ Nuitka would only speed things up a little bit. From what I've read, nuitka is more so about compatibility (supports *all* python language constructs) and for making standalone, portable builds. For nuitka, speed is secondary to those concerns
@ianposter2161
@ianposter2161 4 місяці тому
Hey, thanks for an amazing video! Which one would you suggest so that I can just grab my regular python code with dataclasses and get a performance boost with no tweaks whatsoever?
@dougmercer
@dougmercer 4 місяці тому
Thanks for watching! =] I'd try mypyc first. The others are way more disruptive and would probably require changes to your code
@ianposter2161
@ianposter2161 4 місяці тому
​@@dougmercer Thanks for your answer! I was thinking of something. Nowadays we almost always use type hints because they are great. But only for clarity/type-checkers like mypy. So we are not getting any performance benefit out of it, although I think we could have! Cython translates python to C and forces us to write statically-typed python for that. Which type hints could also be used for... Turns out that Cython supports type hints as well! Then we have stuff like MonkeyType that allows us to automatically type-hint code based on runtime behavior. Nice for annotating legacy code. 1) we write python code with type hints 2) if needed apply MonkeyType to apply them everywhere 3) compile with Cython 4) get a C-like performance I wonder why it's not actually practiced. Do you have any idea?
@dougmercer
@dougmercer 4 місяці тому
Mmm, for using type hints to achieve better performance through compilation, I think there's a high level design question: "should your code (1) look/feel like vanilla Python, or (2) are you OK with using non-standard Python features, or (3) are you willing to use syntax that only works in your special language, as long as it still vaguely resembles Python and interoperates with it"? I think mypyc is the closest to achieving the goal of speeding up vanilla Python. cython's python mode is pretty OK, but you need to add extra metadata to make it be performant (e.g., the locals decorator). Cython also has its own type system rather than using Pythons built-in types (e.g., cython.int vs int). Cython as a language (in non-python mode) isn't really Python any more, but interpolates with it well. Some other languages (e.g., Mojo) claim to have a "python-like" syntax and support interacting with Python, but the code isn't really Python.
@ianposter2161
@ianposter2161 4 місяці тому
​@@dougmercer Yeah it would be amazing if we could just write vanilla python with standard type hints and compile it with Cython. Apparenly Cython somewhat supports it. UKposts blocks my commend if I paste a link but you can search this on google: Can Cython use Python type hints? Because todays type hints are everywhere and we don't get any performance benefit out of it at all, which feels weird.
@dougmercer
@dougmercer 4 місяці тому
It's hard to say-- when I was experimenting with this problem I remember not observing any speed up when adding vanilla Python typehints, and it wasn't until I started adding things like the @locals decorator that I really noticed any improvement. Let me know if you do any testing that shows a meaningful speed up!
@lapppse2764
@lapppse2764 Місяць тому
10:48 I think it would be nice to define on the left that lower is better (I've usually seen it done in benchmarks). Thank you for the video! About CPP, I think you might've used SIMD instructions.
@dougmercer
@dougmercer Місяць тому
Good point, I def could have made the metrics interpretation clearer. As for SIMD, it's hard to parallelize this because it's an inherently serial problem (everything requires previous solutions)
@RobertLugg
@RobertLugg Місяць тому
How did you make those amazing looking bar charts?
@dougmercer
@dougmercer Місяць тому
Hah, *very carefully* in Davinci Resolve (Fusion Page) =P I manually drew the graph using rectangles, then applied (noise + displace) to make it more irregular + (fade it out with noise + the "painterly" effect from Krokodove) to give it the water color appearance + paper texture + adding lens blur One of my favorite animations I've made =]. Thanks for commenting on it
@etiennetiennetienne
@etiennetiennetienne Місяць тому
There are also ways to write c++ directly in python i think, for instance cppyy or with torch extension
@dougmercer
@dougmercer Місяць тому
True! Through C/C++ extension libraries, you can directly write/link C/C++ libraries and write your own Python interface to it. Cppyy, ctypes, cffi, pybind11, and Cython are all fair game for this.
2 місяці тому
Nice. Thanks!
@dougmercer
@dougmercer 2 місяці тому
No prob! Glad it was helpful
@timlambe8837
@timlambe8837 6 місяців тому
Really interresting Video. I‘d love to learn more about it. Maybe I will be laughed at for this statement, but even with this video i feel like bringing python to C-Level performance seems to be quite a bit of an effort. Isnt it worth it to learn C/C++ for special tasks? How would you evaluate the developer‘s expirience comparing „Make everything possible with Python“ with „Learning C/C++ or Rust“? Thanks a Lot!
@dougmercer
@dougmercer 6 місяців тому
You're right! It's not easy to get C++ performance in Python. I think these tools are appropriate when there are a few "hot spots" in your code, but the majority of your application benefits from Python's ecosystem. It's possible to directly build C extensions and call them from python, but I think these tools are way easier. For some (new) projects, it might make sense to write the whole thing in Rust from the start. In practice, most of my projects use a lot of Python libraries, and my team is not very flexible (they mostly only know Python), so it'd be pretty disruptive if I wrote a critical component in a different language and with different tooling. Good question! (Sorry I don't have a good answer =P)
@timlambe8837
@timlambe8837 6 місяців тому
@@dougmercer that is indeed a good answer, thanks. Since I am working in the Data analysis field (geospatial) I love Python for its possibilities. I was wondering if it makes sense to learn another language for intensive calculations like C++. But think I will try your tools 😊 Many thanks!
@ButchCassidyAndSundanceKid
@ButchCassidyAndSundanceKid 4 місяці тому
Was your taichi (arch) based on cpu or gpu when you carried out the benchmark testing ?
@dougmercer
@dougmercer 4 місяці тому
The LCS dynamic program was on CPU. The visualization I showed at the beginning of the section of a kind of warping fractal was on GPU.
@ButchCassidyAndSundanceKid
@ButchCassidyAndSundanceKid 4 місяці тому
@@dougmercer Thanks. Taichi certainly looks promising, but I still prefer Numba for its simplicity, i.e. adding a couple of decorators, without altering the code too much. Have you tried Spark and Dask ? They're both parallel programming libraries.
@dougmercer
@dougmercer 4 місяці тому
Yup, both are great! Since this problem couldn't be easily parallelized, I didn't mention them. And I agree, in general Numba will be easier than Taichi by a long shot. I just thought Taichi was kind of neat so I included it in the video ¯\_(ツ)_/¯
@gorrofrigio5570
@gorrofrigio5570 18 днів тому
Thank you Doug for this awesome video! Btw, just curious: has anyone tried some of this on Pygame? I know Python it's not a common language in the videogame industry, but maybe some of this could bring it some justice (and good surprises).
@dougmercer
@dougmercer 18 днів тому
You can definitely use Cython or Numba to help speed some things up with pygame. I found a few old reddit threads that included demos and discussions by searching "Numba pygame reddit".
@Petch85
@Petch85 6 місяців тому
Grate video. I will give numba a try... I use numpy all the time, and that is super fast for my work. But I always end up needing to plot some numbers, and save it as a png file or something. I use matplotlib, and most of the time i can read and manipulate my data i lest than 0.1 sec. But then making the plot takes maybe 1 sec, and saving the png file also take 1 sec. Is there anything I could do. (I have more than one file of data, and need more than one plot saved... I know 3 sec do not seem like a long time, but it adds up)
@dougmercer
@dougmercer 6 місяців тому
Hmm, I don't have any sure-fire recommendations. Could potentially try using multiprocessing if your plotting function is easy to map over an iterable of inputs? That way you can maybe speed up by the number of cores your CPU has.
@lbgstzockt8493
@lbgstzockt8493 Місяць тому
Are you showing the plot? There is a way to not show the plot windows but still save to a file, it is still slow but much less than two seconds.
@thesnedit5406
@thesnedit5406 Місяць тому
The theme, info, ambience and the whole vibe of the video is so good. Subscribed !
@dougmercer
@dougmercer Місяць тому
That's like the best compliment =] thanks!
@user-up8fm3vb1r
@user-up8fm3vb1r 21 день тому
Amazing work, as someone who has to use python against my will, I enjoy your videos
@dougmercer
@dougmercer 21 день тому
Thanks =]. What's your preferred language if Python is against your will?
@user-up8fm3vb1r
@user-up8fm3vb1r 20 днів тому
@@dougmercer Haskell is my love and I like lambda calculus so I am writing a interpreter and compiler for my own lc implementation for fun. (in haskell)
@dougmercer
@dougmercer 20 днів тому
@@user-up8fm3vb1r very cool. I haven't touched Haskell much, but I'm learning ocaml for fun recently and enjoying it
@user-up8fm3vb1r
@user-up8fm3vb1r 20 днів тому
@@dougmercer glad to see you join the functional land.. enjoy!!
@helkindown
@helkindown Місяць тому
Great video! From what I've tested, your C++ code is good enough. The main bottleneck of your code seems to be the dp result variable. I was able to double the speed (from 3.78832 to 1.77546 seconds) by replacing dp 2D array by two 1D arrays: one "current row" array and "previous row" array, and swapping references around at each iteration. This probably because the code don't have as many cache misses by not fetching new rows of the "dp" array, which are filled by zeros anyway. I did not test this with the Python code, but the same speedup should be obtainable by using two variable (or an tuple of 2 arrays) to keep up with C++.
@dougmercer
@dougmercer Місяць тому
Good point! I may have to re-run this experiment at some point-- I wonder how Numba/cython would perform with that more memory efficient approach 🤔
@Daekar3
@Daekar3 Місяць тому
I feel like this is one reason why my PC is literally god-tier compared to what I went to college with, but the day to day experience really isn't ant different. My games are prettier and my SSD is bigger, but the mechanics is using the OS is NOT orders of magnitude better.
@ThatJay283
@ThatJay283 Місяць тому
with the c++ version, did you compile it with -O3 optimisations enabled?
@dougmercer
@dougmercer Місяць тому
Yup! gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@ThatJay283
@ThatJay283 Місяць тому
@@dougmercer thanks! i just managed to get it 169% faster (see fork). still, the speed improvements offered by numba, pyx, and taichi are really impressive :)
@dougmercer
@dougmercer Місяць тому
Very cool! Yesterday I implemented the 1D index approach (not nearly as cleverly-- just hand jammed the indexing arithmetic in line) and I got about 1.1-1.2x speed up. Does the noexcept make a difference in performance? Or is there something else causing the extra 0.4ish speed up 🤔
@rm9050
@rm9050 5 місяців тому
Is useful use Taichi for load csv like pandas? I discover dask and is fantastic
@dougmercer
@dougmercer 5 місяців тому
Hmm, I might be wrong, but I don't believe Taichi has any filesystem support. I believe the simple thing to do would be to read data in Python and pass it to Taichi for processing. That said, I love Dask and Pandas! They rock!
@stereoplegic
@stereoplegic 2 місяці тому
Polars is faster than Pandas with almost identical API, right?
@dougmercer
@dougmercer 2 місяці тому
Yes, it is. I'm actually working on a video that talks about trying to read a very large CSV file and do some basic number crunching with it. (The one billion rows challenge, 1brc, but in Python) Spoiler alert, Polars and Duckdb are great choices.
@incremental_failure
@incremental_failure Місяць тому
Polars is by far the fastest to load CSV. It might even be faster when you load in polars and convert to pandas.
@legion_prex3650
@legion_prex3650 2 місяці тому
Love you channel! Nice 80ies sound!
@dougmercer
@dougmercer 2 місяці тому
Thanks! I had fun choosing music for this one =]
@cleteblackwell1706
@cleteblackwell1706 Місяць тому
Can you do these kinds of comparisons for building flask apps?
@dougmercer
@dougmercer Місяць тому
Hmm, what specifically did you have in mind? As an aside, I typically use FastAPI for Python web projects, but have used Flask in the past
@cleteblackwell1706
@cleteblackwell1706 Місяць тому
Either is fine. Maybe an api that calls a couple other APIs and reads from a database. That would be your typical business api.
@system64_MC
@system64_MC 2 місяці тому
What happens if you use the -O2 or -O3 optimisation flag for the C++ implementation?
@dougmercer
@dougmercer 2 місяці тому
I did compile with -O3 for my C++ test
@dougmercer
@dougmercer 2 місяці тому
gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@system64_MC
@system64_MC 2 місяці тому
@@dougmercer Oh, you did. This is surprising how Python can be faster than C++!
@dougmercer
@dougmercer 2 місяці тому
Definitely surprising! That said, I'm sure someone could write faster C++! But, it did beat my first attempt at translating the code into C++ ¯\_(ツ)_/¯
@IamusTheFox
@IamusTheFox Місяць тому
Im enjoying the video, serious question though. How can jit be faster than c++? Did you have the c++ optimizer on? Nevermind, found a comment where you said that you used -O3. Great work. I feel like anyone who complains about your c++ isn't being fair. While i may have done it another way, its valid
@dougmercer
@dougmercer Місяць тому
Probably means that I left some performance on the table in the C++, or the JIT pulled some tricks that most people wouldn't pull when writing it natively. Someone else in the comments found that using a flat 1D array gave the C++ a 1.1-1.2x speedup. That probably puts it on par with the Numba/Taichi ndarray approaches That said, the point of the video still stands-- for at least this particular problem, there are several approaches for getting performance on par with native C++
@IamusTheFox
@IamusTheFox Місяць тому
Absolutely! Fantastically well done. I'm really quiet impressed by what you did.
@dougmercer
@dougmercer Місяць тому
Thanks =]
@Caspar__
@Caspar__ Місяць тому
But most of the time I use pyhton libraries. Can I just in time copile those as well?
@dougmercer
@dougmercer Місяць тому
I do not believe any of these options will compile or JIT third party libraries. If I'm wrong, hopefully someone will correct me. That said, you can try using a different a Python interpreter altogether. PyPy would JIT whatever code it runs (but you need to use the PyPy interpreter instead of CPython)
@Caspar__
@Caspar__ Місяць тому
@@dougmercer Thanks a lot : )
@BrunoGallant
@BrunoGallant Місяць тому
Great production value. Thanks for the tips. Grumpy linux sysadmin here, definitively does not want to learn C++. With good speed, python is perfect.
@dougmercer
@dougmercer Місяць тому
Glad it was helpful! And definitely, I'm a big fan of "good enough" speed, and generally I can get that with Python
@PySnek
@PySnek 20 днів тому
What about Nim?
@ethan91372
@ethan91372 Місяць тому
4:00 where do you get this footage?
@dougmercer
@dougmercer Місяць тому
Storyblocks
@OliverBatchelor
@OliverBatchelor 2 місяці тому
Taichi for the win. You didn't even use GPU programming with it, which is all I do - the inter-op with torch is excellent and works the same way as the ndarray.
@dougmercer
@dougmercer 2 місяці тому
Taichi was super fun. I did use GPU (well, metal) for rendering the fractal animation. Was pleasantly surprised at how easy it was.
@OliverBatchelor
@OliverBatchelor 2 місяці тому
@@dougmercerSorry that possibly came out the wrong way - I meant that you did a great job demonstrating it *even without* using the GPU!
@dougmercer
@dougmercer 2 місяці тому
Oh, I see now-- hah! Thanks =] I definitely would like to try using Taichi for an ML project. Taichi + Torch seems like a great fit. Do you have any open source projects you've done with it? (I have skimmed through the docs section involving torch, but haven't looked at real projects). I also thought it might be fun to make a "shader" to process video (but I can't for the life of me figure out how to extend Davinci Resolve with Python code, so that's kind of an unrelated blocker).
@OliverBatchelor
@OliverBatchelor 2 місяці тому
@rcer Yep! A few now - most of them are for bits and pieces I do at work, and largely undocumented e.g. for an HDR Camera ISP pipeline or a spatial subdivision grid for distance queries. By far the biggest one so far is a Taichi library for Gaussian Splatting rasterization, I called it taichi-splatting (distinct from original taichi_3d_gaussian_splatting, which it originally derived from but is very different now!). It has a few rough edges but I think it has enabled quite a clean yet performant implementation. I replied a yesterday but I see my comment is nowhere to be seen I think because I put a link in here, so I haven't this time! I must admit that before watching this video I did not realise that the CPU implementation in taichi performed so well, especially with the outer loop serialised!
@dougmercer
@dougmercer 2 місяці тому
Oh right! I saw your posts in the discord-- I read through the readme a bit. It looks very interesting-- I'll take another look at the source code sometime tomorrow. And sorry about the link issue! For some reason it's not showing up in comments held for review on the mobile app. I'll check in a browser tomorrow and hopefully approve it (if not, UKposts totally ate it-- sorry)
@marcelobravo3074
@marcelobravo3074 4 місяці тому
this is gold
@dougmercer
@dougmercer 4 місяці тому
Thanks! Glad you liked it =]
@Zeioth
@Zeioth Місяць тому
I'm missing nuitka on that comparison, but very cool.
@dougmercer
@dougmercer Місяць тому
I've never tried it! Does it work well? I'll have to mess with it sometime 🤔 That said, I am working on a video where I cover one library that I wanted to include in this video (PyPy).
@jimmysaxblack
@jimmysaxblack 2 місяці тому
fantastic thanks a lot
@dougmercer
@dougmercer 2 місяці тому
Glad it was helpful =]
@MrNolimitech
@MrNolimitech Місяць тому
When you reach the 100x speed performance, I don't think it really matter that you can do better (Maybe with some cases). Most of the time, it's only because the code is wrong. People that is new (or even pro) to python, think it's slow, because they heard it somewhere. But in fact, it's only because they can't write better codes. They duplicates everything. They initializes the same thing at multiples times. They repeat themself. Using multiprocessors or threads with a huge function (method) that do everything inside, instead of separate things and use the cpu/gpu for specifics calculations. These are good libraries, but I hope people will try to optimize their codes with betters lines before using those libraries.
@dougmercer
@dougmercer Місяць тому
I agree! there is usually a lot of room to make your algorithm/implementation better
@overbored1337
@overbored1337 9 днів тому
Python is super slow by default. The only skill issue is actually the choice of Python when performance matters, because it was never designed for speed, or power draw, and optimizing it goes against its fundamentals. If it does not fit, as is, then use another language instead of a shoehorn.
@varunbhaaskar3338
@varunbhaaskar3338 19 днів тому
how many of them are production ready? is there anything like this that is production ready?
@dougmercer
@dougmercer 19 днів тому
I would say Cython and Numba are definitely "production ready"
@markkim5117
@markkim5117 4 місяці тому
WOW I'm impressed!
@dougmercer
@dougmercer 4 місяці тому
Thanks! =]
@imadlatch7206
@imadlatch7206 3 місяці тому
we just use pypy as interpreter, no need anything else
@dougmercer
@dougmercer 3 місяці тому
Yeah, pypy is a great option
@arta6183
@arta6183 4 місяці тому
Can you also share the C++ code? It's very easy to write slow C++ code. If the code involves vectors, then AVX optimizations can drastically improve performance on x86 CPUs.
@dougmercer
@dougmercer 4 місяці тому
Hey @arta6183 - Sure! Here's a link to the code and compile command in a gist - gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3 Note-- this algorithm is inherently *not* parallelizable unless you do some really wonky stuff (wave front optimization). So, I'm not sure if AVX will help. That said, I would love to see you squeeze 10x more performance out of it and share a gist back to me. Like I said in the video -- I only know the absolute basics of C++, so my C++ code is *bad*.
@AndersonPEM
@AndersonPEM 21 день тому
[Tries with Rust] the result shows up before you even start the program 😂
@janAkaliKilo
@janAkaliKilo Місяць тому
Another option - learn Nim. It is an easy to learn language with a pythonic syntax. Because Nim is a compiled language, it's speed is on par with C, C++ and Rust.
@dougmercer
@dougmercer Місяць тому
I've been meaning to give it a shot... It definitely seems very approachable
@codingwithmagic
@codingwithmagic 6 місяців тому
can you give me the website name for making coding animation like you change words❤❤
@dougmercer
@dougmercer 6 місяців тому
Can you link to a time code in the video to point to which animation you're specifically interested in? In this video, I used 1. Davinci Resolve (main editor). 2. manim's code object, docs.manim.community/en/stable/reference/manim.mobject.text.code_mobject.Code.html Manim is kind of a pain, though, and you can't automatically animate between two pieces of code.
@codingwithmagic
@codingwithmagic 6 місяців тому
@@dougmercer Yes like writing automatically codes in video 3.47
@dougmercer
@dougmercer 6 місяців тому
Gotcha. I made that animation with manim. For a quick gist, 1. Create two code objects, a and b, where a is the code before the change and be is the code after. 2. Initialize the manim scene with code a. 3. Animate the movement of the comma and parenthesis in code a to their corresponding position in code b 4. I (believe I) used AddTextLetterByLetter to write on the text.
@dougmercer
@dougmercer 2 місяці тому
Hey @codingwithmagic-- you can also look into reveal.js, which supports "auto-animate" for animating the difference between code. It seems to work well for small snippets, but doesn't have an easy way to display a large blob of code. Would also be a little annoying to capture into video, cause it basically generates a webpage.
@budidarmawan6959
@budidarmawan6959 Місяць тому
this is a very nice video.
@dougmercer
@dougmercer Місяць тому
Thanks =]
@sootguy
@sootguy Місяць тому
what about pypy?
@dougmercer
@dougmercer Місяць тому
I'm working on a video that uses it right now =]
@Uveryahi
@Uveryahi Місяць тому
Came for the video, stayed for the stock footage inserts x)
@dougmercer
@dougmercer Місяць тому
=] I also used Nosferatu in my other video called "Your code is almost entirely untested"... I wonder what it means that I keep putting horror movie clips into my Python explainers 🤔
@Erros
@Erros Місяць тому
the speed up at 2:26 is a funnier number than 100x but also much lower 2:56 minutes -> 176 seconds / 2.56 seconds
@dougmercer
@dougmercer Місяць тому
Ah, the clock visualization is confusing. The vanilla Python approach did take 256 seconds, not 2 minutes 56 seconds.
@demonman1234
@demonman1234 Місяць тому
Yk.. I’m not a software dev or anything for companies and I code for either myself or requests from friends… I’ll wait or they can wait for my program to finish (: I get enough headaches as it is for absolutely no logical reason… no need for another (:
@dougmercer
@dougmercer Місяць тому
Hah, I think that's a perfectly good approach =]. I'll take fast to write over fast to run on most days
@demonman1234
@demonman1234 Місяць тому
@@dougmercer Exactly… plus this just seems like too much of a hassle for me. LOL my programs aren’t typically big enough for it to matter.
@francescotomba1350
@francescotomba1350 Місяць тому
Did you compile with -O3 in c++?
@dougmercer
@dougmercer Місяць тому
Yup! gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3 Some people in comments have gotten between 1.1-1.7x speed up through other improvements, but it doesn't really change the narrative much: these compiled Python tools frequently give good enough performance
@francescotomba1350
@francescotomba1350 Місяць тому
@@dougmercer thank you! I think is really problem dependent. In some codebases I worked I had for example a 40x speed up over cython or numba by embedding very very small pure C functions using ctypes.
@dougmercer
@dougmercer Місяць тому
Oh definitely agree. Squeezing out performance is always "it depends" and "did you profile it?"
@francescotomba1350
@francescotomba1350 Місяць тому
Yes, in my case there were two issues, the first was that cython for some things relies on the python interpreter if data and objects are not managed in the most cythonic way, the second was cache misses. I was working on a kd-tree implementation and a tiny detail on how nodes are managed let me cut out on cache misses during tree traversal. For that purpose I used perf to sample from the process but I know for sure that there are many other options for doing that.
@francescotomba1350
@francescotomba1350 Місяць тому
​@@dougmercer Moreover, numba is a life saver if you need performance on the fly without many refactors.
@monza8844
@monza8844 4 місяці тому
Meh... I'm just going to learn a language that is fast, instead of dealing with this hassle.
@B_a_s_t_e_r_b_i_n_e
@B_a_s_t_e_r_b_i_n_e 3 місяці тому
Well, if you work with special case like i do. It's better to deal with this hassle instead learning another language lmao.
@JonitoFischer
@JonitoFischer 2 місяці тому
Not everything should run fast. Maybe running slow and use less time to develop is the way to go.
@HuxleysShaggyDog
@HuxleysShaggyDog 2 місяці тому
Is it academics you work with or libraries nobody ported to other languages or made into DLLs for interop?
@anandsuralkar2947
@anandsuralkar2947 Місяць тому
Good luck making a website in c++ or js compared to python
@HuxleysShaggyDog
@HuxleysShaggyDog Місяць тому
@@anandsuralkar2947 It's really not that hard, MVC frameworks exist for everything.
@ShanilPanara
@ShanilPanara 7 місяців тому
So good
@dougmercer
@dougmercer 7 місяців тому
Thanks =]
@Angel33Demon666
@Angel33Demon666 27 днів тому
How does this compare with Julia? I found that its fast just out of the box
@dougmercer
@dougmercer 26 днів тому
I didn't try Julia, but I've used it a bit in the past and it is quite fast. In a future video, I'd like to throw Julia and Nim into the mix
@dearheart2
@dearheart2 20 днів тому
I wish all videos (no just youtube) has voice and music as separate channels. I hate music in educational videos.
@Gardenmonkey78
@Gardenmonkey78 4 місяці тому
Numba is super cool, you can also parallelize super easily
@dougmercer
@dougmercer 4 місяці тому
Absolutely, I ❤️ numba
@encapsulatio
@encapsulatio 11 місяців тому
Glad you are back! And then there's Mojo, the one that will swallow Python in a serpently fashion. It's basically Python++, the Python superset.
@dougmercer
@dougmercer 11 місяців тому
Thanks se se! =] I think mojo is very cool. That said, from what I know, I believe their license was restrictive for commercial use? Maybe I'll eventually do a follow-up video on it and the other proprietary Python superset that Im failing to recall the name of if this video does well. I also skipped over PyPy, for the sole reason that it failed to install/run on my laptop. ¯\_(ツ)_/¯
@eliavrad2845
@eliavrad2845 11 місяців тому
if python++ was that good, cython would already be the big thing. I feel like this approach suffers from both worlds: it's harder to understand how a program works compare to python, so nobody uses it instead of python, and it's harder to optimize than c++, so nobody uses it instead of c++.
@SENYSENofficial
@SENYSENofficial 6 місяців тому
excelent! i wil learn c++
@dougmercer
@dougmercer 6 місяців тому
Can't go wrong with C++!
@cucen24601
@cucen24601 Місяць тому
"Numba is so much easier than Cython" In reality it is so much more painful to code in Numba and doesn't really work very well. At least for Cython, if I coded correctly, it works correctly. Numba doesn't let me do things I expect them to do...
@dougmercer
@dougmercer Місяць тому
That is fair. It works well when you already mostly (or entirely) use supported features, and is incredibly painful when you don't.
@cucen24601
@cucen24601 Місяць тому
@@dougmercer Yup. Thank you for the great content by the way, I wasn't paying much attention to numba and thought Cython to be as fast it could get, but the results are shocking.
@dougmercer
@dougmercer Місяць тому
There might have been some cython tricks that I missed. I'm definitely not an expert, but I found it really hard to get up to Numba speeds.
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Переглядів 129 тис.
Write Python code people actually want to use
8:03
Doug Mercer
Переглядів 11 тис.
ЧТО ДЕЛАТЬ, ЕСЛИ НЕ ХВАТАЕТ ДЕНЕГ НА ВОССТАНОВЛЕНИЕ ТАЧКИ?
47:52
25 nooby Python habits you need to ditch
9:12
mCoding
Переглядів 1,7 млн
Python's 5 Worst Features
19:44
Indently
Переглядів 55 тис.
5 Good Python Habits
17:35
Indently
Переглядів 280 тис.
Make Python code 1000x Faster with Numba
20:33
Jack of Some
Переглядів 437 тис.
Optimizing my Game so it Runs on a Potato
19:02
Blargis
Переглядів 243 тис.
So You Think You Know Git - FOSDEM 2024
47:00
GitButler
Переглядів 902 тис.
The Bubble Sort Curve
19:18
Lines That Connect
Переглядів 307 тис.
I Made A Fully Ray Traced Game
26:17
Acerola
Переглядів 145 тис.
PLEASE Use These 5 Python Decorators
20:12
Tech With Tim
Переглядів 82 тис.
100x Faster Than NumPy... (GPU Acceleration)
28:49
Mr. P Solver
Переглядів 80 тис.
ЧТО ДЕЛАТЬ, ЕСЛИ НЕ ХВАТАЕТ ДЕНЕГ НА ВОССТАНОВЛЕНИЕ ТАЧКИ?
47:52