Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

  Переглядів 69,095

Spark Summit

Spark Summit

7 років тому

КОМЕНТАРІ: 24
@stephaniedatabricksrivera
@stephaniedatabricksrivera 3 роки тому
Emily's Parkay butter pics made me laugh. Really enjoyed this. Great job Emily!!
@flwi
@flwi 6 років тому
Wow, great presentation!
@manjunath15
@manjunath15 5 років тому
Very informative and nicely articulated.
@maa1dz1333q2eqER
@maa1dz1333q2eqER 5 років тому
Great presentation, touched a lot of important areas, thanks
@HasanAmmori
@HasanAmmori 2 роки тому
Fantastic talk! I wish there was a little more info on the format spec itself.
@Tomracc
@Tomracc 2 роки тому
this is wonderful, enjoyed start to end :)
@TheAjit1111
@TheAjit1111 4 роки тому
Great talk, Thank you
@tianzhang3120
@tianzhang3120 3 роки тому
Awesome presentation!
@gmetrofun
@gmetrofun 4 роки тому
AWS S3 supports random access queries (i.e., Range Header), consequently pushdown is also supported on AWS S3
@bnsagar90
@bnsagar90 3 роки тому
Can you please some text or link where I can read more about this. Thanks.
@betterwithrum
@betterwithrum 5 років тому
Where are the slides?
@bogdandubas3978
@bogdandubas3978 3 роки тому
Amazing speaker!
@amitbhattacharyya5925
@amitbhattacharyya5925 2 роки тому
good explanations , this would be great if some git code they can mention
@djibb.7876
@djibb.7876 6 років тому
Great talk!!! I set up a spark-cluster with 2 workers. I save a Dtaframe using partitionBy ("column x") as a parquet format to some path on each worker. The matter is that i am able to save it but if i want to read it back i am getting these errors: - Could not read footer for file file´status ...... - unable to specify Schema ... Any Suggestions?
@clray123
@clray123 6 років тому
Eh so basically any sort of growing data can be only partitioned in one way (along the dimension of the growth - which for many use cases will be some meaningless "autoincrement" id). Which then defeats all the push-down filtering for any other dimension. Not to mention that if your data keeps growing in small increments and you need access to latest of it, you will have to jump through hoops to somehow integrate all those small increments into bigger files - because scanning 20000 tiny files ain't gonna be efficient (and this means lots of constant rewriting - that's why write speed DOES matter and it's not "write-once", but write-many)...
@HughMcBrideDonegalFlyer
@HughMcBrideDonegalFlyer 7 років тому
Great talk on a very important (and too often overlooked ) topic
@ardenjar7942
@ardenjar7942 7 років тому
Awesome thanks!
@thomasgong5538
@thomasgong5538 4 роки тому
具有一定的指导学习作用。
@deenadayalmuli2756
@deenadayalmuli2756 6 років тому
to my experience, orc supports nesting...
@pradeep422
@pradeep422 5 років тому
The only thing I liked is the way Emily executed it.
@mikecmw8492
@mikecmw8492 5 років тому
Why is everyone a "spark expert"?? Get real and just show us how to do it...
@betterwithrum
@betterwithrum 5 років тому
there are spark experts, just far and few between. I've hired a few, but they were unicorns
Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha
29:33
Apache Spark Core-Deep Dive-Proper Optimization Daniel Tomes Databricks
1:30:18
😨Новая Война в GTA 5 Online #shorts
00:40
King Dm
Переглядів 1,7 млн
The columnar roadmap: Apache Parquet and Apache Arrow
41:39
DataWorks Summit
Переглядів 32 тис.
Top 5 Mistakes When Writing Spark Applications
30:37
Spark Summit
Переглядів 100 тис.
NEW GPT-4o: My Mind is Blown.
6:28
Joshua Chang
Переглядів 6 тис.
A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)
44:03
Understanding Query Plans and Spark UIs - Xiao Li Databricks
33:12
Databricks
Переглядів 24 тис.
How Neuralink Works 🧠
0:28
Zack D. Films
Переглядів 26 млн
❌УШЛА ЭПОХА!🍏
0:37
Demin's Lounge
Переглядів 308 тис.
Я Создал Новый Айфон!
0:59
FLV
Переглядів 2,9 млн
❌УШЛА ЭПОХА!🍏
0:37
Demin's Lounge
Переглядів 308 тис.