Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha

Apache Spark Core-Deep Dive-Proper Optimization Daniel Tomes Databricks

Баттл трендов: СЛЕДАЧКА VS ЛЯЛЯ #shorts #шортс #марьяналокель

The Dominating SKILL From Vasiliy Lomachenko Against George Kambosos! | FIGHT HIGHLIGHTS

ПОГНУТИ СТВОЛ або масові психічні розлади в росії #вечір_з_яніною_соколовою #победобесие

😨Новая Война в GTA 5 Online #shorts

Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland

Переглядів 69,095

Spark Summit

Spark Summit

7 років тому

КОМЕНТАРІ: 24

@stephaniedatabricksrivera

@stephaniedatabricksrivera 3 роки тому

Emily's Parkay butter pics made me laugh. Really enjoyed this. Great job Emily!!

@flwi 6 років тому

Wow, great presentation!

@manjunath15 5 років тому

Very informative and nicely articulated.

@maa1dz1333q2eqER

@maa1dz1333q2eqER 5 років тому

Great presentation, touched a lot of important areas, thanks

@HasanAmmori 2 роки тому

Fantastic talk! I wish there was a little more info on the format spec itself.

@Tomracc 2 роки тому

this is wonderful, enjoyed start to end :)

@TheAjit1111 4 роки тому

Great talk, Thank you

@tianzhang3120 3 роки тому

Awesome presentation!

@gmetrofun 4 роки тому

AWS S3 supports random access queries (i.e., Range Header), consequently pushdown is also supported on AWS S3

@bnsagar90 3 роки тому

Can you please some text or link where I can read more about this. Thanks.

@betterwithrum 5 років тому

Where are the slides?

@bogdandubas3978

@bogdandubas3978 3 роки тому

Amazing speaker!

@amitbhattacharyya5925

@amitbhattacharyya5925 2 роки тому

good explanations , this would be great if some git code they can mention

@djibb.7876 6 років тому

Great talk!!! I set up a spark-cluster with 2 workers. I save a Dtaframe using partitionBy ("column x") as a parquet format to some path on each worker. The matter is that i am able to save it but if i want to read it back i am getting these errors: - Could not read footer for file file´status ...... - unable to specify Schema ... Any Suggestions?

@clray123 6 років тому

Eh so basically any sort of growing data can be only partitioned in one way (along the dimension of the growth - which for many use cases will be some meaningless "autoincrement" id). Which then defeats all the push-down filtering for any other dimension. Not to mention that if your data keeps growing in small increments and you need access to latest of it, you will have to jump through hoops to somehow integrate all those small increments into bigger files - because scanning 20000 tiny files ain't gonna be efficient (and this means lots of constant rewriting - that's why write speed DOES matter and it's not "write-once", but write-many)...

@HughMcBrideDonegalFlyer

@HughMcBrideDonegalFlyer 7 років тому

Great talk on a very important (and too often overlooked ) topic

@ardenjar7942 7 років тому

Awesome thanks!

@thomasgong5538

@thomasgong5538 4 роки тому

具有一定的指导学习作用。

@deenadayalmuli2756

@deenadayalmuli2756 6 років тому

to my experience, orc supports nesting...

@pradeep422 5 років тому

The only thing I liked is the way Emily executed it.

@mikecmw8492 5 років тому

Why is everyone a "spark expert"?? Get real and just show us how to do it...

@betterwithrum 5 років тому

there are spark experts, just far and few between. I've hired a few, but they were unicorns

Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha

29:33

Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha

Spark Summit

Переглядів 84 тис.

Apache Spark Core-Deep Dive-Proper Optimization Daniel Tomes Databricks

1:30:18

Apache Spark Core-Deep Dive-Proper Optimization Daniel Tomes Databricks

Databricks

Переглядів 170 тис.

Баттл трендов: СЛЕДАЧКА VS ЛЯЛЯ #shorts #шортс #марьяналокель

00:29

Баттл трендов: СЛЕДАЧКА VS ЛЯЛЯ #shorts #шортс #марьяналокель

Maryana Lokel

Переглядів 1,4 млн

The Dominating SKILL From Vasiliy Lomachenko Against George Kambosos! | FIGHT HIGHLIGHTS

02:23

The Dominating SKILL From Vasiliy Lomachenko Against George Kambosos! | FIGHT HIGHLIGHTS

Top Rank Boxing

Переглядів 1,7 млн

ПОГНУТИ СТВОЛ або масові психічні розлади в росії #вечір_з_яніною_соколовою #победобесие

00:39

ПОГНУТИ СТВОЛ або масові психічні розлади в росії #вечір_з_яніною_соколовою #победобесие

Вечір з Яніною Соколовою

Переглядів 192 тис.

😨Новая Война в GTA 5 Online #shorts

00:40

😨Новая Война в GTA 5 Online #shorts

King Dm

Переглядів 1,7 млн

Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on DataExpert.io

59:31

Data Lake Fundamentals, Apache Iceberg and Parquet in 60 minutes on DataExpert.io

Data with Zach

Переглядів 19 тис.

The columnar roadmap: Apache Parquet and Apache Arrow

41:39

The columnar roadmap: Apache Parquet and Apache Arrow

DataWorks Summit

Переглядів 32 тис.

Top 5 Mistakes When Writing Spark Applications

30:37

Top 5 Mistakes When Writing Spark Applications

Spark Summit

Переглядів 100 тис.

Scalable Stream Processing: A Survey of Storm, Samza, Spark and Flink by Felix Gessert

49:00

Scalable Stream Processing: A Survey of Storm, Samza, Spark and Flink by Felix Gessert

Devoxx

Переглядів 19 тис.

NEW GPT-4o: My Mind is Blown.

6:28

NEW GPT-4o: My Mind is Blown.

Joshua Chang

Переглядів 6 тис.

SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal

36:24

SparkSQL: A Compiler from Queries to RDDs: Spark Summit East talk by Sameer Agarwal

Spark Summit

Переглядів 30 тис.

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

40:45

The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)

Databricks

Переглядів 145 тис.

A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)

44:03

A Deeper Understanding of Spark Internals - Aaron Davidson (Databricks)

Spark Summit

Переглядів 145 тис.

Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

29:43

Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong

Databricks

Переглядів 25 тис.

Understanding Query Plans and Spark UIs - Xiao Li Databricks

33:12

Understanding Query Plans and Spark UIs - Xiao Li Databricks

Databricks

Переглядів 24 тис.

Кто такие ТЕСТИРОВЩИКИ на самом деле? #компьютер #программирование #тестирование #айти

1:00

Кто такие ТЕСТИРОВЩИКИ на самом деле? #компьютер #программирование #тестирование #айти

Rocket Tech School

Переглядів 338 тис.

Апгрейд Геймпада на 3Д Принтере #3d #3дпечать #интересно #россия #ps4 #ps5 #ps3 #игры #рек

0:53

Апгрейд Геймпада на 3Д Принтере #3d #3дпечать #интересно #россия #ps4 #ps5 #ps3 #игры #рек

Гаражанин

Переглядів 669 тис.

Как ДВА сервиса не заметили этого?😱ТОПОВЫЙ ноутбук Tianxuan 4(i9-13900H, RTX 4060) / нет изображения

23:30

Как ДВА сервиса не заметили этого?😱ТОПОВЫЙ ноутбук Tianxuan 4(i9-13900H, RTX 4060) / нет изображения

notebook-31

Переглядів 79 тис.

How Neuralink Works 🧠

0:28

How Neuralink Works 🧠

Zack D. Films

Переглядів 26 млн

❌УШЛА ЭПОХА!🍏

0:37

❌УШЛА ЭПОХА!🍏

Demin's Lounge

Переглядів 308 тис.

Я Создал Новый Айфон!

0:59

Я Создал Новый Айфон!

FLV

Переглядів 2,9 млн

❌УШЛА ЭПОХА!🍏

0:37

❌УШЛА ЭПОХА!🍏

Demin's Lounge

Переглядів 308 тис.