17 OCT 2024 - Welcome Back to TorrentFunk! Get your pirate hat back out. Streaming is dying and torrents are the new trend. Account Registration works again and so do Torrent Uploads. We invite you all to start uploading torrents again!
Discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs
Video Details
ISBN 9781789801125
Course Length 2 hours 26 minutes
Table of Contents
• TRANSFORMATIONS AND ACTIONS
• IMMUTABLE DESIGN
• AVOID SHUFFLE AND REDUCE OPERATIONAL EXPENSES
• SAVING DATA IN THE CORRECT FORMAT
• WORKING WITH SPARK KEY/VALUE API
• TESTING APACHE SPARK JOBS
• LEVERAGING SPARK GRAPHX API
Video Description
Apache Spark has been around for quite some time, but do you really know how to get the most out of Spark? This course aims at giving you new possibilities; you will explore many aspects of Spark, some you may have never heard of and some you never knew existed.
In this course you'll learn to implement some practical and proven techniques to improve particular aspects of programming and administration in Apache Spark. You will explore 7 sections that will address different aspects of Spark via 5 specific techniques with clear instructions on how to carry out different Apache Spark tasks with hands-on experience. The techniques are demonstrated using practical examples and best practices.
By the end of this course, you will have learned some exciting tips, best practices, and techniques with Apache Spark. You will be able to perform tasks and get the best data out of your databases much faster and with ease.
Style and Approach
This step-by-step and fast-paced guide will help you learn different techniques you can use to optimize your testing time, speed, and results with a practical approach, take your skills to the next level, and get you up-and-running with Spark.
What You Will Learn
• Compose Spark jobs from actions and transformations
• Create highly concurrent Spark programs by leveraging immutability
• Ways to avoid the most expensive operation in the Spark API—Shuffle
• How to save data for further processing by picking the proper data format saved by Spark
• Parallelize keyed data; learn of how to use Spark's Key/Value API
• Re-design your jobs to use reduceByKey instead of groupBy
• Create robust processing pipelines by testing Apache Spark jobs
• Solve repeated problems by leveraging the GraphX API
Authors
Tomasz Lelek
Tomasz Lelek is a Software Engineer, programming mostly in Java and Scala. He is a fan of microservice architectures and functional programming. He dedicates considerable time and effort to getting better every day. He is passionate about nearly everything associated with software development, and believes that we should always try to consider different solutions and approaches before solving a problem. Recently he was a speaker at conferences in Poland -, Confitura and JDD (Java Developers Day), and also at Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference.
For More Udemy Free Courses >>> http://www.freetutorials.eu
For more Lynda and other Courses >>> https://www.freecoursesonline.me/
Our Forum for discussion >>> https://discuss.freetutorials.eu/
VISITOR COMMENTS (0 )
FILE LIST
Filename
Size
01.Transformations and Actions/0101.The Course Overview.mp4
7.6 MB
01.Transformations and Actions/0102.Using Spark Transformations to Defer Computations to a Later Time.mp4
17.3 MB
01.Transformations and Actions/0103.Avoiding Transformations.mp4
14 MB
01.Transformations and Actions/0104.Using reduce and reduceByKey to Calculate Results.mp4
18.4 MB
01.Transformations and Actions/0105.Performing Actions That Trigger Computations.mp4
16.9 MB
01.Transformations and Actions/0106.Reusing the Same RDD for Different Actions.mp4
13.4 MB
02.Immutable Design/0201.Delve into Spark RDDs ParentChild Chain.mp4
22.3 MB
02.Immutable Design/0202.Using RDD in an Immutable Way.mp4
12.1 MB
02.Immutable Design/0203.Using DataFrame Operations to Transform It.mp4
13.6 MB
02.Immutable Design/0204.Immutability in the Highly Concurrent Environment.mp4
16.9 MB
02.Immutable Design/0205.Using Dataset API in an Immutable Way.mp4
11.8 MB
03.Avoid Shuffle and Reduce Operational Expenses/0301.Detecting a Shuffle in a Processing.mp4
16.8 MB
03.Avoid Shuffle and Reduce Operational Expenses/0302.Testing Operations That Cause Shuffle in Apache Spark.mp4
15 MB
03.Avoid Shuffle and Reduce Operational Expenses/0303.Changing Design of Jobs with Wide Dependencies.mp4
11.7 MB
03.Avoid Shuffle and Reduce Operational Expenses/0304.Using keyBy() Operations to Reduce Shuffle.mp4
15.3 MB
03.Avoid Shuffle and Reduce Operational Expenses/0305.Using Custom Partitioner to Reduce Shuffle.mp4
12.9 MB
04.Saving Data in the Correct Format/0401.Saving Data in Plain Text.mp4
20.8 MB
04.Saving Data in the Correct Format/0402.Leveraging JSON as a Data Format.mp4
15.3 MB
04.Saving Data in the Correct Format/0403.Tabular Formats – CSV.mp4
13.2 MB
04.Saving Data in the Correct Format/0404.Using Avro with Spark.mp4
15.6 MB
04.Saving Data in the Correct Format/0405.Columnar Formats – Parquet.mp4
13.6 MB
05.Working with Spark KeyValue API/0501.Available Transformations on KeyValue Pairs.mp4
18 MB
05.Working with Spark KeyValue API/0502.Using aggregateByKey Instead of groupBy().mp4
17.8 MB
05.Working with Spark KeyValue API/0503.Actions on KeyValue Pairs.mp4
13.2 MB
05.Working with Spark KeyValue API/0504.Available Partitioners on KeyValue Data.mp4
23.1 MB
05.Working with Spark KeyValue API/0505.Implementing Custom Partitioner.mp4
18.3 MB
06.Testing Apache Spark Jobs/0601.Separating Logic from Spark Engine – Unit Testing.mp4
17.4 MB
06.Testing Apache Spark Jobs/0602.Integration Testing Using SparkSession.mp4
12.5 MB
06.Testing Apache Spark Jobs/0603.Mocking Data Sources Using Partial Functions.mp4
15.4 MB
06.Testing Apache Spark Jobs/0604.Using ScalaCheck for Property-Based Testing.mp4
14.3 MB
06.Testing Apache Spark Jobs/0605.Testing in Different Versions of Spark.mp4
12.7 MB
07.Leveraging Spark GraphX API/0701.Creating Graph from Datasource.mp4