Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Sample Questions Answers

Questions 4

The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code

block to accomplish this.

spark.sql.shuffle.partitions

__1__.__2__.__3__(__4__, 100)

Options:

1. spark

2. conf

3. set

4. "spark.sql.shuffle.partitions"

1. pyspark

2. config

3. set

4. spark.shuffle.partitions

1. spark

2. conf

3. get

4. "spark.sql.shuffle.partitions"

1. pyspark

2. config

3. set

4. "spark.sql.shuffle.partitions"

1. spark

2. conf

3. set

4. "spark.sql.aggregate.partitions"

Buy Now

Questions 5

Which of the following describes a narrow transformation?

Options:

narrow transformation is an operation in which data is exchanged across partitions.

A narrow transformation is a process in which data from multiple RDDs is used.

A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.

A narrow transformation is an operation in which data is exchanged across the cluster.

A narrow transformation is an operation in which no data is exchanged across the cluster.

Buy Now

Questions 6

Which of the following statements about RDDs is incorrect?

Options:

An RDD consists of a single partition.

The high-level DataFrame API is built on top of the low-level RDD API.

RDDs are immutable.

RDD stands for Resilient Distributed Dataset.

RDDs are great for precisely instructing Spark on how to do a query.

Buy Now

Questions 7

Which of the following statements about Spark's DataFrames is incorrect?

Options:

Spark's DataFrames are immutable.

Spark's DataFrames are equal to Python's DataFrames.

Data in DataFrames is organized into named columns.

RDDs are at the core of DataFrames.

The data in DataFrames may be split into multiple chunks.

Buy Now

Questions 8

Which of the following describes a difference between Spark's cluster and client execution modes?

Options:

In cluster mode, the cluster manager resides on a worker node, while it resides on an edge node in client mode.

In cluster mode, executor processes run on worker nodes, while they run on gateway nodes in client mode.

In cluster mode, the driver resides on a worker node, while it resides on an edge node in client mode.

In cluster mode, a gateway machine hosts the driver, while it is co-located with the executor in client mode.

In cluster mode, the Spark driver is not co-located with the cluster manager, while it is co-located in client mode.

Buy Now

Questions 9

Which of the following DataFrame methods is classified as a transformation?

Options:

DataFrame.count()

DataFrame.show()

DataFrame.select()

DataFrame.foreach()

DataFrame.first()

Buy Now

Questions 10

Which of the following statements about broadcast variables is correct?

Options:

Broadcast variables are serialized with every single task.

Broadcast variables are commonly used for tables that do not fit into memory.

Broadcast variables are immutable.

Broadcast variables are occasionally dynamically updated on a per-task basis.

Broadcast variables are local to the worker node and not shared across the cluster.

Buy Now

Questions 11

The code block displayed below contains an error. The code block should write DataFrame transactionsDf as a parquet file to location filePath after partitioning it on column storeId. Find the error.

Code block:

transactionsDf.write.partitionOn("storeId").parquet(filePath)

Options:

The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.

The partitionOn method should be called before the write method.

The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.

Column storeId should be wrapped in a col() operator.

No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.

Buy Now

Questions 12

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

itemsDf.join(transactionsDf, itemId == transactionId)

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Buy Now

Questions 13

Which of the following statements about lazy evaluation is incorrect?

Options:

Predicate pushdown is a feature resulting from lazy evaluation.

Execution is triggered by transformations.

Spark will fail a job only during execution, but not during definition.

Accumulators do not change the lazy evaluation model of Spark.

Lineages allow Spark to coalesce transformations into stages

Buy Now

Questions 14

Which of the following code blocks prints out in how many rows the expression Inc. appears in the string-type column supplier of DataFrame itemsDf?

Options:

1.counter = 0

3.for index, row in itemsDf.iterrows():

4. if 'Inc.' in row['supplier']:

5. counter = counter + 1

7.print(counter)

1.counter = 0

3.def count(x):

4. if 'Inc.' in x['supplier']:

5. counter = counter + 1

7.itemsDf.foreach(count)

8.print(counter)

print(itemsDf.foreach(lambda x: 'Inc.' in x))

print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

1.accum=sc.accumulator(0)

3.def check_if_inc_in_supplier(row):

4. if 'Inc.' in row['supplier']:

5. accum.add(1)

7.itemsDf.foreach(check_if_inc_in_supplier)

8.print(accum.value)

Buy Now

Questions 15

Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from

DataFrame transactionsDf and column attributes from DataFrame itemsDf?

Options:

transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

4.spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes")

transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"), "transactionsDf.productId==itemsDf.itemId")

1.transactionsDf \

2. .drop(col('value'), col('storeId')) \

3. .join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

4.statement = """

5.SELECT * FROM transactionsDf

6.INNER JOIN itemsDf

7.ON transactionsDf.productId==itemsDf.itemId

8."""

9.spark.sql(statement).drop("value", "storeId", "attributes")

Buy Now

Answer:

Explanation:

Explanation

This QUESTION NO: offers you a wide variety of answers for a seemingly simple question. However, this variety reflects the variety of ways that one can express a join in PySpark. You need to

understand

some SQL syntax to get to the correct answer here.

transactionsDf.createOrReplaceTempView('transactionsDf')

itemsDf.createOrReplaceTempView('itemsDf')

statement = """

SELECT * FROM transactionsDf

INNER JOIN itemsDf

ON transactionsDf.productId==itemsDf.itemId

"""

spark.sql(statement).drop("value", "storeId", "attributes")

Correct - this answer uses SQL correctly to perform the inner join and afterwards drops the unwanted columns. This is totally fine. If you are unfamiliar with the triple-quote """ in Python: This allows

you to express strings as multiple lines.

transactionsDf \

drop(col('value'), col('storeId')) \

join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

No, this answer option is a trap, since DataFrame.drop() does not accept a list of Column objects. You could use transactionsDf.drop('value', 'storeId') instead.

transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"), "transactionsDf.productId==itemsDf.itemId")

Incorrect - Spark does not evaluate "transactionsDf.productId==itemsDf.itemId" as a valid join expression. This would work if it would not be a string.

transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

Wrong, this statement incorrectly uses itemsDf.select instead of itemsDf.drop.

transactionsDf.createOrReplaceTempView('transactionsDf')

itemsDf.createOrReplaceTempView('itemsDf')

spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes")

No, here the SQL expression syntax is incorrect. Simply specifying -columnName does not drop a column.

More info: pyspark.sql.DataFrame.join — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 25 (Databricks import instructions)

Questions 16

Which of the following code blocks displays various aggregated statistics of all columns in DataFrame transactionsDf, including the standard deviation and minimum of values in each column?

Options:

transactionsDf.summary()

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min")

transactionsDf.summary("count", "mean", "stddev", "25%", "50%", "75%", "max").show()

transactionsDf.agg("count", "mean", "stddev", "25%", "50%", "75%", "min").show()

transactionsDf.summary().show()

Buy Now

Questions 17

Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?

Options:

Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.

Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.

Use a narrow transformation to reduce the number of partitions.

Use a wide transformation to reduce the number of partitions.

Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.

Buy Now

Questions 18

Which of the following code blocks produces the following output, given DataFrame transactionsDf?

Output:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

transactionsDf.schema.print()

transactionsDf.rdd.printSchema()

transactionsDf.rdd.formatSchema()

transactionsDf.printSchema()

print(transactionsDf.schema)

Buy Now

Questions 19

Which of the following statements about Spark's configuration properties is incorrect?

Options:

The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.

The default value for spark.sql.autoBroadcastJoinThreshold is 10MB.

The default number of partitions to use when shuffling data for joins or aggregations is 300.

The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.

Buy Now

Questions 20

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

Options:

1. save

2. mode

3. "ignore"

4. "compression"

5. path

1. store

2. with

3. "replacement"

4. "compression"

5. path

1. write

2. mode

3. "overwrite"

4. "compression"

5. save

(Correct)

1. save

2. mode

3. "replace"

4. "compression"

5. path

1. write

2. mode

3. "overwrite"

4. compression

5. parquet

Buy Now

Questions 21

Which of the following code blocks writes DataFrame itemsDf to disk at storage location filePath, making sure to substitute any existing data at that location?

Options:

itemsDf.write.mode("overwrite").parquet(filePath)

itemsDf.write.option("parquet").mode("overwrite").path(filePath)

itemsDf.write(filePath, mode="overwrite")

itemsDf.write.mode("overwrite").path(filePath)

itemsDf.write().parquet(filePath, mode="overwrite")

Buy Now

Questions 22

The code block displayed below contains an error. The code block should arrange the rows of DataFrame transactionsDf using information from two columns in an ordered fashion, arranging first by

column value, showing smaller numbers at the top and greater numbers at the bottom, and then by column predError, for which all values should be arranged in the inverse way of the order of items

in column value. Find the error.

Code block:

transactionsDf.orderBy('value', asc_nulls_first(col('predError')))

Options:

Two orderBy statements with calls to the individual columns should be chained, instead of having both columns in one orderBy statement.

Column value should be wrapped by the col() operator.

Column predError should be sorted in a descending way, putting nulls last.

Column predError should be sorted by desc_nulls_first() instead.

Instead of orderBy, sort should be used.

Buy Now

Questions 23

Which of the following describes characteristics of the Spark UI?

Options:

Via the Spark UI, workloads can be manually distributed across executors.

Via the Spark UI, stage execution speed can be modified.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

There is a place in the Spark UI that shows the property spark.executor.memory.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Buy Now

Questions 24

The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')

Options:

1. withColumn

2. 'associateId'

3. 5

4. remove

5. 'productId'

1. withNewColumn

2. associateId

3. lit(5)

4. drop

5. productId

1. withColumn

2. 'associateId'

3. lit(5)

4. drop

5. 'productId'

1. withColumnRenamed

2. 'associateId'

3. 5

4. drop

5. 'productId'

1. withColumn

2. col(associateId)

3. lit(5)

4. drop

5. col(productId)

Buy Now

Questions 25

Which of the following code blocks returns a new DataFrame in which column attributes of DataFrame itemsDf is renamed to feature0 and column supplier to feature1?

Options:

itemsDf.withColumnRenamed(attributes, feature0).withColumnRenamed(supplier, feature1)

1.itemsDf.withColumnRenamed("attributes", "feature0")

2.itemsDf.withColumnRenamed("supplier", "feature1")

itemsDf.withColumnRenamed(col("attributes"), col("feature0"), col("supplier"), col("feature1"))

itemsDf.withColumnRenamed("attributes", "feature0").withColumnRenamed("supplier", "feature1")

itemsDf.withColumn("attributes", "feature0").withColumn("supplier", "feature1")

Buy Now

Questions 26

The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

Options:

transactionsDf.remove(transactionsDf.storeId==25)

transactionsDf.where(transactionsDf.storeId!=25)

transactionsDf.filter(transactionsDf.storeId==25)

transactionsDf.drop(transactionsDf.storeId==25)

transactionsDf.select(transactionsDf.storeId!=25)

Buy Now

Questions 27

The code block displayed below contains an error. The code block should configure Spark so that DataFrames up to a size of 20 MB will be broadcast to all worker nodes when performing a join.

Find the error.

Code block:

Options:

spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 20)

Spark will only broadcast DataFrames that are much smaller than the default value.

The correct option to write configurations is through spark.config and not spark.conf.

Spark will only apply the limit to threshold joins and not to other joins.

The passed limit has the wrong variable type.

The command is evaluated lazily and needs to be followed by an action.

Buy Now

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update: Apr 26, 2024

Questions: 180

PDF + Testing Engine

$64 ~~$159.99~~

Testing Engine (only)

$48 ~~$119.99~~

PDF (only)

$40 ~~$99.99~~

buy now Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Labour Day Sale - Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 575363r9

dumpspedia logo

Navigation:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Sample Questions Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Quick Links

Why Us

Site Secure