Summer Special Sale - Limited Time 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 575363r9

Welcome To DumpsPedia

MLS-C01 Sample Questions Answers

Questions 4

A data engineer at a bank is evaluating a new tabular dataset that includes customer data. The data engineer will use the customer data to create a new model to predict customer behavior. After creating a correlation matrix for the variables, the data engineer notices that many of the 100 features are highly correlated with each other.

Which steps should the data engineer take to address this issue? (Choose two.)

Options:

A.

Use a linear-based algorithm to train the model.

B.

Apply principal component analysis (PCA).

C.

Remove a portion of highly correlated features from the dataset.

D.

Apply min-max feature scaling to the dataset.

E.

Apply one-hot encoding category-based variables.

Buy Now
Questions 5

A Data Scientist needs to migrate an existing on-premises ETL process to the cloud The current process runs at regular time intervals and uses PySpark to combine and format multiple large data sources into a single consolidated output for downstream processing

The Data Scientist has been given the following requirements for the cloud solution

* Combine multiple data sources

* Reuse existing PySpark logic

* Run the solution on the existing schedule

* Minimize the number of servers that will need to be managed

Which architecture should the Data Scientist use to build this solution?

Options:

A.

Write the raw data to Amazon S3 Schedule an AWS Lambda function to submit a Spark step to a persistent Amazon EMR cluster based on the existing schedule Use the existing PySpark logic to run the ETL job on the EMR cluster Output the results to a "processed" location m Amazon S3 that is accessible tor downstream use

B.

Write the raw data to Amazon S3 Create an AWS Glue ETL job to perform the ETL processing against the input data Write the ETL job in PySpark to leverage the existing logic Create a new AWS Glue trigger to trigger the ETL job based on the existing schedule Configure the output target of the ETL job to write to a "processed" location in Amazon S3 that is accessible for downstream use.

C.

Write the raw data to Amazon S3 Schedule an AWS Lambda function to run on the existing schedule and process the input data from Amazon S3 Write the Lambda logic in Python and implement the existing PySpartc logic to perform the ETL process Have the Lambda function output the results to a "processed" location in Amazon S3 that is accessible for downstream use

D.

Use Amazon Kinesis Data Analytics to stream the input data and perform realtime SQL queries against the stream to carry out the required transformations within the stream Deliver the output results to a "processed" location in Amazon S3 that is accessible for downstream use

Buy Now
Questions 6

A Machine Learning Specialist is developing a custom video recommendation model for an application The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance.

Which approach allows the Specialist to use all the data to train the model?

Options:

A.

Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training

code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the

full dataset from the S3 bucket using Pipe input mode.

B.

Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the

instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to

Amazon SageMaker and train using the full dataset

C.

Use AWS Glue to train a model using a small subset of the data to confirm that the data will be compatible

with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using

Pipe input mode.

D.

Load a smaller subset of the data into the SageMaker notebook and train locally. Confirm that the training

code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an

AWS Deep Learning AMI and attach the S3 bucket to train the full dataset.

Buy Now
Questions 7

A company is building a line-counting application for use in a quick-service restaurant. The company wants to use video cameras pointed at the line of customers at a given register to measure how many people are in line and deliver notifications to managers if the line grows too long. The restaurant locations have limited bandwidth for connections to external services and cannot accommodate multiple video streams without impacting other operations.

Which solution should a machine learning specialist implement to meet these requirements?

Options:

A.

Install cameras compatible with Amazon Kinesis Video Streams to stream the data to AWS over the restaurant's existing internet connection. Write an AWS Lambda function to take an image and send it to Amazon Rekognition to count the number of faces in the image. Send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.

B.

Deploy AWS DeepLens cameras in the restaurant to capture video. Enable Amazon Rekognition on the AWS DeepLens device, and use it to trigger a local AWS Lambda function when a person is recognized. Use the Lambda function to send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.

C.

Build a custom model in Amazon SageMaker to recognize the number of people in an image. Install cameras compatible with Amazon Kinesis Video Streams in the restaurant. Write an AWS Lambda function to take an image. Use the SageMaker endpoint to call the model to count people. Send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.

D.

Build a custom model in Amazon SageMaker to recognize the number of people in an image. Deploy AWS DeepLens cameras in the restaurant. Deploy the model to the cameras. Deploy an AWS Lambda function to the cameras to use the model to count people and send an Amazon Simple Notification Service (Amazon SNS) notification if the line is too long.

Buy Now
Questions 8

A data scientist is training a large PyTorch model by using Amazon SageMaker. It takes 10 hours on average to train the model on GPU instances. The data scientist suspects that training is not converging and that

resource utilization is not optimal.

What should the data scientist do to identify and address training issues with the LEAST development effort?

Options:

A.

Use CPU utilization metrics that are captured in Amazon CloudWatch. Configure a CloudWatch alarm to stop the training job early if low CPU utilization occurs.

B.

Use high-resolution custom metrics that are captured in Amazon CloudWatch. Configure an AWS Lambda function to analyze the metrics and to stop the training job early if issues are detected.

C.

Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.

D.

Use the SageMaker Debugger confusion and feature_importance_overweight built-in rules to detect issues and to launch the StopTrainingJob action if issues are detected.

Buy Now
Questions 9

A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date, location, and notes, as well as some custom entities such as receipt numbers.

The company is using optical character recognition (OCR) to extract text for data labeling. However, documents are in different structures and formats, and the company is facing challenges with setting up the manual workflows for each document type. Additionally, the company trained a named entity recognition (NER) model for custom entity detection using a small sample size. This model has a very low confidence score and will require retraining with a large dataset.

Which solution for text extraction and entity detection will require the LEAST amount of effort?

Options:

A.

Extract text from receipt images by using Amazon Textract. Use the Amazon SageMaker BlazingText algorithm to train on the text for entities and custom entities.

B.

Extract text from receipt images by using a deep learning OCR model from the AWS Marketplace. Use the NER deep learning model to extract entities.

C.

Extract text from receipt images by using Amazon Textract. Use Amazon Comprehend for entity detection, and use Amazon Comprehend custom entity recognition for custom entity detection.

D.

Extract text from receipt images by using a deep learning OCR model from the AWS Marketplace. Use Amazon Comprehend for entity detection, and use Amazon Comprehend custom entity recognition for custom entity detection.

Buy Now
Questions 10

A machine learning (ML) specialist wants to secure calls to the Amazon SageMaker Service API. The specialist has configured Amazon VPC with a VPC interface endpoint for the Amazon SageMaker Service API and is attempting to secure traffic from specific sets of instances and IAM users. The VPC is configured with a single public subnet.

Which combination of steps should the ML specialist take to secure the traffic? (Choose two.)

Options:

A.

Add a VPC endpoint policy to allow access to the IAM users.

B.

Modify the users' IAM policy to allow access to Amazon SageMaker Service API calls only.

C.

Modify the security group on the endpoint network interface to restrict access to the instances.

D.

Modify the ACL on the endpoint network interface to restrict access to the instances.

E.

Add a SageMaker Runtime VPC endpoint interface to the VPC.

Buy Now
Questions 11

A Machine Learning Specialist is building a supervised model that will evaluate customers' satisfaction with their mobile phone service based on recent usage The model's output should infer whether or not a customer is likely to switch to a competitor in the next 30 days

Which of the following modeling techniques should the Specialist use1?

Options:

A.

Time-series prediction

B.

Anomaly detection

C.

Binary classification

D.

Regression

Buy Now
Questions 12

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is poor and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency of words in the dataset

Which tool should be used to improve the validation accuracy?

Options:

A.

Amazon Comprehend syntax analysts and entity detection

B.

Amazon SageMaker BlazingText allow mode

C.

Natural Language Toolkit (NLTK) stemming and stop word removal

D.

Scikit-learn term frequency-inverse document frequency (TF-IDF) vectorizers

Buy Now
Questions 13

A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million

observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices

that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also show a right skew, with fewer older

individuals participating in the workforce.

Which feature transformations can the Data Scientist apply to fix the incorrectly skewed data? (Choose two.)

Options:

A.

Cross-validation

B.

Numerical value binning

C.

High-degree polynomial transformation

D.

Logarithmic transformation

E.

One hot encoding

Buy Now
Questions 14

A manufacturing company uses machine learning (ML) models to detect quality issues. The models use images that are taken of the company's product at the end of each production step. The company has thousands of machines at the production site that generate one image per second on average.

The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists used an industrial PC that ran AWS IoT Greengrass with a long-running AWS Lambda function that uploaded the images to Amazon S3. The uploaded images invoked a Lambda function that was written in Python to perform inference by using an Amazon SageMaker endpoint that ran a custom model. The inference results were forwarded back to a web service that was hosted at the production site to prevent faulty products from being shipped.

The company scaled the solution out to all manufacturing machines by installing similarly configured industrial PCs on each production machine. However, latency for predictions increased beyond acceptable limits. Analysis shows that the internet connection is at its capacity limit.

How can the company resolve this issue MOST cost-effectively?

Options:

A.

Set up a 10 Gbps AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images. Increase the size of the instances and the number of instances that are used by the SageMaker endpoint.

B.

Extend the long-running Lambda function that runs on AWS IoT Greengrass to compress the images and upload the compressed files to Amazon S3. Decompress the files by using a separate Lambda function that invokes the existing Lambda function to run the inference pipeline.

C.

Use auto scaling for SageMaker. Set up an AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images.

D.

Deploy the Lambda function and the ML models onto the AWS IoT Greengrass core that is running on the industrial PCs that are installed on each machine. Extend the long-running Lambda function that runs on AWS IoT Greengrass to invoke the Lambda function with the captured images and run the inference on the edge component that forwards the results directly to the web service.

Buy Now
Questions 15

A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages.

The Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions The ML spoctafst must ensure that the data does not contain outliers before training the ..el.

What can the ML specialist meet these requirements with the LEAST operational overhead?

Options:

A.

Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and third quartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles.

B.

Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Use a Data Wrangler data flow to remove outliers based on the bias report.

C.

Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.

D.

Use Amazon Lookout for Equipment to find and remove outliers from the dataset.

Buy Now
Questions 16

A machine learning specialist needs to analyze comments on a news website with users across the globe. The specialist must find the most discussed topics in the comments that are in either English or Spanish.

What steps could be used to accomplish this task? (Choose two.)

Options:

A.

Use an Amazon SageMaker BlazingText algorithm to find the topics independently from language. Proceed with the analysis.

B.

Use an Amazon SageMaker seq2seq algorithm to translate from Spanish to English, if necessary. Use a SageMaker Latent Dirichlet Allocation (LDA) algorithm to find the topics.

C.

Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Comprehend topic modeling to find the topics.

D.

Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon Lex to extract topics form the content.

E.

Use Amazon Translate to translate from Spanish to English, if necessary. Use Amazon SageMaker Neural Topic Model (NTM) to find the topics.

Buy Now
Questions 17

A company that manufactures mobile devices wants to determine and calibrate the appropriate sales price for its devices. The company is collecting the relevant data and is determining data features that it can use to train machine learning (ML) models. There are more than 1,000 features, and the company wants to determine the primary features that contribute to the sales price.

Which techniques should the company use for feature selection? (Choose three.)

Options:

A.

Data scaling with standardization and normalization

B.

Correlation plot with heat maps

C.

Data binning

D.

Univariate selection

E.

Feature importance with a tree-based classifier

F.

Data augmentation

Buy Now
Questions 18

A Machine Learning Specialist works for a credit card processing company and needs to predict which

transactions may be fraudulent in near-real time. Specifically, the Specialist must train a model that returns the

probability that a given transaction may fraudulent.

How should the Specialist frame this business problem?

Options:

A.

Streaming classification

B.

Binary classification

C.

Multi-category classification

D.

Regression classification

Buy Now
Questions 19

When submitting Amazon SageMaker training jobs using one of the built-in algorithms, which common parameters MUST be specified? (Select THREE.)

Options:

A.

The training channel identifying the location of training data on an Amazon S3 bucket.

B.

The validation channel identifying the location of validation data on an Amazon S3 bucket.

C.

The 1AM role that Amazon SageMaker can assume to perform tasks on behalf of the users.

D.

Hyperparameters in a JSON array as documented for the algorithm used.

E.

The Amazon EC2 instance class specifying whether training will be run using CPU or GPU.

F.

The output path specifying where on an Amazon S3 bucket the trained model will persist.

Buy Now
Questions 20

A Machine Learning Specialist is working for an online retailer that wants to run analytics on every customer visit, processed through a machine learning pipeline. The data needs to be ingested by Amazon Kinesis Data Streams at up to 100 transactions per second, and the JSON data blob is 100 KB in size.

What is the MINIMUM number of shards in Kinesis Data Streams the Specialist should use to successfully ingest this data?

Options:

A.

1 shards

B.

10 shards

C.

100 shards

D.

1,000 shards

Buy Now
Questions 21

A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age.

Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population.

How should the Data Scientist correct this issue?

Options:

A.

Drop all records from the dataset where age has been set to 0.

B.

Replace the age field value for records with a value of 0 with the mean or median value from the dataset.

C.

Drop the age feature from the dataset and train the model using the rest of the features.

D.

Use k-means clustering to handle missing features.

Buy Now
Questions 22

A company is running an Amazon SageMaker training job that will access data stored in its Amazon S3 bucket A compliance policy requires that the data never be transmitted across the internet How should the company set up the job?

Options:

A.

Launch the notebook instances in a public subnet and access the data through the public S3 endpoint

B.

Launch the notebook instances in a private subnet and access the data through a NAT gateway

C.

Launch the notebook instances in a public subnet and access the data through a NAT gateway

D.

Launch the notebook instances in a private subnet and access the data through an S3 VPC endpoint.

Buy Now
Questions 23

A company wants to create an artificial intelligence (Al) yoga instructor that can lead large classes of students. The company needs to create a feature that can accurately count the number of students who are in a class. The company also needs a feature that can differentiate students who are performing a yoga stretch correctly from students who are performing a stretch incorrectly.

...etermine whether students are performing a stretch correctly, the solution needs to measure the location and angle of each student's arms and legs A data scientist must use Amazon SageMaker to ...ss video footage of a yoga class by extracting image frames and applying computer vision models.

Which combination of models will meet these requirements with the LEAST effort? (Select TWO.)

Options:

A.

Image Classification

B.

Optical Character Recognition (OCR)

C.

Object Detection

D.

Pose estimation

E.

Image Generative Adversarial Networks (GANs)

Buy Now
Questions 24

A bank's Machine Learning team is developing an approach for credit card fraud detection The company has a large dataset of historical data labeled as fraudulent The goal is to build a model to take the information from new transactions and predict whether each transaction is fraudulent or not

Which built-in Amazon SageMaker machine learning algorithm should be used for modeling this problem?

Options:

A.

Seq2seq

B.

XGBoost

C.

K-means

D.

Random Cut Forest (RCF)

Buy Now
Questions 25

A company that runs an online library is implementing a chatbot using Amazon Lex to provide book recommendations based on category. This intent is fulfilled by an AWS Lambda function that queries an Amazon DynamoDB table for a list of book titles, given a particular category. For testing, there are only three categories implemented as the custom slot types: "comedy," "adventure,” and "documentary.”

A machine learning (ML) specialist notices that sometimes the request cannot be fulfilled because Amazon Lex cannot understand the category spoken by users with utterances such as "funny," "fun," and "humor." The ML specialist needs to fix the problem without changing the Lambda code or data in DynamoDB.

How should the ML specialist fix the problem?

Options:

A.

Add the unrecognized words in the enumeration values list as new values in the slot type.

B.

Create a new custom slot type, add the unrecognized words to this slot type as enumeration values, and use this slot type for the slot.

C.

Use the AMAZON.SearchQuery built-in slot types for custom searches in the database.

D.

Add the unrecognized words as synonyms in the custom slot type.

Buy Now
Questions 26

A Machine Learning Specialist trained a regression model, but the first iteration needs optimizing. The Specialist needs to understand whether the model is more frequently overestimating or underestimating the target.

What option can the Specialist use to determine whether it is overestimating or underestimating the target value?

Options:

A.

Root Mean Square Error (RMSE)

B.

Residual plots

C.

Area under the curve

D.

Confusion matrix

Buy Now
Questions 27

A company wants to enhance audits for its machine learning (ML) systems. The auditing system must be able to perform metadata analysis on the features that the ML models use. The audit solution must generate a report that analyzes the metadata. The solution also must be able to set the data sensitivity and authorship of features.

Which solution will meet these requirements with the LEAST development effort?

Options:

A.

Use Amazon SageMaker Feature Store to select the features. Create a data flow to perform feature-level metadata analysis. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.

B.

Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use SageMaker Studio to analyze the metadata.

C.

Use Amazon SageMaker Features Store to apply custom algorithms to analyze the feature-level metadata that the company requires. Create an Amazon DynamoDB table to store feature-level metadata. Use Amazon QuickSight to analyze the metadata.

D.

Use Amazon SageMaker Feature Store to set feature groups for the current features that the ML models use. Assign the required metadata for each feature. Use Amazon QuickSight to analyze the metadata.

Buy Now
Questions 28

A web-based company wants to improve its conversion rate on its landing page Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker However there is an overfitting problem training data shows 90% accuracy in predictions, while test data shows 70% accuracy only

The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases

Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?

Options:

A.

Increase the randomization of training data in the mini-batches used in training.

B.

Allocate a higher proportion of the overall data to the training dataset

C.

Apply L1 or L2 regularization and dropouts to the training.

D.

Reduce the number of layers and units (or neurons) from the deep learning network.

Buy Now
Questions 29

A Machine Learning Specialist is required to build a supervised image-recognition model to identify a cat. The ML Specialist performs some tests and records the following results for a neural network-based image classifier:

Total number of images available = 1,000 Test set images = 100 (constant test set)

The ML Specialist notices that, in over 75% of the misclassified images, the cats were held upside down by their owners.

Which techniques can be used by the ML Specialist to improve this specific test error?

Options:

A.

Increase the training data by adding variation in rotation for training images.

B.

Increase the number of epochs for model training.

C.

Increase the number of layers for the neural network.

D.

Increase the dropout rate for the second-to-last layer.

Buy Now
Questions 30

A company has video feeds and images of a subway train station. The company wants to create a deep learning model that will alert the station manager if any passenger crosses the yellow safety line when there is no train in the station. The alert will be based on the video feeds. The company wants the model to detect the yellow line, the passengers who cross the yellow line, and the trains in the video feeds. This task requires labeling. The video data must remain confidential.

A data scientist creates a bounding box to label the sample data and uses an object detection model. However, the object detection model cannot clearly demarcate the yellow line, the passengers who cross the yellow line, and the trains.

Which labeling approach will help the company improve this model?

Options:

A.

Use Amazon Rekognition Custom Labels to label the dataset and create a custom Amazon Rekognition object detection model. Create a private workforce. Use Amazon Augmented AI (Amazon A2I) to review the low-confidence predictions and retrain the custom Amazon Rekognition model.

B.

Use an Amazon SageMaker Ground Truth object detection labeling task. Use Amazon Mechanical Turk as the labeling workforce.

C.

Use Amazon Rekognition Custom Labels to label the dataset and create a custom Amazon Rekognition object detection model. Create a workforce with a third-party AWS Marketplace vendor. Use Amazon Augmented AI (Amazon A2I) to review the low-confidence predictions and retrain the custom Amazon Rekognition model.

D.

Use an Amazon SageMaker Ground Truth semantic segmentation labeling task. Use a private workforce as the labeling workforce.

Buy Now
Questions 31

A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of a ResNet architecture.

Which of the following will accomplish this? (Select TWO.)

Options:

A.

Customize the built-in image classification algorithm to use Inception and use this for model training.

B.

Create a support case with the SageMaker team to change the default image classification algorithm to Inception.

C.

Bundle a Docker container with TensorFlow Estimator loaded with an Inception network and use this for model training.

D.

Use custom code in Amazon SageMaker with TensorFlow Estimator to load the model with an Inception network and use this for model training.

E.

Download and apt-get install the inception network code into an Amazon EC2 instance and use this instance as a Jupyter notebook in Amazon SageMaker.

Buy Now
Questions 32

An ecommerce company sends a weekly email newsletter to all of its customers. Management has hired a team of writers to create additional targeted content. A data scientist needs to identify five customer segments based on age, income, and location. The customers’ current segmentation is unknown. The data scientist previously built an XGBoost model to predict the likelihood of a customer responding to an email based on age, income, and location.

Why does the XGBoost model NOT meet the current requirements, and how can this be fixed?

Options:

A.

The XGBoost model provides a true/false binary output. Apply principal component analysis (PCA) with five feature dimensions to predict a segment.

B.

The XGBoost model provides a true/false binary output. Increase the number of classes the XGBoost model predicts to five classes to predict a segment.

C.

The XGBoost model is a supervised machine learning algorithm. Train a k-Nearest-Neighbors (kNN) model with K = 5 on the same dataset to predict a segment.

D.

The XGBoost model is a supervised machine learning algorithm. Train a k-means model with K = 5 on the same dataset to predict a segment.

Buy Now
Questions 33

A data engineer needs to provide a team of data scientists with the appropriate dataset to run machine learning training jobs. The data will be stored in Amazon S3. The data engineer is obtaining the data from an Amazon Redshift database and is using join queries to extract a single tabular dataset. A portion of the schema is as follows:

...traction Timestamp (Timeslamp)

...JName(Varchar)

...JNo (Varchar)

Th data engineer must provide the data so that any row with a CardNo value of NULL is removed. Also, the TransactionTimestamp column must be separated into a TransactionDate column and a isactionTime column Finally, the CardName column must be renamed to NameOnCard.

The data will be extracted on a monthly basis and will be loaded into an S3 bucket. The solution must minimize the effort that is needed to set up infrastructure for the ingestion and transformation. The solution must be automated and must minimize the load on the Amazon Redshift cluster

Which solution meets these requirements?

Options:

A.

Set up an Amazon EMR cluster Create an Apache Spark job to read the data from the Amazon Redshift cluster and transform the data. Load the data into the S3 bucket. Schedule the job to run monthly.

B.

Set up an Amazon EC2 instance with a SQL client tool, such as SQL Workbench/J. to query the data from the Amazon Redshift cluster directly. Export the resulting dataset into a We. Upload the file into the S3 bucket. Perform these tasks monthly.

C.

Set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3 bucket as the destination Use the built-in transforms Filter, Map. and RenameField to perform the required transformations. Schedule the job to run monthly.

D.

Use Amazon Redshift Spectrum to run a query that writes the data directly to the S3 bucket. Create an AWS Lambda function to run the query monthly

Buy Now
Questions 34

A medical imaging company wants to train a computer vision model to detect areas of concern on patients' CT scans. The company has a large collection of unlabeled CT scans that are linked to each patient and stored in an Amazon S3 bucket. The scans must be accessible to authorized users only. A machine learning engineer needs to build a labeling pipeline.

Which set of steps should the engineer take to build the labeling pipeline with the LEAST effort?

Options:

A.

Create a workforce with AWS Identity and Access Management (IAM). Build a labeling tool on Amazon EC2 Queue images for labeling by using Amazon Simple Queue Service (Amazon SQS). Write the labeling instructions.

B.

Create an Amazon Mechanical Turk workforce and manifest file. Create a labeling job by using the built-in image classification task type in Amazon SageMaker Ground Truth. Write the labeling instructions.

C.

Create a private workforce and manifest file. Create a labeling job by using the built-in bounding box task type in Amazon SageMaker Ground Truth. Write the labeling instructions.

D.

Create a workforce with Amazon Cognito. Build a labeling web application with AWS Amplify. Build a labeling workflow backend using AWS Lambda. Write the labeling instructions.

Buy Now
Questions 35

A Data Scientist is developing a binary classifier to predict whether a patient has a particular disease on a series of test results. The Data Scientist has data on 400 patients randomly selected from the population. The disease is seen in 3% of the population.

Which cross-validation strategy should the Data Scientist adopt?

Options:

A.

A k-fold cross-validation strategy with k=5

B.

A stratified k-fold cross-validation strategy with k=5

C.

A k-fold cross-validation strategy with k=5 and 3 repeats

D.

An 80/20 stratified split between training and validation

Buy Now
Questions 36

A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical

features. The Marketing team has not provided any insight about which features are relevant for churn

prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on

the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide

gap between the training and validation set accuracy.

Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team’s

needs? (Choose two.)

Options:

A.

Add L1 regularization to the classifier

B.

Add features to the dataset

C.

Perform recursive feature elimination

D.

Perform t-distributed stochastic neighbor embedding (t-SNE)

E.

Perform linear discriminant analysis

Buy Now
Questions 37

A manufacturing company wants to use machine learning (ML) to automate quality control in its facilities. The facilities are in remote locations and have limited internet connectivity. The company has 20 ТВ of training data that consists of labeled images of defective product parts. The training data is in the corporate on-premises data center.

The company will use this data to train a model for real-time defect detection in new parts as the parts move on a conveyor belt in the facilities. The company needs a solution that minimizes costs for compute infrastructure and that maximizes the scalability of resources for training. The solution also must facilitate the company’s use of an ML model in the low-connectivity environments.

Which solution will meet these requirements?

Options:

A.

Move the training data to an Amazon S3 bucket. Train and evaluate the model by using Amazon SageMaker. Optimize the model by using SageMaker Neo. Deploy the model on a SageMaker hosting services endpoint.

B.

Train and evaluate the model on premises. Upload the model to an Amazon S3 bucket. Deploy the model on an Amazon SageMaker hosting services endpoint.

C.

Move the training data to an Amazon S3 bucket. Train and evaluate the model by using Amazon SageMaker. Optimize the model by using SageMaker Neo. Set up an edge device in the manufacturing facilities with AWS IoT Greengrass. Deploy the model on the edge device.

D.

Train the model on premises. Upload the model to an Amazon S3 bucket. Set up an edge device in the manufacturing facilities with AWS IoT Greengrass. Deploy the model on the edge device.

Buy Now
Questions 38

A manufacturing company has a production line with sensors that collect hundreds of quality metrics. The company has stored sensor data and manual inspection resultsin a data lake for several months. To automate quality control, the machine learning team must build an automated mechanism that determines whether the produced goods are good quality, replacement market quality, or scrap quality based on the manual inspection results.

Which modeling approach will deliver the MOST accurate prediction of product quality?

Options:

A.

Amazon SageMaker DeepAR forecasting algorithm

B.

Amazon SageMaker XGBoost algorithm

C.

Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm

D.

A convolutional neural network (CNN) and ResNet

Buy Now
Questions 39

A data scientist uses an Amazon SageMaker notebook instance to conduct data exploration and analysis. This requires certain Python packages that are not natively available on Amazon SageMaker to be installed on the notebook instance.

How can a machine learning specialist ensure that required packages are automatically available on the notebook instance for the data scientist to use?

Options:

A.

Install AWS Systems Manager Agent on the underlying Amazon EC2 instance and use Systems Manager Automation to execute the package installation commands.

B.

Create a Jupyter notebook file (.ipynb) with cells containing the package installation commands to execute and place the file under the /etc/init directory of each Amazon SageMaker notebook instance.

C.

Use the conda package manager from within the Jupyter notebook console to apply the necessary conda packages to the default kernel of the notebook.

D.

Create an Amazon SageMaker lifecycle configuration with package installation commands and assign the lifecycle configuration to the notebook instance.

Buy Now
Questions 40

An ecommerce company wants to use machine learning (ML) to monitor fraudulent transactions on its website. The company is using Amazon SageMaker to research, train, deploy, and monitor the ML models.

The historical transactions data is in a .csv file that is stored in Amazon S3 The data contains features such as the user's IP address, navigation time, average time on each page, and the number of clicks for ....session. There is no label in the data to indicate if a transaction is anomalous.

Which models should the company use in combination to detect anomalous transactions? (Select TWO.)

Options:

A.

IP Insights

B.

K-nearest neighbors (k-NN)

C.

Linear learner with a logistic function

D.

Random Cut Forest (RCF)

E.

XGBoost

Buy Now
Questions 41

A company uses camera images of the tops of items displayed on store shelves to determine which items

were removed and which ones still remain. After several hours of data labeling, the company has a total of

1,000 hand-labeled images covering 10 distinct items. The training results were poor.

Which machine learning approach fulfills the company’s long-term needs?

Options:

A.

Convert the images to grayscale and retrain the model

B.

Reduce the number of distinct items from 10 to 2, build the model, and iterate

C.

Attach different colored labels to each item, take the images again, and build the model

D.

Augment training data for each item using image variants like inversions and translations, build the model, and iterate.

Buy Now
Questions 42

A data scientist is building a linear regression model. The scientist inspects the dataset and notices that the mode of the distribution is lower than the median, and the median is lower than the mean.

Which data transformation will give the data scientist the ability to apply a linear regression model?

Options:

A.

Exponential transformation

B.

Logarithmic transformation

C.

Polynomial transformation

D.

Sinusoidal transformation

Buy Now
Questions 43

A real-estate company is launching a new product that predicts the prices of new houses. The historical data for the properties and prices is stored in .csv format in an Amazon S3 bucket. The data has a header, some categorical fields, and some missing values. The company’s data scientists have used Python with a common open-source library to fill the missing values with zeros. The data scientists have dropped all of the categorical fields and have trained a model by using the open-source linear regression algorithm with the default parameters.

The accuracy of the predictions with the current model is below 50%. The company wants to improve the model performance and launch the new product as soon as possible.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

A.

Create a service-linked role for Amazon Elastic Container Service (Amazon ECS) with access to the S3 bucket. Create an ECS cluster that is based on an AWS Deep Learning Containers image. Write the code to perform the feature engineering. Train a logistic regression model for predicting the price, pointing to the bucket with the dataset. Wait for the training job to complete. Perform the inferences.

B.

Create an Amazon SageMaker notebook with a new IAM role that is associated with the notebook. Pull the dataset from the S3 bucket. Explore different combinations of feature engineering transformations, regression algorithms, and hyperparameters. Compare all theresults in the notebook, and deploy the most accurate configuration in an endpoint for predictions.

C.

Create an IAM role with access to Amazon S3, Amazon SageMaker, and AWS Lambda. Create a training job with the SageMaker built-in XGBoost model pointing to the bucket with the dataset. Specify the price as the target feature. Wait for the job to complete. Load the model artifact to a Lambda function for inference on prices of new houses.

D.

Create an IAM role for Amazon SageMaker with access to the S3 bucket. Create a SageMaker AutoML job with SageMaker Autopilot pointing to the bucket with the dataset. Specify the price as the target attribute. Wait for the job to complete. Deploy the best model for predictions.

Buy Now
Questions 44

A company wants to segment a large group of customers into subgroups based on shared characteristics. The company’s data scientist is planning to use the Amazon SageMaker built-in k-means clustering algorithm for this task. The data scientist needs to determine the optimal number of subgroups (k) to use.

Which data visualization approach will MOST accurately determine the optimal value of k?

Options:

A.

Calculate the principal component analysis (PCA) components. Run the k-means clustering algorithm for a range of k by using only the first two PCA components. For each value of k, create a scatter plot with a different color for each cluster. The optimal value of k is the value where the clusters start to look reasonably separated.

B.

Calculate the principal component analysis (PCA) components. Create a line plot of the number of components against the explained variance. The optimal value of k is the number of PCA components after which the curve starts decreasing in a linear fashion.

C.

Create a t-distributed stochastic neighbor embedding (t-SNE) plot for a range of perplexity values. The optimal value of k is the value of perplexity, where the clusters start to look reasonably separated.

D.

Run the k-means clustering algorithm for a range of k. For each value of k, calculate the sum of squared errors (SSE). Plot a line chart of the SSE for each value of k. The optimal value of k is the point after which the curve starts decreasing in a linear fashion.

Buy Now
Questions 45

A Machine Learning Specialist built an image classification deep learning model. However the Specialist ran into an overfitting problem in which the training and testing accuracies were 99% and 75%r respectively.

How should the Specialist address this issue and what is the reason behind it?

Options:

A.

The learning rate should be increased because the optimization process was trapped at a local minimum.

B.

The dropout rate at the flatten layer should be increased because the model is not generalized enough.

C.

The dimensionality of dense layer next to the flatten layer should be increased because the model is not complex enough.

D.

The epoch number should be increased because the optimization process was terminated before it reached the global minimum.

Buy Now
Questions 46

A retail company uses a machine learning (ML) model for daily sales forecasting. The company’s brand manager reports that the model has provided inaccurate results for the past 3 weeks.

At the end of each day, an AWS Glue job consolidates the input data that is used for the forecasting with the actual daily sales data and the predictions of the model. The AWS Glue job stores the data in Amazon S3. The company’s ML team is using an Amazon SageMaker Studio notebook to gain an understanding about the source of the model's inaccuracies.

What should the ML team do on the SageMaker Studio notebook to visualize the model's degradation MOST accurately?

Options:

A.

Create a histogram of the daily sales over the last 3 weeks. In addition, create a histogram of the daily sales from before that period.

B.

Create a histogram of the model errors over the last 3 weeks. In addition, create a histogram of the model errors from before that period.

C.

Create a line chart with the weekly mean absolute error (MAE) of the model.

D.

Create a scatter plot of daily sales versus model error for the last 3 weeks. In addition, create a scatter plot of daily sales versus model error from before that period.

Buy Now
Questions 47

A data scientist needs to identify fraudulent user accounts for a company's ecommerce platform. The company wants the ability to determine if a newly created account is associated with a previously known fraudulent user. The data scientist is using AWS Glue to cleanse the company's application logs during ingestion.

Which strategy will allow the data scientist to identify fraudulent accounts?

Options:

A.

Execute the built-in FindDuplicates Amazon Athena query.

B.

Create a FindMatches machine learning transform in AWS Glue.

C.

Create an AWS Glue crawler to infer duplicate accounts in the source data.

D.

Search for duplicate accounts in the AWS Glue Data Catalog.

Buy Now
Questions 48

A Machine Learning Specialist was given a dataset consisting of unlabeled data The Specialist must create a model that can help the team classify the data into different buckets What model should be used to complete this work?

Options:

A.

K-means clustering

B.

Random Cut Forest (RCF)

C.

XGBoost

D.

BlazingText

Buy Now
Questions 49

A company wants to create a data repository in the AWS Cloud for machine learning (ML) projects. The company wants to use AWS to perform complete ML lifecycles and wants to use Amazon S3 for the data storage. All of the company’s data currently resides on premises and is 40 ТВ in size.

The company wants a solution that can transfer and automatically update data between the on-premises object storage and Amazon S3. The solution must support encryption, scheduling, monitoring, and data integrity validation.

Which solution meets these requirements?

Options:

A.

Use the S3 sync command to compare the source S3 bucket and the destination S3 bucket. Determine which source files do not exist in the destination S3 bucket and which source files were modified.

B.

Use AWS Transfer for FTPS to transfer the files from the on-premises storage to Amazon S3.

C.

Use AWS DataSync to make an initial copy of the entire dataset. Schedule subsequent incremental transfers of changing data until the final cutover from on premises to AWS.

D.

Use S3 Batch Operations to pull data periodically from the on-premises storage. Enable S3 Versioning on the S3 bucket to protect against accidental overwrites.

Buy Now
Questions 50

A social media company wants to develop a machine learning (ML) model to detect Inappropriate or offensive content in images. The company has collected a large dataset of labeled images and plans to use the built-in Amazon SageMaker image classification algorithm to train the model. The company also intends to use SageMaker pipe mode to speed up the training.

...company splits the dataset into training, validation, and testing datasets. The company stores the training and validation images in folders that are named Training and Validation, respectively. The folder ...ain subfolders that correspond to the names of the dataset classes. The company resizes the images to the same sue and generates two input manifest files named training.1st and validation.1st, for the ..ing dataset and the validation dataset. respectively. Finally, the company creates two separate Amazon S3 buckets for uploads of the training dataset and the validation dataset.

...h additional data preparation steps should the company take before uploading the files to Amazon S3?

Options:

A.

Generate two Apache Parquet files, training.parquet and validation.parquet. by reading the images into a Pandas data frame and storing the data frame as a Parquet file. Upload the Parquet files to the training S3 bucket

B.

Compress the training and validation directories by using the Snappy compression library Upload the manifest and compressed files to the training S3 bucket

C.

Compress the training and validation directories by using the gzip compression library. Upload the manifest and compressed files to the training S3 bucket.

D.

Generate two RecordIO files, training rec and validation.rec. from the manifest files by using the im2rec Apache MXNet utility tool. Upload the RecordlO files to the training S3 bucket.

Buy Now
Questions 51

A financial services company wants to adopt Amazon SageMaker as its default data science environment. The company's data scientists run machine learning (ML) models on confidential financial data. The company is worried about data egress and wants an ML engineer to secure the environment.

Which mechanisms can the ML engineer use to control data egress from SageMaker? (Choose three.)

Options:

A.

Connect to SageMaker by using a VPC interface endpoint powered by AWS PrivateLink.

B.

Use SCPs to restrict access to SageMaker.

C.

Disable root access on the SageMaker notebook instances.

D.

Enable network isolation for training jobs and models.

E.

Restrict notebook presigned URLs to specific IPs used by the company.

F.

Protect data with encryption at rest and in transit. Use AWS Key Management Service (AWS KMS) to manage encryption keys.

Buy Now
Questions 52

An engraving company wants to automate its quality control process for plaques. The company performs the process before mailing each customized plaque to a customer. The company has created an Amazon S3 bucket that contains images of defects that should cause a plaque to be rejected. Low-confidence predictions must be sent to an internal team of reviewers who are using Amazon Augmented Al (Amazon A2I).

Which solution will meet these requirements?

Options:

A.

Use Amazon Textract for automatic processing. Use Amazon A2I with Amazon Mechanical Turk for manual review.

B.

Use Amazon Rekognition for automatic processing. Use Amazon A2I with a private workforce option for manual review.

C.

Use Amazon Transcribe for automatic processing. Use Amazon A2I with a private workforce option for manual review.

D.

Use AWS Panorama for automatic processing Use Amazon A2I with Amazon Mechanical Turk for manual review

Buy Now
Questions 53

A data scientist is using an Amazon SageMaker notebook instance and needs to securely access data stored in a specific Amazon S3 bucket.

How should the data scientist accomplish this?

Options:

A.

Add an S3 bucket policy allowing GetObject, PutObject, and ListBucket permissions to the Amazon SageMaker notebook ARN as principal.

B.

Encrypt the objects in the S3 bucket with a custom AWS Key Management Service (AWS KMS) key that only the notebook owner has access to.

C.

Attach the policy to the IAM role associated with the notebook that allows GetObject, PutObject, and ListBucket operations to the specific S3 bucket.

D.

Use a script in a lifecycle configuration to configure the AWS CLI on the instance with an access key ID and secret.

Buy Now
Questions 54

A monitoring service generates 1 TB of scale metrics record data every minute A Research team performs queries on this data using Amazon Athena The queries run slowly due to the large volume of data, and the team requires better performance

How should the records be stored in Amazon S3 to improve query performance?

Options:

A.

CSV files

B.

Parquet files

C.

Compressed JSON

D.

RecordIO

Buy Now
Questions 55

An office security agency conducted a successful pilot using 100 cameras installed at key locations within the main office. Images from the cameras were uploaded to Amazon S3 and tagged using Amazon Rekognition, and the results were stored in Amazon ES. The agency is now looking to expand the pilot into a full production system using thousands of video cameras in its office locations globally. The goal is to identify activities performed by non-employees in real time.

Which solution should the agency consider?

Options:

A.

Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique

Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Video and create

a stream processor to detect faces from a collection of known employees, and alert when non-employees

are detected.

B.

Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique

Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Image to detect

faces from a collection of known employees and alert when non-employees are detected.

C.

Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to

Amazon Kinesis Video Streams for each camera. On each stream, use Amazon Rekognition Video and

create a stream processor to detect faces from a collection on each stream, and alert when nonemployees

are detected.

D.

Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to

Amazon Kinesis Video Streams for each camera. On each stream, run an AWS Lambda function to

capture image fragments and then call Amazon Rekognition Image to detect faces from a collection of

known employees, and alert when non-employees are detected.

Buy Now
Questions 56

A Machine Learning Specialist is configuring Amazon SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To ensure the best operational performance, the Specialist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on the deployed SageMaker endpoints, and all errors that are generated when an endpoint is invoked.

Which services are integrated with Amazon SageMaker to track this information? (Select TWO.)

Options:

A.

AWS CloudTrail

B.

AWS Health

C.

AWS Trusted Advisor

D.

Amazon CloudWatch

E.

AWS Config

Buy Now
Questions 57

A data scientist is training a text classification model by using the Amazon SageMaker built-in BlazingText algorithm. There are 5 classes in the dataset, with 300 samples for category A, 292 samples for category B, 240 samples for category C, 258 samples for category D, and 310 samples for category E.

The data scientist shuffles the data and splits off 10% for testing. After training the model, the data scientist generates confusion matrices for the training and test sets.

What could the data scientist conclude form these results?

Options:

A.

Classes C and D are too similar.

B.

The dataset is too small for holdout cross-validation.

C.

The data distribution is skewed.

D.

The model is overfitting for classes B and E.

Buy Now
Questions 58

A Machine Learning Specialist is working with a media company to perform classification on popular articles from the company's website. The company is using random forests to classify how popular an article will be before it is published A sample of the data being used is below.

Given the dataset, the Specialist wants to convert the Day-Of_Week column to binary values.

What technique should be used to convert this column to binary values.

Options:

A.

Binarization

B.

One-hot encoding

C.

Tokenization

D.

Normalization transformation

Buy Now
Questions 59

A retail company stores 100 GB of daily transactional data in Amazon S3 at periodic intervals. The company wants to identify the schema of the transactional data. The company also wants to perform transformations on the transactional data that is in Amazon S3.

The company wants to use a machine learning (ML) approach to detect fraud in the transformed data.

Which combination of solutions will meet these requirements with the LEAST operational overhead? {Select THREE.)

Options:

A.

Use Amazon Athena to scan the data and identify the schema.

B.

Use AWS Glue crawlers to scan the data and identify the schema.

C.

Use Amazon Redshift to store procedures to perform data transformations

D.

Use AWS Glue workflows and AWS Glue jobs to perform data transformations.

E.

Use Amazon Redshift ML to train a model to detect fraud.

F.

Use Amazon Fraud Detector to train a model to detect fraud.

Buy Now
Questions 60

A machine learning (ML) specialist needs to extract embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive models. The text consists of curated sentences in English. Many sentences use similar words but in different contexts. There are questions and answers among the sentences, and the embedding space must differentiate between them.

Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.)

Options:

A.

Amazon SageMaker seq2seq algorithm

B.

Amazon SageMaker BlazingText algorithm in Skip-gram mode

C.

Amazon SageMaker Object2Vec algorithm

D.

Amazon SageMaker BlazingText algorithm in continuous bag-of-words (CBOW) mode

E.

Combination of the Amazon SageMaker BlazingText algorithm in Batch Skip-gram mode with a custom recurrent neural network (RNN)

Buy Now
Questions 61

A power company wants to forecast future energy consumption for its customers in residential properties and commercial business properties. Historical power consumption data for the last 10 years is available. A team of data scientists who performed the initial data analysis and feature selection will include the historical power consumption data and data such as weather, number of individuals on the property, and public holidays.

The data scientists are using Amazon Forecast to generate the forecasts.

Which algorithm in Forecast should the data scientists use to meet these requirements?

Options:

A.

Autoregressive Integrated Moving Average (AIRMA)

B.

Exponential Smoothing (ETS)

C.

Convolutional Neural Network - Quantile Regression (CNN-QR)

D.

Prophet

Buy Now
Questions 62

A company is running a machine learning prediction service that generates 100 TB of predictions every day A Machine Learning Specialist must generate a visualization of the daily precision-recall curve from the predictions, and forward a read-only version to the Business team.

Which solution requires the LEAST coding effort?

Options:

A.

Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3 Give the Business team read-only access to S3

B.

Generate daily precision-recall data in Amazon QuickSight, and publish the results in a dashboard shared with the Business team

C.

Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3 Visualize the arrays in Amazon QuickSight, and publish them in a dashboard shared with the Business team

D.

Generate daily precision-recall data in Amazon ES, and publish the results in a dashboard shared with the Business team.

Buy Now
Questions 63

An aircraft engine manufacturing company is measuring 200 performance metrics in a time-series. Engineers

want to detect critical manufacturing defects in near-real time during testing. All of the data needs to be stored

for offline analysis.

What approach would be the MOST effective to perform near-real time defect detection?

Options:

A.

Use AWS IoT Analytics for ingestion, storage, and further analysis. Use Jupyter notebooks from within

AWS IoT Analytics to carry out analysis for anomalies.

B.

Use Amazon S3 for ingestion, storage, and further analysis. Use an Amazon EMR cluster to carry out

Apache Spark ML k-means clustering to determine anomalies.

C.

Use Amazon S3 for ingestion, storage, and further analysis. Use the Amazon SageMaker Random Cut

Forest (RCF) algorithm to determine anomalies.

D.

Use Amazon Kinesis Data Firehose for ingestion and Amazon Kinesis Data Analytics Random Cut Forest

(RCF) to perform anomaly detection. Use Kinesis Data Firehose to store data in Amazon S3 for further

analysis.

Buy Now
Questions 64

A trucking company is collecting live image data from its fleet of trucks across the globe. The data is growing rapidly and approximately 100 GB of new data is generated every day. The company wants to explore machine learning uses cases while ensuring the data is only accessible to specific IAM users.

Which storage option provides the most processing flexibility and will allow access control with IAM?

Options:

A.

Use a database, such as Amazon DynamoDB, to store the images, and set the IAM policies to restrict access to only the desired IAM users.

B.

Use an Amazon S3-backed data lake to store the raw images, and set up the permissions using bucket policies.

C.

Setup up Amazon EMR with Hadoop Distributed File System (HDFS) to store the files, and restrict access to the EMR instances using IAM policies.

D.

Configure Amazon EFS with IAM policies to make the data available to Amazon EC2 instances owned by the IAM users.

Buy Now
Questions 65

A retail company wants to combine its customer orders with the product description data from its product catalog. The structure and format of the records in each dataset is different. A data analyst tried to use a spreadsheet to combine the datasets, but the effort resulted in duplicate records and records that were not properly combined. The company needs a solution that it can use to combine similar records from the two datasets and remove any duplicates.

Which solution will meet these requirements?

Options:

A.

Use an AWS Lambda function to process the data. Use two arrays to compare equal strings in the fields from the two datasets and remove any duplicates.

B.

Create AWS Glue crawlers for reading and populating the AWS Glue Data Catalog. Call the AWS Glue SearchTables API operation to perform a fuzzy-matching search on the two datasets, and cleanse the data accordingly.

C.

Create AWS Glue crawlers for reading and populating the AWS Glue Data Catalog. Use the FindMatches transform to cleanse the data.

D.

Create an AWS Lake Formation custom transform. Run a transformation for matching products from the Lake Formation console to cleanse the data automatically.

Buy Now
Questions 66

A Data Engineer needs to build a model using a dataset containing customer credit card information.

How can the Data Engineer ensure the data remains encrypted and the credit card information is secure?

Options:

A.

Use a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker

instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the credit card numbers.

B.

Use an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically

discard credit card numbers and insert fake credit card numbers.

C.

Use an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker

instance in a VPC. Use the SageMaker principal component analysis (PCA) algorithm to reduce the length

of the credit card numbers.

D.

Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue.

Buy Now
Questions 67

A Machine Learning Specialist is preparing data for training on Amazon SageMaker The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training

What should the Specialist do to optimize the data for training on SageMaker'?

Options:

A.

Use the SageMaker batch transform feature to transform the training data into a DataFrame

B.

Use AWS Glue to compress the data into the Apache Parquet format

C.

Transform the dataset into the Recordio protobuf format

D.

Use the SageMaker hyperparameter optimization feature to automatically optimize the data

Buy Now
Questions 68

A Machine Learning Specialist needs to move and transform data in preparation for training Some of the data needs to be processed in near-real time and other data can be moved hourly There are existing Amazon EMR MapReduce jobs to clean and feature engineering to perform on the data

Which of the following services can feed data to the MapReduce jobs? (Select TWO )

Options:

A.

AWSDMS

B.

Amazon Kinesis

C.

AWS Data Pipeline

D.

Amazon Athena

E.

Amazon ES

Buy Now
Questions 69

A company is launching a new product and needs to build a mechanism to monitor comments about the company and its new product on social media. The company needs to be able toevaluate the sentiment expressed in social media posts, and visualize trends and configure alarms based on various thresholds.

The company needs to implement this solution quickly, and wants to minimize the infrastructure and data science resources needed to evaluate the messages. The company already has a solution in place to collect posts and store them within an Amazon S3 bucket.

What services should the data science team use to deliver this solution?

Options:

A.

Train a model in Amazon SageMaker by using the BlazingText algorithm to detect sentiment in the corpus of social media posts. Expose an endpoint that can be called by AWS Lambda. Trigger a Lambda function when posts are added to the S3 bucket to invoke the endpoint and record the sentiment in an Amazon DynamoDB table and in a custom Amazon CloudWatch metric. Use CloudWatch alarms to notify analysts of trends.

B.

Train a model in Amazon SageMaker by using the semantic segmentation algorithm to model the semantic content in the corpus of social media posts. Expose an endpoint that can be called by AWS Lambda. Trigger a Lambda function when objects are added to the S3 bucket to invoke the endpoint and record the sentiment in an Amazon DynamoDB table. Schedule a second Lambda function to query recently added records and send an Amazon Simple Notificati

C.

Trigger an AWS Lambda function when social media posts are added to the S3 bucket. Call Amazon Comprehend for each post to capture the sentiment in the message and record the sentiment in an Amazon DynamoDB table. Schedule a second Lambda function to query recently added records and send an Amazon Simple Notification Service (Amazon SNS) notification to notify analysts of trends.

D.

Trigger an AWS Lambda function when social media posts are added to the S3 bucket. Call Amazon Comprehend for each post to capture the sentiment in the message and record the sentiment in a custom Amazon CloudWatch metric and in S3. Use CloudWatch alarms to notify analysts of trends.

Buy Now
Questions 70

A company’s data scientist has trained a new machine learning model that performs better on test data than the company’s existing model performs in the production environment. The data scientist wants to replace the existing model that runs on an Amazon SageMaker endpoint in the production environment. However, the company is concerned that the new model might not work well on the production environment data.

The data scientist needs to perform A/B testing in the production environment to evaluate whether the new model performs well on production environment data.

Which combination of steps must the data scientist take to perform the A/B testing? (Choose two.)

Options:

A.

Create a new endpoint configuration that includes a production variant for each of the two models.

B.

Create a new endpoint configuration that includes two target variants that point to different endpoints.

C.

Deploy the new model to the existing endpoint.

D.

Update the existing endpoint to activate the new model.

E.

Update the existing endpoint to use the new endpoint configuration.

Buy Now
Questions 71

A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3

The source systems send data in CSV format in real lime The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3

Which solution takes the LEAST effort to implement?

Options:

A.

Ingest .CSV data using Apache Kafka Streams on Amazon EC2 instances and use Kafka Connect S3 to

serialize data as Parquet

B.

Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Glue to convert data into Parquet.

C.

Ingest .CSV data using Apache Spark Structured Streaming in an Amazon EMR cluster and use Apache

Spark to convert data into Parquet.

D.

Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert

data into Parquet.

Buy Now
Questions 72

A medical device company is building a machine learning (ML) model to predict the likelihood of device recall based on customer data that the company collects from a plain text survey. One of the survey questions asks which medications the customer is taking. The data for this field contains the names of medications that customers enter manually. Customers misspell some of the medication names. The column that contains the medication name data gives a categorical feature with high cardinality but redundancy.

What is the MOST effective way to encode this categorical feature into a numeric feature?

Options:

A.

Spell check the column. Use Amazon SageMaker one-hot encoding on the column to transform a categorical feature to a numerical feature.

B.

Fix the spelling in the column by using char-RNN. Use Amazon SageMaker Data Wrangler one-hot encoding to transform a categorical feature to a numerical feature.

C.

Use Amazon SageMaker Data Wrangler similarity encoding on the column to create embeddings Of vectors Of real numbers.

D.

Use Amazon SageMaker Data Wrangler ordinal encoding on the column to encode categories into an integer between O and the total number Of categories in the column.

Buy Now
Questions 73

A machine learning specialist is running an Amazon SageMaker endpoint using the built-in object detection algorithm on a P3 instance for real-time predictions in a company's production application. When evaluating the model's resource utilization, the specialist notices that the model is using only a fraction of the GPU.

Which architecture changes would ensure that provisioned resources are being utilized effectively?

Options:

A.

Redeploy the model as a batch transform job on an M5 instance.

B.

Redeploy the model on an M5 instance. Attach Amazon Elastic Inference to the instance.

C.

Redeploy the model on a P3dn instance.

D.

Deploy the model onto an Amazon Elastic Container Service (Amazon ECS) cluster using a P3 instance.

Buy Now
Questions 74

A company's machine learning (ML) specialist is building a computer vision model to classify 10 different traffic signs. The company has stored 100 images of each class in Amazon S3, and the company has another 10.000unlabeled images. All the images come from dash cameras and are a size of 224 pixels * 224 pixels. After several training runs, the model is overfitting on the training data.

Which actions should the ML specialist take to address this problem? (Select TWO.)

Options:

A.

Use Amazon SageMaker Ground Truth to label the unlabeled images

B.

Use image preprocessing to transform the images into grayscale images.

C.

Use data augmentation to rotate and translate the labeled images.

D.

Replace the activation of the last layer with a sigmoid.

E.

Use the Amazon SageMaker k-nearest neighbors (k-NN) algorithm to label the unlabeled images.

Buy Now
Questions 75

A company wants to use automatic speech recognition (ASR) to transcribe messages that are less than 60 seconds long from a voicemail-style application. The company requires the correct identification of 200 unique product names, some of which have unique spellings or pronunciations.

The company has 4,000 words of Amazon SageMaker Ground Truth voicemail transcripts it can use to customize the chosen ASR model. The company needs to ensure that everyone can update their customizations multiple times each hour.

Which approach will maximize transcription accuracy during the development phase?

Options:

A.

Use a voice-driven Amazon Lex bot to perform the ASR customization. Create customer slots within the bot that specifically identify each of the required product names. Use the Amazon Lex synonym mechanism to provide additional variations of each product name as mis-transcriptions are identified in development.

B.

Use Amazon Transcribe to perform the ASR customization. Analyze the word confidence scores in the transcript, and automatically create or update a custom vocabulary file with any word that has a confidence score below an acceptable threshold value. Use this updated custom vocabulary file in all future transcription tasks.

C.

Create a custom vocabulary file containing each product name with phonetic pronunciations, and use it with Amazon Transcribe to perform the ASR customization. Analyze the transcripts and manually update the custom vocabulary file to include updated or additional entries for those names that are not being correctly identified.

D.

Use the audio transcripts to create a training dataset and build an Amazon Transcribe custom language model. Analyze the transcripts and update the training dataset with a manually corrected version of transcripts where product names are not being transcribed correctly. Create an updated custom language model.

Buy Now
Questions 76

A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant

Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test"?

Options:

A.

Review SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon OuickSight to visualize logs as they are being produced

B.

Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker

C.

Build custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the data as it is generated by Amazon SageMaker

D.

Send Amazon CloudWatch Logs that were generated by Amazon SageMaker lo Amazon ES and use Kibana to query and visualize the log data.

Buy Now
Questions 77

A Machine Learning Specialist is working with multiple data sources containing billions of records that need to be joined. What feature engineering and model development approach should the Specialist take with a dataset this large?

Options:

A.

Use an Amazon SageMaker notebook for both feature engineering and model development

B.

Use an Amazon SageMaker notebook for feature engineering and Amazon ML for model development

C.

Use Amazon EMR for feature engineering and Amazon SageMaker SDK for model development

D.

Use Amazon ML for both feature engineering and model development.

Buy Now
Questions 78

A machine learning (ML) specialist is using Amazon SageMaker hyperparameter optimization (HPO) to improve a model’s accuracy. The learning rate parameter is specified in the following HPO configuration:

During the results analysis, the ML specialist determines that most of the training jobs had a learning rate between 0.01 and 0.1. The best result had a learning rate of less than 0.01. Training jobs need to run regularly over a changing dataset. The ML specialist needs to find a tuning mechanism that uses different learning rates more evenly from the provided range between MinValue and MaxValue.

Which solution provides the MOST accurate result?

Options:

A.

Modify the HPO configuration as follows:

Select the most accurate hyperparameter configuration form this HPO job.

B.

Run three different HPO jobs that use different learning rates form the following intervals for MinValue and MaxValue while using the same number of training jobs for each HPO job:

[0.01, 0.1]

[0.001, 0.01]

[0.0001, 0.001]

Select the most accurate hyperparameter configuration form these three HPO jobs.

C.

Modify the HPO configuration as follows:

Select the most accurate hyperparameter configuration form this training job.

D.

Run three different HPO jobs that use different learning rates form the following intervals for MinValue and MaxValue. Divide the number of training jobs for each HPO job by three:

[0.01, 0.1]

[0.001, 0.01]

[0.0001, 0.001]

Select the most accurate hyperparameter configuration form these three HPO jobs.

Buy Now
Questions 79

A retail company wants to update its customer support system. The company wants to implement automatic routing of customer claims to different queues to prioritize the claims by category.

Currently, an operator manually performs the category assignment and routing. After the operator classifies and routes the claim, the company stores the claim’s record in a central database. The claim’s record includes the claim’s category.

The company has no data science team or experience in the field of machine learning (ML). The company’s small development team needs a solution that requires no ML expertise.

Which solution meets these requirements?

Options:

A.

Export the database to a .csv file with two columns: claim_label and claim_text. Use the Amazon SageMaker Object2Vec algorithm and the .csv file to train a model. Use SageMaker to deploy the model to an inference endpoint. Develop a service in the application to use the inference endpoint to process incoming claims, predict the labels, and route the claims to the appropriate queue.

B.

Export the database to a .csv file with one column: claim_text. Use the Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm and the .csv file to train a model. Use the LDA algorithm to detect labels automatically. Use SageMaker to deploy the model to an inference endpoint. Develop a service in the application to use the inference endpoint to process incoming claims, predict the labels, and route the claims to the appropriate queue.

C.

Use Amazon Textract to process the database and automatically detect two columns: claim_label and claim_text. Use Amazon Comprehend custom classification and the extracted information to train the custom classifier. Develop a service in the application to use the Amazon Comprehend API to process incoming claims, predict the labels, and route the claims to the appropriate queue.

D.

Export the database to a .csv file with two columns: claim_label and claim_text. Use Amazon Comprehend custom classification and the .csv file to train the custom classifier. Develop a service in the application to use the Amazon Comprehend API to process incoming claims, predict the labels, and route the claims to the appropriate queue.

Buy Now
Questions 80

The Chief Editor for a product catalog wants the Research and Development team to build a machine learning system that can be used to detect whether or not individuals in a collection of images are wearing the company's retail brand The team has a set of training data

Which machine learning algorithm should the researchers use that BEST meets their requirements?

Options:

A.

Latent Dirichlet Allocation (LDA)

B.

Recurrent neural network (RNN)

C.

K-means

D.

Convolutional neural network (CNN)

Buy Now
Questions 81

A Machine Learning Specialist is configuring automatic model tuning in Amazon SageMaker

When using the hyperparameter optimization feature, which of the following guidelines should be followed to improve optimization?

Choose the maximum number of hyperparameters supported by

Options:

A.

Amazon SageMaker to search the largest number of combinations possible

B.

Specify a very large hyperparameter range to allow Amazon SageMaker to cover every possible value.

C.

Use log-scaled hyperparameters to allow the hyperparameter space to be searched as quickly as possible

D.

Execute only one hyperparameter tuning job at a time and improve tuning through successive rounds of experiments

Buy Now
Questions 82

While working on a neural network project, a Machine Learning Specialist discovers thai some features in the data have very high magnitude resulting in this data being weighted more in the cost function What should the Specialist do to ensure better convergence during backpropagation?

Options:

A.

Dimensionality reduction

B.

Data normalization

C.

Model regulanzation

D.

Data augmentation for the minority class

Buy Now
Questions 83

A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for a classification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds that their absolute values range between 0.1 to 0.95.

Which model describes the underlying data in this situation?

Options:

A.

A naive Bayesian model, since the features are all conditionally independent.

B.

A full Bayesian network, since the features are all conditionally independent.

C.

A naive Bayesian model, since some of the features are statistically dependent.

D.

A full Bayesian network, since some of the features are statistically dependent.

Buy Now
Questions 84

A retail company is ingesting purchasing records from its network of 20,000 stores to Amazon S3 by using Amazon Kinesis Data Firehose. The company uses a small, server-based application in each store to send the data to AWS over the internet. The company uses this data to train a machine learning model that is retrained each day. The company's data science team has identified existing attributes on these records that could be combined to create an improved model.

Which change will create the required transformed records with the LEAST operational overhead?

Options:

A.

Create an AWS Lambda function that can transform the incoming records. Enable data transformation on the ingestion Kinesis Data Firehose delivery stream. Use the Lambda function as the invocation target.

B.

Deploy an Amazon EMR cluster that runs Apache Spark and includes the transformation logic. Use Amazon EventBridge (Amazon CloudWatch Events) to schedule an AWS Lambda function to launch the cluster each day and transform the records that accumulate in Amazon S3. Deliver the transformed records to Amazon S3.

C.

Deploy an Amazon S3 File Gateway in the stores. Update the in-store software to deliver data to the S3 File Gateway. Use a scheduled daily AWS Glue job to transform the data that the S3 File Gateway delivers to Amazon S3.

D.

Launch a fleet of Amazon EC2 instances that include the transformation logic. Configure the EC2 instances with a daily cron job to transform the records that accumulate in Amazon S3. Deliver the transformed records to Amazon S3.

Buy Now
Exam Code: MLS-C01
Exam Name: AWS Certified Machine Learning - Specialty
Last Update: May 24, 2024
Questions: 281
$64  $159.99
$48  $119.99
$40  $99.99
buy now MLS-C01