AWS Certified Machine Learning Specialty MLS-C01 Q1-Q10

This is post 1 of 15 in the series “AWS Certified Machine Learning Specialty MLS-C01 Series”

1. A machine learning (ML) specialist is using the Amazon SageMaker DeepAR forecasting algorithm to train a model on CPU-based Amazon EC2 On-Demand instances. The model currently takes multiple hours to train. The ML specialist wants to decrease the training time of the model.

Which approaches will meet this requirement? (Choose two.)

A. Replace On-Demand Instances with Spot Instances.
B. Configure model auto scaling dynamically to adjust the number of instances automatically.
C. Replace CPU-based EC2 instances with GPU-based EC2 instances.
D. Use multiple training instances.
E. Use a pre-trained version of the model. Run incremental training.

Answer

C, D

2. A chemical company has developed several machine learning (ML) solutions to identify chemical process abnormalities. The time series values of independent variables and the labels are available for the past 2 years and are sufficient to accurately model the problem.

The regular operation label is marked as 0 The abnormal operation label is marked as 1. Process abnormalities have a significant negative effect on the company’s profits. The company must avoid these abnormalities.

Which metrics will indicate an ML solution that will provide the GREATEST probability of detecting an abnormality?

A. Precision = 0.91 –
Recall = 0.6
B. Precision = 0.61 –
Recall = 0.98
C. Precision = 0.7 –
Recall = 0.9
D. Precision = 0.98 –
Recall = 0.8

Answer

3. An agriculture company wants to improve crop yield forecasting for the upcoming season by using crop yields from the last three seasons. The company wants to compare the performance of its new scikit-learn model to the benchmark.

A data scientist needs to package the code into a container that computes both the new model forecast and the benchmark. The data scientist wants AWS to be responsible for the operational maintenance of the container.

Which solution will meet these requirements?

A. Package the code as the training script for an Amazon SageMaker scikit-learn container.
B. Package the code into a custom-built container. Push the container to Amazon Elastic Container Registry (Amazon ECR).
C. Package the code into a custom-built container. Push the container to AWS Fargate.
D. Package the code by extending an Amazon SageMaker scikit-learn container.

Answer

4. A telecommunications company has deployed a machine learning model using Amazon SageMaker. The model identifies customers who are likely to cancel their contract when calling customer service. These customers are then directed to a specialist service team. The model has been trained on historical data from multiple years relating to customer contracts and customer service interactions in a single geographic region.

The company is planning to launch a new global product that will use this model. Management is concerned that the model might incorrectly direct a large number of calls from customers in regions without historical data to the specialist service team.

Which approach would MOST effectively address this issue?

A. Enable Amazon SageMaker Model Monitor data capture on the model endpoint. Create a monitoring baseline on the training dataset. Schedule monitoring jobs. Use Amazon CloudWatch to alert the data scientists when the numerical distance of regional customer data fails the baseline drift check. Reevaluate the training set with the larger data source and retrain the model.
B. Enable Amazon SageMaker Debugger on the model endpoint. Create a custom rule to measure the variance from the baseline training dataset. Use Amazon CloudWatch to alert the data scientists when the rule is invoked. Reevaluate the training set with the larger data source and retrain the model.
C. Capture all customer calls routed to the specialist service team in Amazon S3. Schedule a monitoring job to capture all the true positives and true negatives, correlate them to the training dataset, and calculate the accuracy. Use Amazon CloudWatch to alert the data scientists when the accuracy decreases. Reevaluate the training set with the additional data from the specialist service team and retrain the model.
D. Enable Amazon CloudWatch on the model endpoint. Capture metrics using Amazon CloudWatch Logs and send them to Amazon S3. Analyze the monitored results against the training data baseline. When the variance from the baseline exceeds the regional customer variance, reevaluate the training set and retrain the model.

Answer

5. A company builds computer-vision models that use deep learning for the autonomous vehicle industry. A machine learning (ML) specialist uses an Amazon EC2 instance that has a CPU:GPU ratio of 12:1 to train the models.

The ML specialist examines the instance metric logs and notices that the GPU is idle half of the time. The ML specialist must reduce training costs without increasing the duration of the training jobs.

Which solution will meet these requirements?

A. Switch to an instance type that has only CPUs.
B. Use a heterogeneous cluster that has two different instances groups.
C. Use memory-optimized EC2 Spot Instances for the training jobs.
D. Switch to an instance type that has a CPU:GPU ratio of 6:1.

Answer

6. An ecommerce company discovers that the search tool for the company’s website is not presenting the top search results to customers. The company needs to resolve the issue so the search tool will present results that customers are most likely to want to purchase.

Which solution will meet this requirement with the LEAST operational effort?

A. Use the Amazon SageMaker BlazingText algorithm to add context to search results through query expansion.
B. Use the Amazon SageMaker XGBoost algorithm to improve candidate ranking.
C. Use Amazon CloudSearch and sort results by the search relevance score.
D. Use Amazon CloudSearch and sort results by the geographic location.

Answer

7. A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages.

Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions. The ML specialist must ensure that the data does not contain outliers before training the model.

How can the ML specialist meet these requirements with the LEAST operational overhead?

A. Load the data into an Amazon SageMaker Studio notebook. Calculate the first and third quartile. Use a SageMaker Data Wrangler data flow to remove only values that are outside of those quartiles.
B. Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset. Use a Data Wrangler data flow to remove outliers based on the bias report.
C. Use an Amazon SageMaker Data Wrangler anomaly detection visualization to find outliers in the dataset. Add a transformation to a Data Wrangler data flow to remove outliers.
D. Use Amazon Lookout for Equipment to find and remove outliers from the dataset.

Answer

8. A data scientist obtains a tabular dataset that contains 150 correlated features with different ranges to build a regression model. The data scientist needs to achieve more efficient model training by implementing a solution that minimizes impact on the model’s performance. The data scientist decides to perform a principal component analysis (PCA) preprocessing step to reduce the number of features to a smaller set of independent features before the data scientist uses the new features in the regression model.

Which preprocessing step will meet these requirements?

A. Use the Amazon SageMaker built-in algorithm for PCA on the dataset to transform the data.
B. Load the data into Amazon SageMaker Data Wrangler. Scale the data with a Min Max Scaler transformation step. Use the SageMaker built-in algorithm for PCA on the scaled dataset to transform the data.
C. Reduce the dimensionality of the dataset by removing the features that have the highest correlation. Load the data into Amazon SageMaker Data Wrangler. Perform a Standard Scaler transformation step to scale the data. Use the SageMaker built-in algorithm for PCA on the scaled dataset to transform the data.
D. Reduce the dimensionality of the dataset by removing the features that have the lowest correlation. Load the data into Amazon SageMaker Data Wrangler. Perform a Min Max Scaler transformation step to scale the data. Use the SageMaker built-in algorithm for PCA on the scaled dataset to transform the data.

Answer

9. A data scientist is trying to improve the accuracy of a neural network classification model. The data scientist wants to run a large hyperparameter tuning job in Amazon SageMaker. However, previous smaller tuning jobs on the same model often ran for several weeks. The ML specialist wants to reduce the computation time required to run the tuning job.

Which actions will MOST reduce the computation time for the hyperparameter tuning job? (Choose two.)

A. Use the Hyperband tuning strategy.
B. Increase the number of hyperparameters.
C. Set a lower value for the MaxNumberOfTrainingJobs parameter.
D. Use the grid search tuning strategy.
E. Set a lower value for the MaxParallelTrainingJobs parameter.

Answer

A, C

10. A car company has dealership locations in multiple cities. The company uses a machine learning (ML) recommendation system to market cars to its customers.

An ML engineer trained the ML recommendation model on a dataset that includes multiple attributes about each car. The dataset includes attributes such as car brand, car type, fuel efficiency, and price.

The ML engineer uses Amazon SageMaker Data Wrangler to analyze and visualize data. The ML engineer needs to identify the distribution of car prices for a specific type of car.

Which type of visualization should the ML engineer use to meet these requirements?

A. Use the SageMaker Data Wrangler scatter plot visualization to inspect the relationship between the car price and type of car.
B. Use the SageMaker Data Wrangler quick model visualization to quickly evaluate the data and produce importance scores for the car price and type of car.
C. Use the SageMaker Data Wrangler anomaly detection visualization to Identify outliers for the specific features.
D. Use the SageMaker Data Wrangler histogram visualization to inspect the range of values for the specific feature.

Answer

Leave a Comment Cancel Reply