You developed a Vertex AI pipeline that trains a classification model on data stored in a large BigQuery table. The pipeline has four steps, where each step is created by a Python function that uses the KubeFlow v2 API. The components have the following names:
dt = datetime.now().strftime("%Y%m%d%H%M%S")
f"export-{dt}-yaml", f"preprocess-{dt}.yaml", f"train-{dt}.yaml",
f"calibrate-{dt}.yaml"
You launch your Vertex AI pipeline as the following:
job = aip.PipelineJob(
display_name="my-awesome-pipeline",
template_path="pipeline.json",
job_id=f"my-awesome-pipeline-{dt}",
parameter_values=params,
enable_caching=True,
location="europe-west1"
)
You perform many model iterations by adjusting the code and parameters of the training step. You observe high costs associated with the development, particularly the data export and preprocessing steps. You need to reduce model development costs. What should you do?
A. Change the components’ YAML filenames to export.yaml, preprocess,yaml, f “train-
{dt}.yaml”, f”calibrate-{dt).vaml”.
B. Add the {“kubeflow.v1.caching”: True} parameter to the set of params provided to your PipelineJob.
C. Move the first step of your pipeline to a separate step, and provide a cached path to Cloud Storage as an input to the main pipeline.
D. Change the name of the pipeline to f”my-awesome-pipeline-{dt}”.
Answer
A