Maximum parallel throughput for training workloads

Best OCI Instances for ML Training

CPU-based machine learning training — gradient boosted trees (XGBoost, LightGBM), scikit-learn pipelines, and data preprocessing — scales linearly with parallel CPU throughput. The relevant metric here is peak PassMark at maximum vCPU count, not value per vCPU. Bare Metal shapes with 128–192 vCPUs let you parallelize cross-validation folds and hyperparameter search without network overhead. Flex shapes give you precise cost control: scale up for training runs, scale down for inference serving on the same shape family.

What to look for

ML training jobs parallelize across all available cores. Peak PassMark — the estimated total score at maximum vCPU count — is the ceiling on how fast your training loop runs. Bare Metal shapes avoid hypervisor CPU steal, reducing training time variability across runs.

  • Peak PassMark (at max vCPU config)
  • Max vCPUs
  • PassMark per vCPU
  • RAM (GB)

Ranked instances — 49 shapes

#Shape$/vCPU/hr
1$0.0150
2$0.0125
3$0.0125
4$0.0150
5N/A
6$0.0150
7$0.0125
8N/A
9$0.0125
10$0.0125
11$0.0375
12$0.0100
13$0.0125
14N/A
15$0.0150
16$0.0150
17$0.0319
18N/A
19N/A
20$0.0100
21N/A
22$0.0270
23$0.0070
24N/A
25N/A
26$0.0319
27N/A
28N/A
29N/A
30$0.0069
31$0.0069
32$0.0319
33N/A
34$0.0150
35N/A
36N/A
37$0.0319
38N/A
39$0.0150
40N/A
41$0.0319
42N/A
43$0.0150
44N/A
45$0.0319
46N/A
47$0.0150
48$0.0150
49N/A

More use cases