Package org.apache.sysds.resource
Class CloudUtils
- java.lang.Object
-
- org.apache.sysds.resource.CloudUtils
-
public class CloudUtils extends Object
Class providing static utilities for cloud related operations. Some of the utilities are dependent on the cloud provider, but currently only AWS is supported and set as default provider. The methodsetProvider()
is to be used for setting terget provider once more providers become supported.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CloudUtils.CloudProvider
static class
CloudUtils.InstanceFamily
static class
CloudUtils.InstanceSize
-
Field Summary
Fields Modifier and Type Field Description static double
DEFAULT_CLUSTER_LAUNCH_TIME
static int
EBS_DEFAULT_ROOT_SIZE_EC2
static int
EBS_DEFAULT_ROOT_SIZE_EMR
static String
EC2_REGEX
static double
JVM_MEMORY_FACTOR
static double
MINIMAL_EXECUTION_TIME
static String
SPARK_VERSION
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
calculateAmCores(int totalExecutorCores)
static int
calculateAmMemoryMB(int totalExecutorCores)
static double
calculateClusterPrice(EnumerationUtils.ConfigurationPoint config, double time, CloudUtils.CloudProvider provider)
This method calculates the cluster price based on the estimated execution time and the cluster configuration.static long
calculateEffectiveDriverMemoryBudget(long driverMemory, int totalExecutorCores)
static long
GBtoBytes(double gb)
static void
generateEC2ConfigJson(CloudInstance instance, String filePath)
Generates json file storing the instance type and relevant characteristics for single node executions.static void
generateEMRConfigurationsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)
Generate json file with configurations attribute for launching AWS EMR cluster with Sparkstatic void
generateEMRInstanceGroupsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)
Generates json file with instance groups argument for launching AWS EMR clusterstatic int[]
getEffectiveExecutorResources(long memory, int cores, int numExecutors)
Calculates the effective resource values for SPark cluster managed by YARN.static CloudUtils.InstanceFamily
getInstanceFamily(String instanceName)
static CloudUtils.InstanceSize
getInstanceSize(String instanceName)
static HashMap<String,CloudInstance>
loadInstanceInfoTable(String instanceTablePath, double emrFeeRatio, double ebsStoragePrice)
Performs read of csv file filled with VM instance characteristics.static double[]
loadRegionalPrices(String feeTablePath, String region)
Performs read of csv file filled with relevant AWS fees/prices per region.static void
setProvider(CloudUtils.CloudProvider provider)
Static prover initialization method.static boolean
validateInstanceName(String instanceName)
-
-
-
Field Detail
-
JVM_MEMORY_FACTOR
public static final double JVM_MEMORY_FACTOR
- See Also:
- Constant Field Values
-
EC2_REGEX
public static final String EC2_REGEX
- See Also:
- Constant Field Values
-
EBS_DEFAULT_ROOT_SIZE_EMR
public static final int EBS_DEFAULT_ROOT_SIZE_EMR
- See Also:
- Constant Field Values
-
EBS_DEFAULT_ROOT_SIZE_EC2
public static final int EBS_DEFAULT_ROOT_SIZE_EC2
- See Also:
- Constant Field Values
-
SPARK_VERSION
public static final String SPARK_VERSION
- See Also:
- Constant Field Values
-
MINIMAL_EXECUTION_TIME
public static final double MINIMAL_EXECUTION_TIME
- See Also:
- Constant Field Values
-
DEFAULT_CLUSTER_LAUNCH_TIME
public static final double DEFAULT_CLUSTER_LAUNCH_TIME
- See Also:
- Constant Field Values
-
-
Method Detail
-
setProvider
public static void setProvider(CloudUtils.CloudProvider provider)
Static prover initialization method.- Parameters:
provider
- target provider
-
GBtoBytes
public static long GBtoBytes(double gb)
-
validateInstanceName
public static boolean validateInstanceName(String instanceName)
-
getInstanceFamily
public static CloudUtils.InstanceFamily getInstanceFamily(String instanceName)
-
getInstanceSize
public static CloudUtils.InstanceSize getInstanceSize(String instanceName)
-
calculateClusterPrice
public static double calculateClusterPrice(EnumerationUtils.ConfigurationPoint config, double time, CloudUtils.CloudProvider provider)
This method calculates the cluster price based on the estimated execution time and the cluster configuration. The calculation considers extra storage price for Spark cluster because of the HDFS dependency, but the costs for the root storage is not accounted for here.- Parameters:
config
- the cluster configuration for the calculationtime
- estimated execution time in secondsprovider
- cloud provider for the instances of the cluster- Returns:
- price for the given time
-
loadRegionalPrices
public static double[] loadRegionalPrices(String feeTablePath, String region) throws IOException
Performs read of csv file filled with relevant AWS fees/prices per region. Each record in the csv should carry the following information (including header):- Region - AWS region abbreviation
- Fee Ratio - Ratio of EMR fee per instance to EC2 price per instance per hour
- EBS Price- Price for EBS per month per GB
- Parameters:
feeTablePath
- csv file pathregion
- AWS region abbreviation- Returns:
- static array of doubles with 2 elements: [EMR fee ratio, EBS price]
- Throws:
IOException
- in case of invalid file format
-
loadInstanceInfoTable
public static HashMap<String,CloudInstance> loadInstanceInfoTable(String instanceTablePath, double emrFeeRatio, double ebsStoragePrice) throws IOException
Performs read of csv file filled with VM instance characteristics. Each record in the csv should carry the following information (including header):- API_Name - naming for VM instance used by the provider
- Price - price for instance per hour
- Memory - floating number for the instance memory in GBs
- vCPUs - number of physical threads
- Cores - number of physical cores (not relevant at the moment)
- gFlops - FLOPS capability of the CPU in GFLOPS (Giga)
- memoryBandwidth - memory bandwidth in MB/s
- NVMe - flag if NVMe storage volume(s) are attached
- storageVolumes - number of NVMe or EBS (to additionally configured) volumes
- sizeVolumes - size of each NVMe or EBS (to additionally configured) volume
- diskReadBandwidth - disk read bandwidth in MB/s
- diskReadBandwidth - disk write bandwidth in MB/s
- networkBandwidth - network bandwidth in MB/s
- Parameters:
instanceTablePath
- csv file pathemrFeeRatio
- EMR fee as fraction of the instance price (depends on the region)ebsStoragePrice
- EBS price per GB per month (depends on the region)- Returns:
- map with filtered instances
- Throws:
IOException
- in case problem at reading the csv file
-
generateEC2ConfigJson
public static void generateEC2ConfigJson(CloudInstance instance, String filePath)
Generates json file storing the instance type and relevant characteristics for single node executions. The resulting file is to be used only for parsing the attributes and is not suitable for direct options input to AWS CLI.- Parameters:
instance
- EC2 instance object (always set one)filePath
- path for the json file
-
generateEMRInstanceGroupsJson
public static void generateEMRInstanceGroupsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)
Generates json file with instance groups argument for launching AWS EMR cluster- Parameters:
clusterConfig
- object representing EMR cluster configurationsfilePath
- path for the output json file
-
generateEMRConfigurationsJson
public static void generateEMRConfigurationsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)
Generate json file with configurations attribute for launching AWS EMR cluster with Spark- Parameters:
clusterConfig
- object representing EMR cluster configurationsfilePath
- path for the output json file
-
getEffectiveExecutorResources
public static int[] getEffectiveExecutorResources(long memory, int cores, int numExecutors)
Calculates the effective resource values for SPark cluster managed by YARN. It considers the resource limits for scheduling containers by YARN and the need to fit an Application Master (AM) container in addition to the executor ones.- Parameters:
memory
- total node memory inn bytescores
- total node available virtual coresnumExecutors
- number of available worker nodes- Returns:
- arrays of length 5 - [executor mem. in MB, executor cores, num. executors, AM mem. in MB, AM cores]
-
calculateAmMemoryMB
public static int calculateAmMemoryMB(int totalExecutorCores)
-
calculateAmCores
public static int calculateAmCores(int totalExecutorCores)
-
calculateEffectiveDriverMemoryBudget
public static long calculateEffectiveDriverMemoryBudget(long driverMemory, int totalExecutorCores)
-
-