Package org.apache.sysds.resource
Class CloudUtils
- java.lang.Object
-
- org.apache.sysds.resource.CloudUtils
-
public class CloudUtils extends Object
Class providing static utilities for cloud related operations. Some of the utilities are dependent on the cloud provider, but currently only AWS is supported and set as default provider. The methodsetProvider()is to be used for setting terget provider once more providers become supported.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classCloudUtils.CloudProviderstatic classCloudUtils.InstanceFamilystatic classCloudUtils.InstanceSize
-
Field Summary
Fields Modifier and Type Field Description static doubleDEFAULT_CLUSTER_LAUNCH_TIMEstatic intEBS_DEFAULT_ROOT_SIZE_EC2static intEBS_DEFAULT_ROOT_SIZE_EMRstatic StringEC2_REGEXstatic doubleJVM_MEMORY_FACTORstatic doubleMINIMAL_EXECUTION_TIMEstatic StringSPARK_VERSION
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static intcalculateAmCores(int totalExecutorCores)static intcalculateAmMemoryMB(int totalExecutorCores)static doublecalculateClusterPrice(EnumerationUtils.ConfigurationPoint config, double time, CloudUtils.CloudProvider provider)This method calculates the cluster price based on the estimated execution time and the cluster configuration.static longcalculateEffectiveDriverMemoryBudget(long driverMemory, int totalExecutorCores)static longGBtoBytes(double gb)static voidgenerateEC2ConfigJson(CloudInstance instance, String filePath)Generates json file storing the instance type and relevant characteristics for single node executions.static voidgenerateEMRConfigurationsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)Generate json file with configurations attribute for launching AWS EMR cluster with Sparkstatic voidgenerateEMRInstanceGroupsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)Generates json file with instance groups argument for launching AWS EMR clusterstatic int[]getEffectiveExecutorResources(long memory, int cores, int numExecutors)Calculates the effective resource values for SPark cluster managed by YARN.static CloudUtils.InstanceFamilygetInstanceFamily(String instanceName)static CloudUtils.InstanceSizegetInstanceSize(String instanceName)static HashMap<String,CloudInstance>loadInstanceInfoTable(String instanceTablePath, double emrFeeRatio, double ebsStoragePrice)Performs read of csv file filled with VM instance characteristics.static double[]loadRegionalPrices(String feeTablePath, String region)Performs read of csv file filled with relevant AWS fees/prices per region.static voidsetProvider(CloudUtils.CloudProvider provider)Static prover initialization method.static booleanvalidateInstanceName(String instanceName)
-
-
-
Field Detail
-
JVM_MEMORY_FACTOR
public static final double JVM_MEMORY_FACTOR
- See Also:
- Constant Field Values
-
EC2_REGEX
public static final String EC2_REGEX
- See Also:
- Constant Field Values
-
EBS_DEFAULT_ROOT_SIZE_EMR
public static final int EBS_DEFAULT_ROOT_SIZE_EMR
- See Also:
- Constant Field Values
-
EBS_DEFAULT_ROOT_SIZE_EC2
public static final int EBS_DEFAULT_ROOT_SIZE_EC2
- See Also:
- Constant Field Values
-
SPARK_VERSION
public static final String SPARK_VERSION
- See Also:
- Constant Field Values
-
MINIMAL_EXECUTION_TIME
public static final double MINIMAL_EXECUTION_TIME
- See Also:
- Constant Field Values
-
DEFAULT_CLUSTER_LAUNCH_TIME
public static final double DEFAULT_CLUSTER_LAUNCH_TIME
- See Also:
- Constant Field Values
-
-
Method Detail
-
setProvider
public static void setProvider(CloudUtils.CloudProvider provider)
Static prover initialization method.- Parameters:
provider- target provider
-
GBtoBytes
public static long GBtoBytes(double gb)
-
validateInstanceName
public static boolean validateInstanceName(String instanceName)
-
getInstanceFamily
public static CloudUtils.InstanceFamily getInstanceFamily(String instanceName)
-
getInstanceSize
public static CloudUtils.InstanceSize getInstanceSize(String instanceName)
-
calculateClusterPrice
public static double calculateClusterPrice(EnumerationUtils.ConfigurationPoint config, double time, CloudUtils.CloudProvider provider)
This method calculates the cluster price based on the estimated execution time and the cluster configuration. The calculation considers extra storage price for Spark cluster because of the HDFS dependency, but the costs for the root storage is not accounted for here.- Parameters:
config- the cluster configuration for the calculationtime- estimated execution time in secondsprovider- cloud provider for the instances of the cluster- Returns:
- price for the given time
-
loadRegionalPrices
public static double[] loadRegionalPrices(String feeTablePath, String region) throws IOException
Performs read of csv file filled with relevant AWS fees/prices per region. Each record in the csv should carry the following information (including header):- Region - AWS region abbreviation
- Fee Ratio - Ratio of EMR fee per instance to EC2 price per instance per hour
- EBS Price- Price for EBS per month per GB
- Parameters:
feeTablePath- csv file pathregion- AWS region abbreviation- Returns:
- static array of doubles with 2 elements: [EMR fee ratio, EBS price]
- Throws:
IOException- in case of invalid file format
-
loadInstanceInfoTable
public static HashMap<String,CloudInstance> loadInstanceInfoTable(String instanceTablePath, double emrFeeRatio, double ebsStoragePrice) throws IOException
Performs read of csv file filled with VM instance characteristics. Each record in the csv should carry the following information (including header):- API_Name - naming for VM instance used by the provider
- Price - price for instance per hour
- Memory - floating number for the instance memory in GBs
- vCPUs - number of physical threads
- Cores - number of physical cores (not relevant at the moment)
- gFlops - FLOPS capability of the CPU in GFLOPS (Giga)
- memoryBandwidth - memory bandwidth in MB/s
- NVMe - flag if NVMe storage volume(s) are attached
- storageVolumes - number of NVMe or EBS (to additionally configured) volumes
- sizeVolumes - size of each NVMe or EBS (to additionally configured) volume
- diskReadBandwidth - disk read bandwidth in MB/s
- diskReadBandwidth - disk write bandwidth in MB/s
- networkBandwidth - network bandwidth in MB/s
- Parameters:
instanceTablePath- csv file pathemrFeeRatio- EMR fee as fraction of the instance price (depends on the region)ebsStoragePrice- EBS price per GB per month (depends on the region)- Returns:
- map with filtered instances
- Throws:
IOException- in case problem at reading the csv file
-
generateEC2ConfigJson
public static void generateEC2ConfigJson(CloudInstance instance, String filePath)
Generates json file storing the instance type and relevant characteristics for single node executions. The resulting file is to be used only for parsing the attributes and is not suitable for direct options input to AWS CLI.- Parameters:
instance- EC2 instance object (always set one)filePath- path for the json file
-
generateEMRInstanceGroupsJson
public static void generateEMRInstanceGroupsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)
Generates json file with instance groups argument for launching AWS EMR cluster- Parameters:
clusterConfig- object representing EMR cluster configurationsfilePath- path for the output json file
-
generateEMRConfigurationsJson
public static void generateEMRConfigurationsJson(EnumerationUtils.ConfigurationPoint clusterConfig, String filePath)
Generate json file with configurations attribute for launching AWS EMR cluster with Spark- Parameters:
clusterConfig- object representing EMR cluster configurationsfilePath- path for the output json file
-
getEffectiveExecutorResources
public static int[] getEffectiveExecutorResources(long memory, int cores, int numExecutors)Calculates the effective resource values for SPark cluster managed by YARN. It considers the resource limits for scheduling containers by YARN and the need to fit an Application Master (AM) container in addition to the executor ones.- Parameters:
memory- total node memory inn bytescores- total node available virtual coresnumExecutors- number of available worker nodes- Returns:
- arrays of length 5 - [executor mem. in MB, executor cores, num. executors, AM mem. in MB, AM cores]
-
calculateAmMemoryMB
public static int calculateAmMemoryMB(int totalExecutorCores)
-
calculateAmCores
public static int calculateAmCores(int totalExecutorCores)
-
calculateEffectiveDriverMemoryBudget
public static long calculateEffectiveDriverMemoryBudget(long driverMemory, int totalExecutorCores)
-
-