Class CloudUtils


  • public class CloudUtils
    extends Object
    Class providing static utilities for cloud related operations. Some of the utilities are dependent on the cloud provider, but currently only AWS is supported and set as default provider. The method setProvider() is to be used for setting terget provider once more providers become supported.
    • Method Detail

      • setProvider

        public static void setProvider​(CloudUtils.CloudProvider provider)
        Static prover initialization method.
        Parameters:
        provider - target provider
      • GBtoBytes

        public static long GBtoBytes​(double gb)
      • validateInstanceName

        public static boolean validateInstanceName​(String instanceName)
      • calculateClusterPrice

        public static double calculateClusterPrice​(EnumerationUtils.ConfigurationPoint config,
                                                   double time,
                                                   CloudUtils.CloudProvider provider)
        This method calculates the cluster price based on the estimated execution time and the cluster configuration. The calculation considers extra storage price for Spark cluster because of the HDFS dependency, but the costs for the root storage is not accounted for here.
        Parameters:
        config - the cluster configuration for the calculation
        time - estimated execution time in seconds
        provider - cloud provider for the instances of the cluster
        Returns:
        price for the given time
      • loadRegionalPrices

        public static double[] loadRegionalPrices​(String feeTablePath,
                                                  String region)
                                           throws IOException
        Performs read of csv file filled with relevant AWS fees/prices per region. Each record in the csv should carry the following information (including header):
        • Region - AWS region abbreviation
        • Fee Ratio - Ratio of EMR fee per instance to EC2 price per instance per hour
        • EBS Price- Price for EBS per month per GB
        Parameters:
        feeTablePath - csv file path
        region - AWS region abbreviation
        Returns:
        static array of doubles with 2 elements: [EMR fee ratio, EBS price]
        Throws:
        IOException - in case of invalid file format
      • loadInstanceInfoTable

        public static HashMap<String,​CloudInstance> loadInstanceInfoTable​(String instanceTablePath,
                                                                                double emrFeeRatio,
                                                                                double ebsStoragePrice)
                                                                         throws IOException
        Performs read of csv file filled with VM instance characteristics. Each record in the csv should carry the following information (including header):
        • API_Name - naming for VM instance used by the provider
        • Price - price for instance per hour
        • Memory - floating number for the instance memory in GBs
        • vCPUs - number of physical threads
        • Cores - number of physical cores (not relevant at the moment)
        • gFlops - FLOPS capability of the CPU in GFLOPS (Giga)
        • memoryBandwidth - memory bandwidth in MB/s
        • NVMe - flag if NVMe storage volume(s) are attached
        • storageVolumes - number of NVMe or EBS (to additionally configured) volumes
        • sizeVolumes - size of each NVMe or EBS (to additionally configured) volume
        • diskReadBandwidth - disk read bandwidth in MB/s
        • diskReadBandwidth - disk write bandwidth in MB/s
        • networkBandwidth - network bandwidth in MB/s
        Parameters:
        instanceTablePath - csv file path
        emrFeeRatio - EMR fee as fraction of the instance price (depends on the region)
        ebsStoragePrice - EBS price per GB per month (depends on the region)
        Returns:
        map with filtered instances
        Throws:
        IOException - in case problem at reading the csv file
      • generateEC2ConfigJson

        public static void generateEC2ConfigJson​(CloudInstance instance,
                                                 String filePath)
        Generates json file storing the instance type and relevant characteristics for single node executions. The resulting file is to be used only for parsing the attributes and is not suitable for direct options input to AWS CLI.
        Parameters:
        instance - EC2 instance object (always set one)
        filePath - path for the json file
      • generateEMRInstanceGroupsJson

        public static void generateEMRInstanceGroupsJson​(EnumerationUtils.ConfigurationPoint clusterConfig,
                                                         String filePath)
        Generates json file with instance groups argument for launching AWS EMR cluster
        Parameters:
        clusterConfig - object representing EMR cluster configurations
        filePath - path for the output json file
      • generateEMRConfigurationsJson

        public static void generateEMRConfigurationsJson​(EnumerationUtils.ConfigurationPoint clusterConfig,
                                                         String filePath)
        Generate json file with configurations attribute for launching AWS EMR cluster with Spark
        Parameters:
        clusterConfig - object representing EMR cluster configurations
        filePath - path for the output json file
      • getEffectiveExecutorResources

        public static int[] getEffectiveExecutorResources​(long memory,
                                                          int cores,
                                                          int numExecutors)
        Calculates the effective resource values for SPark cluster managed by YARN. It considers the resource limits for scheduling containers by YARN and the need to fit an Application Master (AM) container in addition to the executor ones.
        Parameters:
        memory - total node memory inn bytes
        cores - total node available virtual cores
        numExecutors - number of available worker nodes
        Returns:
        arrays of length 5 - [executor mem. in MB, executor cores, num. executors, AM mem. in MB, AM cores]
      • calculateAmMemoryMB

        public static int calculateAmMemoryMB​(int totalExecutorCores)
      • calculateAmCores

        public static int calculateAmCores​(int totalExecutorCores)
      • calculateEffectiveDriverMemoryBudget

        public static long calculateEffectiveDriverMemoryBudget​(long driverMemory,
                                                                int totalExecutorCores)