This documentation is for an older version of HVR.
Requirements for AWS
AWS (Amazon Web Services) is Amazon's cloud platform providing the following services relevant for HVR:
- EC2 Elastic Cloud Computing instances are Virtual Machines in the AWS cloud. These VMs can be either Linux or Windows-based. This is "Infrastructure as a Service" (IaaS). HVR can run on an EC2 Instance provided the OS is supported by HVR (Linux, Windows server). This scenario is identical to running HVR in a data center for an on-premises scenario.
- Amazon Redshift is Amazon's highly scalable clustered data warehouse service. HVR supports Redshift as a target database, both for initial load/refresh and in Change Data Capture mode. For more information, see Requirements for Redshift.
- Amazon RDS is Amazon's Relational Database Service. HVR supports MariaDB, MySQL, Aurora, Oracle, PostgreSQL, and Microsoft SQL Server running on Amazon RDS. Note that log-based capture is not supported for Microsoft SQL Server on Amazon RDS.
- Amazon EMR (Elastic Map Reduce) is Amazon's implementation of Hadoop. It can be accessed by using HVR's generic Hadoop connector. For more information, see Requirements for HDFS.
- Amazon S3 storage buckets are available as staging area to load data into Redshift, can be used as a file location target (optional with Hive external tables on top), or for staging for other databases (Hive Acid, Snowflake).
Architecture
There are different types of configuration topologies supported by HVR when working with AWS. The following ones are most commonly used:
- A: Connecting to an AWS resource with the HVR hub installed on-premises. To avoid poor performance due to low bandwidth and/or high latency on the network, the HVR Agent should be installed in AWS. Any size instance will be sufficient for such use case, including the smallest type available (T2.Micro).
- B.1: Hosting the HVR hub in AWS to pull data from an on-premises source into AWS. For this use case, the hub database can be a separate RDS database supported as a hub by HVR. The HVR Agent may be installed on an AWS EC2 instance and be configured to connect to the hub database. For this topology (B.1), using the HVR Agent on EC2 is optional. However, it may provide a better performance, as opposed to remotely connecting the HVR to RDS over the Internet. If the HVR Agent is used on EC2 to connect to RDS, then communication with the HVR hub over the HVR protocol is fast and is not affected by network latency that much.
- B.2: Alternatively the hub database can be installed on the EC2 VM.
- C: Performing cloud-based real-time integration. HVR can connect to only cloud-based resources, either from AWS or from other cloud providers.