AWS Certification: Redshift Questions

AWS RedShift

Overview
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.
1. A company currently hosts a Redshift cluster in AWS. For security reasons, it should be ensured that all traffic from and to the Redshift cluster does not go through the Internet. Which of the following features can be used to fulfill this requirement in an efficient manner?

A. Enable Amazon Redshift Enhanced VPC Routing.

B. Create a NAT Gateway to route the traffic.

C. Create a NAT Instance to route the traffic.

D. Create a VPN Connection to ensure traffic does not flow through the Internet.

Answer

A. Enable Amazon Redshift Enhanced VPC Routing.

AWS Documentation mentions the following: When you use Amazon Redshift Enhanced VPC Routing, Amazon Redshift forces all COPY and UNLOAD traffic between your cluster and your data repositories through your Amazon VPC. If Enhanced VPC Routing is not enabled, Amazon Redshift routes traffic through the Internet, including traffic to other services within the AWS network. For more information on Redshift Enhanced Routing, please visit the following URL: https://docs.aws.amazon.com/redshift/latest/mgmt/enhanced-vpc-routing.html


2. A company currently uses Redshift in AWS. The Redshift cluster is required to be used in a cost-effective manner. As an architect, which of the following would you consider to ensure cost-effectiveness?

A. Use Spot Instances for the underlying nodes in the cluster.

B. Ensure that unnecessary manual snapshots of the cluster are deleted.

C. Ensure VPC Enhanced Routing is enabled.

D. Ensure that CloudWatch metrics are disabled.

Answer

B. Ensure that unnecessary manual snapshots of the cluster are deleted.

AWS Documentation mentions the following: Amazon Redshift provides free storage for snapshots that is equal to the storage capacity of your cluster until you delete the cluster. After you reach the free snapshot storage limit, you are charged for any additional storage at the normal rate. Because of this, you should evaluate how many days you need to keep automated snapshots and configure their retention period accordingly, and delete any manual snapshots that you no longer need. For more information on working with Redshift Snapshots, please visit the following URL: https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-snapshots.html Note: Redshift pricing is based on the following elements.Compute node hours Backup StorageData transfer – There is no data transfer charge for data transferred to or from Amazon Redshift and Amazon S3 within the same AWS Region. For all other data transfers into and out of Amazon Redshift, you will be billed at standard AWS data transfer rates.Data scanned There is no additional charge for using Enhanced VPC Routing. You might incur additional data transfer charges for certain operations, such as UNLOAD to Amazon S3 in a different region or COPY from Amazon EMR or SSH with public IP addresses. Enhanced VPC routing does not incur any cost but any Unload operation to a different region will incur a cost.With Enhanced VPC routing or with out it any data transfer to a different region does incur cost.But with Storage, increasing your backup retention period or taking additional snapshots increases the backup storage consumed by your data warehouse. There is no additional charge for backup storage up to 100% of your provisioned storage for an active data warehouse cluster. Any amount of storage exceeding this limit does incur cost.


3. A company has a Redshift Cluster defined in AWS. The IT Operations team have ensured that both automated and manual snapshots are in place. Since the cluster is going to be run for a long duration of a couple of years, Reserved Instances have been purchased. There has been a recent concern on the cost being incurred by the cluster. Which of the following steps can be carried out to minimize the costs being incurred by the cluster?

A. Delete the manual snapshots.

B. Set the retention period of the automated snapshots to 35 days.

C. Choose to use Spot Instances instead of Reserved Instances.

D. Choose to use Instance store volumes to store the cluster data.

Answer

A. Delete the manual snapshots.

AWS Documentation mentions the following: Regardless of whether you enable automated snapshots, you can take a manual snapshot whenever you want. Amazon Redshift will never automatically delete a manual snapshot. Manual snapshots are retained even after you delete your cluster. Because manual snapshots accrue storage charges, it’s important that you manually delete them if you no longer need them. Automated snapshots are automatically deleted within the period of 1(Least) to 35(Max) days(Based on the retention period settings). So we have to take care of the Manual snapshots instead of Automated snapshots. Amazon Redshift never deletes Manual snaphots automatically, like how it does for Automatic Snapshots. For more information on working with Snapshots, please visit the following URL: https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-snapshots.html


4. A company has a Redshift cluster for petabyte-scale data warehousing. The data within the cluster is easily reproducible from additional data stored on Amazon S3. The company wants to reduce the overall total cost of running this Redshift cluster. Which scenario would best meet the needs of the running cluster, while still reducing total overall ownership of the cluster? Choose the correct answer from the options below.

A. Instead of implementing automatic daily backups, write a CLI script that creates manualsnapshots every few days. Copy the manual snapshot to a secondary AWS regionfor disaster recovery situations.

B. Enable automated snapshots but set the retention period to a lower number to reducestorage costs.

C. Implement daily backups, but do not enable multi-region copy to save data transfer costs.

D. Disable automated and manual snapshots on the cluster.

Answer

D. Disable automated and manual snapshots on the cluster.

Snapshots are point-in-time backups of a cluster. There are two types of snapshots:automatedandmanual. Amazon Redshift stores these snapshots internally in Amazon S3 by using an encrypted Secure Sockets Layer (SSL) connection. If you need to restore from a snapshot, Amazon Redshift creates a new cluster and imports data from the snapshot that you specify. Since the question already mentions that the cluster is easily reproducible from additional data stored on Amazon S3, you do not need to maintain snapshots. For more information on Redshift Snapshots, please visit the below URL: http://docs.aws.amazon.com/redshift/latest/mgmt/working-with-snapshots.html


5. A company has a requirement to store 100TB of data to AWS. This data will be exported using AWS Snowball and needs to then reside in a database layer. The database should have the facility to be queried from a business intelligence application. Each item is roughly 500KB in size. Which of the following is an ideal storage mechanism for the underlying data layer?

A. AWS DynamoDB

B. AWS Aurora

C. AWS RDS

D. AWS Redshift

Answer

D. AWS Redshift

AWS Documentation mentions the following on AWS Redshift:

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.

The first step to create a data warehouse is to launch a set of nodes, called an Amazon Redshift cluster. After you provision your cluster, you can upload your data set and then perform data analysis queries. Regardless of the size of the data set, Amazon Redshift offers fast query performance using the same SQL-based tools and business intelligence applications that you use today.

For more information on AWS Redshift, please refer to the URL below.

https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html Option A is incorrect because the maximum item size in DynamoDB is 400KB.

Option B is incorrect because Aurora supports 64TB of data.

Option C is incorrect because we can create MySQL, MariaDB, SQL Server, PostgreSQL, and Oracle RDS DB instances with up to 16 TiB of storage.


6. A company is generating large datasets with millions of rows to be summarized column-wise. Existing business intelligence tools will be used to build daily reports from these datasets.

Which storage service meets these requirements?


A. Amazon Redshift

B. Amazon RDS

C. ElastiCache

D. DynamoDB

Answer

A. Amazon Redshift

AWS Documentation mentions the following:

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers.

For more information on AWS Redshift, please visit the following URL:

https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html Columnar storage for database tables is an important factor in optimizing analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk.

Amazon Redshift uses a block size of 1 MB, which is more efficient and further reduces the number of I/O requests needed to perform any database loading or other operations that are part of query execution.

More information on how redshift manages the columnar storage is available here:

https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt .html


7. A company is planning on using the AWS Redshift service. The Redshift service and data on it would be used continuously for the next 3 years as per the current business plan. Which of the following would be the most cost-effective solution in this scenario?

A. Consider using On-demand instances for the Redshift Cluster.

B. Enable Automated backup.

C. Conside rusing Reserved Instances for the Redshift Cluster.

D. Consider not using a cluster for the Redshift nodes.

Answer

C. Conside rusing Reserved Instances for the Redshift Cluster.

AWS Documentation mentions the following: If you intend to keep your Amazon Redshift cluster running continuously for a prolonged period, you should consider purchasing reserved node offerings. These offerings provide significant savings over on-demand pricing, but they require you to reserve compute nodes and commit to paying for those nodes for either a one-year or three-year duration. For more information on Reserved Nodes in Redshift, please visit the following URL: https://docs.aws.amazon.com/redshift/latest/mgmt/purchase-reserved-node-instance.html


8. A company is using a Redshift cluster to store their data warehouse. There is a requirement from the Internal IT Security team to encrypt data for the Redshift database. How can this be achieved?

A. Encrypt the EBS volumes of the underlying EC2 Instances.

B. Use AWS KMS Customer Default master key.

C. Use SSL/TLS for encrypting the data.

D. Use S3 Encryption.

Answer

B. Use AWS KMS Customer Default master key.

AWS documentation mentions the following:

Amazon Redshift uses a hierarchy of encryption keys to encrypt the database. You can use either AWS Key Management Service (AWS KMS) or a hardware security module (HSM) to manage the top-level encryption keys in this hierarchy. The process that Amazon Redshift uses for encryption differs depending on how you manage keys.

For more information on Redshift encryption, please visit the following URL:

https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-db-encryption.html


9. A company needs to have a columnar structured database storage suitable to perform complex analytic queries against petabytes of structured data, Which of the following options can meet this requirement?

A. Amazon Redshift

B. Amazon RDS

C. ElastiCache

D. DynamoDB

Answer

A. Amazon Redshift

AWS Documentation mentions the following: Amazon Redshift is a column-oriented, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. Amazon Redshift achieves efficient storage and optimum query performance through a combination of massively parallel processing, columnar data storage, and very efficient, targeted data compression encoding schemes. For more information on columnar database in AWS, please refer to the below URL: https://aws.amazon.com/nosql/columnar/


10. A Redshift cluster currently contains 60TB of data. There is a requirement that a disaster recovery site is put in place in a region located 600km away. Which of the following solutions would help ensure that this requirement is fulfilled?

A. Take a copy of the underlying EBS volumes to S3, and then do Cross-Region Replication.

B. Enable Cross-Region snapshots for the Redshift Cluster.

C. Create a CloudFormation template to restore the Cluster in another region.

D. Enable Cross Availability Zone snapshots for the Redshift Cluster.

Answer

B. Enable Cross-Region snapshots for the Redshift Cluster.

For more information on managing Redshift snapshots, please visit the following URL: https://docs.aws.amazon.com/redshift/latest/mgmt/managing-snapshots-console.html


11. A retailer exports data daily from its transactional databases into an S3 bucket in the Sydney region. The retailer’s Data Warehousing team wants to import this data into an existing Amazon Redshift cluster in their VPC at Sydney. Corporate security policy mandates that data can only be transported within a VPC.

What combination of the following steps will satisfy the security policy? Choose 2 answers from the options given below.


A. Enable Amazon Redshift Enhanced VPC Routing.

B. Create a Cluster Security Group to allow the Amazon Redshift cluster to access Amazon S3.

C. Create a NAT gateway in a public subnet to allow the Amazon Redshift cluster to access Amazon S3.

D. Create and configure an Amazon S3 VPC endpoint.

Answer

A. & D. Amazon Redshift Enhanced VPC Routing provides VPC resources, the access to Redshift.

Redshift will not be able to access the S3 VPC endpoints without enabling Enhanced VPC routing, so one option is not going to support the scenario if another is not selected.

NAT instance (the proposed answer) cannot be reached by Redshift without enabling Enhanced VPC Routing.

https://aws.amazon.com/about-aws/whats-new/2016/09/amazon-redshift-now-supports-enhanced-vpc-routing/


12. You have set up a Redshift cluster in AWS and are trying to access it, but are unable to do so. What should be done so that you can access the Redshift Cluster?

A. Ensure the Cluster is created in the right Availability Zone.

B. Ensure the Cluster is created in the right region.

C. Change the security groups for the cluster.

D. Change the encryption key associated with the cluster.

Answer

C. Change the security groups for the cluster.

AWS Documentation mentions the following:

When you provision an Amazon Redshift cluster, it is locked down by default so nobody has access to it. To grant other users inbound access to an Amazon Redshift cluster, you associate the cluster with a security group.

For more information on Redshift Security Groups, please refer to the below URL:

https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-security-groups.html


13. Your company currently has an entire data warehouse of assets that needs to be migrated to the AWS Cloud. Which of the following services should this be migrated to?

A. AWS DynamoDB

B. AWS S3

C. AWS RDS

D. AWS Redshift

Answer

D. AWS Redshift

AWS Documentation mentions the following: Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. You can start with just a few hundred gigabytes of data and scale to a petabyte or more. This enables you to use your data to acquire new insights for your business and customers. For more information on AWS Redshift, please visit the following URL: https://docs.aws.amazon.com/redshift/latest/mgmt/welcome.html