![]() |
VOOZH | about |
Amazon AWS Glue is an ETL service designed to simplify the preparation and loading of data for storage and analytics purposes. By employing Glue Studio and CData Connect AI, you have the capability to construct ETL jobs without the need for coding or with minimal coding. These jobs can interact with data through the CData Glue Connector. This article provides a step-by-step guide on connecting to Twitter Ads via CData Connect AI and utilizing the CData Glue Connector to establish and execute an AWS Glue job that operates with real-time Twitter Ads data.
CData Connect AI offers a seamless cloud-to-cloud interface tailored for Twitter Ads, simplifying the direct access to live Twitter Ads data within AWS Glue jobs. All you need to do is employ the AWS Glue Connector and choose a table (or craft your custom SQL query). With its inherent optimized data processing capabilities, CData Connect AI efficiently channels all supported query operations, including filters, JOINs, and more, straight to Twitter Ads. This harnesses server-side processing to promptly retrieve Twitter Ads data for your ETL jobs.
This setup requires a CData Connect AI instance and the CData AWS Glue Connector. To get started, sign up a free trial of Connect AI and subscribe to the free Glue Connector for Connect AI.
Connectivity to Twitter Ads from AWS Glue is made possible through CData Connect AI. To work with Twitter Ads data from AWS Glue, we start by creating and configuring a Twitter Ads connection.
All tables require authentication. You must use OAuth to authenticate with Twitter. OAuth requires the authenticating user to interact with Twitter using the browser. For more information, refer to the OAuth section in the Help documentation.
π Configuring a connection (Salesforce is shown)When connecting to Connect AI through the REST API, the OData API, or the Virtual SQL Server, a Personal Access Token (PAT) is used to authenticate the connection to Connect AI. It is best practice to create a separate PAT for each service to maintain granularity of access.
With the connection configured and a PAT generated, you are ready to connect to Twitter Ads data from AWS Glue.
When you create the AWS Glue job, you specify an AWS Identity and Access Management (IAM) role for the job to use. The role must grant access to all resources used by the job, including Amazon S3 for any sources, targets, scripts, temporary directories, and AWS Glue Data Catalog objects. The role must also grant access to the CData Glue Connector for Twitter Ads from the AWS Glue Marketplace.
The following policies should be added to the IAM role for the AWS Glue job, at a minimum:
If you will be accessing data found in Amazon S3, add:
And lastly, if you will be using AWS Secrets Manager to store confidential connection properties (see more below), you will need to add an inline policy similar to the following, granting access to the specific secrets needed for the Glue Job:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetResourcePolicy",
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecretVersionIds"
],
"Resource": [
"arn:aws:secretsmanager:us-west-2:111122223333:secret:aes128-1a2b3c",
"arn:aws:secretsmanager:us-west-2:111122223333:secret:aes192-4D5e6F",
"arn:aws:secretsmanager:us-west-2:111122223333:secret:aes256-7g8H9i"
]
}
]
}
For more information about granting access to AWS Glue Studio and Glue Jobs, see Setting up IAM Permissions for AWS Glue in the AWS Glue documentation.
For more information about granting access to the Amazon S3 buckets, see Identity and access management in the Amazon Simple Storage Service Developer Guide.
For more information on setting up access control for your secrets, see Authentication and Access Control for AWS Secrets Manager in the AWS Secrets Manager documentation and Limiting Access to Specific Secrets in the AWS Secrets Manager User Guide. The credential retrieved from AWS Secrets Manager (a string of key-value pairs) is used in the JDBC URL used by the CData Glue Connector when connecting to the data source, as shown above.
To safely store and use your connection properties, you can save them in AWS Secrets Manager.
Note: You must host your AWS Glue ETL job and secret in the same region. Cross-region secret retrieval is not supported currently.
For more information about creating secrets, see Creating and Managing Secrets with AWS Secrets Manager in the AWS Secrets Manager User Guide.
To work with the CData Glue Connector for Twitter Ads in AWS Glue Studio, you need to subscribe to the Connector from the AWS Marketplace. If you have already subscribed to the CData Glue Connector for Twitter Ads, you can jump to the next section.
To use the CData Glue Connector for Twitter Ads in AWS Glue, you need to activate the subscribed connector in AWS Glue Studio. The activation process creates a connector object and connection in your AWS account.
Under Connection access, select the JDBC URL format and configure the connection. Below you will find sample connection string(s) for the JDBC URL format(s) available for Twitter Ads. You can read more about authenticating with Twitter Ads in the Help documentation for the Connector.
If you opted to store properties in the AWS Secrets Manager, leave the placeholder values (e.g. ${Property1}), otherwise, the values you enter in the AWS Glue Connection interface will appear in the (read-only) JDBC URL below the properties.
jdbc:cdata:Connect:AuthScheme=Basic;User=${Username};Password=${Password};defaultCatalog=${defaultCatalog}
If you want to log the functionality from the CData Glue Connector for Twitter Ads you will need to append two properties to the JDBC URL:
Once you have configured a Connection, you can build a Glue Job.
The visual job editor appears. A new Source node, derived from the connection, is displayed on the Job graph. In the node details panel on the right, the Source Properties tab is selected for user input.
You can configure the access options for your connection to the data source in the Source properties tab. Refer to the AWS Glue Studio documentation for more information. Here we provide a simple walk-through.
SELECT EntityId, Entity FROM TwitterAds1.TwitterAds.AdStats WHERE Entity = ORGANIC_TWEET
NOTE: Use the fully qualified domain for the source table, where the name of the connection in CData Connect AI is the catalog name and the name of the data source is the schema. For example: TwitterAds1.TwitterAds.AdStats.
π Configuring the Source node.See "Use the Connection in a Glue job using Glue Studio" for more information about these options.
Edit the job by adding and editing the nodes in the job graph. See Editing ETL jobs in AWS Glue Studio for more information.
After you complete editing the job, enter the job properties.
At any point in the job creation, you can click on the Script tab to review the script being created by Glue Studio. If you create a simple job to write Twitter Ads data to an Amazon S3 bucket, your script will look similar to the following:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node CData AWS Glue Connector for CData Connect
CDataAWSGlueConnectorforCDataConnect_node1 = (
glueContext.create_dynamic_frame.from_options(
connection_type="marketplace.jdbc",
connection_options={
"tableName": "TwitterAds1.TwitterAds.AdStats",
"dbTable": "TwitterAds1.TwitterAds.AdStats",
"connectionName": "cdata-cloud-connector",
},
transformation_ctx="CDataAWSGlueConnectorforCDataConnect_node1",
)
)
job.commit()
Using CData Connect AI and AWS Glue Connector for Connect AI in AWS Glue Studio, you can easily create ETL jobs to load Twitter Ads data into an S3 bucket or any other destination.
To get live data access to hundreds of SaaS, Big Data, and NoSQL sources directly from your cloud applications, try CData Connect AI today!
Learn more about CData Connect AI or sign up for free trial access:
Free Trial