Using the REST API connector as a data source in Informatica Cloud
From my archive - originally published on 15 October 2015
It can be pretty straightforward to read REST API resources using Informatica Cloud – so long as you know which levers to pull and your data is in a sympathetic format.
Formatting an API payload for Informatica
If you can control the output from the REST API then you should aim to return completely flat, de-normalised payloads. Informatica is a row-based batch processing engine at heart so is easier to work with if fed this form of data. More involved integration jobs tend to involve putting data into a flat, de-normalized structure so it can be mapped onto target data more easily.
If you try and feed in a nested data structure then Informatica may struggle to read past the second level of data. Even if you can model the hierarchy in a data source then Informatica will immediately flatten any data it reads.
It’s also worth considering what transformations you’ll want to perform on a regular basis and including this data in the payload. For example, you can save yourself a lot of unnecessary fiddling with aggregations and joins by returning commonly-used grouped totals in your results.
The size of payload can also be significant when setting up integrations. Informatica will call the underlying service at least twice when you are configuring mappings and tasks. If the method takes a long time to execute then this can become a pretty tedious process, though you can specify a sample response file that is picked up when configuring mappings and tasks.
Note that larger payloads can cause memory problems with the REST connector where jobs fail with the following error:
[ERROR] java.lang.OutOfMemoryError: Java heap space
This indicates that you need to increase the memory allocated to Apache Tomcat which the secure agent runs under. Don’t try and fiddle around with Tomcat configuration files to do this, you can change the settings in the Informatica Cloud UI so they are propagated to the agent. This is documented in the Informatica community pages here.
Setting the connection parameters
The REST connector is designed to provide a generic connection that can be controlled through a set of configuration files that sit on the runtime environment. These configuration files let you control the URL, query string, headers and credentials each time the connection is used in a task or mapping.
The idea is that you have different sets of these configuration files sitting on the server and select them via a data filter in a mapping or task. Setting this up can be a little fiddly and the documentation isn’t very explicit about how this is done. The following walk though shows how to set up a basic data synchronisation that reads from a parameterised REST connection.
There are three main steps involved:
- Setting up a connection that can be used for any attempt to read from the API
- Setting up a connection for the customer destination file
- Create a data synchronisation task that picks up a customer-specific API configuration and maps it to the destination file
Step one: setting up the REST API connection
To set up a REST connection you need to enter some example connection details and create a configuration file template. You can then make different copies of this configuration when using the connection in mappings and tasks. This example just sets the URL templates but the process for setting headers and authentication is much the same.
To create the new connection, select Configure -> Connections from the menu and click on “New”:
- Enter a name and description (you always add descriptions to your objects, right…?)
- Select “ReST (Informatica Cloud Labs)” as the connection type
- Select a run-time environment whose secure agent can access the API – the configuration file will be copied onto this machine
- Enter the base URL for the resource without any query string arguments
- Ensure that “Is BaseUrl Dynamic” is CHECKED – this will let you set parameterize the URL in a configuration file
- Add any query string parameters to the “Url request parameters” field using the following format:
Remember to include the final semi-colon.
- Don’t forget to select the correct media type – Informatica recommend that you use “application/ison” where you have a choice.
- Select the request and authentication type – we will use GET with no authentication for this example.
- In the “Response folder path” field enter the full path on the run-time environment where you want to store payloads fetched from the API (and beware of the data that accumulates here as Informatica never culls it…)
- If you have a sample response file then you should enter the full path in the “Sample Response XML or JSON File” field – this will save you a lot of time when creating mappings and tasks
- In the “URL Input Parameters Config File Name” field enter the full path of the configuration file you want to create
- Ensure that “Create the config CSV file” is set to “YES”.
If you click on the “TEST” button then a configuration file will be created in the location you specified in the “URL Input Parameters Config File Name” field.
To save the connection, set the “Create the config CSV file” field to “NO” and click on the “OK” button.
Note that setting up the connection does not actually contact the remote service. This doesn’t happen until you try and use the connection in a mapping or task.
Step two: Setting up the target file
A Flat File connection creates a mapping to a nominated directory on a runtime environment. This directory is fixed, though you can select different files to use when using the connection in different mappings and tasks.
To create the new connection, select Configure -> Connections from the menu and click on “New“:
- Enter a name and description
- Select “Flat File” as the connection type.
- Select a run-time environment for the connection.
- Enter the full path for the files that this connection will use in the “Directory” field. Note that this path cannot be parameterized in mappings and tasks.
Click on the “Test” button to ensure that you have entered the details correctly and “OK” to save the connection.
Step three: Using a synchronisation task to read from the API
Firstly, create a copy of the configuration file created in step one. We will use this to set the URL to use in the synchronisation task.
To create the new task, select Apps -> Data Synchronization from the menu and click on “New”:
- Enter a name and description
- Select “Insert” as the operation – i.e. the file will be re-created with all the data found in the source.
- Click on “Next” and select the REST connector created in step one as the source (note that the API will actually be called for the first time at this point)
- Ensure “Single” is selected for the “Source type” and select “JsonRoot” as the “Source object“. This is the default name that Informatica gives the payload. If you have a nested payload you can select “multiple” and set up the structure of the payload here. However, Informatica may struggle to read deep hierarchies so it’s best to use flat payloads where possible.
- Note that Informatica will query the API again at this point to fetch a data preview.
- Click on “Next” and select the flat file connector as the target.
- Select the file that you want to export the data to as the “Target object” and click on “Next” to go to the “Data filters” step on the wizard.
The “Data filters” step is where you can specify the new configuration file by creating a new data filter:
- Click on “New…” and the data filter dialog will appear
- Select “JsonRoot” as the object and the “Filter by” drop-down should be pre-populated with all the filter options and their slightly arcane codes
- To specify a new URL configuration file, select “_FLT_URL_Input_Parameters_Config_File_Path“from “Filter by” and enter the full path of the file you want to use in the text box.
- Click on “OK” to save the setting – it should appear in the list of data filters.
- Click on “Next” to move to the data mappings screen.
From here you should be able to set up the field mappings, save the task and run it. When the task runs it will use the URL settings specified in the new configuration file.