Scripting

Agenty script is a C# program, written for a special run-time environment that automate the modification of agent output result or the input data entered by user, either by manually or selecting a URL list or source_agent. Scripts are often interpreted (rather than compiled). There are two types of scripts available to add with each agent:

  1. Pre-Processing
  2. Post-Processing

Debugging and Testing

Here is an open-source replication of Agenty scripting library in C# to allow developers to test and debug their pre-processing and post-processing scripts locally. This project has been open-sourced, and the code is available on Github and the package can also be installed from Nuget to test your scripts, before using them in Agenty to modify the agents input or result on the fly:

PM> Install-Package Agenty.scripting

Pre-Processing

Script to modify input on the fly : This script executes before the agent job starts the task and should be used to manipulate the agent input : -

//Get the agent default input as DataTable

var table = Agenty.Cloud.GetAgentInput("YOUR AGENT ID HERE");


// Write your code here to modify the input as DataTable
// Programming language : C#


table.SetAgentInput(); //Set the modified input

The Pre-Processing script helps to modify the agent input automatically. For example, if we want to extract the data from an airline website which require the Check In as current date and Check Out as plus 7 days in current date in query-string. We can write a Pre-processing script to modify the static input URL http://example.com/flight-search into dynamically generated url with current date and +7 days automatically on manual jobs or on scheduled jobs and can write a script to generate URL in format like:- http://example.com/flight-search?check-in=01\05\2018&check-out=01\12\2018

Modify URLs

In this example I will modify the manual input URLs to append the current date in the URL using the pre-processing script to modify the input before running the scraping agent:

// Get the current agent input as DataTable
var table = Agenty.Cloud.GetAgentInput("<Your Agent Id Here>");
table.Columns.Add("Field1");
// Append the date parameter to URL
foreach (DataRow row in table.Rows)
{
	row["Field1"] = $"{row["Field1"].ToString()}&date={DateTime.UtcNow.ToString("yyyy-MM-dd")}"; 
	
	// http://www.example.com/flight-search/?location=NYC&date=2019-04-21
	// http://www.example.com/flight-search/?location=GB&date=2019-04-21
}
// Set the table back to agent input
table.SetAgentInput();

Post-Processing

Script to modify output on the fly : This script executes after the agent job has been completed and before trigger, this script should be used to manipulate the output table : -

//Get the agent default output as DataTable

var table = Agenty.Cloud.GetAgentResult("YOUR AGENT ID HERE");


// Write your code here to modify the result from DataTable
// Programming language : C#


table.SetAgentResult(); //Set the modified output

The Post-Processing script helps to modify the output automatically. For example, if we have an Amount column in our agent result which has the USD word in it, but we want to replace the USD word with $. So, we can write a simple post-processing script to replace the USD with $.

Add a logical field

In this example, we will use the post-processing script to add a logical field in an agent output result, we can simply add a column in DataTable and then set the value of that field using a loop to set some value on each row.

For example, I have an agent with 3 fields (ProductName, Price, Tax), now I want to add a 4th field TotalPrice where Price + Tax should be the value in TotalPrice:

// Get the current agent result as DataTable
var table = Agenty.Cloud.GetAgentResult("<Your Agent Id Here>");
// Add the TotalPrice field in DataTable
table.Columns.Add("TotalPrice");
// Set TotalPrice field value
foreach(DataRow row in table.Rows)
{
    row["TotalPrice"] = Convert.ToDouble(row["Price"]) + Convert.ToInt32(row["Tax"]);	
}
// Set the table result back to agent result
table.SetAgentResult();

Remove duplicates(extension)

In this example we will remove the duplicate rows using 2 columns URL and TITLE in a web scraping agent result. This example uses an extension method RemoveAllDuplicatesRows which can be called to remove duplicate rows by passing the List<string> of column names. Or null, if all columns should be used to find and remove dupes.

// Get the current agent input as DataTable
var table = Agenty.Cloud.GetAgentResult("<Your Agent Id Here>");

// Key columns. By default all columns if null
List<string> keyColumns = new List<string> {"URL", "TITLE"};

// RemoveAllDuplicatesRows extension method
table.RemoveAllDuplicatesRows(keyColumns);

// Set the table back to agent input
table.SetAgentResult();

Remove duplicates(code)

In this example we will eliminate the duplicate rows using single columns PRODUCT_ID in a scraper result. This example doesn't use the extension method as in previous example, and we've written our own code to eliminate duplicates :

var table = Agenty.Cloud.GetAgentResult("<Your Agent Id Here>");

// Eliminate duplicates
ArrayList uniqueRecordsList = new ArrayList();
ArrayList duplicateRecordsList = new ArrayList();

// Check if records is already added to UniqueRecords otherwise,
// Add the records to DuplicateRecords
foreach (DataRow dRow in table.Rows)
{
	if (uniqueRecordsList.Contains(dRow["PRODUCT_ID"]))
		duplicateRecordsList.Add(dRow);
	else
		uniqueRecordsList.Add(dRow["PRODUCT_ID"]);
}

// Remove duplicate rows from DataTable added to DuplicateRecords
foreach (DataRow dRow in duplicateRecordsList)
{
	table.Rows.Remove(dRow);
}

table.SetAgentResult(); //Set the modified output

Delete rows conditionally

In this example we will delete some rows using if-else condition. Remember, we can't use the foreach loop to delete an item from a collection, because that collection has been changed and we cannot continue to enumerate through it. So, we will use the for loop instead:

var table = Agenty.Cloud.GetAgentResult("<Your Agent Id Here>");
for (int i = table.Rows.Count - 1; i >= 0; i--)
{
	DataRow row = table.Rows[i];
	string emailAddress = row["email_address"].ToString().Trim();
	// Delete the row if the email_address field length is less then 10
	if (emailAddress.Length < 10)
	{
		row.Delete();
	}
}
table.SetAgentResult();

Find keywords on webpage

We wrote this simple script for a client who was looking to create a web crawling report. They sent us 5000 websites URLs containing millions of pages - All they need is to set the HAS_KEYWORD value to yes/no by looking on each webpage, if any of the provided keyword is found on crawled pages.

var table = Agenty.Cloud.GetAgentResult("<Your Agent Id Here>");

// List of keywords to find on webpage
List<string> searchTerms = new List<string> {"Keyword 1", "Keyword 2", "Keyword 3"};
foreach(DataRow row in table.Rows)
{
	string pageContent = row["PAGE_BODY"].ToString();
	if (searchTerms.Any(pageContent.Contains))
	{
		row["HAS_KEYWORD"] = "Yes";
	}
	else
	{
		row["HAS_KEYWORD"] = "No";
	}
}
table.SetAgentResult();

Script like this can be used for many purposes, like: Brand monitoring, Trademark monitoring online etc...