Peter Bell

7 minute read

Stream Processing (Part 1)

In this series, I’ve mentioned the Stream class, which is the way GlideQuery users process multiple records and groups returned by the select method. Stream is similar to Java’s Stream class and C#’s IEnumerable interface.

In all the examples so far, I’ve only used Stream’s forEach method, as that’s one of the most common Stream methods available. There are, however, other powerful Stream methods available that can enhance your code. In this article, I’ll go through a few of the most popular ones and discuss the difference between lazy and eager methods and how Stream’s laziness is a good thing.

map

map is used to transform each record in the Stream into something else. Transforming data is a fundamental part of writing software, and Stream takes a functional approach to data transformations. Stream’s map method is very similar to Array’s map:

var names = ['Bob', 'Sue', 'Joe'];
var capNames = names
    .map(function (name) { return name.toUpperCase(); })
    .forEach(gs.info);

// BOB
// SUE
// JOE

new GlideQuery('sys_user')
    .whereNotNull('first_name')
    .select('first_name')
    .map(function (user) { return user.first_name.toUpperCase(); })
    .forEach(gs.info);

// MICHAEL
// FRED
// SARAH

map is given its name because it takes a function which “maps” a given value to another value. In our GlideQuery example, we’re converting sys_user objects (which first_name field) into strings (uppercased). map returns a new Stream object and follows the fluent syntax you saw in many of GlideQuery’s methods (like where). This means you can chain multiple mappings together. As a side note, you may have noticed I passed the call to gs.info to forEach instead of using a string literal like

.forEach(function (name) {
    gs.info(name);
});

This is because info is a function on the gs object, and it’s redundant to use an anonymous JavaScript function.

filter

Filtering data is a typical pattern every developer uses regularly, and Stream supports filtering, just as Array does. filter requires a predicate function, a function that takes a parameter and returns true or false.

Warning! Stream’s filtering is performed in JavaScript after the SQL query has already been executed. This means that, when possible, you should always first do any filtering possible using where at the GlideQuery level (before calling select) before you decide to use filter at the Stream level. Doing otherwise can result in less performance. However, there are times when where (or having) can’t be used. Three examples are:

  1. having can only filter groups by a numeric value, so if you wanted to filter groups by date, for example, you could use filter.
  2. When using flatMap (more on this in the future), you may need to filter data based on a Stream and its child Stream.
  3. You may need to use filter when special logic is needed, which can’t be expressed in a where clause.

To demonstrate that last example:

var hasBadPassword = function (user) {
    return !user.user_password
        || user.user_password.length < 10
        || user.user_password === user.last_password
        || !/\d/.test(user.user_password) // no numbers
        || !/[a-z]/.test(user.user_password) // no lowercase letters
        || !/[A-Z]/.test(user.user_password); // no uppercase letters
};

new GlideQuery('sys_user')
    .select('email', 'user_password', 'last_password')
    .filter(hasBadPassword)
    .forEach(notifyUserToChangePassword);

Here I created a predicate function hasBadPassword, which returns whether a user’s password is suitable or not. Like map, filter can be chained as it returns a Stream containing the filtered data.

limit

To simply limit the number of records returned by a Stream, you can call limit. Like filter, some precaution must be taken. The GlideQuery class has a limit method as well, and GlideQuery performs the limiting in the underlying SQL query (as does GlideRecord), which is more performant. So you should prefer to call limit before you call select, unless it’s not an option. So for example:

new GlideQuery('task')
    .orderBy('priority')
    .limit(10) // Good: calling GlideQuery's limit method
    .select('assigned_to', 'priority', 'description')
    .forEach(generateReport);

new GlideQuery('task')
    .orderBy('priority')
    .select('assigned_to', 'priority', 'description')
    .limit(10) // Bad: calling Stream's limit method
    .forEach(generateReport);

In the first example, the filtering is done by the SQL database (e.g., MySQL). In contrast, in the second example, the filtering is being done in JavaScript, which is considerably slower when dealing with large data sets. So why have limit at all if it can be misused like this? There are special cases when Stream’s limit method should be used:

  1. When using flatMap, you may have more records than exist in the main table you’re querying. Again, more on flatMap in the future.
  2. You may be using Stream generically (without GlideQuery). This is a more advanced topic that we’ll cover in the future.

find

Like Array, Stream has a find method, which returns the first item in the Stream which matches a given predicate. Unlike Array’s find:

  • find returns an Optional, which is empty if no item in the Stream could be found that matches the predicate, or the Stream is empty.
  • The first item in the Stream, wrapped in an Optional, is returned if no predicate is given.
new GlideQuery('task')
    .select('description')
    .find(descriptionIsInSpanish) // returns an Optional
    .ifPresent(assignToSpanishTeam) // This is an Optional method

In the above example, we have some function named descriptionIsInSpanish which somehow determines whether the language of the task’s description is in Spanish (perhaps through a 3rd party API). This function takes a task object containing the description. If the function returns true, then we call the assignToSpanishTeam, which takes the same task object (which contains the sys_id of the task, as that’s included by default by select). However, if no task is found with a Spanish description, then find will return an empty Optional, and assignToSpanishTeam will never get called. We’ll talk more about Optional in a future blog post.

Intermediate/Terminal Functions and Laziness

Stream has two categories of methods: intermediate and terminal. Intermediate functions are functions that return another Stream, allowing a fluent style of syntax where we chain calls together. Terminal functions are functions that don’t return a Stream; they stop the chain of Stream method calls. Examples of intermediate functions are map, filter, and limit. Examples of terminal functions are forEach, find, and reduce.

It’s important to realize that intermediate functions are lazy. That is, they won’t actually do any mapping, filtering, or limiting (for example) until a terminal function is called at the end of the function chain.

new GlideQuery('sys_user')
    .whereNotNull('name')
    .select('name')
    .map(function (user) { return user.name.toUpperCase(); })
    .filter(function (name) { return name.length > 20; })
    // no processing will occur until this next line is uncommented
    // .forEach(gs.info);

Intermediate methods build up the “plan” of what we wish to occur, but until we finally use a terminal function, nothing will happen. In fact, we won’t even query the database until reduce, forEach, or some other terminal function completes the function chain. If you’re using select but find that nothing seems to be happening, ensure that you’re using a terminal function at the end.

This is where Stream and JavaScript’s Array differ. Array’s version of map and filter are eager, not lazy. When map is called on an Array, JavaScript immediately loops through the Array, calling the mapping function on each item, and putting the results into a newly allocated array. All the items within a JavaScript Array are located entirely in memory, whereas records in a Stream are only fetched as they are needed. This property of laziness is what allows us to process a Stream as a single collection similarly to an Array, without having to allocate thousands (possibly billions?) of records in memory all at once. Doing so would kill our performance and possibly the entire instance. Streams can actually be infinitely long (we’ll get into that in part 2), which isn’t possible with an Array, unless you had an infinite amount of memory and processing power.

You may have noticed that the GlideQuery class itself also has intermediate and terminal functions. Methods like where, orderBy, and disableWorkflow are themselves intermediate functions which return a new GlideQuery object. Likewise, GlideQuery’s most popular terminal functions, select and selectOne are terminal functions: they are called when our GlideQuery is done being configured and we’re ready to start processing record(s):

new GlideQuery('task')
    .where('active', true) // returns GlideQuery
    .orderBy('priority') // returns GlideQuery
    .select('description', 'assigned_to') // Terminal function: returns Stream
    .filter(descriptionIsInSpanish) // returns Stream
    .map(translateDescription) // return Stream
    .forEach(notifyAssignedUser); // Terminal function (we're done)

Conclusion

We’ve gone through some Stream methods for processing data and discussed the difference between lazy and eager methods. It’s important to remember:

  • Only use filter when where isn’t an option
  • Prefer GlideQuery’s limit over Stream’s
  • Stream’s intermediate functions are lazy, but terminal functions aren’t.

In the following article, we’ll cover reduce, chunk, and flatMap!


Comments