Advanced Analytics – Presto Functions and Operators Quick Review

This post is a lot different from our earlier entries. Think of it as a reference flag post for people interested in a quick lookup for advanced analytics functions and operators used in modern data lake operations based on Presto. So you could, of course, use it in Presto installations, but also in some other commercial products such as AWS Athena that is used widely these days to facilitate analytic operations on Enterprise Data Lakes built on top of Amazon S3.

Without further ado, let’s dive straight into the nuts and bolts of these queries for advanced analytics:

JSON Functions

is_json_scalar(json) → boolean

  • Determines if json is a scalar (i.e. a JSON number, a JSON string, true, false or null).
  • Example:
QueryOutput
SELECT is_json_scalar(‘1’)true
SELECT is_json_scalar(‘[1, 2, 3]’)false

json_array_contains(json, value) → boolean

  • Determines if value exists in json (a string containing a JSON array)
  • Example:
QueryOutput
SELECT json_array_contains(‘[1, 2, 3]’, 2)true

json_array_get(json_array, index) → json

  • Example:
QueryOutput
SELECT json_array_get(‘[“a”, [3, 9], “c”]’, 0)a
SELECT json_array_get(‘[“a”, [3, 9], “c”]’, 10)null
SELECT json_array_get(‘[“c”, [3, 9], “a”]’, -2)JSON ‘[3,9]’

json_array_length(json) → bigint

  • Returns the array length of json (a string containing a JSON array)
  • Example:
QueryOutput
SELECT json_array_length(‘[1, 2, 3]’)3

json_extract(json, json_path) → json

  • Evaluates the JSONPath-like expression json_path on json (a string containing JSON) and returns the result as a JSON string
  • Example:
QueryOutput
SELECT json_extract(json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [5678908, 587578575, 668798]}’), ‘$.email.abc’)“xyz@yahoo.com”

json_extract_scalar(json, json_path) → varchar

  • Just like json_extract(), but returns the result value as a string (as opposed to being encoded as JSON). The value referenced by json_path must be a scalar (boolean, number or string)
  • Example:
QueryOutput
SELECT json_extract_scalar(json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [5678908, 587578575, 9999999]}’), ‘$.phone_numbers[0]’)9999999
SELECT json_extract_scalar(json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’), ‘$.phone_numbers[1].mob’)587578575
SELECT json_extract_scalar(json_parse(‘{“email”: “xyz@yahoo.com”, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’), ‘$[email]’)xyz@yahoo.com

json_parse(string) → json

  • Returns the JSON value deserialized from the input JSON text. This is an inverse function to json_format()
  • Example:
QueryOutput
SELECT json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’){“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}

json_size(json, json_path) → bigint

  • Just like json_extract(), but returns the size of the value. For objects or arrays, the size is the number of members, and the size of a scalar value is zero.
  • Example:
QueryOutput
SELECT json_size(json_parse(‘{“email”: “xyz@yahoo.com”, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’), ‘$.phone_numbers’)3
SELECT json_size(‘{“x”: {“a”: 1, “b”: 2}}’, ‘$.x.a’)0

Date and Time Functions and Operators

current_date → date

  • Returns the current date as of the start of the query.

current_time → time with time zone

  • Returns the current time as of the start of the query. (with UTC)

current_timestamp → timestamp with time zone

  • Returns the current timestamp as of the start of the query.

current_timezone() → varchar

  • Returns the current time zone in the format defined by IANA (e.g., America/Los_Angeles) or as fixed offset from UTC (Example: +08:35)

date(x) → date

  • This is an alias for CAST(x AS date).
  • Example:
QueryOutput
SELECT DATE(‘2019-08-07’)SELECT CAST(‘2019-08-07’ AS DATE);

now() → timestamp with time zone

  • This is an alias for current_timestamp.
  • SELECT now();

date_trunc(unit, x) → [same as input]

  • Returns x truncated to unit.
  • Example:
QueryOutput
SELECT date_trunc(‘second’, current_timestamp)2019-08-16 06:51:29.000 UTC (returns value upto unit)
SELECT date_trunc(‘minute’, current_timestamp)2019-08-16 06:54:00.000 UTC

Interval Functions

UnitDescription
millisecondMilliseconds
secondSeconds
minuteMinutes
hourHours
dayDays
weekWeeks
monthMonths
quarterQuarters of a year
yearYear

date_add(unit, value, timestamp) → [same as input]

  • Adds an interval value of type unit to timestamp. Subtraction can be performed by using a negative value.
  • Example:
QueryOutput
SELECT date_add(‘month’, 1, current_timestamp)2019-09-16 06:59:55.425 UTC
SELECT date_add(‘day’, 1, current_timestamp)2019-08-17 07:01:26.834 UTC

date_diff(unit, timestamp1, timestamp2) → bigint

  • Returns timestamp2 – timestamp1 expressed in terms of unit.
  • Example:
QueryOutput
SELECT date_diff(‘day’, current_timestamp, date_add(‘day’, 10, current_timestamp))10

parse_duration(string) → interval

  • Parses string of format value unit into an interval, where value is fractional number of unit values
  • Example:
QueryOutput
SELECT parse_duration(‘42.8ms’)0 00:00:00.043
SELECT parse_duration(‘3.81 d’)3 19:26:24.000
SELECT parse_duration(‘5m’)0 00:05:00.000

date_format(timestamp, format) → varchar

  • Formats timestamp as a string using format (converts timestamp to string)
  • Example:
QueryOutput
SELECT date_format(current_timestamp, ‘%Y-%m-%d’)2019-08-16

date_parse(string, format) → timestamp

  • Parses string into a timestamp using format. (converts string to timestamp)
  • Example:
QueryOutput
SELECT date_parse(‘2019-08-16’, ‘%Y-%m-%d’)2019-08-16 00:00:00.000

day(x) → bigint

  • Returns the day of the month from x.
  • Example:
QueryOutput
SELECT day(date_parse(‘2019-08-16’, ‘%Y-%m-%d’))16

day_of_month(x) → bigint

  • This is an alias for day().

day_of_week(x) → bigint

  • Returns the ISO day of the week from x. The value ranges from 1 (Monday) to 7 (Sunday).
  • Example:
QueryOutput
SELECT day_of_week(date_parse(‘2019-08-16’, ‘%Y-%m-%d’))5

year(x) → bigint

  • Returns the year from x.

Aggregate Functions

array_agg(x) → array<[same as input]>

  • Returns an array created from the input x elements
  • The array_agg() function is an aggregate function that accepts a set of values and returns an array in which each value in the input set is assigned to an element of the array.
  • Syntax:
    array_agg(expression [ORDER BY [sort_expression {ASC | DESC}], […])
  • Example:
Query
SELECT title, array_agg (first_name || ‘ ‘ || last_name) actors FROM film
SELECT title, array_agg (first_name || ‘ ‘ || last_name ORDER BY first_name) actors FROM film

avg(x) → double

  • Returns the average (arithmetic mean) of all input values

bool_and(boolean) → boolean

  • Returns TRUE if every input value is TRUE, otherwise FALSE.

bool_or(boolean) → boolean

  • Returns TRUE if any input value is TRUE, otherwise FALSE.

count(*) → bigint

  • Returns the number of input rows.

count(x) → bigint

  • Returns the number of non-null input values.

count_if(x) → bigint

  • Returns the number of TRUE input values. This function is equivalent to count(CASE WHEN x THEN 1 END).

arbitrary(x) → [same as input]

  • Returns an arbitrary non-null value of x, if one exists.
  • Arbitrary chooses one value out of a set of values. arbitrary is useful for silencing warnings about values neither grouped by or aggregated over.

max_by(x, y) → [same as x]

  • Returns the value of x associated with the maximum value of y over all input values.
  • The max_by takes two arguments and returns the value of the first argument for which the value of the second argument is maximized.
  • If multiple rows maximize the result of the second value, and arbitrary first value is chosen from among them. max_by can be used with both numeric and non-numeric data.
  • Example:
QueryOutput
SELECT max_by(close_date, close_value) as date_of_max_sale FROM sales_pipelinequery returns the date where close_value is maximum

max_by(x, y, n) → array<[same as x]>

  • Returns n values of x associated with the n largest of all input values of y in descending order of y.

min_by(x, y) → [same as x]

  • Returns the value of x associated with the minimum value of y over all input values.

min_by(x, y, n) → array<[same as x]>

  • Returns n values of x associated with the n smallest of all input values of y in ascending order of y.

max(x, n) → array<[same as x]>

  • Returns n largest values of all input values of x.

min(x) → [same as input]

  • Returns the minimum value of all input values.

min(x, n) → array<[same as x]>

  • Returns n smallest values of all input values of x.

reduce_agg(inputValue T, initialState S, inputFunction(S, T, S), combineFunction(S, S, S)) → S

  • Reduces all input values into a single value. inputFunction will be invoked for each input value. In addition to taking the input value, inputFunction takes the current state, initially initialState, and returns the new state. combineFunction will be invoked to combine two states into a new state. The final state is returned
  • Example:
QueryOutput
SELECT id, reduce_agg(value, 0, (a, b) -> a + b, (a, b) -> a + b) FROM (VALUES (1, 2) (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id(1, 9), (2, 90)

Array Functions and Operators

Subscript Operator: [ ]

  • The [ ] operator is used to access an element of an array and is indexed starting from one:
  • If a column has values like [1, 3], [45, 46]
  • Example:
QueryOutput
SELECT column[1]1, 45

Concatenation Operator: ||

  • The || operator is used to concatenate an array with an array or an element of the same type.
  • Example:
QueryOutput
SELECT ARRAY [1] || ARRAY [2][1, 2]
SELECT ARRAY [1] || 2[1, 2]
SELECT 2 || ARRAY [1][2, 1]

array_distinct(x) → array

  • Remove duplicate values from the array x.
  • Example:
QueryOutput
SELECT array_distinct(ARRAY[1, 1, 1, 2])[1, 2]

array_intersect(x, y) → array

  • Returns an array of the elements in the intersection of x and y, without duplicates.
  • Common elements in two arrays without duplicates.
  • Example:
QueryOutput
SELECT array_intersect(ARRAY[1, 1, 1, 2], ARRAY[10, 15, 1, 100])[1]

array_union(x, y) → array

  • Returns an array of the elements in the union of x and y, without duplicates.
  • Example:
QueryOutput
SELECT array_union(ARRAY[1, 1, 1, 2], ARRAY[10, 15, 1, 100])[1, 2, 10, 15, 100]

array_except(x, y) → array

  • Returns an array of elements in x but not in y, without duplicates.
  • Example:
QueryOutput
SELECT array_except(ARRAY[1, 1, 1, 2], ARRAY[10, 15, 1, 100])[2]

array_join(x, delimiter, null_replacement) → varchar

  • Concatenates the elements of the given array using the delimiter and an optional string to replace nulls.
  • Example:
QueryOutput
SELECT array_join(ARRAY[1, 71, 81, 92, null], ‘/’, ‘abcd’)1/71/81/92/abcd

array_max(x) → x

  • Returns the maximum value of input array.
  • Example:
QueryOutput
SELECT array_max(ARRAY[1, 71, 81, 92, 100])100
SELECT array_max(ARRAY[1, 71, 81, 92, null])null

array_min(x) → x

  • Returns the minimum value of input array.
  • Example:
QueryOutput
SELECT array_min(ARRAY[1, 71, 81, 92, 100])1

array_position(x, element) → bigint

  • Returns the position of the first occurrence of the element in array x (or 0 if not found).
  • Example:
QueryOutput
SELECT array_position(ARRAY[1, 71, 81, 92, 100], 100)5

array_remove(x, element) → array

  • Remove all elements that equal element from array x.
  • Example:
QueryOutput
SELECT array_remove(ARRAY[1, 71, 81, 92, 100, 1], 1)[71, 81, 92, 100]

array_sort(x) → array

  • Sorts and returns the array x. The elements of x must be orderable. Null elements will be placed at the end of the returned array.
  • Example:
QueryOutput
SELECT array_sort(ARRAY[1, null, 8, 9, 71, 81, 92, 100, 12, 51, 10, 7, 1, null])[1, 1, 7, 8, 9, 10, 12, 51, 71, 81, 92, 100, null, null]

array_sort(array(T), function(T, T, int)) → array(T)

  • Sorts and returns the array based on the given comparator function. The comparator will take two nullable arguments representing two nullable elements of the array. It returns -1, 0, or 1 as the first nullable element is less than, equal to, or greater than the second nullable element. 
  • If the comparator function returns other values (including NULL), the query will fail and raise an error.
  • Example:
QueryOutput
SELECT array_sort(ARRAY [3, 2, 5, 1, 2], (x, y) -> IF(x < y, 1, IF(x = y, 0, -1)))[5, 3, 2, 2, 1] 

cardinality(x) → bigint

  • Returns the cardinality (size) of the array x.
  • Example:
QueryOutput
SELECT cardinality(ARRAY[1, 81, 92, 100, 12, 51, 10])7

arrays_overlap(x, y) → boolean

  • Tests if arrays x and y have any any non-null elements in common. Returns null if there are no non-null elements in common but either array contains null.
  • Example:
QueryOutput
SELECT arrays_overlap(ARRAY[1, 81, 92, 100, 12, 51, 10], ARRAY[101])false
SELECT arrays_overlap(ARRAY[1, 81, 92, 100, 12, 51, 10], ARRAY[101, 51])true

concat(array1, array2, …, arrayN) → array

  • Concatenates the arrays array1, array2, …, arrayN. This function provides the same functionality as the SQL-standard concatenation operator (||).

contains(x, element) → boolean

  • Returns true if the array x contains the element.
  • Example:
QueryOutput
SELECT contains(ARRAY[1, 81, 92, 100, 12, 51, 10], 1)true
SELECT contains(ARRAY[‘abcd’, ‘test’, ‘xyz’], ‘xyz’)true

element_at(array(E), index) → E

  • SQL array indices start at 1
  • Returns element of array at given index. If index > 0, this function provides the same functionality as the SQL-standard subscript operator ([]). If index < 0, element_at accesses elements from the last to the first.
  • Example:
QueryOutput
SELECT contains(ARRAY[1, 81, 92, 100, 12, 51, 10], 2)81

filter(array(T), function(T, boolean)) → array(T)

  • Constructs an array from those elements of array for which function returns true.
  • Example:
QueryOutput
SELECT filter(ARRAY [5, -6, NULL, 7],  x -> x > 0)[5, 7]

flatten(x) → array

  • Flattens an array(array(T)) to an array(T) by concatenating the contained arrays.

reduce(array(T), initialState S, inputFunction(S, T, S), outputFunction(S, R)) → R

  • Returns a single value reduced from array. inputFunction will be invoked for each element in array in order. 
  • In addition to taking the element, inputFunction takes the current state, initially initialState, and returns the new state. outputFunction will be invoked to turn the final state into the result value.
  • It may be the identity function (i -> i).
  • Example:
QueryOutput
SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + COALESCE(x, 0), s -> s)75
SELECT reduce(ARRAY [5, 20, NULL, 50], 1, (s, x) -> s * COALESCE(x, 1), s -> s)5000

repeat(element, count) → array

  • Repeat element for count times

reverse(x) → array

  • Returns an array which has the reversed order of array x.
  • Example:
QueryOutput
SELECT reverse(ARRAY[2, 5])[5, 2]

sequence(start, stop) → array(bigint)

  • Generate a sequence of integers from start to stop, increments by 1 if start is less than or equal to stop, otherwise decrements by 1.
  • Example:
QueryOutput
SELECT sequence(2, 7)[2, 3, 4, 5, 6, 7]
SELECT sequence(7, 1)[7, 6, 5, 4, 3, 2, 1]

sequence(start, stop, step) → array(bigint)

  • Generate a sequence of integers from start to stop, incrementing by step.

sequence(start, stop) → array(date)

  • Generate a sequence of dates from start date to stop date, incrementing by 1 day if start date is less than or equal to stop date, otherwise -1 day.

sequence(start, stop, step) → array(date)

  • Generate a sequence of dates from start to stop, incrementing by step. The type of step can be either INTERVAL DAY TO SECOND or INTERVAL YEAR TO MONTH.

shuffle(x) → array

  • Generate a random permutation of the given array x.
  • Example:
QueryOutput
SELECT shuffle(ARRAY[1, 7, 2])[7, 1, 2]

slice(x, start, length) → array

  • Subsets array x starting from index start (or starting from the end if start is negative) with a length of length.
  • Example:
QueryOutput
SELECT slice(ARRAY[1, 7, 2, 87, 12, 09], 2, 4)[7, 2, 87, 12]

transform(array(T), function(T, U)) → array(U)

  • Returns an array that is the result of applying function to each element of array
  • Example:
QueryOutput
SELECT transform(ARRAY [5, NULL, 6], x -> COALESCE(x, 0) + 1)[6, 1, 7]
SELECT transform(ARRAY [‘x’, ‘abc’, ‘z’], x -> x || ‘0’)[‘x0’, ‘abc0’, ‘z0’]

zip(array1, array2[, …]) → array(row)

  • Merges the given arrays, element-wise, into a single array of rows. The M-th element of the N-th argument will be the N-th field of the M-th output element. If the arguments have an uneven length, missing values are filled with NULL.
  • Example:
QueryOutput
SELECT zip(ARRAY[1, 2], ARRAY[‘1b’, null, ‘3b’])[ROW(1, ‘1b’), ROW(2, null), ROW(null, ‘3b’)]

zip_with(array(T), array(U), function(T, U, R)) → array(R)

  • Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function:
  • Example:
QueryOutput
SELECT zip_with(ARRAY[1, 3, 5], ARRAY[‘a’, ‘b’, ‘c’], (x, y) -> (y, x))[ROW(‘a’, 1), ROW(‘b’, 3), ROW(‘c’, 5)]

Window functions

Window functions perform calculations across rows of the query result. They run after the HAVING clause but before the ORDER BY clause. Invoking a window function requires special syntax using the OVER clause to specify the window. A window has three components:

  1. The partition specification, which separates the input rows into different partitions. This is analogous to how the GROUP BY clause separates rows into different groups for aggregate functions.
  2. The ordering specification, which determines the order in which input rows will be processed by the window function.
  3. The window frame, which specifies a sliding window of rows to be processed by the function for a given row. If the frame is not specified, it defaults to RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This frame contains all rows from the start of the partition up to the last peer of the current row.

rank() → bigint

  • Returns the rank of a value in a group of values. The rank is one plus the number of rows preceding the row that are not peer with the row. Thus, tie values in the ordering will produce gaps in the sequence. The ranking is performed for each window partition.
  • As shown below, the rank function produces a numerical rank within the current row’s partition for each distinct ORDER BY value, in the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.
  • Query:
WITH dataset AS 
    (SELECT 'name' AS rows, ARRAY['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'e'] AS sample )
SELECT rows,
         value,
         rank()
    OVER (ORDER BY value) AS rank
FROM dataset
CROSS JOIN UNNEST(sample) AS t(value)
  • Output:
 rowsvaluerank
1namea1
2namea1
3namea1
4nameb4
5nameb4
6namec6
7namec6
8named8
9namee9

row_number() → bigint

  • returns a unique, sequential number for each row, starting with one, according to the ordering of rows within the window partition.
  • Query:
WITH dataset AS 
    (SELECT 'name' AS rows, ARRAY['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'e'] AS sample )
SELECT rows,
         value,
         row_number()
    OVER (ORDER BY value) AS row_num
FROM dataset
CROSS JOIN UNNEST(sample) AS t(value)
  • Output:
 rowsvaluerow_num
1namea1
2namea2
3namea3
4nameb4
5nameb5
6namec6
7namec7
8named8
9namee9

dense_rank() → bigint

  • returns the rank of a value in a group of values. This is similar to rank(), except that tie values do not produce gaps in the sequence.
  • Query: 
WITH dataset AS 
    (SELECT 'name' AS rows, ARRAY['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'e'] AS sample )
SELECT rows,
         value,
         dense_rank()
    OVER (ORDER BY value) AS dense_ran
FROM dataset
CROSS JOIN UNNEST(sample) AS t(value)
  • Output: 
 rowsvaluedense_ran
1namea1
2namea1
3namea1
4nameb2
5nameb2
6namec3
7namec3
8named4
9namee5

Map Functions and Operators

map(array(K), array(V)) → map(K, V)

  • Returns a map created using the given key/value arrays.
  • Example:
QueryOutput
SELECT map(ARRAY[1,3], ARRAY[2,4]) AS a{1=2, 3=4}
SELECT map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’]){k1=value1, k2=value2}

Subscript Operator: [ ]

  • The [ ] operator is used to retrieve the value corresponding to a given key from a map.
  • Example:
QueryOutput
SELECT map(ARRAY[1,3], ARRAY[2,4]) AS a{1=2, 3=4}
SELECT a[1] FROM (SELECT map(ARRAY[1,3], ARRAY[2,4]) AS a)2

element_at(map(K, V), key) → V

  • Returns value for given key, or NULL if the key is not contained in the map.
  • Example:
Query
SELECT element_at(a, ‘k1’) FROM (SELECT map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’]) AS a)

cardinality(x) → bigint

  • Returns the cardinality (size) of the map x.

map() → map<unknown, unknown>

  • Returns an empty map.
  • Example:
QueryOutput
SELECT map(){}

map_from_entries(array(row(K, V))) → map(K, V)

  • Returns a map created from the given array of entries.
  • Example:
QueryOutput
SELECT map_from_entries(ARRAY[(1, ‘x’), (2, ‘y’)]){1 -> ‘x’, 2 -> ‘y’}

map_agg(key, value) → map(K, V)

  • Returns a map created from the input key/value pairs.

multimap_from_entries(array(row(K, V))) → map(K, array(V))

  • Returns a multimap created from the given array of entries. Each key can be associated with multiple values.
  • Example:
QueryOutput
SELECT multimap_from_entries(ARRAY[(1, ‘x’), (2, ‘y’), (1, ‘z’)]){1 -> [‘x’, ‘z’], 2 -> [‘y’]}

map_concat(map1(K, V), map2(K, V), …, mapN(K, V)) → map(K, V)

  • Returns the union of all the given maps. If a key is found in multiple given maps, that key’s value in the resulting map comes from the last one of those maps.

map_filter(map(K, V), function(K, V, boolean)) → map(K, V)

  • Constructs a map from those entries of map for which function returns true.
  • Example:
QueryOutput
SELECT map_filter(MAP(ARRAY[], ARRAY[]), (k, v) -> true){}
SELECT map_filter(MAP(ARRAY[10, 20, 30], ARRAY[‘a’, NULL, ‘c’]), (k, v) -> v IS NOT NULL){10 -> a, 30 -> c}

map_keys(x(K, V)) → array(K)

  • Returns all the keys in the map x.
  • Example:
QueryOutput
SELECT map_keys(map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’]))[k1, k2]

map_values(x(K, V)) → array(V)

  • Returns all the values in the map x.
  • Example:
QueryOutput
SELECT map_values(map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’]))[value1, value2]

transform_keys(map(K1, V), function(K1, V, K2)) → map(K2, V)

  • Returns a map that applies function to each entry of map and transforms the keys.
  • Example:
QueryOutput
SELECT transform_keys(MAP(ARRAY[], ARRAY[]), (k, v) -> k + 1){}
SELECT transform_keys(MAP(ARRAY [1, 2, 3], ARRAY [‘a’, ‘b’, ‘c’]), (k, v) -> k + 1){2 -> a, 3 -> b, 4 -> c}
SELECT transform_keys(MAP(ARRAY [‘a’, ‘b’], ARRAY [1, 2]), (k, v) -> k || CAST(v as VARCHAR)){a1 -> 1, b2 -> 2}

transform_values(map(K, V1), function(K, V1, V2)) → map(K, V2)

  • Returns a map that applies function to each entry of map and transforms the values.
  • Example:
QueryOutput
SELECT transform_values(MAP(ARRAY[], ARRAY[]), (k, v) -> v + 1){}
SELECT transform_values(MAP(ARRAY [1, 2, 3], ARRAY [10, 20, 30]), (k, v) -> v + k){1 -> 11, 2 -> 22, 3 -> 33}

Thanks for the read. Please feel free to reach out with your comments.

This story is authored by PV Subbareddy. Subbareddy is a Big Data Engineer specializing on Cloud Big Data Services and Apache Spark Ecosystem.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.