This post is a lot different from our earlier entries. Think of it as a reference flag post for people interested in a quick lookup for advanced analytics functions and operators used in modern data lake operations based on Presto. So you could, of course, use it in Presto installations, but also in some other commercial products such as AWS Athena that is used widely these days to facilitate analytic operations on Enterprise Data Lakes built on top of Amazon S3.
Without further ado, let’s dive straight into the nuts and bolts of these queries for advanced analytics:
JSON Functions
is_json_scalar(json) → boolean
- Determines if json is a scalar (i.e. a JSON number, a JSON string, true, false or null).
- Example:
Query | Output |
SELECT is_json_scalar(‘1’) | true |
SELECT is_json_scalar(‘[1, 2, 3]’) | false |
json_array_contains(json, value) → boolean
- Determines if value exists in json (a string containing a JSON array)
- Example:
Query | Output |
SELECT json_array_contains(‘[1, 2, 3]’, 2) | true |
json_array_get(json_array, index) → json
- Example:
Query | Output |
SELECT json_array_get(‘[“a”, [3, 9], “c”]’, 0) | a |
SELECT json_array_get(‘[“a”, [3, 9], “c”]’, 10) | null |
SELECT json_array_get(‘[“c”, [3, 9], “a”]’, -2) | JSON ‘[3,9]’ |
json_array_length(json) → bigint
- Returns the array length of json (a string containing a JSON array)
- Example:
Query | Output |
SELECT json_array_length(‘[1, 2, 3]’) | 3 |
json_extract(json, json_path) → json
- Evaluates the JSONPath-like expression json_path on json (a string containing JSON) and returns the result as a JSON string
- Example:
Query | Output |
SELECT json_extract(json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [5678908, 587578575, 668798]}’), ‘$.email.abc’) | “xyz@yahoo.com” |
json_extract_scalar(json, json_path) → varchar
- Just like json_extract(), but returns the result value as a string (as opposed to being encoded as JSON). The value referenced by json_path must be a scalar (boolean, number or string)
- Example:
Query | Output |
SELECT json_extract_scalar(json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [5678908, 587578575, 9999999]}’), ‘$.phone_numbers[0]’) | 9999999 |
SELECT json_extract_scalar(json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’), ‘$.phone_numbers[1].mob’) | 587578575 |
SELECT json_extract_scalar(json_parse(‘{“email”: “xyz@yahoo.com”, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’), ‘$[email]’) | xyz@yahoo.com |
json_parse(string) → json
- Returns the JSON value deserialized from the input JSON text. This is an inverse function to json_format()
- Example:
Query | Output |
SELECT json_parse(‘{“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’) | {“email”: {“abcd”: “xyz@yahoo.com”}, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]} |
json_size(json, json_path) → bigint
- Just like json_extract(), but returns the size of the value. For objects or arrays, the size is the number of members, and the size of a scalar value is zero.
- Example:
Query | Output |
SELECT json_size(json_parse(‘{“email”: “xyz@yahoo.com”, “phone_numbers”: [{“home”: 5678908}, {“mob”: 587578575}, {“cell”: 9999999}]}’), ‘$.phone_numbers’) | 3 |
SELECT json_size(‘{“x”: {“a”: 1, “b”: 2}}’, ‘$.x.a’) | 0 |
Date and Time Functions and Operators
current_date → date
- Returns the current date as of the start of the query.
current_time → time with time zone
- Returns the current time as of the start of the query. (with UTC)
current_timestamp → timestamp with time zone
- Returns the current timestamp as of the start of the query.
current_timezone() → varchar
- Returns the current time zone in the format defined by IANA (e.g., America/Los_Angeles) or as fixed offset from UTC (Example: +08:35)
date(x) → date
- This is an alias for CAST(x AS date).
- Example:
Query | Output |
SELECT DATE(‘2019-08-07’) | SELECT CAST(‘2019-08-07’ AS DATE); |
now() → timestamp with time zone
- This is an alias for current_timestamp.
- SELECT now();
date_trunc(unit, x) → [same as input]
- Returns x truncated to unit.
- Example:
Query | Output |
SELECT date_trunc(‘second’, current_timestamp) | 2019-08-16 06:51:29.000 UTC (returns value upto unit) |
SELECT date_trunc(‘minute’, current_timestamp) | 2019-08-16 06:54:00.000 UTC |
Interval Functions
Unit | Description |
millisecond | Milliseconds |
second | Seconds |
minute | Minutes |
hour | Hours |
day | Days |
week | Weeks |
month | Months |
quarter | Quarters of a year |
year | Year |
date_add(unit, value, timestamp) → [same as input]
- Adds an interval value of type unit to timestamp. Subtraction can be performed by using a negative value.
- Example:
Query | Output |
SELECT date_add(‘month’, 1, current_timestamp) | 2019-09-16 06:59:55.425 UTC |
SELECT date_add(‘day’, 1, current_timestamp) | 2019-08-17 07:01:26.834 UTC |
date_diff(unit, timestamp1, timestamp2) → bigint
- Returns timestamp2 – timestamp1 expressed in terms of unit.
- Example:
Query | Output |
SELECT date_diff(‘day’, current_timestamp, date_add(‘day’, 10, current_timestamp)) | 10 |
parse_duration(string) → interval
- Parses string of format value unit into an interval, where value is fractional number of unit values
- Example:
Query | Output |
SELECT parse_duration(‘42.8ms’) | 0 00:00:00.043 |
SELECT parse_duration(‘3.81 d’) | 3 19:26:24.000 |
SELECT parse_duration(‘5m’) | 0 00:05:00.000 |
date_format(timestamp, format) → varchar
- Formats timestamp as a string using format (converts timestamp to string)
- Example:
Query | Output |
SELECT date_format(current_timestamp, ‘%Y-%m-%d’) | 2019-08-16 |
date_parse(string, format) → timestamp
- Parses string into a timestamp using format. (converts string to timestamp)
- Example:
Query | Output |
SELECT date_parse(‘2019-08-16’, ‘%Y-%m-%d’) | 2019-08-16 00:00:00.000 |
day(x) → bigint
- Returns the day of the month from x.
- Example:
Query | Output |
SELECT day(date_parse(‘2019-08-16’, ‘%Y-%m-%d’)) | 16 |
day_of_month(x) → bigint
- This is an alias for day().
day_of_week(x) → bigint
- Returns the ISO day of the week from x. The value ranges from 1 (Monday) to 7 (Sunday).
- Example:
Query | Output |
SELECT day_of_week(date_parse(‘2019-08-16’, ‘%Y-%m-%d’)) | 5 |
year(x) → bigint
- Returns the year from x.
Aggregate Functions
array_agg(x) → array<[same as input]>
- Returns an array created from the input x elements
- The array_agg() function is an aggregate function that accepts a set of values and returns an array in which each value in the input set is assigned to an element of the array.
- Syntax:
array_agg(expression [ORDER BY [sort_expression {ASC | DESC}], […]) - Example:
Query |
SELECT title, array_agg (first_name || ‘ ‘ || last_name) actors FROM film |
SELECT title, array_agg (first_name || ‘ ‘ || last_name ORDER BY first_name) actors FROM film |
avg(x) → double
- Returns the average (arithmetic mean) of all input values
bool_and(boolean) → boolean
- Returns TRUE if every input value is TRUE, otherwise FALSE.
bool_or(boolean) → boolean
- Returns TRUE if any input value is TRUE, otherwise FALSE.
count(*) → bigint
- Returns the number of input rows.
count(x) → bigint
- Returns the number of non-null input values.
count_if(x) → bigint
- Returns the number of TRUE input values. This function is equivalent to count(CASE WHEN x THEN 1 END).
arbitrary(x) → [same as input]
- Returns an arbitrary non-null value of x, if one exists.
- Arbitrary chooses one value out of a set of values. arbitrary is useful for silencing warnings about values neither grouped by or aggregated over.
max_by(x, y) → [same as x]
- Returns the value of x associated with the maximum value of y over all input values.
- The max_by takes two arguments and returns the value of the first argument for which the value of the second argument is maximized.
- If multiple rows maximize the result of the second value, and arbitrary first value is chosen from among them. max_by can be used with both numeric and non-numeric data.
- Example:
Query | Output |
SELECT max_by(close_date, close_value) as date_of_max_sale FROM sales_pipeline | query returns the date where close_value is maximum |
max_by(x, y, n) → array<[same as x]>
- Returns n values of x associated with the n largest of all input values of y in descending order of y.
min_by(x, y) → [same as x]
- Returns the value of x associated with the minimum value of y over all input values.
min_by(x, y, n) → array<[same as x]>
- Returns n values of x associated with the n smallest of all input values of y in ascending order of y.
max(x, n) → array<[same as x]>
- Returns n largest values of all input values of x.
min(x) → [same as input]
- Returns the minimum value of all input values.
min(x, n) → array<[same as x]>
- Returns n smallest values of all input values of x.
reduce_agg(inputValue T, initialState S, inputFunction(S, T, S), combineFunction(S, S, S)) → S
- Reduces all input values into a single value. inputFunction will be invoked for each input value. In addition to taking the input value, inputFunction takes the current state, initially initialState, and returns the new state. combineFunction will be invoked to combine two states into a new state. The final state is returned
- Example:
Query | Output |
SELECT id, reduce_agg(value, 0, (a, b) -> a + b, (a, b) -> a + b) FROM (VALUES (1, 2) (1, 3), (1, 4), (2, 20), (2, 30), (2, 40) ) AS t(id, value) GROUP BY id | (1, 9), (2, 90) |
Array Functions and Operators
Subscript Operator: [ ]
- The [ ] operator is used to access an element of an array and is indexed starting from one:
- If a column has values like [1, 3], [45, 46]
- Example:
Query | Output |
SELECT column[1] | 1, 45 |
Concatenation Operator: ||
- The || operator is used to concatenate an array with an array or an element of the same type.
- Example:
Query | Output |
SELECT ARRAY [1] || ARRAY [2] | [1, 2] |
SELECT ARRAY [1] || 2 | [1, 2] |
SELECT 2 || ARRAY [1] | [2, 1] |
array_distinct(x) → array
- Remove duplicate values from the array x.
- Example:
Query | Output |
SELECT array_distinct(ARRAY[1, 1, 1, 2]) | [1, 2] |
array_intersect(x, y) → array
- Returns an array of the elements in the intersection of x and y, without duplicates.
- Common elements in two arrays without duplicates.
- Example:
Query | Output |
SELECT array_intersect(ARRAY[1, 1, 1, 2], ARRAY[10, 15, 1, 100]) | [1] |
array_union(x, y) → array
- Returns an array of the elements in the union of x and y, without duplicates.
- Example:
Query | Output |
SELECT array_union(ARRAY[1, 1, 1, 2], ARRAY[10, 15, 1, 100]) | [1, 2, 10, 15, 100] |
array_except(x, y) → array
- Returns an array of elements in x but not in y, without duplicates.
- Example:
Query | Output |
SELECT array_except(ARRAY[1, 1, 1, 2], ARRAY[10, 15, 1, 100]) | [2] |
array_join(x, delimiter, null_replacement) → varchar
- Concatenates the elements of the given array using the delimiter and an optional string to replace nulls.
- Example:
Query | Output |
SELECT array_join(ARRAY[1, 71, 81, 92, null], ‘/’, ‘abcd’) | 1/71/81/92/abcd |
array_max(x) → x
- Returns the maximum value of input array.
- Example:
Query | Output |
SELECT array_max(ARRAY[1, 71, 81, 92, 100]) | 100 |
SELECT array_max(ARRAY[1, 71, 81, 92, null]) | null |
array_min(x) → x
- Returns the minimum value of input array.
- Example:
Query | Output |
SELECT array_min(ARRAY[1, 71, 81, 92, 100]) | 1 |
array_position(x, element) → bigint
- Returns the position of the first occurrence of the element in array x (or 0 if not found).
- Example:
Query | Output |
SELECT array_position(ARRAY[1, 71, 81, 92, 100], 100) | 5 |
array_remove(x, element) → array
- Remove all elements that equal element from array x.
- Example:
Query | Output |
SELECT array_remove(ARRAY[1, 71, 81, 92, 100, 1], 1) | [71, 81, 92, 100] |
array_sort(x) → array
- Sorts and returns the array x. The elements of x must be orderable. Null elements will be placed at the end of the returned array.
- Example:
Query | Output |
SELECT array_sort(ARRAY[1, null, 8, 9, 71, 81, 92, 100, 12, 51, 10, 7, 1, null]) | [1, 1, 7, 8, 9, 10, 12, 51, 71, 81, 92, 100, null, null] |
array_sort(array(T), function(T, T, int)) → array(T)
- Sorts and returns the array based on the given comparator function. The comparator will take two nullable arguments representing two nullable elements of the array. It returns -1, 0, or 1 as the first nullable element is less than, equal to, or greater than the second nullable element.
- If the comparator function returns other values (including NULL), the query will fail and raise an error.
- Example:
Query | Output |
SELECT array_sort(ARRAY [3, 2, 5, 1, 2], (x, y) -> IF(x < y, 1, IF(x = y, 0, -1))) | [5, 3, 2, 2, 1] |
cardinality(x) → bigint
- Returns the cardinality (size) of the array x.
- Example:
Query | Output |
SELECT cardinality(ARRAY[1, 81, 92, 100, 12, 51, 10]) | 7 |
arrays_overlap(x, y) → boolean
- Tests if arrays x and y have any any non-null elements in common. Returns null if there are no non-null elements in common but either array contains null.
- Example:
Query | Output |
SELECT arrays_overlap(ARRAY[1, 81, 92, 100, 12, 51, 10], ARRAY[101]) | false |
SELECT arrays_overlap(ARRAY[1, 81, 92, 100, 12, 51, 10], ARRAY[101, 51]) | true |
concat(array1, array2, …, arrayN) → array
- Concatenates the arrays array1, array2, …, arrayN. This function provides the same functionality as the SQL-standard concatenation operator (||).
contains(x, element) → boolean
- Returns true if the array x contains the element.
- Example:
Query | Output |
SELECT contains(ARRAY[1, 81, 92, 100, 12, 51, 10], 1) | true |
SELECT contains(ARRAY[‘abcd’, ‘test’, ‘xyz’], ‘xyz’) | true |
element_at(array(E), index) → E
- SQL array indices start at 1
- Returns element of array at given index. If index > 0, this function provides the same functionality as the SQL-standard subscript operator ([]). If index < 0, element_at accesses elements from the last to the first.
- Example:
Query | Output |
SELECT contains(ARRAY[1, 81, 92, 100, 12, 51, 10], 2) | 81 |
filter(array(T), function(T, boolean)) → array(T)
- Constructs an array from those elements of array for which function returns true.
- Example:
Query | Output |
SELECT filter(ARRAY [5, -6, NULL, 7], x -> x > 0) | [5, 7] |
flatten(x) → array
- Flattens an array(array(T)) to an array(T) by concatenating the contained arrays.
reduce(array(T), initialState S, inputFunction(S, T, S), outputFunction(S, R)) → R
- Returns a single value reduced from array. inputFunction will be invoked for each element in array in order.
- In addition to taking the element, inputFunction takes the current state, initially initialState, and returns the new state. outputFunction will be invoked to turn the final state into the result value.
- It may be the identity function (i -> i).
- Example:
Query | Output |
SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + COALESCE(x, 0), s -> s) | 75 |
SELECT reduce(ARRAY [5, 20, NULL, 50], 1, (s, x) -> s * COALESCE(x, 1), s -> s) | 5000 |
repeat(element, count) → array
- Repeat element for count times
reverse(x) → array
- Returns an array which has the reversed order of array x.
- Example:
Query | Output |
SELECT reverse(ARRAY[2, 5]) | [5, 2] |
sequence(start, stop) → array(bigint)
- Generate a sequence of integers from start to stop, increments by 1 if start is less than or equal to stop, otherwise decrements by 1.
- Example:
Query | Output |
SELECT sequence(2, 7) | [2, 3, 4, 5, 6, 7] |
SELECT sequence(7, 1) | [7, 6, 5, 4, 3, 2, 1] |
sequence(start, stop, step) → array(bigint)
- Generate a sequence of integers from start to stop, incrementing by step.
sequence(start, stop) → array(date)
- Generate a sequence of dates from start date to stop date, incrementing by 1 day if start date is less than or equal to stop date, otherwise -1 day.
sequence(start, stop, step) → array(date)
- Generate a sequence of dates from start to stop, incrementing by step. The type of step can be either INTERVAL DAY TO SECOND or INTERVAL YEAR TO MONTH.
shuffle(x) → array
- Generate a random permutation of the given array x.
- Example:
Query | Output |
SELECT shuffle(ARRAY[1, 7, 2]) | [7, 1, 2] |
slice(x, start, length) → array
- Subsets array x starting from index start (or starting from the end if start is negative) with a length of length.
- Example:
Query | Output |
SELECT slice(ARRAY[1, 7, 2, 87, 12, 09], 2, 4) | [7, 2, 87, 12] |
transform(array(T), function(T, U)) → array(U)
- Returns an array that is the result of applying function to each element of array
- Example:
Query | Output |
SELECT transform(ARRAY [5, NULL, 6], x -> COALESCE(x, 0) + 1) | [6, 1, 7] |
SELECT transform(ARRAY [‘x’, ‘abc’, ‘z’], x -> x || ‘0’) | [‘x0’, ‘abc0’, ‘z0’] |
zip(array1, array2[, …]) → array(row)
- Merges the given arrays, element-wise, into a single array of rows. The M-th element of the N-th argument will be the N-th field of the M-th output element. If the arguments have an uneven length, missing values are filled with NULL.
- Example:
Query | Output |
SELECT zip(ARRAY[1, 2], ARRAY[‘1b’, null, ‘3b’]) | [ROW(1, ‘1b’), ROW(2, null), ROW(null, ‘3b’)] |
zip_with(array(T), array(U), function(T, U, R)) → array(R)
- Merges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function:
- Example:
Query | Output |
SELECT zip_with(ARRAY[1, 3, 5], ARRAY[‘a’, ‘b’, ‘c’], (x, y) -> (y, x)) | [ROW(‘a’, 1), ROW(‘b’, 3), ROW(‘c’, 5)] |
Window functions
Window functions perform calculations across rows of the query result. They run after the HAVING clause but before the ORDER BY clause. Invoking a window function requires special syntax using the OVER clause to specify the window. A window has three components:
- The partition specification, which separates the input rows into different partitions. This is analogous to how the GROUP BY clause separates rows into different groups for aggregate functions.
- The ordering specification, which determines the order in which input rows will be processed by the window function.
- The window frame, which specifies a sliding window of rows to be processed by the function for a given row. If the frame is not specified, it defaults to RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This frame contains all rows from the start of the partition up to the last peer of the current row.
rank() → bigint
- Returns the rank of a value in a group of values. The rank is one plus the number of rows preceding the row that are not peer with the row. Thus, tie values in the ordering will produce gaps in the sequence. The ranking is performed for each window partition.
- As shown below, the rank function produces a numerical rank within the current row’s partition for each distinct ORDER BY value, in the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.
- Query:
WITH dataset AS
(SELECT 'name' AS rows, ARRAY['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'e'] AS sample )
SELECT rows,
value,
rank()
OVER (ORDER BY value) AS rank
FROM dataset
CROSS JOIN UNNEST(sample) AS t(value)
- Output:
rows | value | rank | |
1 | name | a | 1 |
2 | name | a | 1 |
3 | name | a | 1 |
4 | name | b | 4 |
5 | name | b | 4 |
6 | name | c | 6 |
7 | name | c | 6 |
8 | name | d | 8 |
9 | name | e | 9 |
row_number() → bigint
- returns a unique, sequential number for each row, starting with one, according to the ordering of rows within the window partition.
- Query:
WITH dataset AS
(SELECT 'name' AS rows, ARRAY['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'e'] AS sample )
SELECT rows,
value,
row_number()
OVER (ORDER BY value) AS row_num
FROM dataset
CROSS JOIN UNNEST(sample) AS t(value)
- Output:
rows | value | row_num | |
1 | name | a | 1 |
2 | name | a | 2 |
3 | name | a | 3 |
4 | name | b | 4 |
5 | name | b | 5 |
6 | name | c | 6 |
7 | name | c | 7 |
8 | name | d | 8 |
9 | name | e | 9 |
dense_rank() → bigint
- returns the rank of a value in a group of values. This is similar to rank(), except that tie values do not produce gaps in the sequence.
- Query:
WITH dataset AS
(SELECT 'name' AS rows, ARRAY['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'e'] AS sample )
SELECT rows,
value,
dense_rank()
OVER (ORDER BY value) AS dense_ran
FROM dataset
CROSS JOIN UNNEST(sample) AS t(value)
- Output:
rows | value | dense_ran | |
1 | name | a | 1 |
2 | name | a | 1 |
3 | name | a | 1 |
4 | name | b | 2 |
5 | name | b | 2 |
6 | name | c | 3 |
7 | name | c | 3 |
8 | name | d | 4 |
9 | name | e | 5 |
Map Functions and Operators
map(array(K), array(V)) → map(K, V)
- Returns a map created using the given key/value arrays.
- Example:
Query | Output |
SELECT map(ARRAY[1,3], ARRAY[2,4]) AS a | {1=2, 3=4} |
SELECT map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’]) | {k1=value1, k2=value2} |
Subscript Operator: [ ]
- The [ ] operator is used to retrieve the value corresponding to a given key from a map.
- Example:
Query | Output |
SELECT map(ARRAY[1,3], ARRAY[2,4]) AS a | {1=2, 3=4} |
SELECT a[1] FROM (SELECT map(ARRAY[1,3], ARRAY[2,4]) AS a) | 2 |
element_at(map(K, V), key) → V
- Returns value for given key, or NULL if the key is not contained in the map.
- Example:
Query |
SELECT element_at(a, ‘k1’) FROM (SELECT map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’]) AS a) |
cardinality(x) → bigint
- Returns the cardinality (size) of the map x.
map() → map<unknown, unknown>
- Returns an empty map.
- Example:
Query | Output |
SELECT map() | {} |
map_from_entries(array(row(K, V))) → map(K, V)
- Returns a map created from the given array of entries.
- Example:
Query | Output |
SELECT map_from_entries(ARRAY[(1, ‘x’), (2, ‘y’)]) | {1 -> ‘x’, 2 -> ‘y’} |
map_agg(key, value) → map(K, V)
- Returns a map created from the input key/value pairs.
multimap_from_entries(array(row(K, V))) → map(K, array(V))
- Returns a multimap created from the given array of entries. Each key can be associated with multiple values.
- Example:
Query | Output |
SELECT multimap_from_entries(ARRAY[(1, ‘x’), (2, ‘y’), (1, ‘z’)]) | {1 -> [‘x’, ‘z’], 2 -> [‘y’]} |
map_concat(map1(K, V), map2(K, V), …, mapN(K, V)) → map(K, V)
- Returns the union of all the given maps. If a key is found in multiple given maps, that key’s value in the resulting map comes from the last one of those maps.
map_filter(map(K, V), function(K, V, boolean)) → map(K, V)
- Constructs a map from those entries of map for which function returns true.
- Example:
Query | Output |
SELECT map_filter(MAP(ARRAY[], ARRAY[]), (k, v) -> true) | {} |
SELECT map_filter(MAP(ARRAY[10, 20, 30], ARRAY[‘a’, NULL, ‘c’]), (k, v) -> v IS NOT NULL) | {10 -> a, 30 -> c} |
map_keys(x(K, V)) → array(K)
- Returns all the keys in the map x.
- Example:
Query | Output |
SELECT map_keys(map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’])) | [k1, k2] |
map_values(x(K, V)) → array(V)
- Returns all the values in the map x.
- Example:
Query | Output |
SELECT map_values(map(ARRAY[‘k1′,’k2’], ARRAY[‘value1’, ‘value2’])) | [value1, value2] |
transform_keys(map(K1, V), function(K1, V, K2)) → map(K2, V)
- Returns a map that applies function to each entry of map and transforms the keys.
- Example:
Query | Output |
SELECT transform_keys(MAP(ARRAY[], ARRAY[]), (k, v) -> k + 1) | {} |
SELECT transform_keys(MAP(ARRAY [1, 2, 3], ARRAY [‘a’, ‘b’, ‘c’]), (k, v) -> k + 1) | {2 -> a, 3 -> b, 4 -> c} |
SELECT transform_keys(MAP(ARRAY [‘a’, ‘b’], ARRAY [1, 2]), (k, v) -> k || CAST(v as VARCHAR)) | {a1 -> 1, b2 -> 2} |
transform_values(map(K, V1), function(K, V1, V2)) → map(K, V2)
- Returns a map that applies function to each entry of map and transforms the values.
- Example:
Query | Output |
SELECT transform_values(MAP(ARRAY[], ARRAY[]), (k, v) -> v + 1) | {} |
SELECT transform_values(MAP(ARRAY [1, 2, 3], ARRAY [10, 20, 30]), (k, v) -> v + k) | {1 -> 11, 2 -> 22, 3 -> 33} |
Thanks for the read. Please feel free to reach out with your comments.
This story is authored by PV Subbareddy. Subbareddy is a Big Data Engineer specializing on Cloud Big Data Services and Apache Spark Ecosystem.
Comments