VOOZH about

URL: https://en.wikipedia.org/wiki/Jaql

⇱ Jaql - Wikipedia


Jump to content
From Wikipedia, the free encyclopedia
Functional data processing and query language
Jaql
ParadigmFunctional
Designed byVuk Ercegovac (Google)
First appearedOctober 9, 2008; 17 years ago (2008-10-09)
Stable release
0.5.1 / July 12, 2010; 15 years ago (2010-07-12)
Implementation languageJava
OSCross-platform
LicenseApache License 2.0
Websitecode.google.com/p/jaql/m
Major implementations
IBM BigInsights

Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data.

It started as an open source project at Google[1] but the latest release was on 2010-07-12. IBM[2] took it over as primary data processing language for their Hadoop software package BigInsights.

Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML.

A comparison[3] to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies.

Jaql supports[4] lazy evaluation, so expressions are only materialized when needed.

Syntax

[edit]

The basic concept of Jaql is

source -> operator(parameter) -> sink ;

where a sink can be a source for a downstream operator. So typically a Jaql program has to following structure, expressing a data processing graph:

source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ;

Most commonly for readability reasons Jaql programs are linebreaked after the arrow, as is also a common idiom in Twitter Scalding:

source -> operator1(parameter)
-> operator2(parameter)
-> operator2(parameter)
-> operator3(parameter)
-> operator4(parameter)
-> sink ;

Core operators

[edit]

Source:[5]

Expand

[edit]

Use the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays [[T]] and produces an output array [T], by promoting the elements of each nested array to the top-level output array.

Filter

[edit]

Use the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause. Example:

data=[
{name:"Jon Doe",income:20000,manager:false},
{name:"Vince Wayne",income:32500,manager:false},
{name:"Jane Dean",income:72000,manager:true},
{name:"Alex Smith",income:25000,manager:false}
];

data->filter$.manager;

[
{
"income":72000,
"manager":true,
"name":"Jane Dean"
}
]

data->filter$.income<30000;

[
{
"income":20000,
"manager":false,
"name":"Jon Doe"
},
{
"income":25000,
"manager":false,
"name":"Alex Smith"
}
]

Group

[edit]

Use the GROUP expression to group one or more input arrays on a grouping key and applies an aggregate function per group.

Join

[edit]

Use the JOIN operator to express a join between two or more input arrays. This operator supports multiple types of joins, including natural, left-outer, right-outer, and outer joins.

Sort

[edit]

Use the SORT operator to sort an input by one or more fields.

Top

[edit]

The TOP expression selects the first k elements of its input. If a comparator is provided, the output is semantically equivalent to sorting the input, then selecting the first k elements.

Transform

[edit]

Use the TRANSFORM operator to realize a projection or to apply a function to all items of an output.

See also

[edit]

References

[edit]
  1. ^ "Google Code Archive - Long-term storage for Google Code Project Hosting". code.google.com. Retrieved 2025-10-30.
  2. ^ Initial Publication
  3. ^ Stewart, Robert J.; Trinder, Phil W.; Loidl, Hans-Wolfgang (2011). "Comparing High Level MapReduce Query Languages". Advanced Parallel Processing Technologies. Lecture Notes in Computer Science. Vol. 6965. pp. 58–72. doi:10.1007/978-3-642-24151-2_5. ISBN 978-3-642-24150-5.
  4. ^ "jaql Archives - Matouš Havlena". 2013-07-18. Retrieved 2025-10-30.
  5. ^ IBM BigInsights Documentation

External links

[edit]