VQL Survival Guide
Author: Husam Gameel (Oppenheim3r)
Table of Contents
1. VQL Design Philosophy & Core Concepts
Before writing queries, you must understand three crucial architectural concepts in VQL:
A. Dynamic Data Sources (Plugins vs. Tables)
In SQL, you query static tables (FROM users). In VQL, the data sources are plugins—which are actual Go functions running inside the Velociraptor agent (FROM pslist(), FROM glob()).
- Plugins return rows dynamically.
- Plugins accept keyword arguments (e.g.,
pslist(pid=1234)).
B. Lazy Evaluation (Doing Less Work)
VQL goes to extreme lengths to postpone evaluation. When you specify a column expression, VQL wraps it in a Lazy Evaluator.
- If a row is filtered out by a
WHEREclause before a column needs to be printed, that column's function is never evaluated. - This is critical when using expensive functions like
hash()orupload().
💡 Performance Tip: Order yourWHEREconditions from cheapest to most expensive (left-to-right). If a cheap string match fails, VQL aborts early and won't run an expensive file hash function on that row!
Bad (Expensive):WHERE hash(path=FullPath).SHA256 = "..." AND FullPath =~ "system32"
Good (Cheap First):WHERE FullPath =~ "system32" AND hash(path=FullPath).SHA256 = "..."
C. The Scope
The scope is a stack-like data structure containing active variables, plugins, and helper functions.
- When a query executes, nested child scopes are created (e.g., inside
foreach()loops). - If a variable or column name inside a subquery matches a variable in the parent scope, the inner one masks the outer one. You can use
ASto rename variables and avoid collisions.
2. Level 1: Basic Queries & Syntax
Let's master the basic SQL-like skeleton of a VQL query:
SELECT Column1, Column2, Column3 AS CustomName
FROM plugin(arg1="value", arg2=123)
WHERE Column1 =~ "regex_pattern"
ℹ️ Note: Unlike SQL, VQL does not require or allow a semicolon (;) at the end of statements!
2.1 Basic System Information Query
Let's get basic system facts using the info() plugin:
SELECT Hostname, OS, Platform, KernelVersion, Architecture, IsAdmin
FROM info()
Output (JSON format):
[
{
"Hostname": "Rick",
"OS": "windows",
"Platform": "Microsoft Windows 11 Pro",
"KernelVersion": "10.0.22631",
"Architecture": "amd64",
"IsAdmin": true
}
]
2.2 Filtering and the Regex Operator (=~)
The regex match operator (=~) is one of the most frequently used operators in DFIR queries. Let's list running processes using pslist(), filtering for Microsoft Edge or Chrome:
SELECT Pid, Name, CommandLine, Exe
FROM pslist()
WHERE Name =~ "chrome|msedge"
2.3 Defining Stored Queries (LET Variable Binding)
You can declare queries as variables using the LET keyword. Stored queries are lazy and will only be executed when you reference them!
-- Declare the stored query (not evaluated yet!)
LET FindBrowsers = SELECT Pid, Name, Exe
FROM pslist()
WHERE Name =~ "chrome|msedge|firefox"
-- Execute the query and limit the results to 3
SELECT * FROM FindBrowsers
LIMIT 3
2.4 Multi-line Raw Strings (''')
Windows file paths use backslashes (\). Standard strings require escaping ("C:\\Windows\\System32"). To avoid escaping nightmares, VQL provides raw strings using triple single-quotes ('''). Inside a raw string, everything is literal!
-- Escaping required:
SELECT * FROM glob(globs="C:\\Windows\\System32\\*.exe")
-- No escaping required (Raw string):
SELECT * FROM glob(globs='''C:\Windows\System32\*.exe''')
3. Level 2: Foreach, Memoize & Subqueries
In standard SQL, you combine two tables using JOIN. Because VQL has no static tables or static indexes, it does not have a JOIN operator. Instead, VQL uses the foreach() plugin and memoize() function to combine dynamic data.
3.1 The Mighty foreach() Loop
The foreach() plugin runs an inner query for every row returned by an outer query.
Syntax:
SELECT * FROM foreach(
row={ SELECT Exe, Pid FROM pslist() },
query={ SELECT Pid, ModTime, Size, FullPath FROM stat(filename=Exe) }
)
In this query:
rowquery runs first, yielding process executables (Exe).- For each row, VQL creates a nested scope containing
Exeand runs thequerysubquery. - The
stat()plugin takes theExevariable from the outer row and returns file details.
3.2 Dynamic Join: Processes and Network Connections
Let's write a query to map active network connections (netstat()) to their respective process names (pslist()).
SELECT * FROM foreach(
row={
SELECT Family, Type, Laddr.IP AS LocalIP, Laddr.Port AS LocalPort,
Raddr.IP AS RemoteIP, Raddr.Port AS RemotePort, Pid
FROM netstat()
WHERE RemoteIP != "0.0.0.0" AND RemoteIP != "127.0.0.1"
},
query={
SELECT Name AS ProcessName, Exe AS ProcessPath, LocalIP, LocalPort, RemoteIP, RemotePort, Pid
FROM pslist(pid=Pid)
}
)
⚠️ Performance Alert: This query is extremely slow on busy systems! Why? Because for every single socket returned bynetstat(), thepslist()plugin runs again to find the process name. If there are 300 sockets,pslist()runs 300 times!
3.3 Solving Performance with Materialization (<=)
To solve the slow nested-query problem, we can materialize the process list into memory using the <= operator.
LET x = query(Lazy - runs the query every time it is referenced).LET x <= query(Materialized - runs the query once, caches the results in an in-memory array, and accesses it instantly).
-- Step 1: Cache the process list in memory (runs exactly ONCE)
LET ProcessCache <= SELECT Pid AS ProcessPid, Name AS ProcessName, Exe AS ProcessPath
FROM pslist()
-- Step 2: Query netstat and map using a fast in-memory subquery lookup
SELECT Laddr.IP AS LocalIP, Laddr.Port AS LocalPort,
Raddr.IP AS RemoteIP, Raddr.Port AS RemotePort, Pid,
{
SELECT ProcessName
FROM ProcessCache
WHERE ProcessPid = Pid
} AS ProcessName
FROM netstat()
WHERE RemoteIP != "0.0.0.0" AND RemoteIP != "127.0.0.1"
This materialized lookup runs in milliseconds instead of seconds!
3.4 Speeding up with memoize()
For large lookups, you can use the memoize() function. It builds an indexed hash map in memory on a specific key:
-- Re-cache every 10 seconds (10000ms)
LET ProcessLookup <= memoize(
key="ProcessPid",
query={ SELECT Pid AS ProcessPid, Name, Exe FROM pslist() },
period=10000
)
-- Access the cached process details using get() in O(1) time!
SELECT Laddr.IP AS LocalIP, Laddr.Port AS LocalPort, Pid,
get(item=ProcessLookup, field=Pid).Name AS ProcessName
FROM netstat()
4. Level 3: Log Parser (Grok, CSV, and Aggregate Functions)
Log parsing is a fundamental task for detection engineers. VQL handles CSVs, JSONs, XMLs, and unstructured text files beautifully.
4.1 Parsing Structured CSV Logs
Let's parse a CSV log file on the fly using parse_csv().
LET RawLogs = '''Timestamp,User,Action,IPAddress
2026-05-19T10:00:00Z,admin,login_success,192.168.1.50
2026-05-19T10:01:05Z,attacker,login_failed,45.2.3.9
2026-05-19T10:02:10Z,attacker,login_failed,45.2.3.9
2026-05-19T10:03:00Z,attacker,login_success,45.2.3.9
'''
SELECT Timestamp, User, Action, IPAddress
FROM parse_csv(accessor="data", filename=RawLogs)
WHERE Action = "login_success"
4.2 Grouping and Aggregate Analysis (GROUP BY)
Let's group the failed logins by user and calculate counts using the aggregate count() function:
SELECT User, count() AS TotalFailedAttempts
FROM parse_csv(accessor="data", filename=RawLogs)
WHERE Action = "login_failed"
GROUP BY User
Output:
[
{
"User": "attacker",
"TotalFailedAttempts": 2
}
]
4.3 Parsing Unstructured Logs with grok()
When logs are unstructured text files (like web IIS logs or Linux syslogs), you can extract fields using Grok expressions (named regex patterns).
Let's parse raw web server log lines:
LET WebServerLogs = '''192.168.1.100 - - [19/May/2026:10:00:01 -0700] "GET /index.html HTTP/1.1" 200 452
192.168.1.150 - - [19/May/2026:10:01:45 -0700] "POST /login.php HTTP/1.1" 401 234
192.168.1.180 - - [19/May/2026:10:02:15 -0700] "GET /admin/cmd.aspx?cmd=whoami HTTP/1.1" 404 125
'''
-- Parse lines and extract fields using a custom Grok expression
LET parsed_lines = SELECT grok(
grok="%{IP:ClientIP} - - \\[%{HTTPDATE:Timestamp}\\] \"%{WORD:Method} %{NOTSPACE:URI} HTTP/%{NUMBER:Version}\" %{NUMBER:StatusCode:int} %{NUMBER:ResponseBytes:int}",
data=Line
) AS LogRecord
FROM parse_lines(accessor="data", filename=WebServerLogs)
-- Use foreach to deconstruct the parsed dictionaries into individual columns
SELECT * FROM foreach(row=parsed_lines, column="LogRecord")
WHERE StatusCode >= 400
Output:
[
{
"ClientIP": "192.168.1.150",
"Timestamp": "19/May/2026:10:01:45 -0700",
"Method": "POST",
"URI": "/login.php",
"Version": "1.1",
"StatusCode": 401,
"ResponseBytes": 234
},
{
"ClientIP": "192.168.1.180",
"Timestamp": "19/May/2026:10:02:15 -0700",
"Method": "GET",
"URI": "/admin/cmd.aspx?cmd=whoami",
"Version": "1.1",
"StatusCode": 404,
"ResponseBytes": 125
}
]