github microsoft/msticpy v2.6.0
v2.6.0 Parallel Queries, Velociraptor data

latest releases: v2.14.0, v2.13.1, v2.13.0...
15 months ago

The three big changes in this release are:

  • Executing MS Sentinel and Kusto queries in parallel across multiple instance
  • Threaded (parallel) execution of time-split queries
  • Addition of data provider to query local (exported) Velociraptor logs

Many thanks to @d3vzer0 for inspiration and early work on the threaded query feature.
Many thanks @juju4 for inspiration and work on the Velociraptor support.

Support for running a query across multiple connections (with optional threaded operation)

It is common for data services to be spread across multiple tenants or workloads. E.g., multiple Sentinel workspaces,
Microsoft Defender subscriptions or Splunk instances. You can use the MSTICPy QueryProvider to run a query across multiple connections and return the results in a single DataFrame.

To create a multi-instance provider:

  • Create an instance of a QueryProvider for your data source and execute the connect() method to connect to the first instance of your data service.
  • Then use the add_connection() method. This takes the same parameters as the connect() method (the parameters for this method vary by data provider) to add additional instance connections.

add_connection() also supports an alias parameter to allow you to refer to the connection by a friendly name.

    qry_prov = QueryProvider("MSSentinel")
    qry_prov.connect(workspace="Workspace1")
    qry_prov.add_connection(workspace="Workspace2, alias="Workspace2")
    qry_prov.list_connections()

When you now run a query for this provider, the query will be run on all of the connections and the results will be returned as a single dataframe.

    test_query = '''
        SecurityAlert
        | take 5
        '''

    query_test = qry_prov.exec_query(query=test_query)
    query_test.head()

Some of the MSTICPy drivers support asynchronous execution of queries against multiple instances, so that the time taken to run the query is much reduced compared to running the queries sequentially. Drivers that support asynchronous queries will use this automatically. The initial set of multi-threaded drivers are:

  • MSSentinel_New (the new version of the MSSentinel driver)
  • Kusto_New (the new version of the Kusto/Azure Data Explorer driver)

By default, the queries will use at most 4 concurrent threads. You can override this by initializing the QueryProvider with the
max_threads parameter to set it to the number of threads you want. Although you should be cautious
about using too many simultaneous connections due to the potential impact on the cluster performance.

    qry_prov = QueryProvider("MSSentinel", max_threads=10)

Multi-threaded support for split/shared queries

MSTICPy has supported splitting large queries by time-slice for a while. This is documented here Splitting a Query into time chunks. With this release, we've added asynchronous support for this (if the driver supports threaded/async operation) so that multiple chunks of the query will run in parallel.

    qry_prov.SecurityAlert.list_alerts(start=start, end=end, split_by="1d")

Use the parameter split_query_by or split_by to specify a time range (the time unit uses the same syntax as pandas time intervals - e.g. "1D", "4h", etc. - the the pandas documentation for more details on this).

In this release sharding is also supported for ad hoc queries as long as you add "start" and "end" parameters to the query (this is still experimental, so let us know if you have issues with this).

Velociraptor Local Data Provider

The Velociraptor data provider can read Velociraptor log files and provide convenient query functions for each data set in the output logs.

The provider can read files from one or more hosts, stored in in separate folders. The files are read, converted to pandas DataFrames and grouped by table/event. Multiple log files of the same type (when reading in data from multiple hosts) are concatenated into a single DataFrame.

To use the Velociraptor provider, you need to create an QueryProvider instance, passing the string "Velociraptor" (or "VelociraptorLogs") as the data_environment parameter. You also need to add the data_paths parameter to specify specific folders that you want to search for log file (although you can set these paths in msticpyconfig.yaml, if you do this frequently).

You can specify multiple folders to have the logs from different hosts.

    qry_prov = mp.QueryProvider("VelociraptorLogs", data_paths=["~/my_logs"])

Calling the connect method triggers the provider to read the locations of the
log files (although the contents are not read until a query function is run).

    qry_prov.connect()


## Listing Velociraptor tables

```python3
    qry_prov.list_queries()
    ['velociraptor.Custom_Windows_NetBIOS',
    'velociraptor.Custom_Windows_Patches',
    'velociraptor.Custom_Windows_Sysinternals_PSInfo',
    'velociraptor.Custom_Windows_Sysinternals_PSLoggedOn',
   ....

Each query returns the table of data types retrieved from the logs.

    qry_prov.vc_prov.velociraptor.Windows_Forensics_ProcessInfo()
Name PebBaseAddress Pid ImagePathName CommandLine CurrentDirectory Env
LogonUI.exe 0x95bd3d2000 804 C:\Windows\system32\LogonUI.exe "LogonUI.exe" /flags:0x2 /state0:0xa3b92855 /state1:0x41c64e6d C:\Windows\system32\ {'ALLUSERSP
dwm.exe 0x6cf4351000 848 C:\Windows\system32\dwm.exe "dwm.exe" C:\Windows\system32\ {'ALLUSERSP
svchost.exe 0x6cd64d000 872 C:\Windows\System32\svchost.exe C:\Windows\System32\svchost.exe -k termsvcs C:\Windows\system32\ {'ALLUSERSP
svchost.exe 0x7d18e99000 912 C:\Windows\System32\svchost.exe C:\Windows\System32\svchost.exe -k LocalServiceNetworkRestricted C:\Windows\system32\ {'ALLUSERSP
svchost.exe 0x5c762eb000 920 C:\Windows\system32\svchost.exe C:\Windows\system32\svchost.exe -k LocalService C:\Windows\system32\ {'ALLUSERSP

What's Changed

Full Changelog: v2.5.3...v2.6.0

Don't miss a new msticpy release

NewReleases is sending notifications on new releases.