What is xCyclopedia?

The xCyclopedia project attempts to document all executable binaries (and eventually scripts) that reside on a typical operating system. It provides a web page to view the data as well as a machine-readable format (JSON and CSV) that can be immediately usable in other systems such as SIEMs to enrich observed executions with contextual data.

What data points are available?

  • Runtime data (Standard Out, Standard Error, Children Processes, Screenshots, Open Handles, Loaded Modules, Window Title)
  • File metadata (File Description, Original File Name, Product Name, Comments, Company Name, File Version, Product Version, Copyright)
  • Digital signature validity and associated metadata (Serial, Thumbprint, Issuer, Subject)
  • File hashes (MD5, SHA1, SHA256, SHA384, SHA512)
  • Fuzzy file hash (ssdeep)
  • Similar files* (available on xCyclopedia web page only)
  • External References* (available on xCyclopedia web page only)
    • Examples of misuse (e.g. malicious use of legitimate executable)
    • Microsoft Documentation

How is this done?

A powershell script iterates recursively through all directories and starts any executables found. It then gathers a multitude of artifacts (which is slowly being improved). For example, it grabs the command line output, in search of helpful syntax messages. And if a window is visible, it will take a screenshot.

Where is this data stored?

JSON/CSV

For the machine-readable data (JSON & CSV):

Web Page (Markdown)

For a web-based view of the data click here: strontic.github.io/xcyclopedia. Note: the web view includes a few bonus features that the JSON/CSV files do not currently include; namely the following:

Can I collect this data myself?

Sure! The powershell scripts are here! See syntax/usage section below.

Collector Script Usage

Syntax

  Get-Xcyclopedia
  #Synopsis: Iterate through all executable files in a specified directory (default target is .EXE). Gather CLI usage/syntax, screenshots, file hashes, file metadata, signature validity, and child processes.
    -save_path                  #path to save output
    -target_path                #target path for enumerating files (non-recursive). Comma-delimited for multiple paths.
    -target_path_recursive      #target path for enumerating files (recursive). Comma-delimited for multiple paths.
    -target_file_extension      #File extension to target (default = ".exe")
    -execute_files    [bool]    #Execute each for gathering syntax/usage info (stdout/stderr)
    -take_screenshots [bool]    #Take a screenshot if a given process has a window visible. This requires execute_files to be enabled.
    -minimize_windows [bool]    #Minimizing windows helps with screenshots, so that other windows do not get in the way. This only takes effect if execute_files and $take_screenshots are both enabled.
    -xcyclopedia_verbose [bool] #Verbose Output
    -transcript_file  [bool]    #Write console output to a file (job.txt)

  Coalesce-Json
    #Synopsis: Combine JSON files into a single file. Only works with PowerShell-compatible JSON files.
    -target_files          #List of JSON files (comma-delimited) to combine.
    -save_path             #Path to save the combined JSON file.
    -verbose_output [bool]
    -save_json      [bool] #Save file as JSON
    -save_csv       [bool] #Save file as CSV

Example

Get-Xcyclopedia -save_path "c:\temp\strontic-xcyclopedia" -target_path "$env:windir\system32" -target_file_extension ".exe"

Optional Dependencies:

  • ssdeep: For obtaining ssdeep fuzzy hashes (useful for finding similar files). You must extract the ssdeep ZIP file (available here) into a subfolder called “bin/ssdeep-2.14.1”.
  • Sysinternals Handle: For obtaining the open handles of a given process. You must place handle64.exe (available here) in a subfolder called “bin/sysinternals/handle”.

How can I contribute?

  • Share it with friends
  • Provide feedback

TODO

  • Use a more reliable method for determining children processes (and for stopping them)
  • Add other hashing algorithms (e.g. Imphash, vHash, Authentihash)
  • Use Logman.exe (or equivalent) to determine which ETW providers are being populated by a given process.
  • Use SilkETW (or equivalent) for vastly improved runtime metadata gathering.
  • Identify runtime deltas in different executable versions. (e.g. when a new command-line switch is added to the standard output)