GroupHiGraph

HiGraph: A Large-Scale Hierarchical Graph Dataset

Hierarchical Graph Dataset for Malware Analysis with Function Call Graphs and Control Flow Graphs

Han Chen1Hanchen Wang1Hongmei Chen2Ying Zhang1Lu Qin1Wenjie Zhang3
1University of Technology Sydney, 2Yunnan University, 3University of New South Wales

Interactive Graph Visualization

Explore the hierarchical structure of malware samples through our interactive visualization tool.

Dataset Overview

499K
Function Call Graphs
200M+
Control Flow Graphs
6.17GB
Compressed Size
11 Years
Time Span (2012-2022)

Hierarchical Graph Structure

HiGraph models each application as a hierarchical graph, preserving both local and global structural information

Program Level
Function Call Graphs (FCG) capturing global program structure and inter-function relationships
Function Level
Control Flow Graphs (CFG) representing detailed intra-function logic and control flow
Malware Analysis
Rich semantic information enabling advanced malware detection and classification

Download Dataset

Access the complete HiGraph dataset through Hugging Face

Hugging FaceDataset SizeSamplesTime PeriodLicense
Dataset Size
6.17GB

Compressed dataset size

Time Span
2012-2022

11 years of samples

License
CC-BY-NC-SA

Creative Commons

Updates

Changelog

Latest updates and improvements to the HiGraph dataset.

Aug 17, 2025

Dataset update with new malware threshold.

  • Updated malware detection threshold from 10 to 15, resulting in a refined dataset of 499K Function Call Graphs (previously 595K+).
  • Removed abstract section from the project page.

May 16, 2025

Initial release of the HiGraph dataset.

HiGraph, a novel, large-scale dataset that models each application as a hierarchical graph, is made publicly available. This initial version includes over 200 million Control Flow Graphs (CFGs) and over 595,000 Function Call Graphs (FCGs).

Future Plans

Continued development and expansion of the HiGraph dataset.

  • Regular updates with new samples and features.
  • Integration of more advanced graph analysis tools.
  • Community contributions and collaborations.