Graph Mining for HPC Analytics – Introduction

Hello, my name is Luis Bobadilla. I will be writing my findings as I conduct my research for my masters thesis. In this post I will set the background and related work that my thesis is based on.

As a graduate research assistant for the laboratory for knowledge discovery in databases (KDD Lab), a machine learning research lab, I am tackling a problem for an ongoing project that many students are on. The project is named the HPC Analytics project.

At Kansas State University we have a high performance computing cluster called beocat ( Our goal is attempt to have the most efficient system utilization and resource allocation cluster. Beocat users span from all disciplines at the university, from biology, statistics, and a range of engineering departments. When users submit their jobs to beocat they specify the number of nodes, cpus, memory and time limit. This sometimes leads to an issue for less experienced users. They can either underestimate or overestimate some of those parameters leading their job to fail or use up unnecessary system resources.

As a team, we have done a few approaches to predicting the memory and cpu needed for a given job. These approaches can be read in the following two papers.


What I’m working specifically is adding a few more features to the dataset involving role extraction. In the next post we will discuss the set-up for the graph database used (NEO4J).

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s