Project Overview
Tracer.bio is a sophisticated platform designed to track and visualize the execution of system binaries. The current implementation relies on the procfs (process file system) for data collection. However, this approach has limitations, particularly in capturing short-duration process executions, which are common in bioinformatics workflows.
Current Limitations
The procfs-based system fails to consistently capture short-lived processes, leading to incomplete tracking of bioinformatics shell scripts. Specific examples of missed processes include:
featureCounts
bamCompare
multiBamSummary
plotPCA
These limitations are especially evident when running scripts like the RNA-seq workflow found here after signup and installing the client on develop.app.tracer.bio
Proposed Solution: eBPF Integration
To address these limitations, we propose integrating eBPF (extended Berkeley Packet Filter) technology into our Rust daemon. eBPF will allow us to trace system calls at the kernel level, providing a more robust and comprehensive method for capturing process executions, including short-lived ones.
Task Description
Objective
Implement an eBPF-based solution within our Rust daemon to accurately trace all process executions, including short-duration ones, and integrate this data into our existing system.
Repository
The current implementation can be found at: https://github.com/davincios/tracer-daemon/tree/main/src
Implementation Steps
Fork the Repository: Create a new branch for the eBPF integration work.
eBPF Program Development:
Develop an eBPF program that efficiently traces sys_execve system calls.
Ensure the program captures all relevant information about executed binaries.
Rust Integration:
Implement an interface in the existing Rust-Daemon to load and communicate with the eBPF program.
Process Filtering:
– Utilise the existing src/config_manager/target_process/targets_list.rs to filter the captured processes names.
– Implement efficient filtering to minimize overhead.
Data Integration:
– Modify src/process_watcher.rs to incorporate data from the eBPF program.
– Update the add_new_process function to handle eBPF-sourced data or create a similar new one.
Concurrent Operation:
– Ensure the eBPF solution can run concurrently with the existing procfs-based system for comparison and gradual transition.
Testing:
Develop unit and integration tests for the new eBPF functionality and ensure that they pass the github CI/CD pipeline.
Deliverables:
– Fully functional eBPF-integrated Rust daemon capable of tracking all process executions.
– Comprehensive test suite demonstrating improved capture of short-duration processes.
Budget: $1,000
Posted On: August 11, 2024 07:09 UTC
Category: Back-End Development
Skills:API, RESTful API, C++, Software Architecture & Design
Country: United States
click to apply
Powered by WPeMatico
