I'm a data engineer and environmental data enthusiast passionate about empowering scientists with tools to aquire and analyze data. My largest side projects focus on
- Remote sensing data analysis and publishing related research
- Using infrastructure-as-code tools for the orchestration and containerization of workflows (MlOps
- Webscraping and archiving
📫 Questions? Connect with me at:
LinkedIn • [email protected]
| Section | Technologies | Project |
|---|---|---|
| Infrastructure-as-Code | Ansible, Terraform, Bash | ml_ops_tree_learn: Object detection MLOps |
| Prometheus, Grafana, NodeJS, Docker | Robust Observation IaC: Distributed compute observation stack | |
| Geospatial & Remote Sensing | Open3D, PyTorch, OpenCV, Rasterio | pyqsm: Image processing and spatial algorithms |
| NumPy, MatPlotLib, GeoPandas, GDAL | canopyHydrodynamics: Simulating water movement within tree canopies | |
| Data Engineering / DevOps | DLT, DuckDB, Web Scraping, Streamlit | LinkedInScraper: Automated data acquisition |
| GitOps, Pandocs, PyPI | canopyHydrodynamics: Robust GitOps CI/CD workflows |
|
Note
Detailed project descriptions are available via dropdowns.
GitOps, NumPy, MatPlotLib, GeoPandas, GDAL, Pandocs, PyPI
Simulating water movement within tree canopiea under varied meteorological conditions.
Identifies key structural traits:- Stemflow and throughfall generating areas of the canopy
- The 'drip points' to which throughfall is directed - complete with their relative volumes
- 'Divides' and 'confluences' within the canopy that dictate the flow of water through the canopy
Leverages GitOps for robust CI/CD capabilities.
- automated linting and testing for all changes
- dynamically created version upgrade branches
- auto-generated method documentation
- Versioned deployment automated for release branches
Prometheus, Grafana, NodeJS, Docker, Bash
Ansible roles/playbooks for deployment of a horizontally scalble cluster of multi-tenant worker nodes. Centralizes/visualizes telemetry data (Prometheus, Grafana, etc.) for the monitoring of multi-container worker nodes. Includes example use case for contribution to community archive project ArchiveTeam.
Consists of:- Docker containerization for cross-platform compatability
- Prometheus for node management/aggregation
- Graphana dashboards for visualization
- Dozzle for docker log aggregation
🌲 pyQSM
SciPy, Open3D OpenCV, Rasterio
Image processing and spatial algorithms to clean and segment trees and their components within terrestrial LiDAR point clouds.
Key functionality includes:Laspy, Terraform, PyTorch, Open3D
An MLOps pipeline for configuration and deployment of a convolutional neural-net on GPU-enabled, cloud-hosted clusters.
Automates the provisioning of Digital Ocean GPU droplets to allow users to leverage CUDA friendly compute. Designed as a 'one-click' solution enabling researchers without specialized hardware to process LiDAR data at minimal cost.DLT, DuckDB, Web Scraping, Streamlit
A DLT pipeline leveraging a LinkedIn's 'hidden' Voyager API to retrieve job and company data.
- Built on DLT which provides a UI for viewing pipeline status, exploring data
- Custom DLT source automatically handles REST requests, pagination, data extraction and relational DB storage
- Predefined endpoints/available datasets
- `get_companies`: scrape followed companies via GraphQL profile components
- `get_job_urls`: fetch job cards per company
- `get_descriptions`: fetch job descriptions and details crawler
- Extensible, with additional resources configured via json








![LI Scraper Streamlit UI]](/wischmcj/wischmcj/raw/main/imgs/li_scraper_ui.png)