👾 Home
About
Experience
Projects
Contact
**Doctoral Thesis (Prof. Martin Swany) - Runtime System and Programming Model for Scalable Graph Processing (C++, Python, SST)**
- Developed a programming model (Gmap) and runtime system (Gmachine) for scalable graph processing.
- Created Graph primitives such as traversals and set intersections with hardware support for accelerating graph processing.
- Engineered dynamic object relocation for dynamic load balancing and contention mitigation.
- Architected hardware accelerated routing mechanisms using TCAM support in routers via longest prefix matching
- Created global address space using object-based addressing rather than the conventional byte-addressing
Novel computer architecture for Graph Processing - Continuum Computer Architecture(CCA) - PilotCRIS
- Project Name: AGILE | Funding Agency: IARPA and U.S. Army: Competing as one of six performers among Intel, AMD, Qualcomm, etc. in the country to develop a novel architecture simulation for the world’s fastest supercomputer with current enabling technology to get a speedup of about 200x from conventional supercomputers..
- Developing a proof of concept behavioral emulator in python and Structural Simulation Toolkitalong with a cycle accurate FPGA-based for active memory architecture to run experiments for reducing architectural overheads prevalent in conventional hardware for general-purpose computing, specialized for dynamic graph processing for applications in the field of AI and ML, n-body simulations, and Adaptive Mesh Refinement
- Enhancing performance by reducing starvation, latency, overheads, and contention via a ParalleX based execution model, hardware mechanisms for global namespace translations, adaptive routing and reordering of a message based runtime system, and graph primitive operations
- Projected an instance of CCA to yield 600x peak performance improvement, 300x increase in memory bandwidth, and 95% reduction of physical footprint compared to Sunway TaihuLight
High Performance Computing (OpenMP, MPI, C/C++); Master’s Thesis
- Reduced time to solution by 90% with a message driven runtime system (like Charm++) or Graph500 by conducting a comparative analysis on scaling results for graph processing algorithms like single source shortest path (SSSP) algorithm
- Slashed 39% execution time on graph processing by creating a parallel variant of a graph algorithm (Dijkstra’s algorithm) on shared memory processors using OpenMP, MPI and parallel boost graph library on a graph size of 100GB
Natural Language Processing (C++11, Python, NLTK, SciKit, Alexa Skills Kit, SpaCy, coreNLP, Stardog, Neo4J)
- Developed a speech aided NLP based artificial intelligence bot with an end-to-end response time of ~300ms capable of storing information from simple English sentences and respond the questions with keyword search about the information already stored in the system using Stardog and Neo4J