Marco Conati

About me

Hello! I'm Marco, a current MS Robotics student at the University of Michigan. My previous exploits include a BS Engineering from Harvey Mudd and 1 year stint as a Systems Engineer at Trellisware Technologies. I'm interested in AI and controls, hoping to contribute to the development of human-centered AI or intelligent robots. For a quick look into my background, check out my resume! Or, for a much more thorough look at my technical work over the years, check out my projects.

On a less academic note, I'm a big fan of swimming, cooking, cats and riddles. Feel free to send me a message!

Ready for a swim at NCAAs 2022:

Cooking with close ones:

Family cats! Max on the left and Leo on the right:

Projects

Here is a running list of my completed projects, ranging from silly programs to research at HMC/UMich and personal projects. For an example of code I built and managed solo, I recommend checking out Style Transfer with Transformers. This is from my senior year undergrad (2 years ago), but my more recent research is not public (yet) and more recent projects (like ARMLAB), were done in a group. Each item links to a small project summary page, and bolded projects are ones I'm most proud of:

Also, about half of the projects pages have tangentially-related AI generated art in them, which I find pretty fun

Chopin Style Transfer

My code for this project is here. This was my second project working with Prof Tsai in the MIR lab.

This project aimed to generate Chopin-esque piano scores. In my literature review, I found that state-of-the-art score generation models struggled to emulate the long-term structure and style of human-composed pieces, resulting in forgotten motifs, over-repetition, jarring jumps, sparse sections, and distasteful chords. To maintain structure, we constrained our generative process, building our score from a database of left- and right-hand Chopin measures. At a high level, our approach started with the right-hand part from a random Chopin piece, built a left-hand score from the database of left-hand measures, and then repeated the process to replace the original right-hand. As we were using Chopin's measures directly, musical style would be maintained, and the structure of the original right-hand should persist through generation. I compiled the left/right hand measure database, created a BERT model with a custom encoder for note events, and trained two such BERT models to predict right/left hand compatibility and next measure prediction, respectively. My custom transformer embedding treated each measure as sequence of note events, each defined by the hand configuration along with the octave and pitch of the lowest note. This project was put on hold as I graduated in the Spring, but it is being resumed this Fall by a new group of students. Future work will use the two transformers in concert to generate new scores based on continuity between measures (next measure prediction) and compatibility between hands.

ARX, RL and LQR

Access my report here for more details.

This project involved applying the AutoRegressive with eXogenous inputs (ARX) model to estimate the dynamic coefficients of the classic Lotka-Volterra model, which describes the predator-prey dynamics between species, such as rabbits and foxes. By accurately modeling these interactions, we aimed to implement effective control strategies to manage population levels.

I developed two distinct control approaches for regulating the rabbit population around a desired setpoint of 15 rabbits. First, a Linear Quadratic Regulator (LQR) was employed, which involved linearizing the model at each timestep to apply control inputs effectively.

Additionally, I explored a Reinforcement Learning (RL) strategy, where the model was rewarded for maintaining the rabbit population at the setpoint. To this end, I implemented and trained DDQN in Pytorch. Access my code at the bottom of the report, but note that it isn't my most organized work, as this wasn't a formal project.

Pendulum Filter stuff

Access my reports here and here for full details. The first report implements and compares EKF, UKF, GHKF, and PF on a simple pendulum. Their mathematical formulation is explained in detail, and performance is compared over a variety of measurement noise levels and time intervals. The second report extends these filter techniques, implementing ERTSS and PMCMC for both system identification and smoothing tasks.

LVLM Misinformation Detection

Explore our project's full report here and access our code here. This project was part of a collaborative effort within the AI Ethics Initiative group.

In May 2024, our team embarked on developing a robust system powered by Large Vision-Language Models (LVLM) to detect multimodal misinformation, inspired by the foundational work in Xuan et. al’s LEMMA study. We adapted the original LEMMA system to utilize open-source models only, specifically transitioning from GPT-4V to LLaVA, aligning our tools with accessible technologies.

This new model had a dramatic performance decrease from LEMMA, with accuracy falling from 82.4% to 52.1% in detecting Twitter misinformation using LLaVA. This prompted us to explore advanced techniques such as fine-tuning and Chain-of-Thought (CoT) prompting, guided by the insights from Kovach et. al’s Blur paper. These strategies significantly enhanced our model’s effectiveness, ultimately boosting performance up to 76.5%.

We additionally ran experiments regarding time-restricted retrieval, model finetuning, and modifying the LEMMA architecture itself, with full details in the report

Low-Resource Pretraining

My code for this project is here. This was my first main project in the HMC MIR lab. It involved building upon the work of Stanford's Liang et al. in their Codified Audio Language Modeling paper. I also build upon my previous experience with Liang et al.'s code from my instrument detection project. They had demonstrated that audio features extracted from the middle layer of OpenAI's music generation model, Jukebox, contained rich information for downstream MIR tasks. We sought to improve feature extraction by finetuning the output layers of a Jukebox model for benchmark MIR datasets. To this end, I created a modified Jukebox-style model with a flexible number of unfrozen layers and MIR-specific training objective in Pytorch. I successfully implemented the modified model, but the additional finetuning had no significant effect on our targeted benchmarks.

Instrument Detection

Building upon the work of Stanford's Liang et. al, I worked with Alec Vercruysse to demonstrate that codified audio language modeling can learn useful representations for instrument identification. To this end, we used representations from layers within Jukebox: a generative model for music. Jukebox representations are used as input features to train shallow probes; each probe is trained to identify the presence of a specific instrument. As input data, the OpenMIC-2018 dataset was used. This dataset contains 10 second excerpts of audio that are partially labeled for the presence or absence of 20 instrument classes. Our results indicated that our probes were unable to match state-of-the-art performance. Further investigation also revealed that while some of the best performing models were achieved with few to no hidden layers, the probes still learn a complicated representation of Jukebox output features.

Our full report is here and code is here

Embedded report:

Marco's Resume

If the embedding is broken, here is a link to it!

Numpy Neural Net

Full disclosure, this "project" is the result of homework in prof Tsai's deep learning for engineers course. For the first 4 weeks of the course, we slowly built the components of a neural net in numpy. This assignment involved putting those pieces(weight initialization, activation functions, loss function, forward/back prop, and gradient descent) together into one notebook. The neural net was then used as an autoencoder for representing 8 digits in a 3d space. My code is here . Also, the larger dlStems folder has other neat assignments from prof Tsai's course if you found this interesting!

Learning Pytorch

During the pandemic, I wanted to learn more about how neural nets are built in practice. In researching this, I found the Pytorch libraries. While studying Pytorch, I made this document. While my document generally follows the official Pytorch tutorials, it is very informal and more in line with my thought process while learning. This manifests as much less organization and frequent detours to fill in gaps (like why are GPUs good for Deep Learning):

Recreating AlphaZero

As a learning experience during the pandemic, I undertook a project with Natalia Orbach-Mandel and Ari Conati(my brother), to implement the AlphaZero paper. This paper had revolutionized AI chess and Go bots in 2017, convincingly beating the world champion in each game. Our main goals were to learn about Pytorch, gain practical Reinforcement Learning experience, and have fun. Here is the code! I am proud of this work because of the context; it represents us using the lockdown to better ourselves.

Self-studying RL

In preparing to recreate AlphaZero, I completed David Silver's(one of the AlphaZero researchers) RL course on YouTube. While completing this course, I took pretty extensive notes, which I am transferring to digital form here.

ROAHMLab research (current)

I am working with Professor Vasudevan's ROAHM lab, with the goal of enhancing the dynamic behavior of his robotic manipulators. The code I am building is based off of armour-dev. Armour-dev provide provably safe, real time manipulator motion (I recommend reading the paper to see how its guarenteed!). However, this motion tends to be conservative, with joint velocities leveling off well below their limits. I am designing and running experiments to identify/correct this behavior. Some examples of limiting factors could be overly strict constraints in the trajectory planner, introducing soft velocity constraints in the optimization problem, and reformatting the high level planner to pick more aggressive waypoints.

ARMLAB

This project was completed in a group during my first half-semester at UMich. We programmed a 5-DOF arm to recognize and localize blocks, path plan, and complete pick and place/stacking tasks. I wrote all of our forward and inverse kinematics code for this project, and worked in a pair to develop our simple motion planner. Our code for this project is available here.

BOTLAB (current)

In a group, I am building a mini forklift using the UMich MBot platform. MBot is unique as it can be built from relatively inexpensive parts (Jetson Nano, Raspberry pi, RPLidar, DC motors) and is open source. We will enhance the platform with a low level position controller, SLAM system, computer vision system for identifying mini crates, and 3D-printed gripper for grasping and moving said crates.

Barton Research Group (current)

Working for professors Dawn Tilbury and Kira Barton, I am developing software and a controller for a 3D additive manufacturing printer specializing in conductive materials. The printer uses air pressure to extrude a silver based ink, and an A3200 motion controller to move the substrate. I am developing a process model to relate printing parameters, such as air pressure and stage speed, to the width and resistivity of the resulting line. Then, the controller will incorporate visual feedback to ensure proper printing.

EKF Blimp tracking

This project involved tracking a blimp using recorded(but very noisy) compass, GPS position, and thrust input data. The underlying, smooth path was determined using an EKF:

The EKF uses vehicle dynamics to predict the position, and updates are made with the measurement value(although the measurement noise is considered in the correction step). Here is the code, with thorough comments!

PID-based Aquatic Robot

Sorry for the grainy picture, this project is from 2018 and at the time of making this site I have lost a lot of the materials. For this project, I worked in a team to build an Aquatic robot with homemade sensors. The robot was meant to navigate around Dana Point and collect water clarity, turbidity, and wind speed. Here is our report detailing the homemade sensors:

Embedded report:

I was primarily responsible for the navigation. Unfortunately, I lost the code at some point, but our robot was generally able to navigate well. The PID was purely GPS based(and our GPS had 3 m uncertainty), so the robot movement was pretty crude. Here is an example of it heading to a waypoint and back:

Particle filter localization

This project involved localizing a simple robot using its onboard LIDAR and IMU. The prediction step was conducted using the robot's IMU data and system dynamics, while the LIDAR was used for localization corrections. Our predicted path closely mirrored the actual, "desired" path, and outperformed pure GPS localization. I worked on this project in a group(with Evan Hassman and Bowen Jiang) for the State Estimation course(E205) at Harvey Mudd. E205 was unique in that there were only four labs during the semester, so each of them was quite intensive(and that's why I justified including this as a project). Here is our report and code!

Embedded report:

Door Security-Syntiant

This was my senior capstone project, where I served as Team Lead for a group of five students in collaboration with Syntiant Corporation. We developed an entrance security system composed of a battery-powered tag that can be deployed on doors in secured areas to track their position in addition to software to identify anomalous activity. During this capstone, I wrote the event and state estimation code, while the rest of my team collected training data and designed a custom PCB for deployment. My code was comprised of two main components: a Kalman Filter for estimating the angle of a swinging door and a Convolutional Neural Net (CNN) for identifying anomalous events. The Kalman Filter was built in C++ and processed 9-axis Inertial Measurement Unit (IMU) data in real time to localize the sensor. I used the IMU's gyroscope to predict changes in angle and magnetometer for error corrections. My event detection system passed frames of 1-D sensor data to Syntiant's NDP120 deep learning processor, where an onboard CNN identified key door events like knocking and opening. We were able to successfully deliver a prototype to Syntiant for further development into a marketable product. Unfortunately, as this project was completed for a company, the code is closed due to an NDA. But, this(pdf) poster about our project was signed off on for public viewing:

Embedded poster:

Spatial Navigation-Millennium

This was my junior clinic project, where I worked in conjunction with four other Harvey Mudd students and liaisons from Millennium Space Systems. We were tasked with augmenting a star tracker system to localize a Low Earth Orbit satellite in the event of a GPS lockout. We developed two means to do this. The first used the Earth's horizon as a reference, which could then localize the satellite in conjunction with the star tracker's attitude quaternion. Our second method determined position by observing the refraction of stars behind the earth's horizon. As this project was completed in collaboration with Millennium, the code is closed due to an NDA.

Analog and Digital Filters

As a final project in my junior year, I worked with a lab partner to design and implement two low-pass filters for isolating a 10 Hz sine wave from a 10 Hz pulse train. This requires a low-pass filter to remove all of the high frequency noise. One of the filters was an analog 4 th order Butterworth filter, implemented directly on a breadboard using the Sallen-Key circuit. The other filter was implemented digitally as a 10 th order comb filter. Both filters successfully isolated the 10 Hz sine wave. Here is our report(pdf). It has our Arduino and Matlab code at the end.

Embedded report:

Musical Performance Tracking

In conjunction with Brandon Apodaca, I created the backbone of a music tracking application. This system could ideally be adapted to create an application for following a performance and advancing the sheet music accordingly.

Our system utilized Dynamic Time Warping to align query audio to the reference music score in realtime. Our system was reliably able to converge for monophonic audio performances. Here is our paper and our code!

Embedded report:

A* search City Design

I completed this project as a fun way to familiarize myself with A* search (and graph traversals in general). The motivation is that a city planner wants to add trails, gravel roads, and paved roads between n settlements, such that the longest travel time in the network doesn't exceed d days. In this scenario, no trail<trail<gravel<paved for travel speed, but cost also increases as connection quality increases. So, this program wants to avoid making the more expensive connections if necessary. This program is a tool that takes in n, d as parameters and plots a possible network that meets the requirements. Since this was just a learning experience, the program is not very smart. It works with the following pseudocode:

Given settlement locations [x,y], an equation f(distance, trail type) that gives travel time in days for a certain distance and trail type, and a maximum travel time d days.

Loop:

Use A* to find the longest path in the current network in days
If it is longer than d days, upgrade the longest path (nothing<trail<gravel<paved)
Else if longest path is less than d days, return the network!

The code isn't the smartest since it will start making trails that might not end up getting used once better roads are built. But, it does do a good job avoiding roads until they are necessary, which could be valuable assuming roads are more costly. An easy improvement would be to make one final pass and remove unnecessary connections at the end. Here is an example of a generated network where n=15 settlements and d=15 days:

It does exhibit some good behaviors like saving roads for long connections. But, it is certainly not ideal.

Shazam Paper

I implemented the Shazam music recognition algorithm in numpy as a project for prof Tsai's DSP course. My system was able to reliably create a database of songs and then match noisy clips of those songs to the originals. The general algorithm is:

For creating a database:

Take the Short-Time-Fourier-Transform of a song. The STFT contains information about which frequencies are present at different points in time. The cover art for this project is an STFT
Identify peaks in the STFT that have the highest magnitude. These points represent the "loudest" notes in the music and are thus most likely to represent the desired signal instead of background noise.
Pair peaks into hashes. A pair of peaks is defined by the frequency of peaks and the time between them. Hashes are more likely to be unique to a song(many songs will have the same notes, but pairs of notes are more unique)
Store a collection of hashes for each song. The greater the amount of hashes, the more space is necessary, but accuracy also increases

For identifying a song:

Use the same process from creating the database to find hashes for the query
For each song in the database
1. Find all matching hashes between query and database song. Calculate the time difference = (time of hash in query)-(time of hash in database song) for each hash
2. Group the hash time differences into a histogram
3. The highest value in the histogram is the score for that database song. The intuition here is that the matching song will have many hashes with the same time offset (time offset caused by the clip not necessarily starting at the beginning of the song)
Return the highest scoring database song

CNNs and Autoencoders

To get practical experience with autoencoders, I solved the MNIST handwritten digit classification problem in two ways:

Direct classification of handwritten digit images with a Convolutional Neural Net(CNN)
Training an autoencoder on digit images, and using it as the backbone for a CNN

I used a small dataset to make training more difficult. As a result, transfer learning from the autoencoder created a much smoother and quicker training curve. Here is my notebook!

LSTM Numerical Translation

As my first introduction to LSTMs and Sequence to Sequence models, I took on a numerical/word translation problem. Given a numerical symbol(I.E "8"), my LSTM would translate it to the word representation(I.E "eight"). My notebook is split into three parts:

Data Preparation and Research: First, I had to create the numerical/word dataset. I also had to research the sequence to sequency problem so that I could implement it in Pytorch.
Implement and Train the model: I trained the model using my dataset, plotting my loss as training progressed. I also experimeted with different model hyperparameters.
Analysis: I tried to identify what failure modes the model experienced.

My notebook is thoroughly commented, have fun checking it out!

RNN Sentence Classification

For my first project with Recurrent Neural Networks(RNNs), I created an RNN model to classify sequences of characters(sentences, words, letters) as Spanish or English. My notebook does the following:

Create a dataset: I started with the Spanish Billion Word Corpus and torchtext English Wikipedia text files. I created a Dataset of 100 character long chunks in each language.
Train the model: My RNN had a simple classifier, which used the RNN cell state at the end of a sequence to predict whether the sequence was Spanish or English
Experimentation: I experimented with various RNN hidden sizes and RNN layer numbers
Analysis: I tried to get intuition for what letters/phrased led to the RNN decision

Inverted Pendulum Simulation

As the final project in Harvey Mudd's advanced signals and systems course, I designed and simulated an inverted pendulum on a cart subject to distrubances. Attached and embedded is my project report! My simulink diagrams and matlab code are at the end of my report.

Testing array searches

When I was learning about array-searching algorithms, I wanted to cement my understanding of them. To this end, I implemented bubble, selection, quick, and merge sort. Each program makes a randomized array of size 10000, and then measures the sorting process runtime, printing it out. The code is not too well commented since this was done as a learning experience.

Solving a riddle

A fellow Mudder gave me the following riddle:

Imagine you are a scientist with 2 rats, 9 bottles, and one hour. One bottle contains poison which will kill a rat in exactly 30 minutes if consumed. How can you feed the rats from the bottles to determine the poison? What if you had r rats and n minutes?

I thought this was pretty fun to solve, so I made a dynamic-programming solution for the r rat, n minute case here!

Datastructures

For Harvey Mudd's Datastructures course, we completed a variety of courses to build comfort with C++ and familiarity with common datastructures. This project highlights a few of my favorites. Each element in the list links to my code for the course:

Contact

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is ^superscript text and this is _subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5

Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Alternate

Dolor pulvinar etiam.
Sagittis adipiscing.
Felis enim feugiat.

Ordered

Dolor pulvinar etiam.
Etiam vel felis viverra.
Felis enim feugiat.
Dolor pulvinar etiam.
Etiam vel felis lorem.
Felis enim et feugiat.

Icons

Actions

Table

Default

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Alternate

Name	Description	Price
Item One	Ante turpis integer aliquet porttitor.	29.99
Item Two	Vis ac commodo adipiscing arcu aliquet.	19.99
Item Three	Morbi faucibus arcu accumsan lorem.	29.99
Item Four	Vitae integer tempus condimentum.	19.99
Item Five	Ante turpis integer aliquet porttitor.	29.99
		100.00

Buttons

Icon
Icon

Disabled
Disabled

About me

Projects

Classical Robotics/Controls

Deep Learning/NLP

Reinforcement Learning

Signal Processing

Programming

Chopin Style Transfer

ARX, RL and LQR

Pendulum Filter stuff

LVLM Misinformation Detection

Low-Resource Pretraining

Instrument Detection

Marco's Resume

Numpy Neural Net

Learning Pytorch

Recreating AlphaZero

Self-studying RL

ROAHMLab research (current)

ARMLAB

BOTLAB (current)

Barton Research Group (current)

EKF Blimp tracking

PID-based Aquatic Robot

Particle filter localization

Door Security-Syntiant

Spatial Navigation-Millennium

Analog and Digital Filters

Musical Performance Tracking

A* search City Design

Shazam Paper

CNNs and Autoencoders

LSTM Numerical Translation

RNN Sentence Classification

Inverted Pendulum Simulation

Testing array searches

Solving a riddle

Datastructures

Contact

Elements

Text

Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5

Heading Level 6

Blockquote

Preformatted

Lists

Unordered

Alternate

Ordered

Icons

Actions

Table

Default

Alternate

Buttons

Form