At Sunday I had started working and was getting up to speed. The teams I managed were finishing up some projects and were producing well with no issues. I wasn’t fully engaged on the next project yet and was looking for something to occupy my time and
provide value. I noticed that we had a support process that was pretty ad-hoc, all communications were done in slack chat rooms and I had already heard complaints from the engineers and the operations teams that issues were ignored,
priorities weren’t acknowledged and that it was overall a mess for everyone. Since I had some time and availability I started researching tools that we could use for helping this out. The company was paying for Jira for software
support, but they also had it for helpdesk work but were not using it. I took my experience to set up a Jira project and issue flow then found several plugins for Slack that would help take issue requests and submit them as tickets
in Jira. After some experimenting around, I enabled one plugin that allowed teams to continue to communicate in Slack but all conversation and information would flow into Jira. This allowed the teams to use Jira for status and
statistics tracking, allowed me to create dashboards for tracking and categorization and enabled us to report back to the operations team and even list which issues were client related or training related. It was a huge boost to
data measurement and productivity and didn’t significantly impact the day to day operations of either the engineers or the operations teams. When I was there the daily volume of incidents increased by 30% but because of using Jira
to manage and track, we increased turn around time and reduced the support backlog from averaging 50 issues open per day down to 20 issues open per day, and from having issues open for weeks down to days. This bolstered morale
of the engineering team and the operations team. We also used the captured data and root cause analysis to define product enhancements to prevent new issues during future growth.
At Feedzai, one of the major issues that was occurring was being able to scale the teams with the projects. Feedzai developed a core product, but each implementation was highly customized and the Customer Success team was heavily project-based. Teams
would be established for a project (1 PM, 1-2 Data Scientist, 2-3 Software Engineers) and that team would work to implement the project and client for 6 - 18 months. As a project would wind down you could remove a few people from
the project to put on other projects, but that eventually allowed only a single engineer to do everything for the project. That person would have intimate knowledge and could work quickly with the client, but that person was also
the only person who could support the client, and if that person left on vacation or left the company they would end up taking all their knowledge with them, they also were not motivated to share that knowledge. This building of
knowledge silos was a major issue for us as for each project we needed at least 1 engineer eventually and this wasn’t really scalable. Within the first 3 months of joining feedzai I proposed forming teams to handle multiple projects
at once. Using 2 project managers, 3 data scientists and up to 6 engineers, we were able to easily manage 4 concurrent implementation projects, and that same size team could handle 10 projects when in a maintenance mode. With the
shared resources the engineers could move between projects and tasks quickly, they could standardize practices and they could share resources and knowledge. This also helped with onboarding a new team member in that they could
pair with several engineers on a variety of projects. In the three years I was at Feedzai, the number of clients we had doubled, without this team structure I would have required about 200 engineers in my organization to handle
that work, but with the use of this we only needed to increase staffing by 30%.
When working at Feedzai, one of the issues our teams had was in setting up and implementing our software packages at our client sites. The core product was simple to install, but adding in solutions packages, custom modules and tuning
several hundred configuration settings was difficult, time consuming and laborious. When working with my lead engineers and managers we adopted Ansible as our default packaging and push deployment solution, developed standard installation
packages for that, and worked to train and educate the engineers and clients on it’s advantages. But we still had issues in that the core product did not allow easy automation. Two engineers had worked on installation and configuration
tools but these tools required a java application to be compiled as they used low level API calls into the core product to handle settings. I worked with those engineers and convinced them that external configuration files were
needed to improve installation maintenance, they worked to refactor their tool and developed a YAML based configuration system that could be integrated with the existing ansible scripts. This tool continued to evolve and add features
and it was used to not only configure the system but the export and import settings from one environment to another, this allowed us to make changes and then promote easily, saving weeks of time for some of our data science teams.
With this tool, we were also able to work with the product engineering team to provide new features to make this work easier and simpler. The configuration tool and use of ansible also allowed us to add a new category of technology
worker, the technical analyst, which was a devops focused role on installation, configuration and deployment allowing the software engineers to focus on customization of software. Seeing the problem, working with my team to find
a solution, but also guiding how that solution could be simplified was a great experience and one that I’m definitely proud of helping in gaining efficiency and effectiveness in our delivery.
When I was working at Honeywell Intelligrated I was part of an engineering team developing a product called the warehouse execution system. It was a planning and routing system for containers in a distribution center and helped online merchants get deliveries
to clients with only a short time frame and notice. The system comprised about 40 microservices, some tied to routing, scheduling, hardware interfaces with picking systems and conveyor controls and other supporting services. I
had 5 engineering teams working on various parts of it. It was a product that was several years in development and we were installing it and going live with the first client. During testing and pre-activation the system worked
well and we were confident that it would perform as needed. However, once the system went live as the daily processing continued, the system introduced greater and greater lags. We had blocked requests and at least one memory leak
in the system. The client was getting upset and had threatened legal action if the software wasn’t corrected. Because this was the first client, we knew we had to get everything back on track. I established a war room in our engineering
headquarters where I pulled all engineers, product and implementation team in to review, categorize, and prioritize known issues. We worked nights and weekends to fix as many bugs as possible and add new features, within 6 weeks
of doing this we delivered 6 releases (weekly) with an average of 40 fixes/features in each release. All while doing this I managed a tiger team of senior engineers, devops, and architects to review the performance issues. We installed
AppDynamics to help with tracing of calls between services and after 3 weeks of intense work, we found and corrected 3 memory leaks in services but more importantly found a thread queue that was getting filled and blocked. Once
we finally found that we could redesign the service and were able not only to get the system performance back within specification but also boosted performance by 40%. This allowed that first client to meet their shipping and delivery
needs and prevented them from canceling our contract which would have led to a significant revenue penalty.
When I was hired at Sunday, one of the main issues with the teams I inherited was that they were not feeling “safe” in the environment. The previous Director had been a “my way or the high way” type person. Had reorganized the team
without getting in put from the team or others and had basically caused a significant amount of chaos. Additionally the company leadership had been overly aggressive with the team pushing to get new products released and though
the team performed well and released on time, it was a challenging and stressful time. When I first joined and was interviewing my peers in the organization I was told that the team was underperforming several times. That they
were unsure of themselves and not experienced. I took that feedback and then interviewed my managers and all the engineers in my organization. The first thing I would say to anyone is that I’m there to help them, that I cannot
succeed unless they do and that I’m always open to input. I did this very specifically to let them know that I am their advocate at all levels. What I found out as I talked to members on the team, was that they were performing
well, there was one team that was heavily loaded with senior engineers, another team had more junior members, and people expected the two teams to perform equally. I took a few weeks to embed with each team to understand their
work and their processes, I worked with the managers to align practices in both teams and to baseline product delivery. I joined in meetings with product and leadership and would help defend timelines or highlight challenges. Doing
this, several things happened, first the teams understood that I was working to help them and allow them to develop, second I was able to gain confidence of others in planning and aligning the teams work and priorities, third I
gave data on the teams performance showing not only that it was higher than everyone expected, but with the reduced stress and some meetings I could take from them, delivery increased. Several people on my org were really appreciative
of that and commented that I was a person who helped them feel more comfortable and proud of the company.
At Cox Media, I spent much of my time building and hiring a team but also honing my agile project management practices and experiences. I became more familiar with Kanban style agile delivery (it was a main reason to join CMG at the
time). Later the organization moved back to Scrum for delivery and I was able to develop and practice the SAFe framework for program management. I became a vocal proponent in the organization on agile practices, including regular
backlog grooming, estimation, multi-team KPI metrics and defining standardized practices. I was a leader in the organization on this and was asked to join with the PMO to develop an agile practices handbook along with two other
engineering managers and several project and program managers. We spent weeks looking at practices, defining minimal requirements that all teams and projects should use (e.g. tools, formats, practices etc) and most importantly
we defined what was optional or what each team could choose to do. Some teams wanted Scrum, some wanted Kanban, we stated either was fine, but there are practices to follow for each. When we released the playbook for the CMG group,
it was picked up by Cox enterprises as the parent organization and was used as a base framework for additional projects in the larger enterprise. It’s really good to know the work that I did there was continuing to be used even
after I left.
When I was at Surgical Information Systems, I started out as an implementation engineer. We did the installation, configuration and training of the software to the client IT department. When I started each person did their own thing
on this and the practices weren’t very consistent. Over a year or so time, I started writing information and knowledge guides, we used it to help onboard new people, to give to clients so they can self install and maintain and
to provide consistency. I wrote over 30 separate guides on installation, configuration, disaster recovery, server and client maintenance. At one point, a new position opened up to be the support manager, I applied for that position,
but during the interview with the CEO for the role, I talked about the needs for consistency, training, and how I had developed these guides to help others. I talked about needing technical leadership and how training was important
for a technical team. I wasn’t give the support manager role, but the CEO liked my thoughts and ideas enough that he worked with the head of the implementation team and created a new role for Technical Solutions Manager. They asked
me to take on that role, which had 5 other Implementation Engineers and a DBA under it at the time. Because I was proactive in standardizing processes it let me start into my first role as a person manager which I was able to keep
with for several years and continue to this day.
One of my later projects that I worked on at Feedzai was one of our primary clients, they were going through a software upgrade and the original engineering team was no longer at the company. The way Feedzai worked was the product
we installed usually accounted for 80% of the client's needs, but for any project there was a portion of code that was modified and inserted into the product to meet specific client functionalities. This particular client was one
of our largest and highest volume clients. They operated 24 production servers in two data centers, averaged 1200 transactions per second and had a response timeout of 240 milliseconds. One particular item this client had was a
specific cache called a bloom filter. It was used to help list a blacklist of credit card numbers that were known to be bad, the filter is a probabilistic filter that will tell you that a number MAY be in the list, or will say
that a number definitely IS NOT in the list. This was custom and had been in place for many years and had little documentation about it. When we did the upgrade, our engineers had not realized that a column needed to be added to
a database table or that some of the code on the filter would automatically reject entries that were not matched properly. This caused the system to have a high error rate when we went live. It took our team two days of combing
through logs to isolate and narrow down the issue, and when done we needed to provide a solution. During this time the engineers found out more about how the filter worked, I worked to present to the client what the impact really
was (it was a high error rate of false negatives). I worked with the engineers to develop the fix plan, we coded, internally tested and released two separate fixes within 2 days, and presented it in the client's staging environment
a day later. I was able to explain what the filter was, how it worked, where the issue was found in our process, how we have corrected the process and the path to providing a fix for the client. Because of this, the client realized
it was an issue, but not as serious as first thought, and they were happy with the response time and effort I had the team put into getting the fix in. I spent a significant amount of time not only understanding the problem, but
looking at the code and working to understand how it was created in the first place.