COFFEE & DATA SCIENCE

Coffee??

You may be wondering why there's a dedicated page for coffee on this website. There is a perfectly good reason for that. TL;DR - You're not missing much if you don't reach the end of this page. This is just me ranting on my data science journey with coffee on the side.

Anyways, I didn't start out as an avid coffee addict nor did I pay much attention to anything coffee related. I was always busy working on my Jupyter notebooks for yet another hot off the press data science project. Coffee was just there to keep me going. Nothing more. Nothing less.

I'd usually start the day out with an iced Americano. A cold brew for the afternoon and an iced latte with extra double shots to close out the night. One day, as many aspring data scientists and machine learning engineers would, I ran into a wall. I found myself trapped.

Aside from numerous programming sins I've committed setting up my code, I couldn't bare the sight of another convoluted error messages. But, I wanted to push forward because that's what we do as data scientists and engineers. Completing one project after another, I was in the zone, just not content with the outcomes.

The thing was, the moment I would push a project over to the next step (let's call it a real life implementation), none of it would really work the way it was supposed to. What meaning does any of this have then? Do we mindlessly create this perfect sandbox and simulate scenario models for academic purposes? And no, I'm not talking about the Iris project.

All of these datasets were nicely cleaned up and ready for a smooth execution. What could possibly be the issue here? These datasets were impeccable yet lacked so many different aspects of what one would consider as a wholesome data (in its distribution, balance, attributes, and so on and so forth). Then again, why and where would I search for an alternative when even my hastily stacked 6-layer CNN model can easily reach 99.8% accuracy with ~50 epochs? Still, only few are willing to openly discuss their models performing at an abysmal accuracy with an actual real world data input. Perhaps, the train and validation sets were split incorrectly. Perhaps, the scope at the model was train on was just not feasible for real world inputs.

Well, I too moved on.

On the contrary, the situation was quite the opposite at work (in my previous role). The data I was given, rather ones I helped create with great restrictions and constraints, were nothing short of a disaster. As most of us realize (You will soon if you haven't already), the real world data is dirty and full of imperfections. Are we to just make general imputations and drop NA rows? At what cost?

Between riding on a reasonable timeline and keeping the management's expectations back down to Earth, I would spend most of my days scrubbing data from its source and weaving missing conditional data points from another remote source. Should I have named it ETLTLELT at this point? How would this change in data affect the model training and its behavior?

Am I a full-time data engineer in disguise now? When will I have time to glean any insights using my data science skillsets within the given 9-5 work hours? Working under uncertainty with a tight timeline definitely has its characteristics (I tend to drink more coffee when I'm pressed). Nonetheless, I made it work. I met deadlines. I made sure I brought more ideas to the table on what the next step should be. Also, I made sure the work put in was well reflected on the outcomes and vice versa. You know, I was still missing something. Oh, it could be documentations.

I know I must sound quite silly to you by now. Perhaps, you've already come up with potential solutions and mitigation tactics of your own in the midst of reading my complaints. Haven't you also wondered at least once where and how data science really shines in our lives? We've established that data science is not as easy to implement in production or in real life per se.

The culprit? I'm gonna say it out aloud and point the finger at the data (We knew that though). Garbage data. Garbage data science. Garbage models. No recyclables. Just a full stack of disappointments. I digress. So, where does coffee come in to play in all this? I'll try to make the background story as short and bearable.

There was this one coffee shop in Koreatown that I used to consider my second home (A lot has changed now. I have not been there in a long time). I spent a considerable amount of time there coding and drinking coffee. I eventually became friends with many of the baristas and I knew everyone's weekly schedule at one point. And as an unforseen result, I could tell when the sugar level for their signature cold brew latte was changed. I could also tell when the espresso wasn't fully dialed in. I guess all of this could be seen as a consolation prize for spending a lot of time and money at the place.

The point is, I naturally started to take notice of coffee and the coffee shop itself.

Well, I lied.

I'm gonna make a confession early. I didn't purposely start paying attention to coffee because I saw something in it. It was more of finding myself procrasting from coding and getting caught up in it. When you've been involuntarily looking at code chunks for hours, anything, I mean anything can be interesting. That's the truth (deep breath). Making coffee seemed so much more invigorating than my broken Python script. I started to observe the coffee bar more often as a poor excuse to take a break.

Weeks and months go by, now I can figure out what's what (at least that's what I thought). My conceited self decided that it was time for a practical implementation was in order (I mean, no surprise there). So I start to do my research on the equipments and tools. I was like a kid planning his Christmas present list. They say men never grow up, only their toys get bigger. I am a living testament to that quote.

In my head, I was already half way along to becoming a full-fledged barista. How difficult could it be? I've seen them leisurely mix drinks and call out people's names countless times. My kitchen countertop began losing its real estate inch by inch as packages started rolling in at my doorstep. To name a few: a Mahlkönig EK43 grinder, a La Marzocco GS3 Manual Paddle Espresso Machine, a Sanremo YOU Automatic Espresso Machine, Loveramics mug sets, and a handful of JibbiJug and WPM pitchers. Not to mention, a dedicated commercial water filter system under the kitchen sink. At one point, it became a sport to collect these tools (thank you instagram...). I would swoon over these shiny objects that I barely knew how to use. I just got a kick out of something else to do other than coding to be honest. Have I gone overboard? Of course, I have, and I didn't care.

And... Viola!

I unmistakenly had turned my kitchen into a semi-commercial home cafe.

Now what...

One fateful Saturday morning,

I drove down to the nearest coffee shop and asked for their bulk espresso package. Uh... a medium-dark roast eh? Whatever that means. I rushed home with a $85 coffee bag in my hand. Upon quick inspection, the beans seemed fine. It was what I was used to seeing over at my second home. I thought to myself, maybe I'll pick up theirs on my next visit. Do I use a single dose or a double dose basket? How many grams of coffee grind should I put in it? How coarse? How fine? With a deep bursting sound of groaning, I reached for my Macbook Pro and started reading as the aroma from the coffee bag filled my kitchen. About 30+ Chrome tabs later, I have gathered enough information to go on.

Next, I called Brian for help, who was the coffee shop manager at the time. We had become good friends over the years. Notably, it was Brian who helped me gather most of the larger equipments. He had agreed to come over and do a shotgun training on espresso. More and more I listened to him about how to extract espresso, my confidence diminished. His demonstration was graceful to say the least. My imitation was mediocre at best. I was reminded with each sip how bitter it was. I managed to butcher what was supposedly a full bodied high quality espresso. We were using identical beans and equipments. His espresso tasted so much better than mine. Is this the coffee equivalent version of it works on my machine?

What in the world...

I agonizingly suppressed my frustration in front of Brian as our session was nearing end. He was a great teacher. I knew that he's trained over hundreds of people over his career. I must've been somewhere over the left end of the tail in his trainee distribution. Again, I digress. I made sure I wrote down and memorized all of the important key points he emphasized. Honestly, not everything made sense, but I took his word for it.

The following Monday, and for the rest of the year, I made it my mission to practice pulling shots. Rest assured, I did not drink all of them. Otherwise, I may not be alive to be telling this little story of mine. I kept a tally sheet on how each shot was different (more so due to my execution and habitual mistakes were causing the differences than other elements). If I wasn't coding for work. I was pulling shots. I could tell you, my caffeine tolerance was immensely and progressively getting higher.

By the end of the week, I had already exhausted the 5lb bag. I would frequent the namely coffee shops in LA and OC for their coffee beans with great expectaions to what I was about to experience with espresso extraction. Some I detested for its crappy quality. Others I was infatuated with how complex and beautiful the beans were (Some of them can be very pricey). A few months down this road, I became good at extracting shots to reach a palletable taste, timing, stripes, volume, and complexity. It was simply amazing and I finally found myself having fun. Naturally, when you're having fun (with money spending involved), you willingly expand on it. Next, I set my eyes on Latte Art. It couldn't be much harder than espresso right?

I wanted to accomplish this part without Brian's help. I had made up my mind to figure this one out alone. By alone, I mean with YouTube, Instagram, Reddit, and other access to information from remote sources. Just not a real person. Not yet. That's how I also learned to code. I was familiar with doing things alone. I mean, I've seen them pour latte arts. You just steam milk and wiggle the pitcher while you pour. Heck, I've even been to WLAC a few times in the past. I was there witnessing many of the greats from all over the world completing with their pours. I mustered up the energy to take this head-on.

During my last trip to Costco, I remembered to pick up 6 gallons of whole milk, anticipating the number of gallons to match the initial trial and error. And to no one's surprise, my hunch was on point. What an awful screeching sound was I making trying to steam milk? Was I killing the milk along with the pitcher? The microfoam splash all over my kitchen seemed to indicate my utter imcomprehension.

Okay. YouTube it is. I quickly searched for how to videos. I must've watched some of the videos more than I can remember. Pause on one frame, rewind, and repeat. Ugh, these people make pouring so effortless and simple. Why doesn't mine come out like that? Yet, the blame was all on me, nothing else. After all, I've already paid for the best, what the pros use, what the cofee shops use. This so called discovery stage was getting quite frustrating.

I was glued to my phone. My eyes were fixated on Instagram reels with some guys doing Latte Art. The unsightly scene must've looked similar to my dog waiting for scraps from dinner, agitated and uncomfortable. I often fell asleep unknowingly with my phone on my chest and find my phone out of battery in the morning. About a week went by. I had found a short break time during work and decided I'd give it a go one more time before heading back into validating my sentiment model and whatnot.

Whoa. There it was - almost perfectly semetrical heart, but I didn't know what I did or how I managed to pull it off. I took a quick picture for the record. I hurriedly rinsed the cup and the pitcher. I had to replicate it. Have I finally found my random state or minimal learning rate for latte pouring? I needed to make sure.

I held my breath as I started pouring my second cup of the day. There it was. The foam began to surface on top of espresso. I carefully started to wiggle the pitcher. What a marvelous sight it was. I was finally getting the gist of pouring! Again, it wasn't picture perfect, but this was a sign of proof that I was getting somewhere. Last time I was in a pinch like this was from my machine learning engineering class at Berkeley.

We weren't given much information to go on and were asked to set a horizontal pod auto-scaler for our FastAPI model in Kubernetes deployment. We understood that we had to tweak our .yaml file for the change in deployment to take effect. For some odd reason, the pods weren't being replicated. Did we set the resource threshold target incorrectly? Each of us were trying different combinations to see if we could get the pods to replicate in the local environment, but to no avail.

We'd take turns taking time off our work to attend available TA sessions, only to get confirmations that we had done everything right (theoretically that is). Is our .yaml file cursed beyond repair? The submission deadline was approaching and our frayed nerves were visible. We didn't lash out on each other.

The silence over the Slack call was deafening. Sheer will to pass this class was holding us together by a thread until one of us exclaimed "It's the damn space!". If you've ever worked on a .yaml file, you would at one point learn that indentation would make or break your file. What a complete waste of all of our time. The culprit was a space looking tab.

In the near future, I would also learn that when pouring latte art, if the pitcher spout is even an eighth of inch farther above the rim of the cup, it would ruin your canvas and ultimately ruin the pouring. I laugh now thinking about it; on how painstakingly and parallelly these unrelated processes share so much in common with each other. A fun fact: Both pods and latte art are ephemeral.