The pythonic solution to balance the imbalanced

Image for post
Image for post
Photo by 青 晨 on Unsplash

Imbalanced data is a very common occurrence in real-world domains, especially when the subject of interest for a decision-making system is a rare but important case. This can be a problem when a future decision is to be made based on insights from historical data. Inadequate data from the minority case can hamper the robustness of the new decision being made.

The case of imbalanced data exists almost to any real-life applications. For instance, the average customer churn rate for wireless carriers in the US is somewhere between 1–6%. …


Reimagining your surrounding with the new iPhone

Image for post
Image for post
Photo by UNIBOA on Unsplash

You want to furnish your room with a new piece of furniture but unsure if it would look nice in your room? Prior to AR, you would probably have to measure the dimension and imagine how that piece of furniture would fit into your anticipated environment. Now, what if I tell you that you can just lift your iPhone to project an augmented version of the furniture right on that corner of your room to see if you like or not. Spot on, the era of Augmented Reality aka. …


How to select the right predictors using ML algorithms

Image for post
Image for post
Edited by the author based on a photo by Markus Spiske on Unsplash

In the first series of this article, we discussed what feature selection is about and provided some walkthroughs using the statistical method. This article follow-ups on the original article by further explaining the other two common approaches in feature selection for Machine Learning (ML) — namely the wrapper and embedded methods. Explanations will be accompanied by sample coding in Python.

To recap, feature selection means to reduce the number of predictors used to train a ML model. The main goals are to improve the accuracy of the predictive performance (by reducing the number of redundant predictors), reduce calculation time (fewer predictors, less time needed to compute), and to improve the interpretability of the model (easier to study the dependency of predictors when the number is smaller). Filter method, which is based on statistical technique can be generally applied independently of the algorithms used for a ML model. …


A lesson of business model innovation and what you can learn from Apple

Image for post
Image for post
Photo by Hugo Agut tugal on Unsplash

Under the shadow of the corona pandemic, Apple has staged its second major online event in 2020, after the WWDC in June. However, unlike any typical fall event for Apple, there was no iPhone announced at the “Time Flies” event this year. While we do get shiny hardware upgrades with the new Apple Watch and two variants of iPads, there is a subtle message at every turn of the event that could go easily unnoticed — A silent flex on strengthening Apple’s ecosystem and its springing service businesses.

This article will look at the new direction Apple has been quietly taking amid a global economic downturn due to the pandemic and geopolitical tension. The heated battlefield of digital subscription businesses with the likes of Netflix and Spotify also sees Apple’s repositioned strategy in leveraging its ecosystem in fending off competition. …


How to select the right predictors using statistical measures

Image for post
Image for post
Photo by Maarten van den Heuvel on Unsplash

Too many cooks spoil the broth.

Even back in 1575, George Gascoigne already knew that a sumptuous bowl of broth can’t be achieved with too many cooks in the kitchen. The rigor of that proverb extends to modern days, yes, even in Machine Learning.

Have you ever wondered why the performance of your model hit a plateau no matter how you fine-tune those hyperparameters? Or even worse that you only see a mediocre improvement on performance after using the most accurate set of data you could ever find? …


A comprehensive guide to understanding Neuralink and Brain-Computer Interface (BCI)

Image for post
Image for post
Illustration by Phonlamai Photo on Shutterstock

Elon Musk struck again!

Remember in 1996, Dolly the sheep became the first cloned mammal in the world. Some twenty years later, Gertrude the pig becomes the first animal who got a neural implant in her skull, which could potentially revolutionize how we communicate with computers and machines. “That’s one small step for piglet, one giant leap for mankind.”

In this article, we will explore the neurotechnology behind Neuralink, how Musk can potentially unveil a new world of bionic brains, what it means for us in the future, and of course not forgetting the possible ethical ramifications.

Understanding the jargons

Neurotechnology: an assembly of methods and instruments that enable a direct connection of technical components with the nervous system [1], so that people can understand the brain and various aspects of consciousness, thought, and higher order activities in the brain. …


Hint: Do science and numbers speak louder than your ego?

Image for post
Image for post
Photo by Marie Jo on Pinterest

In early February 2020, I was catching a flight back from Singapore to Germany, after briefly attending my best friend’s wedding in Kuala Lumpur. I was constantly having a mask on at the airport, during the flight, and all the way until I touched down in Germany. While I was commuting on the train back home from the airport, I could see people throwing that stigmatized glance at me, it was not hard for me to guess that association between a mask, an Asian look, and the coronavirus.

Fast forward a few months later, masks are now compulsory in Germany when in public transport or in enclosed areas like supermarkets or shopping malls. The same rule applies to most parts of Europe and is now a common sight for people to stay masked in public. But across the Atlantic, people are getting more polarized over the debate — to mask or not to mask. …


The secret to building a better model.

Image for post
Image for post
Photo by Franki Chamaki on Unsplash

The ultimate goal of every data scientist or Machine Learning evangelist is to create a better model with higher predictive accuracy. However, in the pursuit of fine-tuning hyperparameters or improving modeling algorithms, data might actually be the culprit. There is a famous Chinese saying “工欲善其事,必先其器” which literally translates to — To do a good job, an artisan needs the best tools. So if the data are generally of poor quality, regardless of how good a Machine Learning model is, the results will always be subpar at best.

Why is data preparation so important?

Image for post
Image for post
Photo by Austin Distel on Unsplash

It is no secret that data preparation in the process of data analytics is ‘an essential but unsexy’ task and more than half of data scientists regard cleaning and organizing data as the least enjoyable part of their work. …


Or just a geopolitical tug-of-war between the U.S. and China.

Image for post
Image for post
Photo by Bernard Hermant on Unsplash

As the world gears up for 5G, the fifth-generation technology standard for cellular networks, Huawei is touted as a leading market leader in both equipment and infrastructure vendor around the world. This dominant market position, however, is not met without any friction. In October 2012, the U.S. House Intelligence Committee released a report concluding Huawei as a threat to national security and recommended Huawei to be banned in the US. In the following years, the Trump administration has actively and consistently pressured American allies to exclude Huawei in their roll-out plan for 5G.

Some argue that the accusation of Huawei is unfounded. Is Huawei really endangering the security of 5G’s network? Is the accusation simply a bargaining chip for the U.S. amid the rising geopolitical tension with China? Or even as a protectionist attempt to remedy Washington's realization of failing to develop a strategically important technology? …


Cracking a data science’s dilemma

Image for post
Image for post
Photo by Clarisse Croset on Unsplash

Recently I have been working on my thesis on making predictions for online marketing conversion rates. In case you don’t know, the conversion rate is the percentage of visitors to your website that complete a desired goal. [1] Basically what I have to do is to crunch through a database made up of different customer interactions with the corporate website then predict how many of them would convert, i.e. by successfully submitting a Request for Quote (RFQ), a first sign that the customer is interested in certain products or service offerings.

My first instinct was to immediately deploy a plethora of machine learning (ML) models I am aware of to find out which has the least classification error and could predict the conversion more accurately. I presented the preliminary results to my supervisor and it all seemed to go pretty well until he asked me…

About

Jack Tan

A Bitcoin aficionado, a tech enthusiast, an engineer, an entrepreneur wannabe, a world traveler. Find me at https://www.linkedin.com/in/jack-yee-tan-13221196/.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store