Skip to main content

Machine Learning isn't Kaggle Competition

Kaggle is a machine learning contest platform, where the participants are given a machine learning problem, a training dataset, an evaluation criterion and they have to build the best model. But, while doing machine learning in production these things comes much later, you have to first define the Business Problem very clearly, determine if it is even a right ML problem or not. Get the right data, clean it up, determine the right metrics, do a lot of engineering part to put it in product and keep it running. The Kaggle Competition part is much smaller and much less significant than people would imagine.

"On Thinking and Stimulated Thinking"

There is a distinction between two modes of working, one is the "explore mode", you need to put in a lot of effort, and be concious while you are doing something. This which occurs while you are learning something, say learning to drive a car or bike. Other mode is the "habit mode" where you are able to do something, effortlessly without thinking because you have done it so many times, like riding a bike after you have learned it, or even writing or typing which I am doing now.

The premise of the article is that "real thinking" happens in the "explore mode", you are deliberate, you are explicative, you are curious, you (at least mentally) try a lot of things out and reach a conclusion. What a lot of people end up doing is they go do "stimulated thinking", which often looks and acts like thinking but is actually in the "habit mode".

"Stimulated thinking" is dangerous because is not flexible, it cannot understand things deeply or see how they will change if the context will change. Simply "Stimulated thinking" is not thinking and should not pass as so.

Explore mode requires action and reaction in an environment, where the person exploring something decides how to act and then modifies her behavior on the basis of the reaction of the environment. Habit "knows" what is the reaction will be so it behaves accordingly in fixed patterns.

Author claims that so much of our learning process today is about broadcast, not action and reaction or learning by trying something out. Education is formal broadcast, so is TV or social media.

I see parallels here with "Amusing Ourselves to Death" where Niel Postman (in part) talks that TV consumption is passive, from few to many, there is no creativity and interactivity for the consumers of television.

Noam Chomsky also said "You just don't let a book pass from the front of your eyes". Well to avoid stimulated thinking I am here again writing webnotes. It takes effort but I guess after sometime I'll be in the habit of putting in this effort.

But just clarifying that there is nothing wrong with habits per se, I'm able to type only after I have deliberately learned this habit of typing and without it, a lot of my effort would have gone into the act of typing itself instead thinking about what I am writing. The problem is which thinking that you are thinking but you are actually not.

Currents of Fear

This transcript of news sessions was about fears of cancer from electro magnetic radiation from power lines. It was pretty long, but interesting. I came here from the article on "Texas Sharpshooter fallacy". the fallacy goes like this. If you take a lot of random points then some of them will cluster, so a Texan can randomly shoot in his backyard and then draw circle around a cluster which he is very likely to find, and say "yay I'm a sharp shooter". Well the fallacy is more general. If you take a lot of random data you'll find some pattern, definitely. Assuming cancer is randomly distributed, some of these people getting cancer will be in a small geographic area and of some of these groups you'll find power lines.

What the Swedish study did was they tested for 800 different types of alingments that might be linked to radiations by power lines and they find Lukemia is correlated and a news agency published it. Susequent studies in Sweden and elsewhere found such effect.

There are many speakers in the discussion and there is this particular guy who speaks for protecting people from radiation though power lines. His argument is similar to Taleb's precautionary principle, that human body is complex and we don't understand it enough hence we should be careful. I compare it with Taleb's argument on GMO, though the important differences is that in GMO the harm is large and systemic. The harm is not that someone will fall ill from eating GMO but that GMO can disrupt the whole ecosystem and we don't know enough of biology to be sure that they won't. Another difference is about size of effect, in the power lines case, the health effects if present are likely to be small because otherwise we would have known them, though in GMO the disruptions can be very very large, I remember some episodes from discovery channel where introduction of a new specify in an unhabitat completely distroyed the habitat. The GMO plant can grow quickly, interbred with other non GMO plants nearby and spread pretty fast.

The effect size part is very important. There can be millions of things which "can be" negatively affecting your health according to VastuShastra the architecture of your house affects your health (maybe though light and ventilation), the material of your utensils might make a difference, the lightint you use at your home might make a difference, the method of preperation of food makes a difference, there are all sorts of things and all sorts of reasons you might think these thigns might make a difference. Because there are so many thigns, it is impossible to take care of everyone of them, and you cannot know if the effects are real (without taking humgous effort of carrying very long, longitudal experimental studies), so maybe just ignore all these and focus on things which have stronger effect.

Community vs Compliance

This a very interesting discussion about GPL law suits and GPL enforcement. The main point of debate is whether to use law suits to enforce GPL or not. Both of the parties agree that having upstream contributions and user freedoms are important but disagree about the particular methods to pursue that goal. The side which pushes for law suits or threats argue that otherwise companies won't comply, we don't get the code back and users loose on freedom. The other side argues that law suits or their threats turn companies into law suits turn companies into enemies. They are less likely to coorperate, meaning worse software. Instead the way should be through internal negotiations, and changing the culture from inside, companies eventually do contribute upstream they also benifit from it and also because the law requires them to.

My personal opinion is that GPL is important, and the possibility of existance of a law suit. If a company is going very antagonistic there needs to be a law suits because otherwise when the companies know that no one realistic change of law suit then GPL is powerless. Stallman himself gave examples of cases where they were able to get GPL compliance simply about asking companies to do and telling them that they are legally required to do so. So to me the question is really about threshold, not the fact that they should be law suit or not,

Anyway as Linus says it is often easier to change things by working with your opponents rathan working against them but against them should appear like a realistic senario when things don't work out.

The Streetlight Effect: A metaphor for knowledge and ignorance

This was a fascinating piece about Streetlight Effect and using it a as metaphor to describe knowledge and ignorance. The really interesting bit was where author described totalitarianism as something where there is only one view point be it of boilerplate Marxism or free market capitalism or anything. The totalitarian view point assumes that their framework can explain it all and everything else. Totalitarianism is the like the streetlight where you try to find all your answers but they don't exist there. The opposite of totalitarianism isn't another view point but darkness where good and bad aren't well defined and where the difference of between truth and false is not known.