Posts Tagged ‘editorial’

Kaggle – my brief shining moment in the top 10

I started playing with the (all too addictive) Kaggle competitions this past December, on and off.

This past week I reached a personal high point by making the top 10 in a featured competition for the first time.


Since then, my ranking has dropped a bit, but there’s still time for me to take first! 😉 Just don’t hold your breath…

Magile Manifesto: Deprecating over- and mis- applied “Agile” concepts

After working in a couple of “Agile shops” that embodied the typical misapplication, misinterpretation, and commonly correlated (though technically unrelated) evils associated with the mutated forms of Agile, Scrum, and Lean now reaching Business Intelligence and other non-software related business units en mass.

Magile* Data Science Principles:

– Interactions over buzzwords and fluff
– Accurate information over false-but-compelling “high level” simplified reporting
– Collaboration over cutting throats
– Adaptive planning over planless adaptation
– Transparency over secrecy
– Individuals over groups

*  Miller’s rebooted Agile

Editorial: Notes v. aRticles and Tuts

My original plan for was to be a blog composed of code snippets and miscellaneous notes, taken from my own Evernote notes with only minimal editing.

This accomplished a couple of goals — it gave me a practical use for my notes and allowed me to contribute knowledge to “the community” (by which I could mean the Data Science “community”, but by which I effectively just mean the Internet) in a time-effective manner. It also used to provide me with some small amount of ad revenue until

  1.  some visitors complained that the ads detracted from the blog’s UX and
  2. Google AdSense froze my account anyway due to spammer-hackers*** spam-dexing a network Russian international dating websites with referrer bombs used some clever hack to route promote their site by sending zombie traffic to a non-existing referrer URL appended to my domain (and thousands of other folks’) domains like mine

However, I came to a bit of a dilemma. The short code snippets and brief programming notes give me a fair amount of search engine traffic, but they were burying the smaller number of higher-quality, longer-length tutorials and articles (“aRticles”, for you R devotees) that folks from the R-Bloggers crowd come looking for.

My initial solution was to link those tuts and articles from the homepage, but this seemed insufficient. I could solve the issue with tags, except to follow the protocol used by R-Bloggers and others, only my article-style posts should receive have “R” tags whereas I have lots of snippets and quick-tips which need that tag to be properly indexed. Even if I did conform to that standard, it would only solve the problem for R-related material.

I could create completely separate blogs for the 2 types of content, but it would waste precious time in duplicating logistical tasks, would require me to share even more links for such a humble amount of content, and since I love this domain name I don’t want to detract from it with some other closely related but different domain.

Let me know if you have any thoughts. I haven’t decided, but I think I may try forcing users entering through the homepage to chose between the 2 distinct sections of the site, each directing to a subdomain with a separately-indexed blog.


*** More power to them! I think it’s pretty funny and clever, though I figured the folks back in Mountain View (i.e. Google) were smart enough to be able to just adjust the ad revenue calculation to remove that part of the traffic. Oh well, this blog wasn’t exactly about to buy me a luxury yacht with Google’s cash anyway.

All Hail the Data Science Venn Diagram

Forged by the Gods, the ancient data science venn diagram is the oldest, most sacred representation of the field of data science.


Data_Science_Venn_DiagramI’ve been in love with this simple diagram since I first began working as a data scientist. I love it because it so clearly and simply represents the unique skillset that makes up data science. I’ll write more on this topic and how my own otherwise eclectic skillset coalesced into the practice of professional data science.

I wish I could take credit for creating this simple-but-totally-unsurpassed graphic. Over the past couple of years I’ve often used it as an avatar and if you look close enough you’ll even find it in the background of my (hacked) WordPress header image. While I like to think that it was immaculately convinced, the word on the street is that it was created for the public domain by Drew Conway, who is the co-author of Machine Learning for Hackers*, a private market intelligence and business consultant, my fellow recovering social scientist,  recent PhD grad from NYU, and a fast-rising name in data science (yea, he wants to be like me).

*It’s an O’reilly book on ML in R which I kept with me at all times for at least a year; the code is on GitHub and I highly recommend it, though it’s a little basic and its social network analysis section is based on the deprecated Google Social Graph API

Protected: Another Take on library() v. require()

This content is password protected. To view it please enter your password below: