Jul 23 2014

Disclaimer: There might be a better solution for this scenario. My expertise with Ruby and Rails is just 6 months. You're welcome to drop me a tweet or a message with better approaches.

 

At CartoDB, as one of the required steps for the 3.0 version we needed recently to change the URLs from the "classic" format of USER.cartodb.com/whatever to ORG.cartodb.com/u/USER/whatever .

This is a change that usually gives lots of headaches. At a previous job a similar change required a huge refactor of the MVC engine and hyperlink building system. At another was quicker but just because the only solution was to do an (ugly) regex hack deep inside the MVC framework.

Rails is initially all unicorns and rainbows. A magical routing system that allows to reduce written code, that maps automatically verbs to controller actions, that even differentiates between acting upon items or collections... A developer's heaven regarding MVC configuration ease.

Except that for advanced scenarios, this magic fades away and you need to fallback to more traditional and detailed config.

This is great for the majority of typical websites:

scope '/admin' do
  resources :posts, :comments
end

 

But now imagine this new rules:

  • Any url might come, or might not, with an additional fragment, including a variable
  • This fragment might be optional, or might be mandatory

 

How do you specify an optional parameter at the Rails routes file? Like this:

get '(/u/:user_domain)/dashboard/tables' => ...

 

Looks easy... but remember that the param is optional. It might not be present... so we need to make sure it is always either sent or nil (but defined) so the code doesn't breaks. For this I implemented a before_filter at the base ApplicationController so it is always present.

 

Then, everything looked ready... until I checked the code and there was a myriad of links bult in all possible Rails ways: relative hardcoded, full path hardcoded, xxxx_path, xxxx_url, redirect_to xxxx, link_to xxxx...
And not only that, I also had to support that optional :user_domain parameter in most URLs, plus other ids or params sent like this sample route:

(/u/:user_domain)/dashboard/visualizations/shared/tag/:tag/:page

 

In the end, I decided to take the following new (and mandatory from now on) approach:

  • Full "literal-based" routes descriptions. They donĀ“t look fancy or like magic anymore but they are practical, work and anybody not even knowing Rails knows what it points to.
  • Always given a name/alias (" as xxxx"). So they can be called from views and controllers with _url/_path helpers without collisions, ambiguity or Rails magical way of autogenerating URLs (that has given some problems and for example don't allow parameters).
  • Every application link has to be built with _url/_path. No more handcrafted URLs.

This makes links to URLs a bit bigger than usual:

<%= link_to tag_name, public_tag_url(user_domain: params[:user_domain],tag: tag_name) %>

 

But we can be 100% sure that where that public_tag URL will go by giving a quick tool to routes file, we support the optional parameter, and we also ease any possible refactor in the future as searching for every link would be much easier than it was (it took 2 days and some bugfixes to uniformize the codebase).

 

About the "might be mandatory" part, what I did was adding another before_filter by default in the base Admin-scoped ApplicationController, and then disable it (with skip_before_filter) on those controllers were was optional.
Wherever it is mandatory, if the user is logged in and belongs to an organization, the app will redirect him to the organization-based URL. But for things like APIs we keep both the old and new format to preserve compatibility with existing third-party applications and projects.

 

 

Overall, I don't blame Rails for being so easy to reduce code. I understand that for the majority of scenarios it really eases code and speeds up this configurations, so nobody could have guessed this routing changes... But what it is a good practice is to be consistent and, even if you have 5 ways of generating urls/links, decide on using just one that is flexible enough for hypothetical future changes.

 

Magic has a problem: to use it, you need to be a mage. Now we're a team of (mostly) non-Ruby experts that needs to build lots of features and we cannot rely (at least on a short term) on everybody having deep knowledge of Rals, so we'll instead go more traditional ways but ease the first steps with the platform.

Jul 20 2014

Book cover

Title: The Art of LEGO Design
Author
: Jordan Schwartz
Editorial: O'Reilly

 

Books with step-by-step instructions and easy tutorials there are many. This title goes the opposite, providing us with general techniques, hints and ideas, but showcasing the works of many authors along its 13 chapters.

From the basic foundations of how to create smooth surfaces, how to build angles, or keeping always a scale in mind, to advanced and great looking mosaics, scenarios or even monsters and spaceships, this book is not only an eye-opener of how pros create such amazing LEGO structures, but also a nice source of inspiration with the abundant photos.

One might actually desire more details of some specific models; For many the closeup images hint how it might have been built, but for the most complex ones you can only imagine how hard has been to construct. But as there are many small interviews with expert designers (apart from the author itself), you at least get some tips from them on how they approach new projects.

I really liked the theme-based chapter structure, allows for quick reference checks, it includes some (but few) detailed instructions for joints and techniques like studded spheres and for amateurs like me it really provides a great boost to imagination on how to try more advanced designs.

 

As always, you can find all my book reviews at its section.

Jul 10 2014

I entered my LinkedIn to do some cleanup of contacts and I noticed that I've passed the 12 years working mark, so why not a small recap to see how bad life has treated me regarding programming languages?

 

I've worked at 8 companies since 2001: Alhambra-Eidos, Surfernet, Grupo Raxon, ilitia, Navteq, Tuenti, Minijuegos and CartoDB (current one).

Of those, I did consulting services at two of them, the rest being product development. A bit below 5 years of doing quite varied and interesting consulting projects, but also sometimes feeling Dilbert's strips are so accurate.

 

Lowest record is 4 months ar Alhambra-Eidos, because was during the summer and they couldn't offer me a part-time job.

 

Highest record would be Tuenti or ilitia with 4 years, but really the closest one was ilitia, as I left the company just 3 days before the 4th anniversary.

 

I've coded in*: Visual Basic 6.0, Visual Basic .NET, ASP 3.0, ASP.NET, C++, C#, Java, Javascript, PHP, OO PHP 5.3 **, Ruby.
I'm still in love with C#, Windows might not be the best platform but the language itself is so good that even Java copies its features now.

 

I've used more or less extensively the following DBs: SQL Server, MySQL, PostgreSQL.
None of them is perfect, all have caveats.

 

I've used quite a few SCM: Visual SourceSafe, CVS, Subversion, Mercurial, Git.
All of them screw up merges sometimes, but I like Git now that I know a bit and can compare it to Hg.

 

I've given 18 talks a total of 23 times.
And I still get quite nervous every time I have to do one.

 

 

Let's hope 12 years in the future we're not all coding in Javascript. Or maybe we'll be?


 

* : Only counting professional work of at least a few weeks

** : One thing is coding in PHP and other is writing proper, object oriented, namespaced code

Jun 22 2014

Python is a language that slowly is awakening my curiosity. Widely used, generally appraised and apparently powerful and yet easy to use.

After poking with it at work last friday to convert some JSON data to GeoJSON I decided to build a small tool only with Python (2.7): A script that, given a list of URLs, tells me if any of them has changed since last time it ran.

The source code of the results can be found at my Github; 83 lines of code counting (few) comments. Nothing really cool, in fact it is really simple, but it's a nice exercise as it touches some areas:

  • Methods
  • Managing Arrays (Lists in Python) and Hashmaps
  • Nulls handling ("None" here)
  • Reading and writing files, detecting if a file exists
  • Exception handling
  • JSON parsing and dumping
  • Basic HTTP requests/responses
  • Colored output *

 

I need to learn more about this language, maybe on the inminent vacations...

 

* Couldn't resist to add some colors, even if meant adding a library (colorama). And thanks to this I've also learned how to globally install Python libs.

May 14 2014

Computer RAM grows quite fast, but unlike hard disk space no matter how much you have, your applications or services will always require (or benefit) from more.

When you handle big datasets (between few hundred megabytes and few gigabytes) you have to be very careful with how you handle the data, because it is easy to create a point of failure due to out of memory errors. Especifically, you have to check that your code does not fully load into memory datasets if those can be big.

A real world scenario: CartoDB's Import API and a growing list of customers who upload datasets near or above the 1GB threshold. Monit was killing the worker jobs of those big uploads so we had to fix it.

 

First, diagnostics: At what points the code was fully loading the file?

- It wasn't upon importing the data into PostgreSQL, because that's done via command line and from ogr2ogr.

- It could be Rails, because its documentation only includes a basic "dump uploaded contents" example without even mentioning that you actually have saved the uploaded file in the folder specified by Pathname.new(params[:file]) (or :filename).

- It could be Typhoeus Ruby gem, because we have a Downloader class that fetches contents from urls and writes them into a file. We were doing a full single response_body dump while Typhoeus allows for streaming chunks of the response.

- It could be also AWS S3 Ruby SDK, because we upload there the import files so that workers can fetch them no matter in which server they are spawned. In this case, the documentation is great and it is a one-liner to write into an S3 object streaming a file.

 

All the 3 "could" were actual failure points, so I applied the fixes and job done. Now I have to spend some time (and bandwith) to upload some multi-GB datasets to benchmark and find where are our new limits in the platform :)

 

Bonus point: Upon uploading the file using Rails, anybody who hasn't set AS3 credentials on their CartoDB installation would get a different code execution path in which indeed the file is loaded and written once. That's acceptable, but I noticed that deactivating my credentials and testing that path, even after the HTTP request was fully processed, my Linux got around 1GB of RAM in use by Ruby process, suspiciously like the size of the file I uploaded.

After some debugging I dound I had to force MRI 1.9.3's shitty garbage collector to recognize the variable holding the file data as destroyed in order to regain my GB of RAM upon ending the request (filedata = nil). It's fun and sad at the same time that you get away from unmanaged languages to end up needing to do the same resource management techniques.

 

If you want, you can check all the changes I did to the Ruby code in this pull request.

More Posts: Next page »