# ScapeToad Cartogram Tutorial (formerly Cartogram Crash Course)

Tags

This post provides a tutorial on how to create a cartogram using ScapeToad v1.1. In addition it describes how to work with a few common GIS file formats. Upon completion you will have created a cartogram that shows the per state population of the United States as well as learned a bit about the DBase and shape file formats. Along the way some simple Python programming will be required. All of the data files used for this tutorial as well as the Python script can be found on Git Hub here.

However, before we start it might be useful to get an idea of how cartograms help to visualize geographic information. Mark Newman’s pages are particularly good for understanding the importance of this data visualization method. Have a look at the 2008 U.S. Presidential Election Results, and also at World Mapper.

To begin the tutorial we will need a shape file that describes the state by state geometry of the United States. This can be downloaded at the Census Bureau’s website. Click the above link, then select “States (and equivalent)”, click “submit”, and then from the 2010 box, select the “all in one national file” option. Clicking on the download button will give you a zip file with the relevant information in it. Explore the other options in order to see what additional shape files are available.

Now that you have the zip file downloaded, unpack it. Assuming the zip file was named “tl_2010_us_state10.zip” you should have a single directory with five files in it. Each of the five files has the same base name as the directory itself, but each has its own file extension. For our purposes here we care about the shape file and the DBase file, which have extensions “shp” and “dbf” respectively.

The shape file itself contains geometric information, and can be thought of as a list of geometric entities, where each item corresponds to a particular state’s geometry. Wikipedia has a write-up worth reading. The detailed technical specification for the file format is here. Arc Explorer and Shape Viewer are two free (as in beer) programs for viewing shape files.

The DBase file is a table of properties where, by convention, each row in the table contains the attributes of the item in the shape file with the same index. For example, the 10th shape in the shape file is presumed to have attributes given by the 10th row in the DBase file.

In order to create a cartogram with Scape Toad we will have to supply an appropriate DBase file. In this case our DBase file will contain two columns. The first will be the state’s two letter postal abbreviation and the second will be its population. Scape Toad will ignore the first column, but will allow us to create a cartogram using the data in the second column.

Note that DBase files can be opened with Excel for viewing, and that there is also a Python library for manipulating them.

At this point you should have the following software installed.

• Python (to create DBase files) (optional if you got the dbf files from Git Hub)
• dbfpy (to create DBase files) (optional if you got the dbf files from Git Hub)
• Scape Toad (to view shape files and create cartograms)
• Shape Viewer (to view shape files – slightly better UI than Scape Toad)
• Excel  (to view DBase files) (optional)

Next we’ll create a DBase file that contains the U.S. population data using the following Python script.

#!/bin/env python

from dbfpy import dbf

POP ={
"CA" : 37691912, "TX" : 25145561, "NY" : 19465197, "FL" : 19057542,
"IL" : 12869257, "PA" : 12742886, "OH" : 11544951, "MI" : 9876187,
"GA" : 9815210, "NC" : 9656401, "NJ" : 8821155, "VA" : 8096604,
"WA" : 6830038, "MA" : 6587536, "IN" : 6516922, "AZ" : 6482505,
"TN" : 6403353, "MO" : 6010688, "MD" : 5828289, "WI" : 5711767,
"MN" : 5344861, "CO" : 5116769, "AL" : 4802740, "SC" : 4679230,
"LA" : 4574836, "KY" : 4369356, "OR" : 3871859, "OK" : 3791508,
"PR" : 3706690, "CT" : 3580709, "IA" : 3062309, "MS" : 2978512,
"AK" : 2937979, "KS" : 2871238, "UT" : 2817222, "NV" : 2723322,
"NM" : 2082224, "WV" : 1855364, "NE" : 1842641, "ID" : 1584985,
"HI" : 1374810, "ME" : 1328188, "NH" : 1318194, "RI" : 1051302,
"MT" : 998199, "DE" : 907135, "SD" : 824082, "AR" : 722718,
"ND" : 683932, "VT" : 626431, "DC" : 617996, "WY" : 568158,
}

# The backup dbf. We'll need it because we need to
# preserve the state by state order of the rows
# in the new file.
olddb = dbf.Dbf("tl_2010_us_state10-orig.dbf")

# Our new DB file.
newdb = dbf.Dbf("tl_2010_us_state10.dbf", new=True)
("STATE", "C", 15),
("POPULATION", "N", 25, 0),
)

for rec in olddb:
# STUSPS10 is the key for the two letter state abbreviation
# in the old file.
abbrev = rec['STUSPS10']

# Create a new record in our new db file
# and assign the columns
rec=newdb.newRecord()
rec['STATE']=abbrev

if POP.has_key(abbrev):
rec['POPULATION']= POP[abbrev]
pop = POP[abbrev]
else:
# Print a message if we cannot find the population
# for a given record.
rec['POPULATION']= 0

rec.store()

olddb.close()
newdb.close()

The script itself should be run in the directory where the shape and DBase files are located, however before running the script, rename the file “tl_2010_us_state10.dbf” to “tl_2010_us_state10-orig.dbf”. We do this because the Python script uses the old DBase file to determine the order in which to write records into the new file, but in addition it overwrites the original location, since the DBase file to be used with any particular shape file must have the same base name as the shape file itself. Edit the script to account for any differences in file names.

Alternatively, you can skip running the script and download the appropriate DBase file from my Git Hub page.

At this point, if you have Excel you might also want to open both the original and new DBase files and see for yourself what is in them.

Now we can fire-up Scape Toad. When it comes up, click the “add layer” button in the tool bar. Navigate to the shape file and select it. If the shape file came in correctly you should see something like this on your screen.

Note that the DBF file you created must have the exact same base name as the shape file and that they must both be in the same directory. Otherwise we won’t be able to create a cartogram.

Next click the “Create cartogram” icon in the toolbar. Click “next”, “next”, and then ensure that POPULATION is selected in the drop down menu. Click “next” again. And again. And then “compute”. Now wait…

After the computation is finished you should see a cartogram that looks something like this on your screen.

Unfortunately Scape Toad has no zoom feature, so to get a close up look at the cartogram you’ll want to export it as a shape file and bring it up in Shape Viewer. Unfortunately there you will lose the legend and will be left with just the distorted shapes. C’est la vie.

If you have gotten this far then congratulations! You have succeeded in creating a simple cartogram that shows how the population of the United States is spread across its geography.

# Dish VIP722 Samsung UN46D700 Remote Code

Tags

My old remote stopped working and so I got a new one and needed to get it to work with my TV. The TV is a Samsung UN46D7000. The Dish Network remote code for the VIP722 is 738. To program hold the “TV” button at the top of the remote until the remote’s buttons flash. Then on the remote enter “738#”. And that’s it.

Manuals can be found here.

# Python Keyword Arguments Exploration

Tags

This post by Eliot at Salty Crane gives a nice introduction to how to work with *args and **kwargs in Python. Here we expand on the use of **kwargs, exploring a variety of its behaviors.

I know all of this to work only with Python 3.2.2, but assume that it works with other Python 3.x versions, and maybe Python 2.x versions.

This first example is just a restatement of Elliot’s last example. Let’s say we have a function:

def foo(a, b, c):
print("a = " + str(a))
print("b = " + str(b))
print("c = " + str(c))

and we have:

x = {"a" : 1, "b" : 2, "c" : 3}
foo(**x)

then as expected we get:

a = 1
b = 2
c = 3

Now let’s say we have:

y = {"a" : 1, "b" : 2}
z = {"a" : 1, "c" : 3}
w = {"a" : 1, "b" : 2, "c" : 3, "d" : 4}
v = {"a" : 1, "b" : 2, "d" : 4}
u = {"a" : 1, "b" : 2, "d" : 4, "e" : 5}
foo(**y)
foo(**z)
foo(**w)
foo(**v)
foo(**u)

We’re missing “c” and “b” in the first two calls respectively and in both cases we get:

Traceback (most recent call last):
File "", line 1, in
TypeError: foo() takes exactly 3 arguments (2 given)

As for the calls with “w” and “v” as input, for both we get:

Traceback (most recent call last):
File "", line 1, in
TypeError: foo() got an unexpected keyword argument 'd'

So in this last case Python knows that “d” is not an argument to “foo” and reports it. Unfortunately for the first two cases it does not report which argument we are missing.

Finally for the last call, where we use “u” as the input we get:

Traceback (most recent call last):
File "", line 1, in
TypeError: foo() got an unexpected keyword argument 'e'

Here Python reports an issue with “e”, but not with “d”. So it looks like Python simply bails when it detects the first problem.

This is all a bit unfortunate. When arguments are not present it would be nice to know which ones are missing. Similarly, when there are too many arguments given, it would be nice to know which ones those are. If we have missing arguments and bad arguments, it would be nice to know that both cases exist. This makes fixing your program much easier.

Happily by using the inspect module, and in particular the getfullargspec function, we can pretty easily write a wrapper that does the appropriate error checking and crafts an outstanding error message. Unfortunately, due to contractual issues, I must leave this as an exercise for the reader.

# C++ Function to Time the Execution of Another Function

Tags

This is just a simple function to time the execution of an incoming function with arbitrary arguments. I’ve used it successfully with gcc 4.6.1.

#include <ctime>

//! Return the number of clock ticks a function takes to execute.
//! \see http://www.cplusplus.com/reference/clibrary/ctime/clock/
template<typename F, typename ...Args>
clock_t timeFunction(F f, Args&& ...args)
{
auto begin = clock();
f(std::forward<Args>(args)...);
auto end = clock();
return end-begin;
}

# The Exponential Distribution

This is a derivation of the cumulative distribution function, characteristic functionmoment generating function, first moment, expected value, second moment, and variance of the exponential distribution given its probability density function.

The probability density function of the exponential distribution is:

$f_X(x) = \lambda e^{-\lambda x}$

Thus the cumulative distribution function is:

$F_X(x \leq a) = \displaystyle\int_{0}^{a} f_X(x) dx = \int_0^a \lambda e^{-\lambda x} dx = -e^{-\lambda x} \Big |_0^a = 1 - e^{-\lambda a}$

And the characteristic function:

 $\displaystyle \phi_X(u)$ $\displaystyle =$ $\displaystyle E[e^{i u X}]$ $\displaystyle =$ $\displaystyle\int_{0}^{\infty} e^{i u x} f_X(x) dx$ $\displaystyle =$ $\displaystyle \int_0^{\infty} e^{i u x} \lambda e^{- \lambda x} dx$ $\displaystyle =$ $\displaystyle \lambda \int_0^{\infty} e^{-(\lambda - iu) x} dx$ $\displaystyle =$ $\displaystyle - \frac{\lambda}{\lambda-iu} e^{-(\lambda-iu)x} \Big |_0^{\infty}$ $\displaystyle =$ $\displaystyle \frac{\lambda}{\lambda - iu}$

Now the moment generating function, which is obtained from the characteristic function:

$M_X(t) = \phi_X(-it) = \displaystyle\frac{\lambda}{\lambda - (i(-it))} = \frac{\lambda}{\lambda - t}$

Now the first moment:

$M_X'(t) = \displaystyle\frac{d}{dt} \lambda(\lambda-t)^{-1} = -\lambda (\lambda -t)^{-2} \frac{d}{dt} (\lambda - t) = \frac{\lambda}{(\lambda - t)^2}$

Thus:

$E[X] = M_X'(0) = \displaystyle\frac{1}{\lambda}$

And the second moment:

$M_X''(t) = \displaystyle\frac{d}{dt} M_X'(t) = \frac{d}{dt} \lambda (\lambda - t){-2} = -2 \lambda ( \lambda - t)^{-3} \frac{d}{dt} (\lambda - t) = \frac{2 \lambda}{(\lambda - t)^3}$

So:

$E[X^2] = M_X''(0) = \displaystyle\frac{2}{\lambda^2}$

And finally:

$\displaystyle Var(X) = E[X^2] - E[X]^2 = \frac{2}{\lambda^2} - \Big(\frac{1}{\lambda}\Big)^2 = \frac{1}{\lambda^2}$

Of course the expected value and the variance can be computed by appropriate combinations of:

$\displaystyle E[X] = \int_0^\infty x \lambda e^{-\lambda x} dx$

and

$\displaystyle E[X^2] = \int_0^\infty x^2 \lambda e^{-\lambda x} dx$

Next we’ll see how the exponential distribution is memoryless, that is:

$\displaystyle P(X > t+s|X>t) = P(X > s)$

First apply Bayes’ Rule:

 $\displaystyle P(X>t +s|X>t)$ $=$ $\displaystyle \frac{P(X>t|X>t+s)P(X>t+s)}{P(X>t)}$ $=$ $\displaystyle \frac{1*P(X>t+s)}{P(X>t)}$ $=$ $\displaystyle \frac{1-(1-e^{-\lambda (t+s)})}{1-(1-e^{-\lambda t})}$ $=$ $\displaystyle \frac{e^{-\lambda (t+s)}}{e^{-\lambda t}}$ $=$ $\displaystyle e^{-\lambda s}$ $=$ $P(X>s)$

Next, Let $X_1, X_2$ be exponential random variables with parameters $\lambda_1$ and $\lambda_2$ respectively, then:

 $\displaystyle P(X_1 < X_2)$ $=$ $\displaystyle \int_0^\infty P(X_1 $=$ $\displaystyle \int_0^\infty P(X_1 $=$ $\displaystyle \int_0^\infty P(x $=$ $\displaystyle \int_0^\infty P(X_2 \geq x)\lambda_1 e^{-\lambda_1 x} dx$ $=$ $\displaystyle \int_0^\infty e^{-\lambda_2 x} \lambda_1 e^{-\lambda_1 x} dx$ $=$ $\displaystyle \lambda_1 \int_0^\infty e^{-(\lambda_1 + \lambda_2)x} dx$ $=$ $\displaystyle \lambda_1 \Big( -\frac{1}{\lambda_1+\lambda_2} e^{-(\lambda_1 + \lambda_2)x} \Big) \Big|_0^\infty$ $=$ $\displaystyle \frac{\lambda_1}{\lambda_1+\lambda_2}$

References:

Ross, Sheldon M. Introduction to Probability Models, 9th edition. Academic Press. 2007.
Bremaud, Pierre. An Introduction to Probabilistic Modeling, 3rd printing. Springer. 1997.

# Defining an Embedded Module in Python3

Tags

Recently I had need to embed Python 3 into an existing C++ code base, and along the way I realized that I needed to embed a simple Python module into my code. After spending way too much time surfing the Web, reading documentation, and attempting to find an example, I finally figured things out.

The code below is a distillation of my efforts.

In particular Python bug 4592 was helpful. It pointed the way toward using PyImport_AppendInittab.

My example is also inspired in part from the code here, but as you will see the link does not use PyImport_AppendInittab (a key missing ingriedient), and it also attempts to support Python 2.x.

We create a module named “hello” with a single method “sayHello” and invoke the interpreter on it.

#include <iostream>
#include "Python.h"

static PyObject* sayHello( PyObject*, PyObject *args )
{
std::cout << "Hello, World!" << std::endl;
Py_RETURN_NONE;
}

static PyMethodDef HELLO_METHODS[] =
{
{"sayHello", sayHello, METH_NOARGS, "A simple example of an embedded function."},
NULL
};

static struct PyModuleDef HELLO_MODULE = {
"hello",
NULL,
-1,
HELLO_METHODS,
NULL, NULL, NULL, NULL
};

PyMODINIT_FUNC
PyInit_hello_module(void)
{
std::cout << "Initializaing module!" << std::endl;
PyObject* m = PyModule_Create(&HELLO_MODULE);
if ( ! m )
{
std::cerr << "Failed to create module..." << std::endl;
return NULL;
}
return m;
}

int main( )
{
Py_Initialize();
PyImport_AppendInittab("hello", PyInit_hello_module);

PyRun_SimpleString("import hello; print( dir(hello) ); hello.sayHello()\n");
Py_Finalize();
return 0;
}

# Dumping Binaries on Windows

Tags

I rarely need to inspect the contents of binaries on Windows, but every time I do I forget the name of the command for doing so! This is a command akin to objdump in the GNU/Linux world, and it is called dumpbin.

It is located in the directory: “C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/amd64/”.

The option which will print out information as to whether the executable is 32 or 64-bit is /HEADERS. The desired information will be near the top of the output and will read x86 or x64 respectively.

# Reload Dish VIP722 Program Guide

Tags

One would think that there would be an obvious option in the menu to get your VIP722 to reload it’s program guide. Unfortunately this is not the case, but you can trick it.

Press menu. Go to “system setup” (6), then “installation” (1), and then “point dish” (1). Choose “check switch” and then “test”. Wait for the tests to complete. Then choose “done”. Then cancel out of all menus. Wait for the receiver to acquire the signal. Wait for the program guide to download.

The whole rigmarole takes about 15 minutes or so.

# Test Driven Development and Remembering Where You Left Off

Tags

A nice side effect of writing your tests before your business logic is that if you have to step away from your work for a while, you only need to compile and run your test suite to determine where you left off. When compared to recovering your context from inspecting source code, TDD is a big winner.

# Adding a Context Menu Item in the Windows Shell for Editing with Emacs

Tags

This post outlines how to create an “Edit with Emacs” context menu item in the Windows Shell for all files.

We will add registry keys so that when the “Edit with Emacs” menu option  is clicked, the file(s) selected will be opened in emacs using emacsclient. This just means that if there is a running emacs session, the file will be opened there. Otherwise if no emacs session exists, a new emacs session will be started.

Before getting started it might be worth reviewing the emacsclient documentation.

Okay then, to begin ensure that the line:

(server-start)

[HKEY_CLASSES_ROOT\*\shell\Emacs] @="Edit with Emacs"
[HKEY_CLASSES_ROOT\*\shell\Emacs\command] @="c:\Program Files (x86)\emacs-23.2\bin\emacsclientw.exe" --no-wait --alternate-editor="c:/Program Files (x86)/emacs-23.2/bin/runemacs.exe" "%1"