Perfect is the Enemy of Started

We could blame the lack of progress on my outdoor oven on the seasons – sure I haven’t been out there in the mud while it rains constantly for 3 months and is also uncomfortably cold.

But it’s not just that. Some of it was indecision – exactly how big do I want this oven to be?

And when I went to buy my first round of supplies, I realized there was a lot more to my reluctance.

I don’t know how to do this. Like, at ALL. Home Depot turns out to be a special land of imposter syndrome – you walk up “I’m going to buy 18 cinder blocks and load them in my ancient car” and then you find the right area, figure out how to get one of the big carts and realize – 18 is a LOT of cinder blocks.

But the book says to try to make the base a reasonable height to work from! and also to make it almost 4 foot square to support a 22″ diameter oven interior! Just trying to do the right thing!

But the book also says to make a mess and do some trial runs. It suggests making a tiny clay oven just to get a feel for the clay and how to work with it. It says you can opt out of making rain protection and just patch it up as it falls apart.

I’m having a hard time not trying to do it all perfectly on the first try. I want a plan and reassurances, but it turns out that I’m not getting so much of that with this project. I knew this would be a stretch of my make-it skills, but I didn’t realize how much. Sewing never required this many pounds of supplies!

I bought 8 cinder blocks, and a couple of landscaping stones (in a perfect world, I would face the exterior with landscaping blocks so it’s not hideous. perfect is insidious.) That was as much as I could get on the flat cart, and nearly as much as my car would reasonably hold (there was more space, but I could feel the weight while driving.)

I felt so dumb in the store – moving cinder blocks with no gloves, having to figure out how to get a flat cart. I both don’t know what I’m doing and don’t really need the kind of help that a salesperson can give. And wow there are a lot of people at the hardware store on a sunny weekend day.

Anyway, I started

2017-03-05 09.36.31

Not pictured: the terra cotta plant saucers that I had put out to mark the approximate spot were completely filled with slugs when I turned them over. Nature is gross.


Visit to a Jewelry Workshop

I recently got to spend some quality time being underfoot at Mimosa Handcrafted. It was during the Christmas rush, so I tried not to interrogate them too much, but I was completely fascinated by the process of how designs became jewelry. So I’m writing it up here to make sure I have it straight and to share with the curious.

A little background first – my cousin Madeline created Mimosa Handcrafted several years ago, first as a side project. Over the years it has grown to be a full-time job for her and her husband Dawson, plus they have an employee, Courtney. Madeline and Dawson met studying landscape architecture at LSU. As you might imagine, they have great aesthetic senses and are very good at making them become reality.


Madeline makes the designs and most of the custom pieces. Dawson has taken over the process of making them into metal. They make their jewelry through lost wax casting.

Here’s my quick version of how it works:

Every piece of jewelry is created in wax first. A bunch of them are attached together so they can be cast efficiently. Surround the wax in plaster, let the plaster set, remove the wax. Boom, now you can pour hot metal into the plaster mold and get a clump of attached jewelry! It still needs to be detached and cleaned up, but you’ve gone from a piece of wax to a piece of metal.

And because I am a huge nerd for processes, here’s each step, with pictures and rambling:

1. Pieces are Created in Wax

There are two branches to this – new and custom pieces are created directly in wax, repeats get made in a silicon mold.

Madeline hand carves her designs, starting from a sketch.


Dawson creates a mold from the finished pieces, and then any of the team can squirt hot wax into it from this magic machine, the Injectomatic II.


2. Wax Pieces Get Attached to a “Tree”


This workbench is where it happens. The two open bowls are full of liquid wax. Red wax is extra sticky, the blue that you see everywhere is extra sturdy.

Off to the left, you see a black disc with a red stick of wax coming out of it – this is a bare tree. Each piece gets attached by a connection of wax called a “sprue.” For molded pieces the sprue is the bit where the wax gets injected into the mold. For custom pieces, someone has to glue a little wax sprue onto the shaped piece.

Then using dabs of the hot wax, each piece is glued to the tree.

2016-12-13 11.25.56.jpg

Cast tomorrow morning #lostwaxcasting #lostwax #riojeweler #mimosahandcrafted

A post shared by Dawson Ellis (@manmosahandcrafted) on

The trees go into containers so that plaster can go around them.

3. Mix Plaster, Remove Bubbles in a Bell Jar

4. Pour Plaster, Remove Bubbles


Bubbles in the plaster near the tree would leave extra bits of metal attached to the pieces. Or bubbles could make the plaster less strong, and the metal could break through bits – a blowout. Looks cool, but you have to totally start over.

I didn’t realize that generating a vacuum in your workshop was so common and so useful, but here’s a bell jar.

5. Allow Plaster to Set

I expected this to take days, but they told me it was more like a couple of hours.

6. Melt the Wax out in a Steam Bath


Late night de-waxing. #mimosahandcrafted #riojeweler #lostwax #lostwaxcasting

A post shared by Dawson Ellis (@manmosahandcrafted) on

This is Louisiana – so that’s a crawfish boil setup.

7. Burn off Rest of Wax and Set Plaster in Kiln


8. Melt Metal for a Given Tree

Since you know how much wax made up a tree, you can convert to find out how much metal it takes to replace that space. Dawson has a system where he marks each tree plaster, and precalculates the grams of bronze or silver to fill it up that tree.

Meltdown #lostwaxcasting #lostwax #mimosahandcrafted #riojeweler

A post shared by Dawson Ellis (@manmosahandcrafted) on

They have two melters (there’s probably a fancy word) for the metal, each has a graphite flask that will withstand about 3000 degrees – the metal is melted at about 2000 degrees.


9. Pour Metal into the Plaster Mold

Beautiful day for Casting #mimosahandcrafted #riojeweler #lostwaxcasting #lostwax

A post shared by Dawson Ellis (@manmosahandcrafted) on

Be careful. It’s hot.

10. Pull the Air out with a Vacuum

This thing is a  vacuum table


It pulls the air out through the (porous) plaster. This sucks the metal into all the tiny spaces.

Back to amazed at standard workshop uses for generating a vacuum.

11. Let the Metal Set

This takes several minutes, which I found surprisingly short.

12. Quench the Mold, Retrieve the Metal Tree


It’s still real hot, the tongs are important.

Most of the plaster will fall right out. Hose off as much of the rest as you can.

The plaster gets discarded at this stage. Dawson tries to dry out a tub worth so that it’s not as annoying to move, but it’s still heavy.

Whoa – we have jewelry made of metal now!!!

13. Clip the Pieces off the Tree

Gotta separate allllll of these from each other.

14. Pop Them in a Tumbler to Remove the Rest of the Plaster


15. Grind off the Sprue and Any Other Metal Sticky-Out Bits

They were a little worried that the grind wheel would wear down too much before they got through the Christmas orders – but they managed!

16. Clean, Buff, and Assemble the Final Piece

Some of the pieces have extra bits, like the Pelican Cuff can have a turquoise eye, or the Diffuser line all include terra cotta disks to hold essential oils. But all of them get lovingly polished and packaged before going on to new homes.

I hope it seems straightforward here, even though there are a lot of steps and a lot of places it can go wrong. It took two days of following Dawson around to see all of these steps – because they are all going on all the time! Madeline will be working on custom pieces while Dawson mixes plaster and Courtney grinds down new pieces. Or Madeline will be buffing jewelry while Dawson melts wax and Courtney handles packaging and shipping. Even in between each of these stages, more stuff gets fit in: waiting for the metal to cool gives Dawson a few minutes to set turquoise eyes in pelican cuffs. I’m so grateful to all of them for finding time to answer my questions while they were so busy. As you can tell, I really learned a lot!

Starting My Art Grant: Proposal

I work at Big Cartel and we have one amazing perk that I’ve never seen anything quite like: the Art Grant. Combined with a more than generous vacation policy, it’s a real incentive to stretch yourself.

Step one of having an Art Grant is writing up a proposal for it. I got mine in this week, and to kick off documenting progress on the grant, I’m sharing it. I’ll keep sharing progress in this category of my blog.


I’ve been a bread-making nerd off and on for at least 10 years now. One of the first books I got and skimmed cover to cover, Bernard Clayton’s Complete Book of Breads, has a little appendix where he talks about how satisfying it is to build and use your own oven. (I remembered it has having a few more details, but instead it says to write care of the publisher to ask him more. It was published before I was born so I opted to Google instead.)

Side story: Growing up, one of the rules on sweets was “if you make it, you can eat (a reasonable amount) of it” and this is why I was really good at making both brownies and apple pie as a teenager. My apple pie recipe comes straight out of Bernard Clayton’s Complete Book of Pastry, and it’s one of the first cookbooks I bought myself a copy of as an adult. Leads directly to his book being the first book on bread I bought.

I really like this as an art grant project because it’s adjacent to something I know how to do – bread, but it gets me to stretch into things I don’t know at all – construction!

I love that it’s something big, fairly permanent, and usable. I love that I have no idea what I’m doing. (Well, I have more idea after researching this proposal!)

Winning Condition: Bake a loaf of sourdough bread in my own oven.

Milestones along the way:

– Build a base

– Attempt to make clay out of my own dirt

– Build the oven on the base

– Learn to use the oven

With the rainy months coming, there’s a good chance that this project will sit under a protective tarp a lot of the time. And there’s probably higher priority things I should be doing to this house. With that in mind, I’m aiming to finish by May.

I spent some time with the budget yesterday, and came up with about $560 for the whole thing

That was using Home Depot prices for many of the supplies, and I’d like to seek out some locally owned businesses to source things from when I can, which may add to the total.

My research settled on the design used in Build Your Own Earth Oven

and buying that book is step 1.

(Most of the well documented blog posts I found use his method, and it is reassuring to see several different but similar results.)

I’ll document this on my personal blog, including publishing the accepted version of the proposal.

References: seems like a good guide

Topic Modeling Example with Support Conversations

Big ups to Kara Woo for pointing me to this blog post tutorial. I followed it very closely, I’m mostly just adding more words around it.

What even is Latent Dirichlet Allocation?

I’m going to try to explain what’s going on, but a) I’m going super high level and aiming for the big idea b) I’m mostly basing this on the Wikipedia entry, so you may just want to read that.

Our Model is: every document is a mixture of multiple topics. (The sum of the weights of the topics = 1.) Within each topic, each word has a certain probability of appearing next. Some words are going to appear in all topics (“the”), but we think of a topic as being defined by which words are most likely to appear in it. We pick the number of topics.

We only see which words appear in each document – the topics, the probability of each topic, and the probability of each word in a topic are all unknown and we estimate them.
“Latent” because we can’t directly see or estimate any of these.

We can describe Our Model as the interaction of a bunch of different probability distributions. We tell the computer the shape of Our Model, and what data we saw, and then have it try lots of things until it finds a good fit that agrees with both of those.
The Beta distribution is what you assume you have when you know that something has be between 0 and 1, but you don’t know much else about it. The Beta is super flexible.
Turns out, the Dirichlet distribution is the multi-dimensional version of this, so it’s a logical fit for both the distribution of words in a topic and the distribution of topics in a document.

This is a pretty common/well-understood/standard model, so all the hard part of describing the shape and telling the computer to look for fits is already done in sklearn for Python (and many other languages. I’ve definitely done this in R before.)

Getting the Computer to Allocate some Latent Dirichlets

High Level:
1. we turn each document into a vector of words
2. we drop super common words (since we know “the” won’t tell us anything, just drop it)
3. we transform it to use term frequency-inverse document frequency as the vector weights
4. we choose a number of topics
5. we hand that info over to sklearn to make estimates
6. we get back: a matrix with num topics rows by num words columns. Entry i, j is the probability that word j comes up in topic i.
7. which we can: describe the topics and make sure they make sense to us, see how each document breaks down as a mixture of topics

Let’s step through doing all this in python.

Get you some libraries:
import pandas as pd
import os, os.path, codecs
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import decomposition
from sklearn.feature_extraction.stop_words import ENGLISH_STOP_WORDS
import numpy as np

ENGLISH_STOP_WORDS is a known list of super common English words that are probably useless.

I have a “data” dataframe that has an “id” column and a “clean” column with my text in it.

With the help of these libraries, we do steps 1-3:

tfidf = TfidfVectorizer(stop_words=ENGLISH_STOP_WORDS, lowercase=True, strip_accents="unicode", use_idf=True, norm="l2", min_df = 5) 
A = tfidf.fit_transform(data['clean'])

A has a row for each conversation I’m looking at, and a column for each word. Entry i,j is the number of times word j appears in conversation i, weighted by the number of conversations j appears in.

 Steps 4 and 5 – we set the number of topics in the “n_components” argument:
model = decomposition.NMF(init="nndsvd", n_components=9, max_iter=200)
W = model.fit_transform(A)
H = model.components_    

fit_transform is where we actually tell it to use the data in A to choose good parameters for our model

H is that topics by words matrix in step 6. We can look at the largest values of any row in H to see which words are most important to the topic represented by that row.

In Python:

first we save the actual words from our tfidf transform
num_terms = len(tfidf.vocabulary_)
terms = [""] * num_terms
for term in tfidf.vocabulary_.keys():
    terms[ tfidf.vocabulary_[term] ] = term

then we look at what appears in H

for topic_index in range( H.shape[0] ):
    top_indices = np.argsort( H[topic_index,:] )[::-1][0:10]
    term_ranking = [terms[i] for i in top_indices]
    print ("Topic %d: %s" % ( topic_index, ", ".join( term_ranking ) ))
 This prints out the top terms by topic:
Topic 0: com, bigcartel, https, rel, nofollow, href, target, _blank, http, www
Topic 1: paypal, account, business, verified, express, required, steps, login, isn, trouble
Topic 2: plan, billing, store, amp, admin, clicking, close, corner, gold, downgrade
Topic 3: shipping, products, product, options, scroll, select, costs, set, admin, add
Topic 4: domain, custom, domains, www, provider, instructions, use, need, cartel, big
Topic 5: help, hi, basics, thing, sure, need, https, store, instructions, close
Topic 6: stripe, account, payment, checkout, bank, paypal, payments, transfer, order, orders
Topic 7: know, thanks, page, hi, let, br, code, add, just, like
Topic 8: duplicate, people, thread, error, service, yup, diamond, shows, able, thats

I started with 4 topics and kept increasing the number until it looked like some of them were too close together. Yep, picking the number is an art. Also you’ll get a different (but maybe hopefully similar) set of topics every time you generate these.

I don’t want to go into toooo much detail about what these are about, but when I inspected conversations that matched each topic strongly, they were very similar, so I feel pretty good about this set of topics. I especially like that even from just the terms, you can tell that are two distinct main types of payment support conversations: getting Paypal Express set up correctly, and general how-do-payments questions. We can also clearly see a topic for helping people set up their custom domains.

What might I use this for? So far I’m thinking:

  1. See which topics come up a lot and use that to decide which documentation to polish.
  2. Look at topics over time, especially as we make relevant changes – do the custom domain questions slack off after we partnered with Google to set them up right in your shop admin?
  3. How long does it take to answer questions that come from a certain topic?

Data Infrastructure Services, Part 5: Web Analytics

We feed both our web analytics tools, Google Analytics and GoSquared, via Segment. This should mean things like page views and definitions of new users should match up across tools. It also means we’re not running separate tracking code for the web analytics tools – just the one set of javascript for Segment.

Google Analytics is fine. I mean, it’s big and confusing but it probably does most everything you need somewhere in there, and if there’s a huge change in your traffic, you’ll be able to quickly see it.

Here’s one cool thing it does that I’ve seen people try to build independently:

ga user flow.png

That particular view is Acquistion -> Social -> Users Flow. People are often interested to see both how people enter their sites and where they go after the first page.

Behavior -> Behavior Flow gives a similar view but lets you choose the starting segment from things like landing page and country.

You can scroll further right and add more levels of depth if you have really long user sessions that cover many pages.

Here’s a wacky one that I learned about at this job: when you go to Acquisition -> Search Console -> Queries you’ll get something like this

ga queries

(Oh man, it looks like that one is from Acquisition -> All Traffic -> Channels -> Click Organic Search in the table. …. this is fine. And a nice example of how complex the tool is.)

ANYWAY you’ll notice that your top search term is (not provided). If I understand correctly, when someone is logged into a Google account, their search terms don’t get passed along to all our analytics.

Google also has a tool called Webmaster Tools  which is all about the Search Console. When you look at your queries there, you’ll get something like:

goog sc queries

what? all the searches are in this one. No big surprises in our case, but it was good to see that. You can get a few other search related metrics in the Webmaster Tools view, but the full queries are the main thing that you can’t find in Google Analytics. (Having access to one tool doesn’t magically get you access to to the other.)

By the way, I realized recently that the timezone for your Google Analytics is set at the “View” level:

Screenshot 2016-08-17 13.40.26.png

I get the difference between Account and Property, but I don’t really understand why you would have “View”s under that. But you do. and your timezone is set in the View Settings link right there.

(I found this out because our numbers weren’t matching with GoSquared despite both getting data from Segment. Pretty sure it was just that GoSquared was on Mountain Time and Google Analytics was on Eastern.)

Hey let’s talk about this other web analytics tool we use: GoSquared. I think they also want to be in the customer communication and analysis space, but we’re just using them for the web analytics.

They only do a few views, but they’re nice to look at and easy to read

gosquared screen

they’re all a variation on this for different time frames. It does quickly show you comparative traffic, referrals, pages viewed, user stats. The Now and Today views can be really fun to watch right after you post a tweet or a send a newsletter – seeing the traffic roll in real time is a thrill and incidentally validates that you did get the links right.

They also send Daily, Weekly, Monthly emails to show how you’re doing

gosquared email.png

Nitpicky statistician here: the red and green numbers are just about direction, not change size – I’m not really excited about a +0.1% change in pages per visit or worried about a +0.1% change in bounce rate. There’s also no consideration for variation – I’d love to see some std. error bars around that trend line! Comparing to the previous period is great, but it’s not enough.

That said, I do skim all these emails – looking for major changes in the shape of the trend, or any weird referrers that brought us traffic. (It’s fun to see someone linked to you in a blog post!)

We’ve recently added their Ecommerce emails which show in a similar way how much money you’re making at what time and from what sources – definitely nice for understanding the dynamics of your business.

What do you find most useful in your web analytics? What other tools should I be checking out?


Data Infrastructure Services, Part 4: Monitoring

tl;dr – We use New Relic, and we have ELK but aren’t relying on it much.

Despite having been to several meetups at the New Relic offices, I didn’t really understand what they do until I took this job. New Relic is a suite of tools that takes data from little programs running on the servers running your Ruby app(s) that can tell you how your app(s) are performing. This results in lots and lots of kinds of graphs, from a weekly email that tells you how all your apps are performing and when they are busiest to things like this:

new relic overview.png

I feel like there’s a lot you can DO with the New Relic data, but we basically use it for two things: Investigating a perceived slow page, and receiving alerts on performance issues.

new relic events

All of our apps have time out and error rate thresholds that they alert on. New Relic is always getting the data from our servers, so it can notice when these things happen and page/alert in Slack/send smoke signals as appropriate.

When we do have a slowdown for whatever reason, the New Relic alerts are usually tied with the support emails for letting us know quickly. And are more likely to be noticed in the middle of the night.


We also have the ELK stack collecting data for most of our apps. (ELK: ElasticSearch, Logstash, Kibana. Logstash sits on your servers and sends things to ElasticSearch, ElasticSearch indexes em, Kibana queries ElasticSearch and makes graphs and stuff.)

Unlike the web event logging, the ELK stack data is near real-time, allowing us to look into weirdness and debug a little quicker. We haven’t been relying on it much, and I have barely used it at all. It’s definitely been a nice-to-have investigative tool.

I played with it a bit this week, so while it’s on my mind, here’s what I’ve learned:

  • you can’t download search results from Kibana
  • you CAN download graph data from Kibana
  • you can directly query the ElasticSearch instance using curl(?) and get json
  • you can post json to the endpoint or do simple queries in the uri
  • and if you’re me, then you do terrible things to json in your Python Notebook


Data Infrastructure Services, Part 3: Other Data Sources, especially Web Events

This is an area where I feel like there’s GOT to be a ton of other solutions out there, but I only know of the one we happen to be using. (1) Please fill me in on other tools that do some of this and what you like/hate about them!

We use a tool called Segment for web events – getting a record of everything people do on your site. (I remember when we’d just look at web requests, but it’s all javascript these days. And ideally you want to see both together. ) As well as giving us libraries to log anything we want to Redshift, they also integrate with a bazillionity other services as sources and destinations of data. The only other source we are using is Intercom, our customer service platform. (We’re also sending data to it, which is real cool for people answering questions so they can have context on the account.) But we’re also sending the data to our web analytics tools. (I promise to talk more about them later.)

Segment lets us log events from the Ruby backend of our app and from the Javascript that’s running locally in the user’s browser. They provide a nice debugger view into the real-time events they’re receiving:

segment debugger

this is key for seeing if you have correctly added logging for a new event.

They also provide logging for our iOS app. To be nitpicky, it is occasionally frustrating that many of the web events I look at are in the “pages” table, but every iOS event is in its own table, each with exactly the same schema. (Write ALLLL the UNIONs) (Maybe I need to get the devs to use a screen_viewed event that has which screen as an attribute? )

Because of realistic engineering limits and Redshift loadtimes, Segment only promises to get your data to you within 24 hours. This is totally fine for me, but maybe you need something fancier? (Someone please tell me a really useful story of what you’re doing with real-time data?)

Big Caveat for those who haven’t fought this yet: because a lot of web event logging is in Javascript, you’re going to have missing data. Ad blockers will keep out your handy cookies and stop the Javascript from running. Slower computers or slower network connections will make it so that the Javascript doesn’t get a chance to actually send the info back to you. You will lose data and that data loss will be biased to people on mobile, and people with slower computers.

Maybe I just have too much trauma from past work, but I’m really happy to not own this system. There’s nothing I can do if somehow data gets lost – but there generally isn’t when it’s an internal system either. It’s just one more huge set of details to get right that I don’t want to specialize in.

(1) I have heard of precisely one other answer for web events, but haven’t yet found anyone who use(s|d) it: Snowplow.