Toshi is an implementation of the Bitcoin protocol, written in Ruby and built by Coinbase in response to their fast growth and need to build Bitcoin infrastructure at scale. This post will cover:
- How to deploy Toshi to an Amazon AWS instance with Redis and PostgreSQL using Docker.
- How to query the data to gain insights into the Blockchain
If you don’t like reading, I have created a tutorial video on deploying Toshi here:
The original inspiration for this post came from Soroush Pour. I decided to add some extra detail to what he did and perform certain steps differently.
To get the most out of this post you will need some basic familiarity with Linux, SQL and AWS.
Most Bitcoin nodes run “Bitcoin Core”, which is written in C++ and serves as the de-facto standard implementation of the Bitcoin protocol. Its advantages are that it is fast for light-medium use and efficiently stores the transaction history of the network (the blockchain) in LevelDB, a key-value datastore developed at Google. It has wallet management features and an easy-to-use JSON RPC interface for communicating with other applications.
However, Bitcoin Core has some shortcomings that make it difficult to use for wallet/address management in at-scale applications. Its database, although efficient, makes it impossible or very difficult to perform certain queries on the blockchain. For example, if you wanted to get the balance of any bitcoin address, you would have to write a script to parse the blockchain separately to find the answer. Additionally, Bitcoin Core starts to significantly slow down when it has to manage and monitor large amounts of addresses (> ~107). For a web app with hundreds of thousands of users, each regularly generating new addresses, Bitcoin Core is not ideal for monitoring transactions and updating balances.
Toshi attempts to address the flexibility and scalability issues facing Bitcoin Core by parsing and storing the entire blockchain in an easily-queried PostgreSQL database. Here is a list of tables in Toshi’s DB:
We will see the direct benefit of this structure when we start querying our data to gain insights from the blockchain. Since Toshi is written in Ruby it has the added advantage of being developer friendly and easy to customize. The main downside of Toshi is the need for ~10x more storage than Bitcoin core, as storing and indexing the blockchain in well-indexed relational DB requires significantly more disk space.
First we will create an instance on Amazon AWS. You will need at least 300GB of storage for the Postgres database.
Be sure to auto assign a public IP and allow TLS incoming connections on Port 5000, as this is how we will access the Toshi web interface.
Once you get your instance up and running, SSH into the instance using the commands given by Amazon. First we will set up a user for Toshi:
Then we will add the new user to the sudoers group and switch to that user:
Next we will install Docker and all of its dependencies through an automated script available on the Docker website. This will provision our instance with the necessary software packages.
Then we will clone the Toshi repo from Github and move into the new directory:
Next, build the coinbase/toshi Docker image from the Dockerfile located in the /toshi directory. Don’t forget the dot at the end of the command!!
Note, you might see ‘Error getting container’ when this runs. If so don’t worry about it at this point.
Next we will build and run our Redis and Postgres containers.
This will build and run Docker containers named toshi_db and toshi_redis based on standard postgres and redis images pulled from Dockerhub. The ‘-d’ flag indicates that the container will run in the background (daemonized). If you see ‘Error response from daemon: Cannot start container’ error while running either of these commands, simply run ‘sudo docker start toshi_redis [or toshi_postgres]’ again.
To ensure that our containers are running properly, run:
You should see both containers running, along with their port numbers.
When we run our Toshi container we need to tell it where to find the Postgres and Redis containers, so we must find the toshi_db and toshi_redis IP addresses. Remember we have not run a Toshi container yet, we only built the image from the Dockerfile. You can think of a container as a running version of an image. To learn more about Docker see the docs.
Now we have everything we need to get our Toshi container up and running. To do this run:
Be sure to replace the IP addresses in the above command with your own.
This creates a container named ‘toshi_main’, runs it as a daemon (-d) and sets three environment variables in the container (-e) which are required for Toshi to run. It also maps port 5000 inside the container to port 5000 of our host (-p). Lastly it runs a shell script in the container (sh –c) which creates and migrates the database, then starts the Toshi web server.
To see that it has started properly run:
If you have set your AWS security settings properly, you should be able to see the syncing progress of Toshi in your browser. Find your instance’s public IP address from the AWS console and then point your browser there using port 5000. For example:
You can also see the logs of our Toshi container by running:
That’s it! We’re all up and running. Be prepared to wait a long time for the blockchain to finish syncing. This could take more than a week or two, but you can start playing around with the data right away through the GUI to get a sense of the power you now have. We’ll discuss how to run custom queries in the DB and gain insight into blockchain data in a future post!