Data Science on Blockchain with R. Part I: Reading the blockchain

Blockchains hash

By Thomas de Marchin (Senior Manager Statistics and Data Sciences at Pharmalex) and Milana Filatenkova (Manager Statistics and Data Sciences at Pharmalex)




What is the Blockchain: A blockchain is a growing list of records, called blocks, that are linked together using cryptography. It is used for recording transactions, tracking assets, and building trust between participating parties. Primarily known for Bitcoin and cryptocurrencies application, Blockchain is now used in almost all domains, including supply chain, healthcare, logistic, identity management… Hundreds of blockchains exist with their own specifications and applications: Bitcoin, Ethereum, Tezos…

What are NFTs: Non-Fungible Tokens are used to represent ownership of unique items. They let us tokenize things like art, collectibles, even real estate. They can only have one official owner at a time and they’re secured by the Blockchain, no one can modify the record of ownership or copy/paste a new NFT into existence. You’ve probably heard of the artist Beeple who sold one of his NFT art for $69 million.

What is R: R language is widely used among statisticians and data miners for developing data analysis software.

Why doing data science on blockchain: As Blockchain technology is booming, there is a growing need in tools and people able to read, understand and summarize this type of information.

What is an API: An API is a software intermediary that allows two applications to talk to each other. APIs are designed to help developers make requests to another software (i.e. downloading information) and getting results in predefined easy to read format, without having to understand how this software works.

There are already several articles dealing with blockchain data analysis in R, but most of them focus on price forecasting. Obtaining data on the cryptocurrencies price is quite straightforward, there are many databases available on internet. But how to actually read the blockchain? In this article, we will focus on reading blockchain transactions. Not all transactions but specifically, transactions related to NFTs. We will read the Ethereum blockchain, probably the top one used to trade NFTs. There exist several market places that that serve as trading platforms for NFTs: OpenSea, Rarible… We will focus here on OpenSea, the largest NFT trading platform at the moment.

Reading the raw blockchain is possible but it is hard. First of all, you have to setup a node and download the content of the blockchain (approximately 7TB at the time of writing). It may take a while to synchronize… Secondly, the data are stored sequentially which requires developing specific tools to follow a transaction. Third, the structure of the block is particularly difficult to read. As an example, a typical transaction on the Ethereum network is shown below. There are between 200 and 300 transactions per block and at the time of writing, we are at block 12586122.

Figure 1: Each block in the chain contains some data and a ‘hash’ – a digital fingerprint that is generated from the data contained within the block using cryptography. Every block also includes the hash from the previous block. This becomes part of the data set used to create the newer block’s hash, which is how blocks are assembled into a chain. Image from

Figure 2: Structure of a transaction.

Fortunately for us, there are APIs which facilitate our work.

OpenSea API

OpenSea provides an API for fetching non-fungible ERC721 assets based on a set of query parameters. Let’s have a look:

There is not a lot of explanations on the content of this dataset on the OpenSea website. I thus selected a few columns which seemed to contain interesting information (at least the ones I could understand).

My guess is – here we have:

  • collection_slug: The collection to which the item belongs
  • contract_address: All the sales are managed by a contract (a piece of code / a software) which sends the NFT to the winner of the bid. This is the address of the OpenSea contract. We can see that there is only one address for all the sales, which means that all sales are managed by the same contract.
  • id: A unique identifier for each sale
  • quantity: The number of items sold per transaction (see fungible / semi fungible below). As in the supermarket, you can buy 1 apple or 20.
  • The cryptocurrency used to buy the item.
  • total_price: The cost paid by the winner. For Ether, this is expressed in Wei, the smallest denomination of ether. 1 ether = 1,000,000,000,000,000,000 Wei (10^18).
  • seller.address: The address of the seller
  • transaction.timestamp: Date of the transaction
  • winner_account.address: The address of the buyer
  • payment_token.usd_price: The price of one token used to make the transaction in USD

Let’s have a look at the distribution of currencies:

We see that most sales are made in Ether (note that Wrapped Ether can be considered the same as Ether), let’s focus on these Ether sales for the rest of the article.

Figure3: Histogram of the price per sale. Data from OpenSea API.

Figure 4: Pie chart of the price per sale. Data from OpenSea API.

Figure 5: Waffle chart of the price per sale. Data from OpenSea API.

This looks pretty nice but there is a big drawback… OpenSea API limits the number of events to the last 300 transactions. There is not that much we can do about it if we use their API. Also, the downloaded transactions are data pre-processed by OpenSea and not the blockchain itself. This is already a good start to have the information about transactions, but what if we wanted to read the the blocks? We have previously seen that retrieving data directly from the blockchain can be quite complex. Hopefully, there are services like Etherscan that allow you to explore Ethereum Blocks in an easy way. And guess what? They also developed an API!

EtherScan API

EtherScan is a block explorer, which allows users to view information about transactions that have been submitted to the Ethereum blockchain, verify contract code, visualize network data… We can therefore use it to read any transaction involving NFTs! EtherScan limits the number of transactions to 10000 in its free version, which is much better than with the OpenSea API and you can still subscribe if you need more.

Where do we start? Let’s focus again on OpenSea: from the data we extracted above, we saw that the address of their contract is “0x7be8076f4ea4a4ad08075c2508e481d6c946d12b”. If we enter this address in EtherScan and filter on the completed transaction (i.e. the transactions validated by the network, not the one waiting to be approved),, we see an incredible amount of them (848,965 at the time of writing.) Of course, not all of them are related to sales.

Let’s see what we can do with these data.

Figure 6: Histogram of the price per sale. Data from EtherScan API.

Figure 7: Pie chart of the price per sale. Data from EtherScan API.

Figure 8: Waffle chart of the price per sale. Data from EtherScan API.

Note that the price conversion from ETH to USD is not entirely correct. We use current ETH/USD price while some transactions were done some time ago. Even within a single day, there can be substantial price variation! This can be easily solved by retrieving historical ETH/USD price, but this requires an EtherScan Pro account.


This article is an introduction into how to read the blockchain and obtain transactions data that can be analyzed and visualized. Here, we have shown an example of a simple analysis of blockchain transactions – distribution charts of NFT sales prices. This is a good start but there a lot more we could do! In Part II, we will investigate how to go further. We can, for example, follow specific NFTs and explore the duration of them being in possession after release and the rate of their subsequent reacquisition. Please let me know which attributes you may find interesting.

Note that the code used to generate this article is available on my Github:

If you want to help me to subscribe to an EtherScan Pro account and be able to retrieve more transactions, don’t hesitate to donate to my ETH address: 0xf5fC137E7428519969a52c710d64406038319169


Contact us now


This blog is intended to communicate PharmaLex’s capabilities which are backed by the author’s expertise. However, PharmaLex US Corporation and its parent, Cencora, Inc., strongly encourage readers to review the references provided with this article and all available information related to the topics mentioned herein and to rely on their own experience and expertise in making decisions related thereto as the article may contain certain marketing statements and does not constitute legal advice. 

Contact us for more information

Scroll to Top