/
Solana

Solana Validators Performance Research, part 1: Downtime Analysis

Post preview image

Welcome to the first article in the series of publications on the Solana validators performance research by P2P Validator. In our opinion, performance of the Solana network validators is one of the most important aspects which determine the network growth and sustainability. Our team has done a deep dive into this topic and we want to share insights gained to benefit the Solana community.

The research is devoted to the analysis of the two most important metrics reflecting Solana network validators’ performance: downtime duration (node delinquency/unavailability duration) and skip rate (measuring how frequently a node fails to produce a transaction block which is subsequently confirmed by consensus on the network).

In this article we reveal the first part of the research findings regarding analysis of downtime. The next article which we plan to publish in the coming weeks, will cover block skip rate analysis results.

Table of contents

Preface

All data used for analysis in the research were obtained from publicly available sources such as Solana JSON RPC API, Solanabeach API, Validators.app API and are relevant for Mainnet beta epochs №194-236 unless another epoch or time period is explicitly specified.

Introduction

Solana is a relatively new (went live in March 2020) public high-performance distributed blockchain platform curated by Solana Foundation (non-profit organization headquartered in Geneva, Switzerland) along with professional blockchain developers, organizations and individuals running validator and RPC nodes and DevOps specialists from all over the world who are dedicated to the decentralization, growth, and security of the Solana network.

Solana is one of blockchains that aims to be fast and scalable, without compromising its security or decentralization. Its theoretical throughput limit of 50k transactions per second (TPS) which is twice more than VISA’s limit, which means it can be used for many real-time applications in various business areas. Solana mainnet has already handled more than 35 billion transactions with current throughput exceeding 2000 TPS (see Figure 1) due to high demand for its capabilities and various use cases including: ultra-fast on-chain payments, token creation and distribution, staking through delegation to network validators, smart-contracts, NFTs issuance and trading. Solana ecosystem also provides many different DeFi services such as decentralized exchange, token swaps, liquidity farming and bridging ensuring cross-chain interoperability with other blockchains.

Figure 1. Solana’s TPS on 27.10.2021 (see explorer.solana.com for live data).

There are currently more than 1000 independent validators and 800 RPC nodes (see Figure 2) which comprise a physical layer for above mentioned functionality while making the network highly secure and decentralized. Each validator supports the network's operation by providing high-end hardware resources and properly configuring their systems to keep the network running as fast and smooth as possible.

Figure 2. Solana validator nodes map (see https://solanabeach.io/ for live data).

Validators receive SOL tokens from stakers, participate in the consensus-based process of transactions validation, get rewards proportional to delegated stake amount and distribute these rewards to stakers (proportionally to staked shares) charging a variable commission. The more stake is delegated to a validator, the more this validator (and its delegators) earns and is more frequently chosen to process new transactions on the ledger and, thus, exposed to greater hardware and network load. Thus, on the one hand, validators are economically motivated to keep their hardware and software running without interruptions, and, on the other, to timely update Solana software and to improve their nodes and network connection as their stake and the Solana network load increase.

Solana validators downtime

It is normal for a node to be temporarily unavailable/offline sometimes as every technical system needs periodic maintenance and reconfiguration. Typical reasons for server unavailability are usually quite simple such as planned reboot to update host configuration or software, an emergency (power outage), network problems in the data center or at the provider.

The longer a node is unavailable, the fewer staking rewards and transaction fees it receives. Staking rewards are paid proportionally to node’s vote transactions count which it cannot post if it is offline or functioning incorrectly. Validator  downtime negatively affects its delegates’ rewards, which is why one should consider checking validator recent downtime duration history before delegating to it.

During periods of downtime an unavailable validator is assigned the “delinquent” status which can be checked using Solana CLI solana validators command or by parsing corresponding json response (solana --output json validators). By constantly fetching statuses of all validators on the network it is possible to measure delinquency periods durations which is a good approximation for downtime duration for further quantitative analysis. The downtime data analyzed is available through the public Redash dashboard.

Factors influencing downtime duration

There are many factors influencing downtime duration and these are typical ones:

Despite most of these factors can not be measured directly, we have managed to collect and analyse some important on-chain data related to the topic, which allowed to quantitatively describe several aspects regarding Solana network nodes unavailability such as downtime duration statistics over time, its variability across nodes as well as duration of node software updates.

Downtime data analysis

Here we illustrate retrospective downtime statistics of Solana nodes that were active in the period from epoch №209 (5th of August, 2021) to epoch №236 (17th of October, 2021). Historical data allow to reveal trends in the dynamics of downtime making it easier to understand the normal behavior of the metric as well as to identify abnormal fluctuations.

Nodes downtime duration by epochs

The descriptive statistics for downtime duration by epochs are presented in the Figure 3 below. Quantile values of 5%- and 95%-level reflect the maximum downtime among the top 5 and top 95 percent of validators, respectively, for each epoch. Average downtime is the simple arithmetic mean and median defines a downtime duration which divides the top 50 and worst 50 percent of validators.

Figure 3. Downtime mean (cyan line), median (lilac), 5%-quantile (red), 95%-quantile (green) and its actual values for each node (black transparent dots) over epochs.


As can be seen from the chart above, typical average downtime duration for a node is around 1.5 hours, which is quite low, while median downtime duration is almost always zero (which means that most nodes usually don’t experience shutdowns). Also there were several epochs (№214, 223 and 234) with high downtime duration upticks mainly due to simultaneous upgrades of Solana software version. Epoch 223 is especially interesting as it is known that on 14 of September, 2021, the Solana network experienced a severe overload which led to the network halt, and after a successful network restart almost all the nodes had to update to a new Solana version with the necessary fixes.

Dispersion of downtime duration

As many factors affect downtime duration, it varies greatly across validators within the same epoch. The dispersion measures indicate the metric’s spread magnitude which is slightly changing over time as shown in Figure 4.

Figure 4. Measures of dispersion of downtime duration over time: interquartile range (red line) representing difference between 25%- and 75%-quantiles of the metric and standard deviation (cyan) representing average deviation from mean.


It can be seen from the chart above that downtime duration dispersion across validator nodes is dropping slightly over time which indicates that validators, on average, have both lower downtime durations and lower deviations of the metric from the mean.

Supermajority and superminority validators comparison

Since the leading validators with a large stake amount take much more financial risks compared to the smaller ones, their nodes' technical characteristics are far better than of the majority. Therefore, it makes sense to compare performance of the superminority set of validators (the minimal set of validators that together control more than 33.33% of the total stake) with the rest falling into the supermajority set with 66.66% of total stake (see Figure 5).

Figure 5. Average downtime duration by epochs for superminority validators set (blue line), supermajority (green) and P2P Validator (red).


As the charts above represent, the supermajority is usually much worse in terms of average downtime duration, especially after epoch №220, especially during hard times like epoch №223, when the Solana network halted and most validators had to perform major software updates.

In contrast, superminority validators (especially P2P Validator) have an average downtime duration and an average number of downtimes (see Figure 6 below) that are much lower than for the supermajority, and there is a much smaller probability that a validator from the superminority set is offline for more than 5% of total epoch duration (see Figure 7 below).

Figure 6. Average number of downtimes over epochs №209-236 for the superminority (cyan line) and supermajority set of validators (red).

Figure 7. Share of validators with downtime duration greater than 5% of total epoch duration over epochs №209-236 for superminority (cyan line) and supermajority set of validators (red).


Downtime duration distribution for updates and other causes

As it was described previously, downtimes may happen due to Solana node software updates as well as due to hardware upgrades and unexpected halts. The available on-chain data allows to distinguish between downtimes related to software updates and related to other causes and compare downtime duration distributions for the supermajority and superminority groups of validators.

Figure 8. Downtime duration distribution (in logarithmic scale) unrelated to software updates for supermajority (green line), superminority (blue) set of validators and P2P Validator (red). Dashed lines of corresponding colors show the average downtime duration written nearby


According to the distributions of downtime duration not related to software updates (see Figure 8), validators groups are quite similar apart from the fact that supermajority validators are more likely to have very long outages that greatly increase the average value of downtime duration (69 vs. 34 minutes for the superminority group). It should be noted that even if the P2P Validator goes down (or delinquent), on average it happens for an extremely short time of 1.5 minutes.

Figure 9. Downtime duration distribution (in logarithmic scale) related to software updates for supermajority (green line), superminority (blue) set of validators and P2P Validator (red). Dashed lines of corresponding colors show the average downtime duration written nearby.


Concerning downtimes due to software updates (see Figure 9), the distributions for the groups differ considerably: for the superjmajority group there is much more variability in downtime duration when compared to the superminority and again supermajority validators frequently have much longer update times leading to higher average (195 vs. 76 minutes for superminority group). Superminority validators including the P2P Validator demonstrate high consistency in update duration presumably due to specific administration standards developed by professional engineers who operate these validators.

Average update time by Solana software versions

Different Solana node software versions vary significantly in the complexity and duration of the installation process, which directly affects the downtime duration associated with updates. Figure 10 below shows the average update time of Solana node software versions by validators from the supermajority and superminority groups.

Figure 10. Average update time for different Solana node software versions.


Of all the most used versions of Solana node software, the update to version 1.6.25 took the longest for both supermajority (4.5 hours on average) and superminority (3.5 hours on average) validators. Long updates to versions 1.7.11 and 1.7.15 were performed only by validators from the supermajority group and took approximately 2-3 hours to complete. Overall, validators from the superminority group usually perform the updates significantly faster ensuring less rewards losses for them and their delegators.

Summary

Downtime duration is a very important metric as it reflects Solana validators operators efficiency and influences rewards received by validators and their delegators as well as overall network’s stability and security. Solana Foundation and the network validators do everything they can to improve performance of nodes and quality of software that control nodes operation, and we can say with confidence that they do it very well, especially validators from superminority group thanks to the experience and professionalism of DevOps engineers.

Acknowledgements

Authors of the report would like to express gratitude and appreciation for the P2P Validator team whose guidance, support and encouragement have been invaluable throughout the research. We would also like to thank Stephen Akridge, co-founder of Solana, Ruud van Asseldonk, software engineer at Chorus One, and Robert Dörzbach, product manager of the Solana Beach, for helpful advice, comments and corrections.

Disclaimer

Information presented in this report and referenced sources are for educational purposes only. It is not financial/investment advice. Seek a licensed professional for any financial advice. Authors of the report made every reasonable effort to ensure the accuracy and validity of the information provided. However, as price points, conditions, and information are continually changing, authors reserve the right to change at any time without notice, information contained in the report and make no warranties or representations as to its accuracy or up-to-dateness.

Authors of the report are employees of P2P Validator company which provides professional services and consulting for highly secure non-custodial staking across more than 25 blockchain networks, including the Solana network with mainnet and testnet validator nodes as well as RPC nodes. Therefore, P2P Validator is not a neutral party with its own business interests in the Solana ecosystem. Nevertheless, authors did their best to make the report as objective as possible with the main purpose in mind being to educate and inform the community.

Sources of data

  1. https://app.swaggerhub.com/apis-docs/V2261/solanabeach-backend_api/0.0.1
  2. https://docs.solana.com/developing/clients/jsonrpc-api
  3. https://www.validators.app/api-documentation
  4. https://redash.p2p.org/public/dashboards/ZEW9RvuBXPdYHUU5aC7DfVjY8DJaXQrGenG2HUmo?org_slug=default

About P2P Validator

P2P Validator is a world-leading non-custodial staking provider securing more than 4 billion USD value from over 20,000 delegators across 25+ high-class networks. We are early investors in Solana and have supported the network from the first block taking part in all stages of testing and voting.


Web: p2p.org
Stake SOL with us: p2p.org/solana
Twitter: @p2pvalidator
Telegram: t.me/P2Pstaking

Subscribe to P2P-economy

Get the latest posts delivered right to your inbox

Subscribe
Pavel Marmalyuk

Data analyst @ p2p.org (previously dentsu inc.), R programmer, born & live in Moscow, graduated from Moscow State University of Psychology and Education, PhD, father, blockchain enthusiast & investor.

Read more