Paginating and ordering accounts in Solana
If you’ve ever tried to get all accounts from a Solana program, you’ve probably noticed that you get everything in one go without being able to limit or offset the accounts retrieved. Additionally, we cannot control the order in which we receive them.
This makes it almost impossible for us to paginate and/or order accounts from a given program. Almost impossible.
In this article, we will go through different techniques to learn how to optimise our calls to the Solana cluster to support pagination and account ordering.
The tools at our disposal
Before diving through these techniques, let’s have a quick look at the tools available for us to query the Solana cluster. The rest of the article will focus on how to use and arrange these tools in a way that benefits us.
RPC Methods
Let’s start by having a look at the RPC methods we’ll use to query the cluster.
-
getProgramAccounts
. This RPC method allows us to fetch all accounts owned by a given program. For instance, if you’ve followed the series on how to create a Twitter dApp in Solana, this could fetch all theTweet
accounts of our program. -
getAccountInfo
. This RPC method allows us to get the account information and data from a given public key. -
getMultipleAccounts
. This RPC method does the same thing asgetAccountInfo
except that it retrieve multiple accounts from a provided list of public keys. This enables us to retrieve a bunch of accounts in only one API call to improve performances and avoid rate limiting. Note that the maximum number of accounts we can retrieve in one go using this method is 100.
RPC filtering and slicing
Some of the RPC methods above support additional parameters to either filter or slice the accounts retrieved.
-
dataSlice
. This parameter limits the data retrieved for each account. It expects an object containing anoffset
— where the data should start — and alimit
— how long the data should be. For example, providing{ offset: 32, limit: 8 }
will only retrieve 8 bytes of data starting at byte 32 for every account. This parameter is available on bothgetProgramAccounts
andgetMultipleAccounts
RPC methods. -
dataSize
. This parameter is a filter that only selects accounts whose data is of the given size in bytes. This filter is only available on thegetProgramAccounts
RPC method. You can read more about this filter here. -
memcmp
. This parameter is a filter that only selects accounts such that their data matches the provided buffer at a given position. This filter is only available on thegetProgramAccounts
RPC method. You can read more about this filter here.
Alright, these are the tools at our disposal, now let’s use them!
The naive approach
Let’s start by trying to fetch all CandyMachine
accounts from the Candy Machine V2 program from Metaplex.
This is a particularly interesting exercise because there are thousands of accounts in that program and some of them have huge amounts of data.
If we wanted to fetch all accounts within the Candy Machine V2 program, that’s how we could do it.
import { Connection, clusterApiUrl, PublicKey } from '@solana/web3.js';
const candyMachineV2Program = new PublicKey('cndy3Z4yapfJBmL3ShUp5exZKqR3z33thTzeNMm2gRZ');
const connection = new Connection(clusterApiUrl('mainnet-beta'))
const accounts = await connection.getProgramAccounts(candyMachineV2Program)
I don’t recommend you to run this piece of code. If you do, chances are your browser tab will stop working after having downloaded hundreds of megabytes of data.
Also, note that we didn’t need to provide a dataSize
or a memcmp
filter to ensure we retrieve CandyMachine
accounts because that’s the only type of account available in the Candy Machine V2 program. That being said, this won’t always be the case and it’s a good practice to be explicit about which account we’re looking for. So let’s add a filter anyway.
We can’t use a dataSize
filter here because the size of a CandyMachine
account is not static and depends on its content. So we need to use the memcmp
filter on the first 8 bytes to compare the hash of the account type — called the discriminator.
Since this is an Anchor program, account discriminators should be the first 8 bytes of the SHA-256 hash of "account:CandyMachine"
. So let’s compute that discriminator and ensure it is present on the first 8 bytes of every account we retrieve using a memcmp
filter.
import { sha256 } from "js-sha256";
import bs58 from 'bs58';
// ...
const candyMachineDiscriminator = Buffer.from(sha256.digest('account:CandyMachine')).slice(0, 8);
const accounts = await connection.getProgramAccounts(candyMachineV2Program, {
filters: [
{ memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
],
})
Note that Anchor adds the filter automatically for you when using the API it provides — e.g. program.account.candyMachine.all()
.
Okay, that’s all nice but we haven’t fixed our issue as running the code above will still try to fetch all the data from all candy machines ever created in this program.
It’s time we explore how to paginate this.
Pre-fetching without data
The key is to pre-fetch the accounts once without any data.
You might need the account data later on, but pre-fetching them without data will allow us to scan all the accounts we need and paginate them by fetching their data page by page.
So how do we do this?
Easy! All we need to do is provide a dataSlice
parameter with a length of zero.
const accounts = await connection.getProgramAccounts(candyMachineV2Program, {
dataSlice: { offset: 0, length: 0 }, // Fetch without any data.
filters: [
{ memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
],
})
Note that I kept our previous memcmp
filter but this will work with any filters you want.
And that’s it! Now we should have the entire list of CandyMachine
accounts within the program. Because we didn’t ask to retrieve their data, this endpoint is much faster and requires much less memory than the previous endpoint. In addition, this will be significantly more performant when dealing with heavy accounts and/or as the program contains more and more accounts.
Now what?
Well, first of all, we get the total count of our filtered accounts for free.
const accountsInTotal = accounts.length
This will be helpful when paginating the accounts as we’ll know when we’ve reached the last page.
Additionally, and most importantly, we might not get any account data but we do get the public key of each account.
const accountPublicKeys = accounts.map(account => account.pubkey)
That means we now have enough information to fetch the missing data page by page.
Fetching data page by page
Now that we have the exhaustive list of public keys that interest us, let’s implement a getPage
function which will return all accounts in a given page with their data.
// ...
const accountPublicKeys = accounts.map(account => account.pubkey)
const getPage = async (page, perPage) => {
// TODO: Implement.
}
First, we need to slice all public keys within the requested page. We can achieve this by using the slice
method on the accountPublicKeys
array. Additionally, if the given page is out-of-bound, slice
will return an empty array, so let’s return early if that’s the case.
const getPage = async (page, perPage) => {
const paginatedPublicKeys = accountPublicKeys.slice(
(page - 1) * perPage,
page * perPage,
);
if (paginatedPublicKeys.length === 0) {
return [];
}
// TODO: Continue implementing.
}
Next, we can use the getMultipleAccounts
RPC method to fetch all of the accounts within the page. This method is called getMultipleAccountsInfo
on the JavaScript client.
const getPage = async (page, perPage) => {
const paginatedPublicKeys = accountPublicKeys.slice(
(page - 1) * perPage,
page * perPage,
);
if (paginatedPublicKeys.length === 0) {
return [];
}
const accountsWithData = await connection.getMultipleAccountsInfo(paginatedPublicKeys);
return accountsWithData;
}
And just like that, we can fetch our accounts data page by page! 🥳
const perPage = 6
const page1 = await getPage(1, perPage)
const page2 = await getPage(2, perPage)
// ...
Remember that the getMultipleAccounts
RPC method can only accept a maximum of 100 public keys. If you need more accounts within a page, you will need to split the paginatedPublicKeys
array into chunks of 100 and make a getMultipleAccounts
call for each of these chunks.
Ordering accounts
So far, we’ve seen how to paginate and/or access subsets of all the accounts in a program. However, no assertions can be made on the order in which we receive these accounts. In fact, calling the exact same endpoint twice can return the accounts in different orders.
So how do we get some control over the order in which we retrieve our accounts?
The key here is to add a little bit of data in the pre-fetch call that scans all available accounts. We need just enough data to successfully reorder our array of public keys before we paginate them or select a subset of them.
Let’s take back our previous example. This time, we’ll want to fetch all CandyMachine
accounts ordered by descending price. Meaning we’ll have the most expensive candy machine first and the cheapest last.
To achieve this, we need to slice the price
property of every account before paginating them. If we look at the following CandyMachine
structure inside the program, we can find out where exactly that price
property resides in the array of bytes.
#[account]
#[derive(Default)]
pub struct CandyMachine { // 8 (discriminator)
pub authority: Pubkey, // 32
pub wallet: Pubkey, // 32
pub token_mint: Option<Pubkey>, // 33
pub items_redeemed: u64, // 8
pub data: CandyMachineData, // See below
}
#[derive(AnchorSerialize, AnchorDeserialize, Clone, Default)]
pub struct CandyMachineData {
pub uuid: String, // 4 + 6
pub price: u64, // 8
// ...
}
From the code above, we can see that the price
property is located at byte 123 (8 + 32 + 32 + 33 + 8 + 4 + 6) and that it uses 8 bytes — or 64 bits. If you’re struggling to understand how I came up with the number of bytes for each property, you might benefit from reading this article and this one too.
Now that we know where the price
property is located, we can use this in our dataSlice
parameter to only fetch that price and nothing else.
const accounts = await connection.getProgramAccounts(candyMachineV2Program, {
dataSlice: { offset: (8 + 32 + 32 + 33 + 8 + 4 + 6), length: 8 }, // Fetch the price only.
filters: [
{ memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
],
})
Trouble in candy paradise
Unfortunately, in this particular case, it doesn’t quite work as expected. I wish it did in order to keep things simple in this article but the truth is, things like that happen frequently when reading accounts whose structure we don’t control and it’s good to know how to tackle these issues.
Okay, so what’s wrong here?
Take a look at the token_mint
property on the account.
#[account]
#[derive(Default)]
pub struct CandyMachine { // 8 (discriminator)
pub authority: Pubkey, // 32
pub wallet: Pubkey, // 32
pub token_mint: Option<Pubkey>, // 33
pub items_redeemed: u64, // 8
pub data: CandyMachineData, // See below
}
We can see it define an optional public key. It uses one byte to determine if there is a public key or not and 32 bytes to store the public key itself — hence the 33 bytes required in total.
So when token_mint
contains a public key, it writes the number “1” on the first byte and the public key on the other 32 bytes. However, when token_mint
doesn’t contain a public key, it writes the number “0” on the first byte and that’s it! It will not use any more storage than that because it does not need to. And that’s where the problem is!
Because the price
property is located after the token_mint
property, whether or not it contains a public key will affect the location of the price
property!
Note that other programs may tackle optional properties differently in order to go around this issue and make their other properties more “searchable”. For example, some programs only use 32 bytes for optional public keys and will store PublicKey::default()
to indicate an empty state. Additionally, some programs might try to store their fixed-size properties first to ensure they are not shifted by properties of variable length. That’s not to say this program didn’t have valid reasons to structure their accounts that way but just to let you know that they are other options with their own trade-offs.
Okay enough chitchat, how do we fix this?
One way would be to fetch all the data between the token_mint
property and the price
property (included). That way we can analyse the first byte of the token_mint
property and slice the price
property at the right location. This can be problematic when the two properties are far away from each other as you’ll end up slicing much more data than you actually need.
Whilst the two properties are relatively close here, let’s have a look at another way to fix this. Since we’ve only got two scenarios to handle here, why not make two different calls to the getProgramAccounts
RPC method? One where the token_mint
is empty and one where it’s not. We know that the first byte of the token_mint
property — determining if there is a public key or not — is located on byte 72 (8 + 32 + 32). Therefore, we just need a memcmp
filter that compares that 72nd byte with “0” and “1”. Then all we need to do is merged the two arrays together.
import bs58 from 'bs58';
import BN from 'bn.js';
const accountsWithTokenMint = await connection.getProgramAccounts(candyMachineV2Program, {
dataSlice: { offset: 8 + 32 + 32 + 33 + 8 + 4 + 6, length: 8 }, // Fetch the price only.
filters: [
{ memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
{ memcmp: { offset: 8 + 32 + 32, bytes: bs58.encode((new BN(1, 'le')).toArray()) } }, // Ensure it has a token mint public key.
],
});
const accountsWithoutTokenMint = await connection.getProgramAccounts(candyMachineV2Program, {
dataSlice: { offset: 8 + 32 + 32 + 1 + 8 + 4 + 6, length: 8 }, // Fetch the price only.
filters: [
{ memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
{ memcmp: { offset: 8 + 32 + 32, bytes: bs58.encode((new BN(0, 'le')).toArray()) } }, // Ensure it doesn't have a token mint public key.
],
});
const accounts = [...accountsWithoutTokenMint, ...accountsWithTokenMint];
There are a few things to notice here.
- The
dataSlice
parameter varies for the two calls to account for the byte shift created by thetoken_mint
property. Notice how33
becomes1
on the second call. - We convert numbers into a Base-58 encoded array of bytes using the bn.js library which is heavily used in the Solana frontend world as numbers tend to exceed the JavaScript limits. We provide
'le'
as a second parameter to theBN
class so it knows the number should be encoded using little endian. - Here we can safely assume that the first byte of the
token_mint
is always either 0 or 1 because the program enforces that constraint. However, if this wasn’t the case, then we would need to use the first approach mentioned above and slice more data.
Phew! That was one tough parenthesis! I’m glad we went through this though because this is typically the sort of exercise you can expect when trying to make optimised calls to a decentralised cluster.
Continuing ordering by price
Right, let’s go back to ordering our accounts by descending price. So far, we’ve managed to use getProgramAccounts
(twice) to list all CandyMachine
accounts in the program whilst including the 8 bytes that store their price (in lamports).
Now, all we need to do is parse these 8 bytes and sort them in descending order to reorder our array of public keys. Let’s start by parsing the price of each account using the map
method.
const accountsWithPrice = accounts.map(({ pubkey, account }) => ({
pubkey,
price: new BN(account.data, 'le'),
}));
Here again, we use the bn.js library to parse an array of little-endian bytes into a BN
object. We won’t try to convert this into a JavaScript number because 8 bytes have the potential of creating an integer bigger than Number.MAX_SAFE_INTEGER
.
Next, we will order this accountsWithPrice
array using the sort
method. This method expects a callback that, given two array items, should return -1, 0 or 1 based on whether or not the first item is before, at the same position or after the second item respectively. Fortunately, the BN
object contains a cmp
(compare) method that does just that.
const sortedAccountsWithPrice = accountsWithPrice.sort((a, b) => b.price.cmp(a.price));
Here we compare the price of b
with the price of a
in order to get the descending order. Returning a.price.cmp(b.price)
would generate the opposite order.
Finally, we can now extract the public keys of this ordered array of accounts.
const accountPublicKeys = sortedAccountsWithPrice.map((account) => account.pubkey);
With that sorted accountPublicKeys
array in hand, we can reuse our getPage
asynchronous methods to fetch accounts in the desired order.
const top20ExpensiveCandyMachines = await getPage(1, 20);
Conclusion
Querying the Solana cluster can be tricky but, with the right tools and the right techniques, we can achieve more complex and/or performant queries.
This is a terrible comparison on many levels but, coming from web 2, it does help me to compare the tools and techniques we’ve seen in this article with SQL clauses to summarise their purpose at a higher level. Please take the table below and the analogy it represents with a tablespoon of salt.
Solana tools and techniques | SQL clauses analogy |
---|---|
getProgramAccounts(programId) |
SELECT * FROM programId |
dataSlice |
SELECT less data. |
Filters (dataSize and memcpm ) |
WHERE |
Pre-fetch with no data + getMultipleAccounts |
LIMIT and OFFSET |
Pre-fetch with some data + sort data + getMultipleAccounts |
ORDER BY |
See you soon for more Solana adventures! 🏕