Paginating and ordering accounts in Solana

If you’ve ever tried to get all accounts from a Solana program, you’ve probably noticed that you get everything in one go without being able to limit or offset the accounts retrieved. Additionally, we cannot control the order in which we receive them.

This makes it almost impossible for us to paginate and/or order accounts from a given program. Almost impossible.

In this article, we will go through different techniques to learn how to optimise our calls to the Solana cluster to support pagination and account ordering.

The tools at our disposal

Before diving through these techniques, let’s have a quick look at the tools available for us to query the Solana cluster. The rest of the article will focus on how to use and arrange these tools in a way that benefits us.

RPC Methods

Let’s start by having a look at the RPC methods we’ll use to query the cluster.

getProgramAccounts. This RPC method allows us to fetch all accounts owned by a given program. For instance, if you’ve followed the series on how to create a Twitter dApp in Solana, this could fetch all the Tweet accounts of our program.
getAccountInfo. This RPC method allows us to get the account information and data from a given public key.
getMultipleAccounts. This RPC method does the same thing as getAccountInfo except that it retrieve multiple accounts from a provided list of public keys. This enables us to retrieve a bunch of accounts in only one API call to improve performances and avoid rate limiting. Note that the maximum number of accounts we can retrieve in one go using this method is 100.

RPC filtering and slicing

Some of the RPC methods above support additional parameters to either filter or slice the accounts retrieved.

dataSlice. This parameter limits the data retrieved for each account. It expects an object containing an offset — where the data should start — and a limit — how long the data should be. For example, providing { offset: 32, limit: 8 } will only retrieve 8 bytes of data starting at byte 32 for every account. This parameter is available on both getProgramAccounts and getMultipleAccounts RPC methods.
dataSize. This parameter is a filter that only selects accounts whose data is of the given size in bytes. This filter is only available on the getProgramAccounts RPC method. You can read more about this filter here.
memcmp. This parameter is a filter that only selects accounts such that their data matches the provided buffer at a given position. This filter is only available on the getProgramAccounts RPC method. You can read more about this filter here.

Alright, these are the tools at our disposal, now let’s use them!

The naive approach

Let’s start by trying to fetch all CandyMachine accounts from the Candy Machine V2 program from Metaplex.

This is a particularly interesting exercise because there are thousands of accounts in that program and some of them have huge amounts of data.

If we wanted to fetch all accounts within the Candy Machine V2 program, that’s how we could do it.

import { Connection, clusterApiUrl, PublicKey } from '@solana/web3.js';

const candyMachineV2Program = new PublicKey('cndy3Z4yapfJBmL3ShUp5exZKqR3z33thTzeNMm2gRZ');
const connection = new Connection(clusterApiUrl('mainnet-beta'))

const accounts = await connection.getProgramAccounts(candyMachineV2Program)

I don’t recommend you to run this piece of code. If you do, chances are your browser tab will stop working after having downloaded hundreds of megabytes of data.

Also, note that we didn’t need to provide a dataSize or a memcmp filter to ensure we retrieve CandyMachine accounts because that’s the only type of account available in the Candy Machine V2 program. That being said, this won’t always be the case and it’s a good practice to be explicit about which account we’re looking for. So let’s add a filter anyway.

We can’t use a dataSize filter here because the size of a CandyMachine account is not static and depends on its content. So we need to use the memcmp filter on the first 8 bytes to compare the hash of the account type — called the discriminator.

Since this is an Anchor program, account discriminators should be the first 8 bytes of the SHA-256 hash of "account:CandyMachine". So let’s compute that discriminator and ensure it is present on the first 8 bytes of every account we retrieve using a memcmp filter.

import { sha256 } from "js-sha256";
import bs58 from 'bs58';

// ...

const candyMachineDiscriminator = Buffer.from(sha256.digest('account:CandyMachine')).slice(0, 8);

const accounts = await connection.getProgramAccounts(candyMachineV2Program, {
    filters: [
        { memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
    ],
})

Note that Anchor adds the filter automatically for you when using the API it provides — e.g. program.account.candyMachine.all().

Okay, that’s all nice but we haven’t fixed our issue as running the code above will still try to fetch all the data from all candy machines ever created in this program.

It’s time we explore how to paginate this.

Pre-fetching without data

The key is to pre-fetch the accounts once without any data.

You might need the account data later on, but pre-fetching them without data will allow us to scan all the accounts we need and paginate them by fetching their data page by page.

So how do we do this?

Easy! All we need to do is provide a dataSlice parameter with a length of zero.

const accounts = await connection.getProgramAccounts(candyMachineV2Program, {
    dataSlice: { offset: 0, length: 0 }, // Fetch without any data.
    filters: [
        { memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
    ],
})

Note that I kept our previous memcmp filter but this will work with any filters you want.

And that’s it! Now we should have the entire list of CandyMachine accounts within the program. Because we didn’t ask to retrieve their data, this endpoint is much faster and requires much less memory than the previous endpoint. In addition, this will be significantly more performant when dealing with heavy accounts and/or as the program contains more and more accounts.

Now what?

Well, first of all, we get the total count of our filtered accounts for free.

const accountsInTotal = accounts.length

This will be helpful when paginating the accounts as we’ll know when we’ve reached the last page.

Additionally, and most importantly, we might not get any account data but we do get the public key of each account.

const accountPublicKeys = accounts.map(account => account.pubkey)

That means we now have enough information to fetch the missing data page by page.

Fetching data page by page

Now that we have the exhaustive list of public keys that interest us, let’s implement a getPage function which will return all accounts in a given page with their data.

// ...

const accountPublicKeys = accounts.map(account => account.pubkey)

const getPage = async (page, perPage) => {
    // TODO: Implement.
}

First, we need to slice all public keys within the requested page. We can achieve this by using the slice method on the accountPublicKeys array. Additionally, if the given page is out-of-bound, slice will return an empty array, so let’s return early if that’s the case.

const getPage = async (page, perPage) => {
    const paginatedPublicKeys = accountPublicKeys.slice(
        (page - 1) * perPage,
        page * perPage,
    );

    if (paginatedPublicKeys.length === 0) {
        return [];
    }

    // TODO: Continue implementing.
}

Next, we can use the getMultipleAccounts RPC method to fetch all of the accounts within the page. This method is called getMultipleAccountsInfo on the JavaScript client.

const getPage = async (page, perPage) => {
    const paginatedPublicKeys = accountPublicKeys.slice(
        (page - 1) * perPage,
        page * perPage,
    );

    if (paginatedPublicKeys.length === 0) {
        return [];
    }

    const accountsWithData = await connection.getMultipleAccountsInfo(paginatedPublicKeys);

    return accountsWithData;
}

And just like that, we can fetch our accounts data page by page! 🥳

const perPage = 6

const page1 = await getPage(1, perPage)
const page2 = await getPage(2, perPage)
// ...

Remember that the getMultipleAccounts RPC method can only accept a maximum of 100 public keys. If you need more accounts within a page, you will need to split the paginatedPublicKeys array into chunks of 100 and make a getMultipleAccounts call for each of these chunks.

Ordering accounts

So far, we’ve seen how to paginate and/or access subsets of all the accounts in a program. However, no assertions can be made on the order in which we receive these accounts. In fact, calling the exact same endpoint twice can return the accounts in different orders.

So how do we get some control over the order in which we retrieve our accounts?

The key here is to add a little bit of data in the pre-fetch call that scans all available accounts. We need just enough data to successfully reorder our array of public keys before we paginate them or select a subset of them.

Let’s take back our previous example. This time, we’ll want to fetch all CandyMachine accounts ordered by descending price. Meaning we’ll have the most expensive candy machine first and the cheapest last.

To achieve this, we need to slice the price property of every account before paginating them. If we look at the following CandyMachine structure inside the program, we can find out where exactly that price property resides in the array of bytes.

#[account]
#[derive(Default)]
pub struct CandyMachine {           // 8 (discriminator)
    pub authority: Pubkey,          // 32
    pub wallet: Pubkey,             // 32
    pub token_mint: Option<Pubkey>, // 33
    pub items_redeemed: u64,        // 8
    pub data: CandyMachineData,     // See below
}

#[derive(AnchorSerialize, AnchorDeserialize, Clone, Default)]
pub struct CandyMachineData {
    pub uuid: String, // 4 + 6
    pub price: u64,   // 8
    // ...
}

From the code above, we can see that the price property is located at byte 123 (8 + 32 + 32 + 33 + 8 + 4 + 6) and that it uses 8 bytes — or 64 bits. If you’re struggling to understand how I came up with the number of bytes for each property, you might benefit from reading this article and this one too.

Now that we know where the price property is located, we can use this in our dataSlice parameter to only fetch that price and nothing else.

const accounts = await connection.getProgramAccounts(candyMachineV2Program, {
    dataSlice: { offset: (8 + 32 + 32 + 33 + 8 + 4 + 6), length: 8 }, // Fetch the price only.
    filters: [
        { memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
    ],
})

Trouble in candy paradise

Unfortunately, in this particular case, it doesn’t quite work as expected. I wish it did in order to keep things simple in this article but the truth is, things like that happen frequently when reading accounts whose structure we don’t control and it’s good to know how to tackle these issues.

Okay, so what’s wrong here?

Take a look at the token_mint property on the account.

#[account]
#[derive(Default)]
pub struct CandyMachine {           // 8 (discriminator)
    pub authority: Pubkey,          // 32
    pub wallet: Pubkey,             // 32
    pub token_mint: Option<Pubkey>, // 33
    pub items_redeemed: u64,        // 8
    pub data: CandyMachineData,     // See below
}

We can see it define an optional public key. It uses one byte to determine if there is a public key or not and 32 bytes to store the public key itself — hence the 33 bytes required in total.

So when token_mint contains a public key, it writes the number “1” on the first byte and the public key on the other 32 bytes. However, when token_mint doesn’t contain a public key, it writes the number “0” on the first byte and that’s it! It will not use any more storage than that because it does not need to. And that’s where the problem is!

Because the price property is located after the token_mint property, whether or not it contains a public key will affect the location of the price property!

Note that other programs may tackle optional properties differently in order to go around this issue and make their other properties more “searchable”. For example, some programs only use 32 bytes for optional public keys and will store PublicKey::default() to indicate an empty state. Additionally, some programs might try to store their fixed-size properties first to ensure they are not shifted by properties of variable length. That’s not to say this program didn’t have valid reasons to structure their accounts that way but just to let you know that they are other options with their own trade-offs.

Okay enough chitchat, how do we fix this?

One way would be to fetch all the data between the token_mint property and the price property (included). That way we can analyse the first byte of the token_mint property and slice the price property at the right location. This can be problematic when the two properties are far away from each other as you’ll end up slicing much more data than you actually need.

Whilst the two properties are relatively close here, let’s have a look at another way to fix this. Since we’ve only got two scenarios to handle here, why not make two different calls to the getProgramAccounts RPC method? One where the token_mint is empty and one where it’s not. We know that the first byte of the token_mint property — determining if there is a public key or not — is located on byte 72 (8 + 32 + 32). Therefore, we just need a memcmp filter that compares that 72nd byte with “0” and “1”. Then all we need to do is merged the two arrays together.

import bs58 from 'bs58';
import BN from 'bn.js';

const accountsWithTokenMint = await connection.getProgramAccounts(candyMachineV2Program, {
    dataSlice: { offset: 8 + 32 + 32 + 33 + 8 + 4 + 6, length: 8 }, // Fetch the price only.
    filters: [
        { memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
        { memcmp: { offset: 8 + 32 + 32, bytes: bs58.encode((new BN(1, 'le')).toArray()) } }, // Ensure it has a token mint public key.
    ],
});

const accountsWithoutTokenMint = await connection.getProgramAccounts(candyMachineV2Program, {
    dataSlice: { offset: 8 + 32 + 32 + 1 + 8 + 4 + 6, length: 8 }, // Fetch the price only.
    filters: [
        { memcmp: { offset: 0, bytes: bs58.encode(candyMachineDiscriminator) } }, // Ensure it's a CandyMachine account.
        { memcmp: { offset: 8 + 32 + 32, bytes: bs58.encode((new BN(0, 'le')).toArray()) } }, // Ensure it doesn't have a token mint public key.
    ],
});

const accounts = [...accountsWithoutTokenMint, ...accountsWithTokenMint];

There are a few things to notice here.

The dataSlice parameter varies for the two calls to account for the byte shift created by the token_mint property. Notice how 33 becomes 1 on the second call.
We convert numbers into a Base-58 encoded array of bytes using the bn.js library which is heavily used in the Solana frontend world as numbers tend to exceed the JavaScript limits. We provide 'le' as a second parameter to the BN class so it knows the number should be encoded using little endian.
Here we can safely assume that the first byte of the token_mint is always either 0 or 1 because the program enforces that constraint. However, if this wasn’t the case, then we would need to use the first approach mentioned above and slice more data.

Phew! That was one tough parenthesis! I’m glad we went through this though because this is typically the sort of exercise you can expect when trying to make optimised calls to a decentralised cluster.

Continuing ordering by price

Right, let’s go back to ordering our accounts by descending price. So far, we’ve managed to use getProgramAccounts (twice) to list all CandyMachine accounts in the program whilst including the 8 bytes that store their price (in lamports).

Now, all we need to do is parse these 8 bytes and sort them in descending order to reorder our array of public keys. Let’s start by parsing the price of each account using the map method.

const accountsWithPrice = accounts.map(({ pubkey, account }) => ({
    pubkey,
    price: new BN(account.data, 'le'),
}));

Here again, we use the bn.js library to parse an array of little-endian bytes into a BN object. We won’t try to convert this into a JavaScript number because 8 bytes have the potential of creating an integer bigger than Number.MAX_SAFE_INTEGER.

Next, we will order this accountsWithPrice array using the sort method. This method expects a callback that, given two array items, should return -1, 0 or 1 based on whether or not the first item is before, at the same position or after the second item respectively. Fortunately, the BN object contains a cmp (compare) method that does just that.

const sortedAccountsWithPrice = accountsWithPrice.sort((a, b) => b.price.cmp(a.price));

Here we compare the price of b with the price of a in order to get the descending order. Returning a.price.cmp(b.price) would generate the opposite order.

Finally, we can now extract the public keys of this ordered array of accounts.

const accountPublicKeys = sortedAccountsWithPrice.map((account) => account.pubkey);

With that sorted accountPublicKeys array in hand, we can reuse our getPage asynchronous methods to fetch accounts in the desired order.

const top20ExpensiveCandyMachines = await getPage(1, 20);

Conclusion

Querying the Solana cluster can be tricky but, with the right tools and the right techniques, we can achieve more complex and/or performant queries.

This is a terrible comparison on many levels but, coming from web 2, it does help me to compare the tools and techniques we’ve seen in this article with SQL clauses to summarise their purpose at a higher level. Please take the table below and the analogy it represents with a tablespoon of salt.

Solana tools and techniques	SQL clauses analogy
`getProgramAccounts(programId)`	`SELECT * FROM programId`
`dataSlice`	`SELECT` less data.
Filters (`dataSize` and `memcpm`)	`WHERE`
Pre-fetch with no data + `getMultipleAccounts`	`LIMIT` and `OFFSET`
Pre-fetch with some data + sort data + `getMultipleAccounts`	`ORDER BY`

See you soon for more Solana adventures! 🏕

Paginating and ordering accounts in Solana

The tools at our disposal

RPC Methods

RPC filtering and slicing

The naive approach

Pre-fetching without data

Fetching data page by page

Ordering accounts

Trouble in candy paradise

Continuing ordering by price

Conclusion

Discussions

Discussion

Discussion

Discussion

❡The tools at our disposal

❡RPC Methods

❡RPC filtering and slicing

❡The naive approach

❡Pre-fetching without data

❡Fetching data page by page

❡Ordering accounts

❡Trouble in candy paradise

❡Continuing ordering by price

❡Conclusion

Discussions

The tools at our disposal

RPC Methods

RPC filtering and slicing

The naive approach

Pre-fetching without data

Fetching data page by page

Ordering accounts

Trouble in candy paradise

Continuing ordering by price

Conclusion