# IPNI (InterPlanetary Network Indexer)
InterPlanetary Network Indexer (IPNI) (opens new window), also referred to as Network Indexer, indexer and IPNI, enables quick and efficient search of content-addressable data available on the InterPlanetary File System (IPFS) and Filecoin networks.
Using IPNI, IPFS nodes can publish the content IDs (CIDs) of their data to an indexer, and clients can query the indexer to learn where to retrieve the content associated with those CIDs. Opting into indexing by IPNI can be done from:
- Lotus (opens new window), the reference implementation for the Filecoin network (opens new window).
- Boost (opens new window), a tool for Filecoin storage providers to manage Filecoin data onboarding and retrieval.
- A self-hosted IPFS server with the someguy (opens new window) caching proxy for routing lookups to IPNI (read path).
- A production-grade IPFS deployment configured to support the IPNI index-provider (opens new window) sidecar for publishing to IPNI (write path).
IPNI is designed to create an alternate routing and discovery infrastructure outside and independent of the Kademlia Distributed Hash Table (DHT).
For a deeper dive into the technical specification for IPNI, see https://github.com/ipni/specs/blob/main/IPNI.md (opens new window).
# What use-cases IPNI serves
While in-protocol routing and discovery have advanced leaps and bounds in recent versions of Kubo and Helia (opens new window), with well-tuned test server performance benchmarked in the high tens of millions of CIDs announce-able within the 24-hour re-announcement window, there continue to be use-cases where keeping announcements live on the DHT is onerous, or swarms where that volume of messages would not be propagated before the next re-announcement cycle.
By comparison, announcements to an IPNI indexer only need to be made once, making them particularly attractive to announcers of large volumes of infrequently-sought CIDs, like large-scale providers of "cold storage" in the Filecoin economy or archivers of public open data.
To support performant retrievals of unsealed Filecoin and IPFS pinned data with a speed comparable to a CDN, a reliable, distributed index must be assembled that maps data to the peer(s) hosting or caching it, and this index must be replicated to be geographically near the lookups. Comparable lookup and time-to-first-byte metrics are quite difficult to achieve on the DHT.
One advantage to being completely orthogonal to DHT-based announcing and discovery is operational flexibility. Over the life of a gateway or other data provider service, one or more IPNI-style indexer systems can be opted into or out of without affecting DHT performance. The reverse is also true: DHT announcing can be taken down and brought back up independently. Many users have found value in opting into IPNI indexing for long-tail discovery while still announcing all new content to peers in the DHT at time of publication to allow for resilience and caching in realtime. This mixed approach also sidesteps the complexity of re-announcing and keeping DHT announcements circulating over time for content that is not expected to be cached and dynamically distributed peer-to-peer over the course of its lifecycle.
# How IPNI benefits IPFS
The indexer offers several benefits to IPFS, including:
- Faster data retrieval: By maintaining an additional layer of information on top of the DHT, the indexer can help speed up data location and retrieval.
- Reduced resource consumption: The indexer can help reduce the amount of bandwidth and processing power expended circulating and re-announcing CID location records. This is particularly beneficial for nodes in commercial data centers where peer-to-peer network traffic is costly. Unlike DHT announcements which must be repeated every 24 hours, IPNI announcements only need to be made once.
- Improved scalability: With the indexer, IPFS can reduce the portion of network traffic used by large-volume publishers, allowing the rest of the network to scale more effectively and support broader and heterogeneous network topologies.
# The IPNI ecosystem
The IPNI ecosystem consists of three main actors:
- IPNI provider nodes - IPFS peers that advertise content to the IPNI.
- IPNI nodes - Nodes that ingest announcements about the content-addressable data, and service lookup queries
- Retrieval clients - Peers that find content via indexer nodes and fetch it from the providers.
# IPNI provider nodes
IPNI Provider nodes are responsible for cataloging and maintaining the latest list of content they host, as well as the protocols over which the content can be retrieved. The list of content is represented as a chain of immutable "advertisements" that are signed by the content provider's identity. Each advertisement can represent either the addition or removal of content. This property, combined with the chaining of advertisement entries, effectively captures a "diff" of content hosted by the provider over time. When a change in content occurs, the advertisement chain is updated as follows:
- The provider captures the change as a new advertisement.
- The provider announces its existence to the network.
- An IPNI node receives and stores the advertisement.
# IPNI Nodes
IPNI nodes are responsible for continuously listening to provider announcements. Once they receive an announce message, they fetch and parse the advertisement to construct the current list of content hosted by the provider. Because the advertisements themselves are immutable, IPNI nodes can recognize known advertisements and only parse ones they've not seen before. This allows IPNI nodes to handle and scale with very long ad chains, as long as they continuously listen for advertisements and keep up with the latest.
# Retrieval clients
Once advertisements are processed, retrieval clients can look up the resulting index records via a query to an API exposed by IPNI nodes. Given a Content Identifier (CID) or multihash, the API provides a list of index records corresponding to it. Each index record captures the identity of the content provider, its address, and the protocols over which the data can be retrieved from that provider. A retrieval client can then further filter the providers list, e.g., by protocol, and retrieve the content directly from the providers.
# How IPNI is used by IPFS
The indexer works in conjunction with the existing DHT to improve data location and retrieval in IPFS. It maintains an up-to-date index of the network's content which has been advertised to it, and provides an additional layer of information that can be used to quickly locate and retrieve data.
When a user searches for a piece of data using a CID or multihash, the indexer is consulted first. If the data is found in the index, the user is directly connected to the node hosting the data, resulting in faster retrieval. If the data is not found in the index, the user falls back to the traditional DHT-based search, ensuring that the data can still be located even if it's not in the indexer.
By providing this additional layer of information, the indexer helps to speed up data location and retrieval, reduce resource consumption, and improve the overall scalability of IPFS for use-cases where DHT performance, resiliency, or security characteristics alone are insufficient.
# Example: finding providers via /routing/v1
Most of the time, IPFS implementations interact with the IPNI by querying the HTTP endpoint compatible with Delegated Routing V1 API Specification (opens new window).
$ curl https://cid.contact/routing/v1/providers/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi
Endpoint produces human-readable JSON objects:
{
"Providers": [
{
"Addrs": [
"/dns4/bitswap.filebase.io/tcp/443/wss"
],
"ID": "12D3KooWGtYkBAaqJMJEmywMxaCiNP7LCEFUAFiLEBASe232c2VH",
"Protocols": [
"transport-bitswap"
],
"Schema": "peer",
"transport-bitswap": "gBI="
},
{
"Addrs": [
"/ip4/212.6.53.27/tcp/80/http"
],
"ID": "12D3KooWHEzPJNmo4shWendFFrxDNttYf8DW4eLC7M2JzuXHC1hE",
"Protocols": [
"transport-ipfs-gateway-http"
],
"Schema": "peer",
"transport-ipfs-gateway-http": "oBIA"
},
{
"Addrs": [
"/ip4/72.52.65.166/tcp/26101"
],
"ID": "12D3KooWHKChM2uYi4EXREaCGtaxersCsp7hbFiMqMUK8o7CgV6Q",
"Protocols": [
"transport-graphsync-filecoinv1"
],
"Schema": "peer",
"transport-graphsync-filecoinv1": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAg1l4zEzA4zeGlv3N7u4iMbysxTBMRUrquyMVzQejTeh9sVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q=="
}
]
}
Each result follows PeerSchema (opens new window), and is the same format as every other delegated routing endpoint in the IPFS ecosystem. This allows client code reuse across all compatible routing systems.
TIP
To start receiving results immediately, enable streaming responses (opens new window) by passing Accept: application/x-ndjson HTTP Header.
$ curl -H 'Accept: application/x-ndjson' https://cid.contact/routing/v1/providers/bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi
FYI
Light IPFS clients may prefer to query delegated-ipfs.dev/routing/v1 (opens new window) endpoint, which is a someguy caching proxy (opens new window) for both Amino DHT and IPNI at cid.contact.
# Example: finding providers via IPNI-specific /cid endpoint
The IPNI also provides its own HTTP API(s) which may be preferable when IPNI-specific information is desired.
To demonstrate the practical application and usage of IPNI, this section will walk through a hands-on example involving the cid.contact indexer tool. The cid.contact tool leverages IPNI to return provider record information for a given CID.
Open a browser.
In the search field, search for the URL below.
https://cid.contact/cid/bafybeigvgzoolc3drupxhlevdp2ugqcrbcsqfmcek2zxiw5wctk3xjpjwyThe tool uses IPNI to return provider information for CID
bafybeigvgzoolc3drupxhlevdp2ugqcrbcsqfmcek2zxiw5wctk3xjpjwy. Output similar to the following (formatted for the purpose of this example) is displayed:{ "MultihashResults": [ { "Multihash": "EiDVNlzli2ONH3OslRv1Q0BRCKUCsERWs3RbthTVu6Xptg==", "ProviderResults": [ { "ContextID": "AXESIAqACNwDTPpjRLuNw0rCwP4z5ge8p2p+mceS0hjDQdBl", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAgjdNAYM8PDCDyhgEIJKlEGElVgqkxlecqZA+2aJrX8CdsVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooWHbYfcXCUzxCCCkfppiJgvD7eAqhbZTXEMu66EYdqTwCQ", "Addrs": [ "/ip4/195.26.70.31/tcp/24001" ] } }, { "ContextID": "AXESIIDDfCF2O9gTlTCW1jsS94di679rBaiNW2wYuudllV8n", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAgOpmhnBIQKNzyU6ehjrfzmEA+e++NQ+z5mBjI6C7y1B5sVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooWDMJSprsuxhjJVnuQQcyibc5GxanUUxpDzHU74rhknqkU", "Addrs": [ "/ip4/89.20.96.58/tcp/24001" ] } }, { "ContextID": "AXESIBD01Ud5R2aNm17hy5POqaKeNmIzfSNMhnAGzhvNCfK/", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAg1bFuob1/knnbN6PTonjf6wUGeB/qc2hJb4oriOwRjTNsVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooWDMJSprsuxhjJVnuQQcyibc5GxanUUxpDzHU74rhknqkU", "Addrs": [ "/ip4/89.20.96.58/tcp/24001" ] } }, { "ContextID": "AXESID1YhQwxum55WMSHXI6EQbtVpnhm7QwGpDPYCm5bjwbr", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAg7H0Gb8ZK4LC8aijKk56XS4diZvoLv9hcDz6iiE0gJhNsVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooW9yi2xLhXds9HC4x9vRN99mphq6ds8qN2YRf8zks1F32G", "Addrs": [ "/ip4/149.5.22.10/tcp/24002" ] } }, { "ContextID": "AXESIMGxu6/414seq9d+YrGEwonTcCDwNzookG69eGph7cQK", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAgOpmhnBIQKNzyU6ehjrfzmEA+e++NQ+z5mBjI6C7y1B5sVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooWM4wsQ3kdd8CDHiVDQthU9JZ9KqsxSdSQT2xj6TAdDth5", "Addrs": [ "/ip4/61.38.42.252/tcp/20000" ] } }, { "ContextID": "AXESIPM2bykkesWamkYUx5lDVUhDSMnaZ10zi3Fk7+5TBCcC", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAgOpmhnBIQKNzyU6ehjrfzmEA+e++NQ+z5mBjI6C7y1B5sVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooWLDf6KCzeMv16qPRaJsTLKJ5fR523h65iaYSRNfrQy7eU", "Addrs": [ "/ip4/141.138.64.21/tcp/11337", "/ip4/149.6.102.102/tcp/11337" ] } }, { "ContextID": "AXESIO50esRu0SvbUfSGOzLTWfIff1S54seFI/PtyDuPNkzZ", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAg+IcjHXCOHHUf8wNiVtnvhYTPwL5Fqnnr7GyOLZp48R5sVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys", "Addrs": [ "/ip4/76.219.232.45/tcp/24001" ] } }, { "ContextID": "AXESIO50esRu0SvbUfSGOzLTWfIff1S54seFI/PtyDuPNkzZ", "Metadata": "gBI=", "Provider": { "ID": "12D3KooWSoSgVaUvoguDQZu1doytze9RgnnANwJoiLw7KUcAXq8i", "Addrs": [ "/ip4/76.219.232.45/tcp/24888" ] } }, { "ContextID": "AXESIIVMIJ+VCHTZGl8Io8JebgortiwZPeGdWjG7/PMqQedI", "Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAgOpmhnBIQKNzyU6ehjrfzmEA+e++NQ+z5mBjI6C7y1B5sVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==", "Provider": { "ID": "12D3KooWEkQFhSUc17MNC4gimbRYakSSCmDiQwMLhcvToh7bsXbN", "Addrs": [ "/ip4/112.216.168.43/tcp/8999" ] } }, { "ContextID": "YmFndXFlZXJha3ppdzRwaWxuZmV5ZGFtNTdlZ2RxZTRxZjR4bzVuZmxqZG56emwzanV0YXJtbWltdHNqcQ==", "Metadata": "gBI=", "Provider": { "ID": "QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC", "Addrs": [ "/dns4/elastic.dag.house/tcp/443/wss" ] } }, { "ContextID": "YmFndXFlZXJhNWpnZWF6eXRhbWVpZnNwbmlocWk2NnFxejNlNnRzazRuM255Nmo3emxjeGFqcnh2YTNlcQ==", "Metadata": "gBI=", "Provider": { "ID": "QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC", "Addrs": [ "/dns4/elastic.dag.house/tcp/443/wss" ] } }, { "ContextID": "YmFndXFlZXJhd3pjeDJ1YnF6M2E0eTJ3anRoZW90bmR1NGFiZDR2NGt6dWxlNzR4dWNvNjZyMmNkeWRycQ==", "Metadata": "gBI=", "Provider": { "ID": "QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC", "Addrs": [ "/dns4/elastic.dag.house/tcp/443/wss" ] } } ] } ] }
# Response explained
This response returns multiple provider records, which indicates that the data identified by this CID was found at multiple providers. For example, the first provider is specified by the following record:
{
"ContextID": "AXESIAqACNwDTPpjRLuNw0rCwP4z5ge8p2p+mceS0hjDQdBl",
"Metadata": "kBKjaFBpZWNlQ0lE2CpYKAABgeIDkiAgjdNAYM8PDCDyhgEIJKlEGElVgqkxlecqZA+2aJrX8CdsVmVyaWZpZWREZWFs9W1GYXN0UmV0cmlldmFs9Q==",
"Provider": {
"ID": "12D3KooWHbYfcXCUzxCCCkfppiJgvD7eAqhbZTXEMu66EYdqTwCQ",
"Addrs": [
"/ip4/195.26.70.31/tcp/24001"
]
}
This indicates that the data can be:
- Found at a provider identified by a peer
IDof12D3KooWHbYfcXCUzxCCCkfppiJgvD7eAqhbZTXEMu66EYdqTwCQ. - Retrieved as specified by the multiaddress
/ip4/195.26.70.31/tcp/24001.
Additional information is also included:
- The
Metadatafield contains data that the provider uses to locate and deliver the content to a client. - The
ContextIDis used by IPNI to update metadata, add multihash mappings to a provider record, or delete a provider record and the multihash mappings to it.
# Glossary
# Advertisement
A record available from a publisher that contains, a link to a chain of multihash blocks, the CID of the previous advertisement, and provider-specific content metadata that is referenced by all the multihashes in the linked multihash blocks. The provider data is identified by a key called a context ID.
# Announce Message
A message that informs indexers about the availability of an advertisement. This is usually sent via gossip pubsub, but can also be sent via HTTP. An announce message contains the advertisement CID it is announcing, which allows indexers to ignore the announcement if they have already indexed the advertisement. The publisher's address is included in the announcement to tell indexers where to retrieve the advertisement from.
# Context ID
A key that, for a provider, uniquely identifies content metadata. This allows content metadata to be updated or deleted on the indexer without having to refer to it using the multihashes that map to it.
# Gossip Pubsub
Publish/subscribe communications over a libp2p gossip mesh. This is used by publishers to broadcast Announce Messages to all indexers that are subscribed to the topic that the announce message is sent on. For production publishers and indexers, this topic is /indexer/ingest/mainnet.
# Indexer
A network node that keeps mappings of multihashes and CIDs to provider records.
# Metadata
Provider-specific data that a retrieval client gets from an indexer query and passed to the provider when retrieving content. This metadata is used by the provider to identify and find specific content and deliver that content via the protocol.
# Provider
The node from which content can be retrieved by a retrieval client. When multihashes are looked up on an indexer, the responses contain provider that provide the content referenced by the multihashes. A provider is identified by a libp2p peer ID.
# Publisher
The node that publishes advertisements and index data to an indexer. It is usually, but not always, the same as the data provider. A publisher is identified by a libp2p peer ID.
# Retrieval Client
A node that queries an indexer to find where content is available, and retrieves that content from a provider.
# Sync
Operation that synchronizes the content indexed by an indexer with the content published by a publisher. A Sync in initiated when an indexer receives and Announce Message, by an administrative command to sync with a publisher, or by the indexer when there have been no updates for a provider for some period of time (24 hours by default).