Telegram-iOS/docs/instantpage-richtext.md
isaac 0050cc7a08 Rich-message media in gallery/shared-media/preview pipelines via Message.effectiveMedia
Add Message/EngineMessage.effectiveMedia (= message.media when non-empty, else
richText.instantPage.allMedia()) and route the media-consuming sites through it
so a rich message's instant-page media participates in the same pipelines as
normal message.media: shared-media grids/file-rows, search media grid, gallery
open + item nodes + footer, the peer audio/voice playlist, secret-media preview,
resource-by-id resolution, recent downloads, downloaded-media store, delete-time
resource cleanup, cache-usage stats, the in-chat download manager, and the
context-menu / share actions (Save to Camera Roll, copy image, save audio/music
to files). For normal messages effectiveMedia == message.media, so each swap is
behavior-preserving; rich messages render their own bubble via
ChatMessageRichDataBubbleContentNode (not the text/file bubbles), so those paths
are deliberately untouched, as are the forward path (the attribute travels with
the forward) and the markdown-based rich-edit path. First-media scope for now.

See docs/instantpage-richtext.md for the full architecture + invariants.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 23:46:56 +02:00

94 KiB
Raw Permalink Blame History

InstantPage V2 & rich-text message rendering

This file documents the rich-text message pipeline and the InstantPage V2 renderer that backs it.

A rich message is a RichTextMessageAttribute carrying an InstantPage (sent with text: ""), produced when typed markdown contains structure the regular message-entity set can't represent (headings, lists, tables, formulas, nested blockquotes) and drawn by ChatMessageRichDataBubbleContentNode via the InstantPage V2 layout/renderer — including AI-streaming progressive reveal, inline custom emoji, and entity (mention / hashtag / …) cases. It also covers the send / edit / copy / paste round-trips between markdown and InstantPage.

These are detailed, non-obvious invariants — read the relevant section before touching the corresponding code. (Moved out of CLAUDE.md to keep that file focused; CLAUDE.md retains a brief pointer back to here.)

AI streaming animation (rich-text bubbles)

ChatMessageRichDataBubbleContentNode progressively reveals InstantPage V2 content while TypingDraftMessageAttribute is on the message. Mirrors the older animation in ChatMessageTextBubbleContentNode, adapted to the heterogeneous V2 layout. The "Thinking…" indicator is now server-sent as InstantPageBlock.thinking rendered inside the pageView (see "InstantPage thinking blocks" section).

Where things live

File Responsibility
submodules/TelegramUI/Components/StreamingTextReveal/Sources/TextRevealController.swift Pacing controller, shared by both bubbles. EWMA inter-arrival → velocity-smoothed cursor.
submodules/InstantPageUI/Sources/InstantPageRenderer.swift (InstantPageV2TextView) Drawing split: private TextRenderView does draw(_) inside a renderContainer whose layer carries a revealMaskLayer; new chars spawn cropped SnippetLayer siblings of the render container that animate in (blur + alpha + scale + position) and are absorbed into the mask on completion. Ported from InteractiveTextComponent.
submodules/InstantPageUI/Sources/InstantPageV2RevealCost.swift InstantPageV2RevealCostMap + InstantPageV2View.applyReveal(revealedCount:costMap:animated:). Bridges the global width-based cursor to per-text-view char counts (via charCountForWidthBudget) and per-item visibility / table-row pop-in.
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift InstantPageTextLine.characterRects (line-local CT coords, baseline-relative positive-up) populated when computeRevealCharacterRects: true is passed to layoutInstantPageV2(...). Uses CTFontGetBoundingRectsForGlyphs for actual glyph ink, not advance widths.
submodules/TelegramUI/Components/Chat/ChatMessageRichDataBubbleContentNode/... Streaming detection (TypingDraftMessageAttribute), display-link wiring, container sizing. The hardcoded "Thinking…" header was removed; thinking is now rendered by the pageView via InstantPageBlock.thinking.

Non-obvious invariants

  • Cost unit is points of width, not characters. Each item's cost = its width in points along the reading direction. Text contributes sum of glyph ink widths; non-text items contribute frame.width. Table cells are floored at cell.frame.width so narrow- or empty-cell tables don't race through the cursor. Reveal pace becomes "points per second" — uniform across content types.
  • Mask uses per-glyph ink bounds, unioned per line. Each revealed glyph's mask rect comes from CTFontGetBoundingRectsForGlyphs (not advance widths) so italics, accents, descenders are covered exactly. Per line, glyphs are unioned into one mask rect; consecutive fully-revealed lines union further — fully-revealed prefix is always one CALayer.
  • containerNode does ALL the clipping. During streaming, containerNode is sized to revealedItemsMaxY (no header offset, no closing pad; streamingHeaderOffset is 0.0). The bubble itself is taller (revealedContentSize.height + 2) — the strip below containerNode is empty bubble background. pageView keeps its full pageLayout.contentSize; anything past containerNode's bottom is clipped at containerNode (clipsToBounds = true set in init). Do NOT shorten the pageView or set pageView.clipsToBounds.
  • The pageView is REUSED across stableVersion bumps for the same message id. ensurePageView calls existing.renderContext?.updateContent(webpage:) (where webpage is now a public private(set) var with an updateContent mutator) and returns the existing view; update(layout:) then diffs item views by stable id, tearing down only views whose block was removed. The pageView is rebuilt only when the bubble is recycled with a different message or webpage. The reveal cursor on TextRevealController persists across chunks; the seed re-apply (applyReveal(revealedCount: previousAnimateGlyphCount, …, animated: false)) is now a continuation from the reused views' state, eliminating the per-chunk flash-of-full-text-then-mask that required the earlier from-scratch re-seed.
  • Layout cache key includes message.stableVersion. Each AI chunk bumps stableVersion; without this the cached layout would shadow newly-arrived content.
  • TypingDraftMessageAttribute is the streaming gate. Same trigger TextBubble uses. The InstantPage's isComplete flag is informational only.
  • Width-based cost → char count bridge. Mask APIs (updateRevealCharacterCount) still take character counts. applyRevealEntry calls charCountForWidthBudget(textItem:widthBudget:) to translate the width-based local cursor into the per-text-view character count.
  • The hardcoded "Thinking…" header was removed. streamingStatusTextNode, streamingStatusShimmerView, and the header-layout machinery no longer exist. streamingHeaderOffset is now a constant 0.0 — the pageView starts at the top of the bubble. The "Thinking…" indicator is now server-sent as InstantPageBlock.thinking and rendered inside the pageView (see "InstantPage thinking blocks" section below).
  • Display-link tick re-layouts on extent change. Tick reads revealedContentSize at the new cursor; if the height differs from the previous cursor, calls requestFullUpdate. So the bubble grows in flight when the cursor crosses a line/item boundary, not just between chunks. Tick passes animated: true to applyReveal to fire the snippet pop-in.

Status node (date/time/checks) positioning

The ChatMessageDateAndStatusNode mirrors TextBubble's placement, adapted to the heterogeneous V2 layout. The node is a child of self (the content node), not of the clipping containerNode, so it is never clipped — the bubble height must be grown to contain it.

  • X is a fixed left edge, not the last line's minX. Anchor x = pageHorizontalInset (10pt, the page layout's text inset; pageView sits at self-x 0). The status layout is measured with boundingWidth - 2·pageHorizontalInset (mirrors TextBubble's boundingWidth - sideInsets) so the right-aligned date lands at the right inset instead of off the bubble. Using lastTextLineFrame.minX (which is large for nested/indented last lines) shoved the date off to the right.
  • Trail the last line only when the bottom-most item is text. lastTextLineFrameIfLastItemIsText(in:) (in InstantPageV2Layout.swift) returns the last line frame only when the bottom-most top-level item (max maxY) is a .text; otherwise nil, so the date wraps below all content (anchored at contentSize.height). For tables/images/etc. the date must not trail text buried above the final item.
  • InstantPage draws the baseline at the line frame's maxY (InstantPageRenderer draws each line at lineOrigin.y + lineFrame.height), so the visible text of a plain line sits ~5pt below maxY. A date that trails on the line (statusHeight == 0) adds trailingBottomPadding (5pt) to align with the text; a date that wraps onto its own line below (statusHeight > 0) sits at the bare maxY. The pad is 0 for lines taller than their font line height (a tall inline attachment, e.g. a formula, already pushes maxY down). lastTextLineFrameIfLastItemIsText returns (frame, trailingBottomPadding); the bubble applies the pad only in the trailing case.
  • Bubble height leaves ~6pt below the date. One unified formula for all cases: boundingSize.height = max(boundingSize.height, statusBottomEdge + 6.0), where statusBottomEdge = statusAnchorY + max(1, statusHeight). The statusAnchorY in the measure (continue) closure must mirror the statusFrameY in the apply closure exactly, or the date will be clipped/misplaced. (streamingHeaderOffset is 0.0 — there is no header offset to add.) 6pt matches TextBubble's bottom bubble inset.
  • hasDraft adds the same 6pt at the streaming site. The status max() above is gated by !hasDraft, so during streaming (status hidden, alpha=0) it can't supply the bubble's bottom inset. A separate boundingSize.height += 6.0 inside if hasDraft in the SizeBlock closure does it instead — same 6pt, so the streaming bubble's bottom breathing room matches its post-stream height and there's no 6pt grow-pop when the status node fades in at finalize. The hadDraft && !hasDraft finalize pass doesn't need it because !hasDraft re-enables the status max(). If you ever refactor the +6.0 constant out of the status max() into a bottomInset (TextBubble's pattern), kill this separate term at the same time — they're two ends of the same invariant.

InstantPage V2 table — flush frame, inset borders, rounded corners

A V2 .table block's item frame is full-width / flush with the bubble interior (so a horizontally-scrollable wide table's scroll container bleeds edge-to-edge), but the actual grid borders start at the body-text side inset — matching the V1 renderer. The grid card also has a 10pt rounded outer border.

Non-obvious invariants

  • InstantPageV2TableItem.contentInset (= page horizontalInset) is the linchpin. layoutTable (InstantPageV2Layout.swift) sizes columns against contentBoundingWidth = boundingWidth horizontalInset·2 (so a fitting table aligns with body text on both sides) and stores contentInset on the item; the item frame.width is the flush boundingWidth, and contentSize.width stays the bare grid width (totalWidth, no inset).
  • The renderer (InstantPageV2TableView) realizes the inset as a view shift, not baked coordinates. In init AND update it shifts the grid contentView to x: contentInset, sets scrollView.contentSize.width = contentSize.width + contentInset * 2.0 (margin on both sides, mirroring V1's InstantPageScrollableNode), and scrollView.clipsToBounds = true. Cells, inner border lines, and the title stay x=0-relative inside contentView, so the single shift carries them all; the rounded outer border is contentView.layer's own border (see below), which wraps the shifted layer automatically.
  • Scrollable tables clip to the full width with no inset on the clip. The inset lives inside the scroll content as a symmetric margin on both sides (contentInset * 2.0): a fitting table (grid + 2·inset ≤ boundingWidth) doesn't scroll and shows both-side inset; an overflowing table rests with its left border at the inset and scrolls until its right border reaches a matching trailing inset (it does not jam flush against the screen edge — matches V1). The scroll-indicator threshold and contentSize.width use the same + contentInset * 2.0, so "does it scroll" is exactly grid > boundingWidth 2·inset.
  • Manual cell-coordinate helpers MUST add contentInset. Because the shift is a real contentView frame change, UIKit hitTest and self.convert(_:to:) paths (propagateVisibilityRect, the row-reveal mask) handle it automatically — but the manual coordinate helpers findTextItem / collectSelectableTextItems (the live tap / URL / text-selection path) compute cell/title positions arithmetically and must add table.contentInset to the x-offset, or in-cell hit-testing is off by the inset. (These helpers still do not account for the table's live horizontal scrollView.contentOffset — a pre-existing limitation, so in-cell hit-testing is only correct at scroll offset 0.) The dead-but-symmetric lastTextLineFrame(in:) table branch has the same omission but has no callers.
  • The 10pt rounded outer border is contentView.layer's own border, NOT sublayers. v2TableCornerRadius = 10.0 (InstantPageV2Layout.swift). The renderer sets contentView.layer.cornerRadius/borderColor/borderWidth = bordered ? v2TableBorderWidth : 0.0 in BOTH init and update (the four straight outer-edge rect layers were removed; lineLayers now holds only inner grid lines). Border-only — deliberately no masksToBounds: cornerRadius rounds the layer's border without clipping contents (filled corner cells round their own fills separately — see next bullet), and there is zero interaction with the streaming reveal mask (contentView.layer.mask, set only during AI streaming) — the border reveals row-by-row with the rows and is part of the masked layer. The rounded card belongs to the grid (scrolls with it). For a non-empty-title table (never produced by markdown/AI), the border wraps title+grid since contentView includes the title region — an accepted, approved nuance.
  • Filled corner cells round their own fills to match the border. A header/striped cell's background is a stripe CALayer; tableStripeCornerMask(cellFrame:gridWidth:gridHeight:effectiveBorderWidth:) detects which grid corners the cell's (grid-local) frame touches — firstCol/firstRow via frame.min{X,Y} <= effectiveBorderWidth/2 + 0.5, lastCol/lastRow via frame.max{X,Y} >= grid{Width,Height} - … (gridWidth = item.contentSize.width, gridHeight = item.contentSize.height - gridOffsetY) — and rounds only those corners: stripe.cornerRadius = max(0, v2TableCornerRadius - effectiveBorderWidth) (the -borderWidth leaves an even border ring; borderless → full radius) + stripe.maskedCorners, in BOTH init and update. A CALayer's backgroundColor honors cornerRadius+maskedCorners with no masksToBounds. A full-width (colspan) header rounds both top corners; a one-row filled table rounds all four; bottom corners round only when the last row is filled. The empty-mask branch resets cornerRadius = 0 and maskedCorners = [] so reused stripes (persist across streaming chunks) don't keep stale rounding. Detection is grid-local, so it's independent of the contentInset shift / horizontal scroll.

InstantPage V2 block media — flush (edge-to-edge), un-rounded

Every V2 block-media kind lays out flush with the bubble interior (0 inset, full bounding width) and un-rounded (cornerRadius 0). The bubble's existing rounded clipping container rounds any media that meets the bubble's top/bottom edge. V1 (InstantPageLayout.swift) is unchanged. (Audio is also full-width / x = 0 as of the V2 audio port, but it does not use this helper — it has its own layoutAudio arm; the wrapped InstantPageAudioNode supplies its own 17pt internal content inset. See the "InstantPage V2 audio/music" section below.)

Where things live

File Responsibility
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift instantPageV2MediaFrame(naturalSize:flush:cornerRadius:boundingWidth:horizontalInset:) — the shared frame helper; instantPageV2MediaEdgeBleed constant; the flush: Bool parameter on layoutTypedMediaWithCaption (image/video/webEmbed-cover/map) and layoutMediaWithCaption (webEmbed-placeholder/postEmbed/channelBanner/relatedArticles). (Collage/slideshow and audio no longer route through these — see their dedicated sections.)
submodules/InstantPageUI/Sources/InstantPageV2MediaViews.swift, …/InstantPageRenderer.swift (InstantPageV2MediaPlaceholderView) Renderer — no change needed: every media view + the placeholder view already does clipsToBounds = item.cornerRadius > 0.0, so cornerRadius 0 means the view doesn't self-clip; the bubble's containerNode clips.
…/Chat/ChatMessageRichDataBubbleContentNode/… The clipping container: containerNode (clipsToBounds = true, cornerRadius = layoutConstants.image.defaultCornerRadius ≈ 1516pt) is what rounds flush media at the bubble edge.

Non-obvious invariants

  • flush is a parameter, not inferred from cornerRadius. Every remaining media call site now passes flush: true. Audio — the former lone flush: false caller — was moved to its own layoutAudio arm in the V2 audio port, so instantPageV2MediaFrame's flush == false branch is now dead code (a candidate for a follow-up cleanup: drop the flush parameter and the inset branch entirely). On the flush path the helper forces the returned corner radius to 0 regardless of the caller's cornerRadius argument (the legacy 8.0/0.0 args at the call sites are now inert — kept as-is, documented in the helper).
  • Small images are NOT upscaled. The scale = min(availableWidth / naturalSize.width, 1.0) cap is kept (now against availableWidth = boundingWidth). A small image stays at natural size, flush-left at x = 0 (not stretched to full width). Large images (the common server/AI case) fill the width.
  • Full-width media bleeds instantPageV2MediaEdgeBleed (4pt) past the trailing edge. The pageView sits at x: -1 inside containerNode (a border-hiding hairline), so a frame at x: 0, width: boundingWidth falls ~1px short of the container's right rounded-clip edge → a 1px corner notch. A small over-bleed on full-width items only (fillsWidth = scaledSize.width >= availableWidth - 1.0) closes it; a genuinely small image gets no bleed. The bleed never widens the bubble because layoutInstantPageV2 clamps contentSize.width = min(maxX, boundingWidth) (gated by context.fitToWidth, which both callers — the rich bubble and the send preview — pass true).
  • Captions stay inset. layoutCaptionAndCredit is still called with the page horizontalInset and offset by the un-bled scaledSize.height; the caption/credit text is inset under a full-bleed image. The isCover && captionHeight > 0 cover-padding block is unchanged.
  • Audio is no longer routed through this helper. As of the V2 audio port it has a dedicated layoutAudio arm emitting a typed .mediaAudio item at a full-width (x = 0), height-48 frame (matching V1 InstantPageLayout.swift); the wrapped InstantPageAudioNode self-insets its content by 17pt, and audio does not participate in instantPageV2MediaEdgeBleed (its node background is transparent). See the dedicated "InstantPage V2 audio/music" section below.
  • .map blocks get a 600×300 (2:1) fallback when the sender omits dimensions. AI/server-sent .map blocks can arrive with dimensions == 0×0 (the wire w/h are required Int32, but the sender may put 0; our pageBlockMap parse and both serializers — Postbox sw/sh, FlatBuffers required dimensions — preserve whatever arrives, so the zero originates upstream). A zero naturalSize.height hits instantPageV2MediaFrame's else branch and returns a height-0 frame: the map collapses to no space, the caption slides up into it, and the V1 node's pin (positioned at size.height*0.5 10 pinSize/2) floats over the caption. The .map arm in InstantPageV2Layout.swift substitutes PixelDimensions(600, 300) whenever width <= 0 || height <= 0, and feeds that effectiveDimensions to BOTH the layout naturalSize AND the InstantPageMapAttribute — the latter is essential because a MapSnapshotMediaResource(width:0,height:0) makes MKMapSnapshotter render nothing, so fixing only the frame would yield a correctly-sized blank box. Real web-article maps (the V1 renderer) always carry real dimensions, so V1 never trips this; the fallback is deliberately scoped to the V2 .map arm rather than V1 or the wire/parse layer.

InstantPage V2 audio/music

InstantPageBlock.audio renders in V2 as a control styled exactly like the standard music message bubble (ChatMessageInteractiveFileNode's music layout) — a dedicated InstantPageV2AudioContentNode, NOT the V1 InstantPageAudioNode (which V2 used in the first iteration and which still backs V1's full-page Instant View). It replaces the earlier inert grey .mediaPlaceholder(kind: .audio). Playback stays on InstantPageMediaPlaylist, with two deliberate behavior changes for the rich-message context: the shared playlist identity is message-scoped so concurrent rich-message audio bubbles don't collide, and rich-message audio files are fetched via a message reference (not the synthesized webpage) so a stale file reference can revalidate.

Specs: 2026-06-02-instantpage-v2-audio-design.md (initial port) + 2026-06-02-instantpage-v2-audio-file-style-design.md (file-bubble styling). Plans: 2026-06-02-instantpage-v2-audio.md + 2026-06-02-instantpage-v2-audio-file-style.md.

Where things live

File Responsibility
submodules/InstantPageUI/Sources/InstantPageMediaPlaylist.swift InstantPageMediaPlaylistId is a public enum.instantPage(webpageId:) (V1 full-page IV) / .richMessage(messageId:) (V2 rich bubble). InstantPageMediaPlaylist.init takes an injected playlistId: (no longer derived from the webpage) and a messageReference: MessageReference? threaded into each InstantPageMediaPlaylistItem. The item's fileReference(_:) helper builds a .message(message:media:) file reference when a (resolvable-id) message reference is present, else the legacy .webPage(...).
submodules/InstantPageUI/Sources/InstantPageV2AudioContentNode.swift The V2 control — replicates ChatMessageInteractiveFileNode's music layout: a Ø44 SemanticStatusNode (album art via playerAlbumArt + play/pause) + a small bottom-right streamingStatusNode download/progress overlay + title/performer TextNodes + a line MediaPlayerScrubbingNode. Big control play/pause from our filteredPlaylistState; small overlay download/progress from messageMediaFileStatus; tap via a UITapGestureRecognizer (controlTapped routes fetch / play / togglePlayPause); fetch via messageMediaFileInteractiveFetched(fetchManager:…).
submodules/InstantPageUI/Sources/InstantPageAudioNode.swift V1 only (full-page Instant View) — unchanged except init takes an injected playlistId:. No longer used by V2.
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift InstantPageV2MediaAudioItem (frame/media/webPage — no cornerRadius/attributes); the .mediaAudio InstantPageV2LaidOutItem case + its frame/offsetBy/collectMedias arms; the .audio block's layoutAudio arm (full-width x = 0, height 44 — the file node's music normHeight; the InstantPageMedia carries caption: nil/credit: nil, the visible caption is a separate item via layoutCaptionAndCredit).
submodules/InstantPageUI/Sources/InstantPageV2MediaViews.swift InstantPageV2MediaAudioView (hosts InstantPageV2AudioContentNode via the shared WrapperRef weak-box pattern; wires its play/togglePlayPause/seek/fetch closures + the filteredPlaylistState playback signal) + handleOpenAudioTap (builds the playlist + setPlaylist, mirroring V1's InstantPageControllerNode.openMedia).
submodules/InstantPageUI/Sources/InstantPageRenderer.swift InstantPageV2RenderContext.message: MessageReference? (carries both the playlist-key id via .id AND the file-fetch reference); the .mediaAudio arms in stableId/reuse/makeItemView.
submodules/InstantPageUI/Sources/InstantPageV2RevealCost.swift .mediaAudio is a non-text reveal entry charging frame.width (like other media).
submodules/TelegramCore/Sources/Network/FetchedMediaResource.swift The .message media-reference revalidation arm also searches RichTextMessageAttribute.instantPage.media (not just message.media), so a stale instant-page file reference inside a rich message can recover.
rich bubble + send preview ChatMessageRichDataBubbleContentNode passes message: MessageReference(item.message); ChatSendMessageRichTextPreview passes message: nil.

Non-obvious invariants

  • The playlist key is message-scoped, NOT webpage-scoped, for rich bubbles. Every rich message synthesizes its TelegramMediaWebpage with the SAME constant id (namespace: 0, id: 0) (ChatMessageRichDataBubbleContentNode), and mediaIndex restarts at 0 per page — so keying playback by (webpageId, mediaIndex) (V1's scheme) would make two audio bubbles on screen share/fight playback state (scrubber + play/pause icon). The discriminated InstantPageMediaPlaylistId.richMessage(messageId) isolates them. The audio view resolves renderContext.message?.id.richMessage(messageId), else .instantPage(webpageId:); the send preview (no message) takes the webpage fallback — harmless since only one preview is ever on screen. The V1 full-page IV path is byte-identical (always .instantPage(...)).
  • InstantPageMediaPlaylistId had to become public. It is exposed through InstantPageMediaPlaylist's public init, which BrowserUI constructs cross-module; an internal type in a public initializer is a hard Swift compile error (independent of -warnings-as-errors). This surfaced only at full-build time — the per-module reasoning didn't catch it.
  • The big control's play/pause comes from OUR playlist, the small overlay's download/progress from the resource status — two separate signals. The file node (ChatMessageInteractiveFileNode) for music keys its play/pause off the peer-messages playback model (messageFileMediaPlaybackStatuspeerMessagesMediaPlaylistAndItemId), which our attribute-embedded audio is NOT part of — so InstantPageV2AudioContentNode drives the big statusNode .play.pause from our filteredPlaylistState (keyed by the message-scoped playlistId + InstantPageMediaPlaylistItemId(index:)) and the small streamingStatusNode from messageMediaFileStatus. This split (rather than reusing the file node) is why the redesign is a replicated layout, not a hosted ChatMessageInteractiveFileNode.
  • Fetch MUST go through the fetch manager, not freeMediaFileInteractiveFetched. messageMediaFileStatus's progress (.Fetching) is derived from the fetch manager's hasEntry flag; freeMediaFileInteractiveFetched bypasses the manager (hasEntry stays false), so the overlay would stick on the static download icon and never show the animated ring. The control fetches via messageMediaFileInteractiveFetched(fetchManager:messageId:messageReference:file:…).
  • Tap is a UITapGestureRecognizer, never an ASControl (same invariant as the V1 InstantPageAudioNode play button): ASControl .touchUpInside is cancelled by the chat ListView's gesture system. The plain tapView covers the whole control → controlTapped (fetch-when-remote / togglePlayPause-when-playing / play-else).
  • InstantPageV2AudioContentNode.updatePresentationData must refresh EVERYTHING theme/incoming-dependent. TextNode (unlike ASTextNode) has no stored attributedText — the strings live in titleAttributedString/descriptionAttributedString and are fed to TextNode.asyncLayout. On an in-place theme/direction change updatePresentationData rebuilds those strings AND statusNode.backgroundNodeColor + foregroundNodeColor + overlayForegroundNodeColor + scrubbingNode.updateColors(…); missing any leaves a stale-colored control. Font size is presentationData.chatFontSize.baseDisplaySize (plain PresentationData has no .fontSize).
  • Audio is NOT a gallery item. InstantPageV2MediaAudioView does not register in the root media registry (no didMoveToWindow/registerInRootRegistry) and returns nil from instantPageTransitionNode / no-ops instantPageUpdateHiddenMedia — explicit per-class witnesses, not the protocol-extension default. Its media IS enrolled in collectMedias/allMedias() so handleOpenAudioTap can gather the page's sibling voice/music files for the playlist (matching V1's mediasFromItems). The WrapperRef weak box breaks the wrapper → node → closure → wrapper retain cycle (the play closure captures only the box + value locals, never self).
  • Full-width item frame, file-node internal layout. The .audio arm lays the item at x = 0, width = boundingWidth, height = 44 (the file node's music normHeight), NOT inset by horizontalInset. The control's internal geometry is copied from the file node's non-thumbnail music branch (Ø44 control at x = 3, controlAreaWidth = 55, title at x = 55). Music-only: any voice file renders music-style (no waveform/transcription). No edge-bleed.
  • Audio files fetch via a message reference (the former recipient-fetch risk is resolved). InstantPageMediaPlaylistItem.fileReference(_:) builds .message(message: messageReference, media: file) when the playlist carries a resolvable-id MessageReference (rich bubbles), else the legacy .webPage(...) (V1 full-page IV, whose webpage is real). The fetch-reference fallback uses the same message?.id != nil test as the playlist-key fallback, so a .none-content reference degrades to the webpage path consistently. Because the rich-message file lives in RichTextMessageAttribute.instantPage.media (not message.media), FetchedMediaResource.swift's .message revalidation arm was taught to search the attribute's instant page too — so a stale file reference can re-fetch the message and recover (a synthetic-(0,0)-webpage reference never could, because that webpage doesn't exist server-side). This also fixes a latent pre-existing bug: instant-page image references in rich messages couldn't revalidate either.
  • Fixed a dormant inverted InstantPagePlaylistLocation.isEqual (it returned false for equal locations and true for unequal — backwards). areSharedMediaPlaylistsEqual ANDs the playlist id and location; it gates only seek-forwarding inside setPlaylist, a path the instant-page audio scrubber doesn't take (it uses playlistControl(.seek)), so the bug was inert. The corrected equality is safe even though all rich-message locations share the synthetic (0,0) webpageId: the .richMessage(messageId) id (ANDed in) disambiguates different rich-message playlists.

InstantPage V2 collage & slideshow blocks

InstantPageBlock.collage and .slideshow (grouped photos/videos with a caption — only ever produced by real web Instant View articles; nothing on the markdown/AI path emits them) render in V2 by porting V1. Collage flattens into the existing media-item machinery; slideshow is a dedicated interactive carousel.

Where things live

File Responsibility
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift layoutCollage(...) — mosaic via chatMessageBubbleMosaicLayout (the MosaicLayout module, same engine grouped messages use), emitting one existing .mediaImage/.mediaVideo item per cell. layoutSlideshow(...) + the InstantPageV2SlideshowItem laid-out item (+ its frame/offsetBy/collectMedias arms).
submodules/InstantPageUI/Sources/InstantPageV2SlideshowView.swift The carousel view: a paged UIScrollView of InstantPageImageNode pages + a PageControlNode, with all pages created eagerly.
…/InstantPageRenderer.swift InstantPageItemView.instantPageTransitionNode(for:) / instantPageUpdateHiddenMedia(_:) (gallery hooks, nil/no-op defaults); transitionArgsFor/applyHiddenMedia dispatch through them. The .slideshow arms in InstantPageV2ItemKind/stableId/reuse/makeItemView.
…/InstantPageV2RevealCost.swift .slideshow is a non-text reveal entry (collage cells already are, being top-level media items).

Non-obvious invariants

  • Collage is a flatten, not a container. layoutCollage computes the mosaic, then emits each cell as an ordinary top-level .mediaImage/.mediaVideo item (cornerRadius 0) into the parent layout — exactly as V1 does (flattenedItemsWithOrigin). Consequence: gallery enumeration (allMedias), the media registry, hidden-media, the reveal-cost map, and view reuse all handle collage cells for free, with no collage-specific code in any of those subsystems. There is no .collage laid-out item or view.
  • Right-edge collage cells bleed 4pt (instantPageV2MediaEdgeBleed, applied only to MosaicItemPosition.right cells) for the same bubble-rounded-clip reason as full-width single media; interior gaps are the mosaic's 1pt spacing; outer corners are rounded by the bubble's containerNode.
  • Slideshow IS a container (it's swipeable), so it gets its own laid-out item + view, unlike collage. Adding the .slideshow case to InstantPageV2LaidOutItem forces a .slideshow arm in every no-default switch over it: frame, offsetBy, stableId, reuse, makeItemView, and the reveal-cost computeEntries (plus collectMedias, which has a default but needs the arm to enumerate slideshow medias for the gallery).
  • Slideshow pages are created eagerly, deviating from V1's lazy central±1 paging. In a chat bubble a slideshow is a handful of images, so eager creation avoids V1's index bookkeeping and makes the gallery transition source available for every page (even off-screen). Height = the tallest image fitted(boundingWidth × 1200); only .image inner blocks render (matches V1 — videos become empty pages).
  • The slideshow registers under EVERY contained media index, and re-registers on an in-window rebuild. Its stableId is positional (.positional(.slideshow, position), not .media(index) like the static media views), so it can be reused for a different slideshow at the same block position; rebuildPages() re-runs registerMedias() (guarded by window != nil) so the new indices land in the registry. The gallery hooks iterate the live page nodes and match by InstantPageMedia identity, so registering one view under N indices is idempotent.
  • The 4 static media views answer the gallery hooks with explicit per-class witnesses, NOT a shared protocol-extension override — an extension-only implementation is statically dispatched and would silently bind to the nil default when invoked through the InstantPageItemView-typed registry wrapper.

InstantPage V2 text item height (true font line box)

layoutTextItem (InstantPageV2Layout.swift) sizes a .text item to the true font line height, not the cap box. A single-line item measures exactly fontAscent + fontDescentBelowBaseline (A + D); the old behavior was the cap box fontLineHeight = floor(fontAscent + fontDescent) (A D).

Non-obvious invariants

  • Two edits in layoutTextItem: the line stack starts at lineBoxTopInset = max(0, fontAscent fontLineHeight) (was 0), and the returned height is lines.last.frame.maxY + extraDescent + fontDescentBelowBaseline (the + fontDescentBelowBaseline contains the last line's descender). Net: every text item grows ~(A L) + D (~8pt @17pt) and its glyphs draw ~lineBoxTopInset (~4pt) lower within their box; the page grows.
  • Per-line frames stay the cap box (height = lineAscent = fontLineHeight). Only the stack's starting origin moves and the total is padded — so the baseline is still drawn at each line frame's maxY, inter-line advance (lineAscent + fontLineSpacing + extraDescent) is unchanged, and decorations / inline attachments / characterRect / the reveal mask (all line-frame-relative) translate consistently.
  • lineBoxTopInset is exact, NOT pixel-snapped — it is an intra-item line offset; crispness rides on the item's own pixel-snapped frame origin (intra-item line positions may already be fractional, e.g. after a non-integral extraDescent).
  • Formulas / tall inline content still inflate via lineAscent/extraDescent; the "\u{200b}"+anchors height = 0 case is preserved.
  • Inline custom emoji are sized to ≈ the line box so they fit the taller box rather than overflowing it (see "Inline custom emoji").

Inline custom emoji (RichText.textCustomEmoji)

RichText.textCustomEmoji(fileId:alt:) renders an inline animated custom emoji inside rich-data bubbles. Covers API parsing, Postbox + FlatBuffers serialization, and display in the InstantPage V2 renderer; the emoji participates in the streaming reveal above. (The send / edit / copy / paste round-trip that produces .textCustomEmoji from typed markdown is a separate section below: "Custom emoji in markdown messages".)

Where things live

File Responsibility
submodules/TelegramCore/Sources/SyncCore/SyncCore_RichText.swift Enum case textCustomEmoji(fileId: Int64, alt: String) + Postbox coding (discriminator 17, keys ce.f/ce.a), ==, plainText (returns alt), and FlatBuffers codec.
submodules/TelegramCore/FlatSerialization/Models/RichText.fbs FlatBuffers schema — RichText_CustomEmoji union member + table. Source of truth; the Bazel flatc genrule regenerates *_generated.swift at build time (the checked-in Sources/*_generated.swift is stale).
submodules/TelegramCore/Sources/ApiUtils/RichText.swift Api.RichText.textCustomEmoji ⇄ Swift, lossless both ways.
submodules/InstantPageUI/Sources/InstantPageTextItem.swift (attributedStringForRichText) Emits a single placeholder char carrying ChatTextInputAttributes.customEmoji (a ChatTextInputTextCustomEmojiAttribute) + a CTRunDelegate sized to the font line height (font.ascender font.descender + 4·pointSize/17 ≈ 24pt @17pt).
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift (line-breaker) Collects per-line InstantPageTextLine.emojiItems; overwrites each placeholder char's characterRect with a full cell (width = itemSize) so it feeds the reveal cost map.
submodules/InstantPageUI/Sources/InstantPageRenderer.swift (InstantPageV2View) Owns the InlineStickerItemLayers: updateInlineEmoji (create/reuse/remove/position), updateEmojiReveal (reveal-driven pop-in), updateEmojiVisibility + propagateVisibilityRect. Layers attach to each text view's emojiContainerView.

Non-obvious invariants

  • flatc casing/required gotchas. Edit RichText.fbs, not the generated Swift. Scalars (long) cannot be (required) — only strings/tables can. A union member RichText_CustomEmoji generates the Swift enum case .richtextCustomemoji (everything after the suffix's first letter is lowercased); the table type stays TelegramCore_RichText_CustomEmoji and field accessors keep .fbs casing (value.fileId). See the flatbuffers-codegen memory.
  • ChatTextInputTextCustomEmojiAttribute is reused end-to-end (display layer ⇄ layout model). The attribute is written to the placeholder in attributedStringForRichText and read back by the V2 line-breaker under the SAME key (ChatTextInputAttributes.customEmoji); InlineStickerItemLayer.init consumes it directly and resolves the file lazily from fileId.
  • Emoji participates in the streaming reveal. Its placeholder char's characterRect is overwritten to a full cell (width = itemSize), so the width-based cost map charges it like other content. updateEmojiReveal pops the layer in (alpha 0→1 + scale) when charIndexInItem < currentRevealCharacterCount; unrevealed → opacity 0.
  • Inline emoji/images are CENTERED on the font line box, NOT baseline-aligned, and do NOT inflate the line. The line-breaker keeps lineAscent = fontLineHeight (only formulas grow it) and places each attachment at baselineY fontLineHeight/2 size/2, so it bleeds symmetrically about the line box instead of doubling the line height and shoving the text baseline down (the prior lineAscent = emoji.size behavior was a regression from V1 layoutTextItemWithString, which centers via (fontLineHeight imageHeight)/2). Custom emoji are sized to ≈ the line box (size = font.ascender font.descender + 4·pointSize/17) so they fit the true-font-height item box (see "InstantPage V2 text item height") with minimal bleed. Mirrors the chat InteractiveTextComponent. The cell's characterRect is centered the same way (y = fontLineHeight/2 size/2) so the reveal mask (renderer: y = minY + lineAscent rect.maxY) tracks it; a tall attachment grows extraDescent so the next line isn't overlapped. Three things must stay in lockstep: the display frame, the characterRect, and extraDescent.
  • Inline-attachment x must be the LEADING edge, computed RTL-safely via v2LeadingOffsetForRange (InstantPageV2Layout.swift). An attachment's left edge is min(CTLineGetOffsetForStringIndex(start), CTLineGetOffsetForStringIndex(end)) — NOT the bare start-index offset. CTLineGetOffsetForStringIndex at the start index returns the glyph's LEFT edge in LTR but its RIGHT edge in RTL (string index increases leftward), so the old single-offset form (…, range.location, nil) shoved emoji/images/formulas ~one advance (≈ the attachment width) too far right on RTL lines — e.g. an emoji in an Arabic thinking-block line, while the CoreText-drawn text stayed correct. The helper mirrors Display.TextNode's addEmbeddedItem (incl. directional-boundary secondary-offset handling) and the strikethrough/underline/marked/spoiler decorations in this same file, which already used the min/abs form. For pure-LTR lines it returns exactly the start-index offset, so LTR is byte-identical. Applies to all 5 attachment sites: the emoji/image/formula display frames AND the emoji/image characterRect (reveal mask). The widths stay the fixed size/rendered.size values (the run-delegate advance), only the x is corrected.
  • Layers sit ABOVE the reveal mask. They attach to InstantPageV2TextView.emojiContainerView (a sibling above renderContainer), NOT inside it — so the reveal mask wipes glyphs while emoji pop in independently. Adding a CTRunDelegate-glyph to the mask would clip-wipe them instead.
  • Layers are owned by InstantPageV2View, not the text view. Keyed by InlineStickerItemLayer.Key(id: fileId, index: occurrence). The pageView is now REUSED across stableVersion bumps (see streaming section), so the inline-emoji dict PERSISTS across chunks; updateInlineEmoji prunes stale keys (emoji whose blocks have been removed) and creates/repositions layers for new or unchanged emoji each update pass.
  • visibilityRect gates looping; nil means "not visible". The bubble's visibility override pushes a full-width sub-rect to the root pageView.visibilityRect, re-pushed in the apply closure after pageView.frame is set. propagateVisibilityRect converts the rect into each nested V2View's coordinate space (self.convert(_:to:)) for details bodies / table cells+title, fanning out via each child's didSet.
  • CTRunDelegate extent buffers must be freed. Every inline-attachment arm (.image/.formula/.textCustomEmoji) in attributedStringForRichText allocates an extentBuffer; the dealloc callback must deallocate() it (it re-runs per layout pass).

RichText.textMention, .textMentionName(text:peerId:), .textHashtag, .textCashtag, .textBotCommand, .textBankCard, .textAutoUrl, .textAutoEmail, .textAutoPhone render the message-entity flavors of rich text inside rich-data bubbles with full tap interaction mirroring ChatMessageTextBubbleContentNode. Covers API parsing, Postbox + FlatBuffers serialization, display, and tap routing. (textDate/textSpoiler remain unimplemented — .plain("").)

Where things live

File Responsibility
submodules/TelegramCore/Sources/SyncCore/SyncCore_RichText.swift The 9 enum cases (each wraps text: RichText; textMentionName adds raw peerId: Int64) + Postbox coding (discriminators 1826, wrapped text under key "t", mention-name peerId under "mn.p"), ==, plainText, FlatBuffers codec.
submodules/TelegramCore/FlatSerialization/Models/RichText.fbs Union members + tables (RichText_MentionName adds peerId:long). Source of truth — same flatc gotchas as the custom-emoji section above.
submodules/TelegramCore/Sources/ApiUtils/RichText.swift Api.RichText ⇄ Swift, lossless. textMentionName carries userIdpeerId.
submodules/InstantPageUI/Sources/InstantPageTextItem.swift (attributedStringForRichText) Display: auto url/email/phone reuse the InstantPageUrlItem (url:) path; the six entity cases push .link(false), recurse, then attach the matching TelegramTextAttributes.* key over the produced range.
submodules/TelegramUI/Components/Chat/ChatMessageRichDataBubbleContentNode/... Tap routing: entityForTapLocation reads the attribute dict at the tapped point; entityTapContent maps keys → ChatMessageBubbleContentTapAction.Content.

Non-obvious invariants

  • Display attaches the same TelegramTextAttributes.* keys the chat text bubble uses; the bubble reads them back. Contract: textMentionPeerTextMention (String); textMentionNamePeerMention (TelegramPeerMention, peerId built as EnginePeer.Id(namespace: Namespaces.Peer.CloudUser, …)InstantPageTextItem imports TelegramCore but NOT Postbox, so bare PeerId is out of scope); textHashtag AND textCashtagHashtag (TelegramHashtag; no dedicated cashtag key/tap-action — the leading $ distinguishes them); textBotCommandBotCommand; textBankCardBankCard. Auto url/email/phone go through the URL path (mailto:/tel:/raw), NOT an entity key.
  • linkSelectionRects and the bubble tap path check all six interactive keys (URL + the five entity keys), not just URL, so press-highlight and the link-loading shimmer cover entities too.
  • Rich-data text selection must reach a line's trailing edge. This is general to rich-data selection, not just entities: InstantPageTextItem.attributesAtPoint(_:orNearest:)'s orNearest: true (selection-drag) path returns line.range.upperBound (via CTLineGetStringRange) when the point is at/past lineFrame.maxX. TextSelectionNode uses that index as the exclusive upper bound, so clamping to the last character's index — as the orNearest: false hit-testing path correctly does — would leave the last character/item of every line unselectable. Mirrors Display.TextNode. Do not collapse the two orNearest paths back together.

Markdown send: entity vs. rich detection

On message send, the app auto-decides: if the typed markdown maps onto the regular message-entity set (bold/italic/code/strikethrough/spoiler/links/blockquote/fenced-code) it sends a normal message via the existing entity path; if it contains structure the entity set can't represent it sends a rich message (RichTextMessageAttribute carrying an InstantPage, rendered by ChatMessageRichDataBubbleContentNode). Always-on (no flag). Effective rich triggers are headings, lists, and tables only.

Where things live

File Responsibility
submodules/BrowserUI/Sources/BrowserMarkdown.swift The classifier richMarkdownAttributeIfNeeded(context:text:) (pre-filter markdownMightNeedRichLayout → parse via existing inputRichTextAttributeFromText → block inspection instantPageNeedsRichLayout/blockIsEntityExpressible/richTextIsEntityExpressible), plus the markdown→InstantPage conversion (markdownWebpage, markdownBlocks(from:), markdownBlocksWithGeneratedAnchors).
submodules/TelegramUI/Sources/ChatControllerNode.swift (sendCurrentMessage, ~line 4860) The gate: if !isSpecialChatContents, let attribute = richMarkdownAttributeIfNeeded(context:, text: effectiveInputText.string) routes to the rich branch; the unchanged else is the entity path.

Non-obvious invariants

  • Boundary rule: send rich iff the parse yields an InstantPageBlock with no entity equivalent. Entity-expressible whitelist (→ normal): .paragraph, .preformatted, .blockQuote (empty caption), .anchor, .unsupported, and .divider (--- is too common in casual text to trigger rich). .formula (block and inline) DOES trigger rich, gated by strict math detection (see "Formulas trigger rich messages" below) so casual $ usage ($5-$10, $FOO=$BAR) stays plain. So effective triggers = headings, lists, tables, formulas.
  • Approach A (parse-then-inspect): the classifier reuses the real parser, so "what triggers rich" can't drift from "what the rich renderer shows." markdownMightNeedRichLayout is a cheap necessary-condition over-approximation — it may over-trigger a parse but must never false-negative. It detects #, list markers, dash-lines (-{1,}, which also catches setext-H2 underlines → heading blocks), \n= (setext H1), |, ![, and math delimiters $/\(/\[ (formulas now trigger rich; the strict detection step decides whether a $ run is actually math).
  • Chat vs. document path = file == nil / context.documentURL == nil. inputRichTextAttributeFromText passes file: nil; the document-attachment path passes a real file. Two chat-only behaviors key off this: (a) generated heading anchors are skipped (markdownBlocksWithGeneratedAnchors runs only for documents — anchors exist for intra-document #slug links and otherwise prepend a spurious invisible .anchor block per heading); (b) a level-1 # heading maps to .heading(text:, level: 1), not .title (the document/article-title treatment). H2H6 → .heading(level: 2…6) for both paths. This converter only ever emits .title (H1-doc) or .heading — never .header/.subheader.
  • The classifier is fed the RAW effectiveInputText.string, not the post-convertMarkdownToAttributes inputText, so inline **bold** survives into the rich render. The entity branch still uses the converted inputText.
  • Bypassed for .customChatContents (business links / quick replies) via isSpecialChatContents. The compose/send gate lives here; editing has its own symmetric re-classification — see "Editing rich messages" below.
  • Transmission: RichTextMessageAttributeApi.InputRichMessage via messages.sendMessage(richMessage:) (flag bit 23, StandaloneSendMessage.swift); recipients reconstruct it from the incoming richMessage field (StoreMessage_Telegram.swift). The rich branch sends text: "" + the attribute, nils mediaReference (no separate webpage preview), and bypasses 4096-char chunking. iOS < 15 / oversize markdown → inputRichTextAttributeFromText returns nil → entity path (which chunks).

Editing rich messages (InstantPage → markdown)

Rich messages (RichTextMessageAttribute, text == "") are made editable by reconstructing markdown source from the stored InstantPage, populating the editor with it, and re-classifying on save — the inverse of the send path above. Always-on (no flag). Images/videos are out of scope (skipped by the converter).

Where things live

File Responsibility
submodules/BrowserUI/Sources/InstantPageToMarkdown.swift markdownStringFromInstantPage(_:) — the inverse converter (block + inline + list + table + escaping). Pure, best-effort, never fails.
submodules/TelegramUI/Sources/Chat/ChatControllerLoadDisplayNode.swift setupEditMessage: rich message → reconstruct markdown into the edit field. editMessage (save): re-classify the raw input, route rich-or-plain.
submodules/TelegramStringFormatting/Sources/InstantPagePreviewText.swift previewText() extensions (RichText/InstantPage*) — one-line plaintext previews.
submodules/TelegramStringFormatting/Sources/MessageContentKind.swift messageContentKind returns .text(instantPage.previewText()) for rich, cascading to all preview surfaces.

Non-obvious invariants

  • The converter emits CommonMark inline, NOT the entity-regex dialect. **bold**, *italic*, `code`, ~~strike~~, [text](url) — because re-send re-parses the text through the rich path (richMarkdownAttributeIfNeededNSAttributedString(markdown:), Apple CommonMark), not convertMarkdownToAttributes (whose dialect is __italic__/||spoiler||). The two parsers disagree on __/*; the rich round-trip is the contract.
  • Re-classify every edit (edit ≡ send). editMessage runs the same richMarkdownAttributeIfNeeded on the edit field's attributed text (so reattached custom emoji round-trip — see the custom-emoji section). Rich → pendingUpdateMessageManager.add(text: "", entities: nil, richText: attr, …); else the unchanged plain path. So normal→rich (add a table) and rich→plain (drop all triggers) both work. Bypassed for .customChatContents.
  • Change-detection compares the rich attribute. The save guard adds currentRichText != richTextAttribute (rich branch — skips no-op rich edits) and currentRichText != nil (plain branch — so rich→plain still saves even when text.string looks unchanged). RichTextMessageAttribute is Equatable on instantPage.
  • The text.length == 0 early-return guard is safe for rich. convertMarkdownToAttributes only rewrites inline tokens, never strips #/-/|, so a rich message's markdown source stays non-empty and passes; the rich branch then sends text: "".
  • Known limitation: a rich→plain edit that leaves only inline-formatted text loses *italic* (the entity path recognizes only __…__). Rare edge; the rich round-trip contract holds.
  • previewText() lives in TelegramStringFormatting, not TextFormat/TelegramCore. It will gain a strings: PresentationStrings param (to localize the "Photo"/"Video"/"Table" placeholders), so it must sit in a UI-string module — messageContentKind/descriptionStringForMessage (same module) already take strings:. Teaching messageContentKind about rich cascades the preview to the edit accessory panel, reply/pinned panels, and forward preview in one place (those surfaces need no individual change).

Copying rich messages as markdown (whole message + partial selection)

Rich messages (RichTextMessageAttribute, text == "") are copyable as markdown two ways: the context-menu Copy action copies the whole message; a text selection inside the rich-data bubble copies just the selected range. Both reconstruct markdown that mirrors the edit round-trip (markdownStringFromInstantPage). Always-on.

Where things live

File Responsibility
submodules/TelegramUI/Sources/ChatInterfaceStateContextMenus.swift Whole-message Copy. Computes richMessageMarkdown from the message's RichTextMessageAttribute.instantPage (after let message = messages[0]), opens the Copy gate with richMessageMarkdown != nil, and short-circuits copyTextWithEntities to storeMessageTextInPasteboard(markdown, entities: nil).
submodules/BrowserUI/Sources/InstantPageToMarkdown.swift markdownStringFromInstantPage — the block-tree → markdown converter (also used by the edit round-trip). Blocks joined by \n\n; nested blockquotes via recursive > wrapping.
submodules/InstantPageUI/Sources/InstantPageTextItem.swift InstantPageMarkdownBlockContext (kind + quoteDepth) and the markdownContext: InstantPageMarkdownBlockContext? field on InstantPageTextItem.
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift stampMarkdownContext/bumpQuoteDepth; stamps markdownContext during layout (heading/title/code/list/blockQuote/layoutQuoteText/table-cell).
submodules/InstantPageUI/Sources/InstantPageMultiTextAdapter.swift markdownForRange(_ range: NSRange) + the private attributed-substring→inline-markdown converter inlineMarkdown(from:).
submodules/TelegramUI/Components/Chat/ChatMessageRichDataBubbleContentNode/.../ChatMessageRichDataBubbleContentNode.swift Intercepts .copy in the TextSelectionNode performAction closure: textSelectionNode.getSelection()adapter.markdownForRange(range) → stores as plain NSAttributedString(string:).

Non-obvious invariants

  • The V2 layout discards block role. A .text layout item from an H2 heading is byte-identical to a body paragraph — heading level and the title category are dropped with no back-reference to the source InstantPageBlock. Precise structural markdown for a selection therefore requires stamping markdownContext at layout time (lists/code/tables/details are structurally recoverable; heading level and .title are not, so they MUST be stamped). Plain paragraphs stay nil (≡ plain).
  • quoteDepth is orthogonal to kind so a heading/list/code line inside a blockquote round-trips (e.g. > ## Title). bumpQuoteDepth lifts a quote's children by 1; nested quotes accumulate. layoutQuoteText (single-paragraph blockquote fast path AND .pullQuote) bumps once — it is never reached by the multi-block recursion, so no double-count.
  • A blockquote is exploded into one text item per line. markdownForRange must re-coalesce a run of consecutive quoteDepth > 0 segments into ONE \n-joined block (each line prefixed at its own depth); otherwise every quote line becomes its own block separated by a blank line. Code/table/list runs are likewise coalesced (one fence; one pipe table; one tight list).
  • Both converters emit compact nested-quote markers (>>, not > >). Selection: String(repeating: ">", count: depth) + " ". Whole-message: when wrapping a line that already starts with >, prepend a bare >. Keep the two in sync.
  • Inline markdown is read from display attributes, not the RichText tree. inlineMarkdown inspects the slice's UIFont (bold/italic/mono — font-based, no symbolic-trait flag for named fonts), .strikethroughStyle, and TelegramTextAttributes.URL (→ InstantPageUrlItem.url, angle-bracketed if it contains (/)/space). Custom-emoji placeholders now emit the [<alt>](tg://emoji?id=…) marker from the display attribute's fileId (alt is best-effort — the display placeholder may be a bare space; see the custom-emoji round-trip section).
  • .copy stores plain text. Passing NSAttributedString(string: markdown) through the existing performTextSelectionAction(.copy) path (storeAttributedTextInPasteboard) generates no entities, so the literal **/#/>/| survive. The whole-message Copy uses storeMessageTextInPasteboard(_, entities: nil) directly.
  • Fidelity caveats (intentional): custom emoji are now preserved as [<alt>](tg://emoji?id=…) markers (selection copy uses a best-effort alt — see the custom-emoji round-trip section below); ordered list + checkbox loses the ordinal (- wins); a partial table selection emits touched cells as rows (no forced header --- separator); block prefixes apply to the whole touched line on a mid-line selection (correct markdown).

Custom emoji in markdown messages (send + edit/copy/paste round-trip)

Custom emoji typed into the compose field survive when a message is sent as a rich message (heading/list/table/formula), rendering as RichText.textCustomEmoji (the display side is the "Inline custom emoji" section above). The carrier across Apple's CommonMark parser is a shared markdown-link marker [<alt>](tg://emoji?id=<fileId>), used identically by the forward (send) and reverse (edit/copy/paste) paths so encode and decode cannot drift. Always-on. Scope: only rich messages — a custom emoji alone never forces a rich message (it stays on the entity path as a .CustomEmoji entity, the pre-existing behavior).

Where things live

File Responsibility
submodules/TextFormat/Sources/CustomEmojiMarkdownMarker.swift The marker format — single source of truth: customEmojiMarkdownURL(fileId:), parseCustomEmojiFileId(fromMarkdownURL:), escapeCustomEmojiMarkdownAlt(_:), and chatInputTextWithReattachedCustomEmoji(_:) (markers → live customEmoji attributes). In TextFormat so both BrowserUI and InstantPageUI can import it.
submodules/BrowserUI/Sources/BrowserMarkdown.swift Forward: markdownSourceInjectingCustomEmojiMarkers rewrites each customEmoji run into the marker; richMarkdownAttributeIfNeeded(context:attributedText:) (signature changed from text:); the marker-URL intercept in markdownInlineContent.textCustomEmoji.
submodules/BrowserUI/Sources/InstantPageToMarkdown.swift Reverse (whole-message copy + edit reconstruction): .textCustomEmoji → emit the marker.
submodules/InstantPageUI/Sources/InstantPageMultiTextAdapter.swift Reverse (text-selection copy): emit the marker from the display attribute's fileId (alt best-effort).
submodules/TelegramUI/Sources/ChatControllerNode.swift, …/Chat/ChatMessageDisplaySendMessageOptions.swift Send + send-options-preview call sites pass the NSAttributedString (effectiveInputText / textInputView.attributedText); the rich send now passes inlineStickers.
submodules/TelegramUI/Sources/Chat/ChatControllerLoadDisplayNode.swift Edit-load (setupEditMessage) reattaches markers via chatInputTextWithReattachedCustomEmoji; edit-save (editMessage) re-classifies the attributed edit text.
submodules/TelegramUI/Components/Chat/ChatTextInputPanelNode/Sources/ChatTextInputPanelNode.swift Paste (chatInputTextNodeShouldPaste) reattaches plain-text markdown markers → live emoji.

Non-obvious invariants

  • One shared marker, one set of helpers. All emit sites (forward normalize, reverse copy/edit, selection copy) use customEmojiMarkdownURL + escapeCustomEmojiMarkdownAlt; the forward intercept and both reattach sites use parseCustomEmojiFileId. The marker is internal/transient — it exists only in the rich-conversion source string and on the clipboard, never persisted as a URL entity.
  • CommonMark preserves the tg://emoji?id=N link URL verbatim under the NSLink attribute (spike-verified). markdownLink's as? NSURL branch returns url.absoluteString, which parseCustomEmojiFileId matches by strict prefix. Negative (signed Int64) file ids survive too (the reattach regex is (-?\d+)).
  • Scope guard is structural. markdownSourceInjectingCustomEmojiMarkers works on a LOCAL copy — effectiveInputText is never mutated. A marker is an entity-expressible link, so an emoji-only message classifies not-rich (markdownMightNeedRichLayout finds no #/|/![/$/list tokens) and takes the entity path; the untouched customEmoji attribute becomes a .CustomEmoji entity.
  • richMarkdownAttributeIfNeeded now takes attributedText: NSAttributedString (was text: String); it normalizes to the marker'd source internally, then calls the unchanged inputRichTextAttributeFromText(text:). All three call sites (send, edit-save, send-options preview) pass the attributed string.
  • Edit-load AND paste reattach to live attributes; copy stays textual. setupEditMessage and chatInputTextNodeShouldPaste run chatInputTextWithReattachedCustomEmoji so the field shows the animated emoji, not raw token text. The paste branch is guarded by .contains("tg://emoji?id=") AND reattached.string != plainText, and runs only after the rich pasteboard types miss — private.telegramtext/RTF already decode the indexed tg://emoji?id=<id>&t=<n> RTF-link form via chatInputStateStringFromRTF. previewText() is unchanged (keeps the alt glyph).
  • Empty alt → a space. CommonMark drops [](url) (no run carries the link attribute), which would silently lose the emoji; every emit site and the reattach substitute a space when the alt is empty.
  • Rich send attaches inlineStickers (was [:]) + bubble-up packs, so the local store has the files. OPEN runtime risk: the wire send uses Api.InputRichMessage.documents: nil (apiInputRichMessage() in SyncCore_RichTextMessageAttribute.swift), so recipient rendering depends on the server back-filling documents from the embedded documentId — UNVERIFIED. If recipients see only the fallback glyph, populate documents: there.
  • Accepted limitations: edit-load reattaches with file: nil (renders via lazy fileId resolution, but the premium-emoji gate is bypassed on edit); an alt containing a literal ] won't reattach on edit-load (cosmetic — re-save still parses it); parseCustomEmojiFileId (strict prefix) vs Pasteboard.swift's URLComponents parse could drift if the marker format ever changes.

Formulas trigger rich messages (strict math detection)

$…$/$$…$$ (and \(…\)/\[…\]) math triggers a rich message, gated by a strict boundary rule so casual $ stays plain. Inverse companion of the markdown-send gate above.

Non-obvious invariants

  • Inline $…$/$$…$$ detection requires a 4-way boundary (in markdownReplacingInlineFormulas, BrowserMarkdown.swift): outer side of each delimiter = line edge OR non-alphanumeric; inner side = non-whitespace; opener/closer $-counts must match (1 or 2). This is what rejects $5-$10/$FOO=$BAR/cost$5$total (alphanumeric outer) while keeping $x$, ($x$), the answer is $x$.. The outer check is the addition over a plain "no-space-inside" rule.
  • Block $$ detection (markdownBlockFormulaReplacement): single-line $$…$$ requires an exact $$ opener (not $$$) and trailing whitespace only; multi-line requires a bare $$ opener line. $$x$$ trailing text falls through to the inline rule. The \[…\] opener path is unchanged and exempt from these $$-only guards.
  • Detection is shared with the document path; the gate is chat-only. markdownPreparedSource (detection) runs for both chat and document attachments. The triggers (richTextIsEntityExpressible/blockIsEntityExpressible.formula is non-expressible; $/\(/\[ in markdownMightNeedRichLayout) are read only by the chat classifier richMarkdownAttributeIfNeeded.

InstantPageListItem task-list checkboxes (- [ ] / - [x])

InstantPageListItem carries a first-class checked: Bool? — the third associated value of .text(RichText, String?, Bool?) / .blocks([InstantPageBlock], String?, Bool?), orthogonal to the ordered-list num — representing a GitHub-style task-list checkbox. nil = not a checkbox item, false = unchecked, true = checked. Covers markdown parse, Postbox + FlatBuffers serialization, Telegram API transmission, display (V1 + V2), the edit round-trip, and previews.

Spec: docs/superpowers/specs/2026-05-27-instantpage-list-checkbox-design.md. Plan: docs/superpowers/plans/2026-05-27-instantpage-list-checkbox.md.

Where things live

File Responsibility
submodules/TelegramCore/Sources/SyncCore/SyncCore_InstantPage.swift The checked: Bool? enum payload; Postbox coding (key "ck", tri-state Int32); ==; FlatBuffers codec. Internal tri-state helpers checkedFromTriState/triState(fromChecked:).
submodules/TelegramCore/FlatSerialization/Models/InstantPageBlock.fbs checkState:int32 (id: 2) on InstantPageListItem_Text + _Blocks. Source of truth; the Bazel flatc genrule regenerates the Swift (checked-in *_generated.swift is stale).
submodules/TelegramCore/Sources/ApiUtils/InstantPage.swift checked / num accessors; reads & writes the API checkbox=flags.0 / checked=flags.1 bits via checkedFromApiFlags / apiFlags(fromChecked:) across all four list-item types.
submodules/BrowserUI/Sources/BrowserMarkdown.swift Forward parse: markdownTaskListMarker detects [ ]/[x]/[X]; the result routes into checked (NOT num).
submodules/BrowserUI/Sources/InstantPageToMarkdown.swift Reverse: emits - [ ] / - [x] from item.checked for the edit round-trip.
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift V2 detection via item.checked; .checklist(checked:colors:) marker carrying InstantPageV2CheckboxColors.
submodules/InstantPageUI/Sources/InstantPageRenderer.swift V2 marker view (InstantPageV2ListMarkerView) hosts a real CheckNode.
submodules/InstantPageUI/Sources/InstantPageLayout.swift V1 detection via item.checked (renders the existing InstantPageChecklistMarkerItem).
submodules/TelegramStringFormatting/Sources/InstantPagePreviewText.swift previewText() renders a /☑︎ glyph + body for checkbox items.

Non-obvious invariants

  • checked is orthogonal to num. The API keeps checkbox/checked as flags separate from the list number, so an ordered item can be both numbered AND a checkbox. This is exactly why the first-class field replaced an earlier sentinel-string-in-num prototype (which could not represent both). No \u{001f}tg-md-task:* sentinel remains anywhere.
  • API bits are checkbox=flags.0, checked=flags.1 on ALL FOUR list-item constructors (pageListItemText/Blocks and pageListOrderedItemText/Blocks, in and out — pageListItemText#2f58683c, pageListOrderedItemText#cd3ea036, etc.). The iOS Api.* layer exposes only flags: Int32; mask the bits (apiFlags(fromChecked:) / checkedFromApiFlags). Because state rides the flags (not the text), it survives the server round-trip for sender + recipients — including the sender's own send-confirmation echo (applyUpdateMessage replaces local attributes with the server's reconstruction, ApplyUpdateMessage.swift).
  • Tri-state persistence 0=nil, 1=unchecked, 2=checked in BOTH Postbox (key "ck", decoded with decodeInt32ForKey(orElse: 0)) and FlatBuffers (checkState:int32, default 0). Absent/0 → nil, so pre-existing stored pages decode unchanged.
  • Detection reads item.checked != nil in both layout engines (was instantPageTaskListMarkerState(item.num)); the V2 marker kind is .checklist(checked: item.checked == true, colors:). The empty-blocks .blocks → .text(.plain(" "), num, checked) promotion must carry checked through, not drop it.
  • V2 CheckNode is hosted directly in a plain UIView, not an ASDisplayNode tree, so checkNode.displaysAsynchronously = false is set to avoid a first-draw blank flash. (The V2 pageView is now REUSED across streaming chunks via stable-id diffing — see the AI streaming section; CheckNode views survive across chunks as long as their list item is present.) InstantPageV2CheckboxColors (background←panelAccentColor, stroke←pageBackgroundColor, border←controlColor) is carried on the .checklist payload and mirrors the V1 instantPageChecklistMarkerTheme.
  • Forward parser keeps [ ] detection but routes to checked. markdownApplyTaskListMarker/markdownStrippingTaskListMarker/markdownTaskListMarker still strip the marker from the item text; the state flows into checked while ordered items keep their real "\(ordinal)" number. The reverse converter emits lowercase [x] / [ ], which the forward hasPrefix guards re-parse — that is the round-trip contract.
  • The enum-arity change is compile-enforced. Adding the third associated value broke every .text/.blocks construction/destructure; the full build is the completeness gate. Read-only consumers outside the core set exist (BrowserInstantPageContent.swift, CachedFaqInstantPage.swift) — grep \.(text|blocks)\( repo-wide when touching the enum again.

InstantPageBlock.blockQuote nested blocks

InstantPageBlock.blockQuote carries (blocks: [InstantPageBlock], caption: RichText) — a sequence of nested page blocks (paragraphs, headings, lists, code, even nested quotes), not the legacy text-only payload. .pullQuote is unchanged (still (text: RichText, caption: RichText); the TL API has no pullQuoteBlocks constructor).

Spec: docs/superpowers/specs/2026-05-29-instantpage-blockquote-blocks-design.md.

Where things live

File Responsibility
submodules/TelegramCore/Sources/SyncCore/SyncCore_InstantPage.swift Enum case shape; Postbox coding (legacy "t" lift → new "b" object array); equality (array-aware, mirrors .collage); FlatBuffers codec.
submodules/TelegramCore/FlatSerialization/Models/InstantPageBlock.fbs InstantPageBlock_BlockQuote: text (now optional, legacy fallback) + caption (required) + new blocks:[InstantPageBlock] (id: 2). Source of truth; Bazel regenerates the *_generated.swift.
submodules/TelegramCore/Sources/ApiUtils/InstantPage.swift Parse both pageBlockBlockquote (lift text→[.paragraph]) and pageBlockBlockquoteBlocks; encode legacy-when-possible.
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift layoutBlockQuote(blocks:…) recurses into children; legacy single-paragraph fast path delegates to layoutQuoteText (the renamed shared text core, also used by .pullQuote).
submodules/InstantPageUI/Sources/InstantPageLayout.swift V1 .blockQuote arm recurses via layoutInstantPageBlock(...); same single-paragraph fast path.
submodules/BrowserUI/Sources/BrowserMarkdown.swift Forward: one quote carrying all child blocks. Entity-expressibility gate (below).
submodules/BrowserUI/Sources/InstantPageToMarkdown.swift Reverse: markdownBlockQuoteBlocks(_:) recurses per child and prefixes > per line.
submodules/TelegramStringFormatting/Sources/InstantPagePreviewText.swift Concatenates child previewText()s + caption.

Non-obvious invariants

  • Legacy shapes lift to [.paragraph(text)] at every decode boundary. API pageBlockBlockquote, the Postbox "t" key (old cached pages), and the FlatBuffers text field (now optional) each lift into a single-paragraph blocks array. New writes emit only blocks ("b" / the FB vector). So pre-existing stored pages and older senders decode unchanged.
  • Outbound stays on the legacy wire constructor when the shape allows. apiInputBlock() emits pageBlockBlockquote for empty or single-.paragraph quotes (so older recipients understand the common chat case) and pageBlockBlockquoteBlocks only for genuinely nested quotes.
  • Both renderers share one text core for the single-paragraph fast path. layoutQuoteText (V2; the function formerly named layoutBlockQuote, isPull: distinguishes pull vs block) and the V1 fast-path branch keep the legacy italicized-body styling; nested children render with their own normal category styling.
  • Nested children use a FIXED 10pt inter-child gap, not spacingBetweenBlocks. The full page-flow spacing (~27pt around quotes) is too airy when nested, and 0 is too tight. childSpacing = 10.0 lives in both layout files; the first child hugs the container's verticalInset (no leading gap). Combined with a nested quote's own 4pt top inset this gives ~14pt effective separation.
  • Entity-expressibility: a quote is entity-expressible (→ regular message path) only if its caption is empty AND every child is an entity-expressible .paragraph. A nested-structure or multi-paragraph quote is not, so it sends via the rich path. Behavior change: markdown > p1\n>\n> p2 is now ONE quote with two paragraphs (rich) rather than two consecutive entity quotes — correct semantics.
  • The enum-arity change is compile-enforced across all modules; the full Bazel build is the completeness gate (no per-module build). CachedFaqInstantPage.swift matches case .blockQuote: payload-less and needs no edit. BrowserReadability.swift constructs .blockQuote(blocks: [.paragraph(.italic(...))], …) and is easy to miss in the spec's file list — grep \.blockQuote( repo-wide when touching the case again.

InstantPage thinking blocks (InstantPageBlock.thinking)

InstantPageBlock.thinking(RichText) renders server-sent reasoning as dimmed, continuously-shimmering text inside rich-data bubbles. V2 renderer only; V1 ignores the block (returns []). The shimmer and fade-in mechanics are deliberately separate from the char-reveal cursor so thinking blocks do not affect the reveal pacing of the answer content that follows them.

Where things live

File Responsibility
submodules/InstantPageUI/Sources/InstantPageV2Layout.swift InstantPageV2ThinkingItem layout item + layoutThinking(...) (paragraph color × 0.55 alpha for the dimmed style) + layoutBlock .thinking arm.
submodules/InstantPageUI/Sources/InstantPageRenderer.swift InstantPageV2ThinkingView — a ShimmeringMaskView wrapping a private inner InstantPageV2TextView; InstantPageV2StableItemId.thinking(Int) stable-id namespace; makeItemView/reuse/stableId arms for the .thinking item kind; the two-counter (content + thinking) stable-id loop in InstantPageV2View.update.
submodules/InstantPageUI/Sources/InstantPageV2RevealCost.swift .thinking(start:) cost entry: contributes zero cursor cost; triggers whole-block alpha fade-in when revealedCount >= start.
submodules/InstantPageUI/Sources/InstantPageLayout.swift V1 has no explicit .thinking case — it falls through layoutInstantPageBlock's default: to an empty layout (no-op).

Non-obvious invariants

  • Zero reveal cost is the linchpin. Thinking blocks do not advance the width-based cursor, so the answer's reveal position is identical whether or not thinking blocks are present — and is unaffected as they appear and disappear across streaming chunks. The answer text always reveals at the same rate regardless of how much thinking precedes it.
  • Whole-block fade, not char reveal. The inner text is drawn fully under the shimmer mask at all times; the reveal mechanism is a simple alpha visibility keyed to the block's start index. A top-of-page thinking block (start == 0) is visible from the very first frame.
  • Shimmer runs continuously while the view is displayed via ShimmeringMaskView's HierarchyTrackingLayer self-animation. It does not stop when streaming ends.
  • Top-level only; separate stable-id namespace. Thinking blocks appear only at the top level of the page. They use the InstantPageV2StableItemId.thinking(Int) namespace, numbered by a counter independent of content blocks. This means adding or removing a thinking block never renumbers the stable ids of content blocks — which, combined with pageView reuse, ensures content views and reveal state persist as thinking blocks come and go across chunks.
  • V1 is a no-op. InstantPageLayout.swift has no .thinking case; the block falls through layoutInstantPageBlock's default: to an empty layout, so V1 rendering silently skips it.

Tapping a fragment-only link ([Jump](#section)) inside a rich-data bubble scrolls the chat so the matching in-message anchor lands ~8pt below the content-area top, expanding any enclosing collapsed <details> first. Anchors come from server/AI-sent InstantPages only — block-level InstantPageBlock.anchor(name) or inline RichText.anchor over a heading/paragraph; the markdown compose path deliberately skips generating heading-slug anchors for chat (markdownBlocksWithGeneratedAnchors runs only for documents), so user-typed messages have no anchors. The whole downstream scroll chain (ChatControllerInteraction.scrollToMessageIdWithAnchorChatMessageBubbleItemNode.getAnchorRecthistoryNode.scrollToMessage(.bottom(anchorY))) pre-existed; this feature fills the two bubble-side seams that were stubbed.

Where things live

File Responsibility
submodules/InstantPageUI/Sources/InstantPageRenderer.swift InstantPageV2View.anchorFrame(name:) (live-layout frame walk, mirrors findTextItem; handles .text/.codeBlock/.thinking/.details/.table) + firstCollapsedDetails(forOrdinalPath:) (maps an ordinal path to the first not-yet-expanded <details>'s live index).
submodules/InstantPageUI/Sources/InstantPageAnchorPath.swift NEW. Pure instantPageAnchorPath(in:name:) model walk → the <details>-sibling-ordinal path to an anchor (nil = absent, [] = outside any details, [2,0] = inside the 3rd top-level details then its 1st nested details) + richTextContainsAnchor.
…/Chat/ChatMessageRichDataBubbleContentNode/… getAnchorRect (delegates to anchorFrame, +8pt top margin); the tapActionAtPoint fragment route + streaming gate; the scrollToAnchor resolve→expand→scroll state machine (pendingScrollAnchor + progress guard); the post-relayout hook.

Non-obvious invariants

  • The ordinal path is mapped to live indices, never reproduced. The layout's detailsIndexCounter (InstantPageV2Layout.swift) is expansion-dependent — a <details> nested inside a collapsed parent has no index until the parent expands and re-lays-out (a collapsed details has innerLayout == nil; its children aren't laid out). So instantPageAnchorPath returns ordinals, and firstCollapsedDetails reads the real index from the live laid-out .details item. Expansion is iterative: expand one collapsed level → requestMessageUpdate → the post-relayout hook re-runs scrollToAnchor → repeat until the anchor resolves via anchorFrame.
  • The model walk's recursion set MUST equal the containers the V2 layout recurses through layoutBlock (and thus counts <details> in via detailsIndexCounter): exactly .blockQuote, .cover, and .list's .blocks items — all of which the layout flattens into the parent items array (only layoutDetails nests a separate innerLayout, which is the level boundary). instantPageAnchorPath recurses those three sharing the inout detailsOrdinal, and treats .details as a new level. It deliberately does NOT recurse .postEmbed/.collage/.slideshow — the V2 layout lays out only their media/caption (never their child blocks), so it never counts a <details> inside them; recursing them would desync the model walk's ordinals from the layout. An anchor inside such a non-laid-out child is unresolvable by anchorFrame anyway, so skipping it is a no-op either way.
  • anchorFrame and the model walk are only ever both consulted when anchorFrame fails. scrollToAnchor first tries anchorFrame (covers everything currently laid out — top level, expanded details, tables, thinking blocks); only on a miss does it consult instantPageAnchorPath. So the only consequential model-walk output is a non-empty path (anchor buried in a collapsed details); nil/[] both no-op.
  • getAnchorRect stays a pure synchronous query. ChatController calls it inside forEachVisibleItemNode; all expansion is orchestrated by scrollToAnchor/pendingScrollAnchor before the scroll fires. The chat scroll consumes only the returned rect's minY.
  • Anchor taps are rejected while the message streams (TypingDraftMessageAttribute) → .none. So pendingScrollAnchor is only ever set post-stream, and the reveal cursor never interacts with anchor scrolling.
  • A fragment-only URL (#…, empty base) is always intercepted — never opened as an external URL. If it resolves → scroll; if not (missing or empty anchor) → no-op (press-highlight only). A real URL carrying a fragment (https://x.com/p#s, non-empty base) keeps the unchanged external-URL handling.
  • The expansion loop terminates via a progress guard (lastExpandedPendingDetailsIndex == collapsedIndex → give up): each relayout pass either resolves+scrolls (clearing pending) or advances to a strictly deeper collapsed <details>.
  • No activate: on the anchor tap action (unlike external-URL taps): anchor scrolling is local and instant, so the link-loading shimmer (makeActivate) would falsely imply network activity. The press-highlight rects are still passed.

"Show more" for partial rich messages (on-demand full page)

A server-sent rich message can arrive partial when the content is long: the RichMessage isPartial flag maps to instantPage.isComplete == false. The bubble then renders the partial page plus an inline "Show more" link; tapping it fetches the full page (once) and expands the bubble in place.

Data model

  • RichTextMessageAttribute (SyncCore_RichTextMessageAttribute.swift) carries the partial instantPage and an optional fullInstantPage: InstantPage? (nil until fetched). The partial page is never replaced — the full page is stored alongside it (encoded/decoded; both in ==).
  • engine.messages.requestFullRichText(id:) (TelegramEngineMessages.swift) requests messages.getRichMessage, then transaction.updateMessage(id,…) sets the existing attribute's fullInstantPage to the fetched complete page (keeping instantPage), and returns the updated attribute. It yields .single(nil) for non-Cloud ids and on network failure (no postbox change).
  • The seed-config merge (SyncCore_StandaloneAccountTransaction.swift) preserves a previously-fetched fullInstantPage if a later server update for the same message arrives without one (same partial instantPage).

Where things live

File Responsibility
…/TelegramCore/Sources/SyncCore/SyncCore_RichTextMessageAttribute.swift The fullInstantPage field (init / encode / decode / ==).
…/TelegramCore/Sources/TelegramEngine/Messages/TelegramEngineMessages.swift requestFullRichText(id:) — fetch + updateMessage to fill fullInstantPage.
…/TelegramCore/Sources/SyncCore/SyncCore_StandaloneAccountTransaction.swift Seed-config merge preserving a fetched fullInstantPage across later updates.
…/Chat/ChatMessageRichDataBubbleContentNode/… The "Show more" link (layout, tap via tapActionAtPoint .custom + updateTouchesAtPoint highlight, TextLoadingEffectView shimmer), the node-local expand state, the effective-page selection, and the downward-expand.
Telegram/Telegram-iOS/en.lproj/Localizable.strings Chat.RichText.ShowMore = "Show more" (→ strings.Chat_RichText_ShowMore).

Non-obvious invariants

  • Expand state is node-local and per-message, NOT derived from the attribute. showMoreExpanded: (messageId, value)? is snapshotted at layout time and resolved against the current item.message.id, so every fresh display of a message starts collapsed (partial) even when its attribute already carries a cached fullInstantPage; only an in-place tap expands, and that expansion survives same-message relayouts. Resolving against the message id makes any other message collapse automatically (no stale-snapshot bug, no manual reset).
  • The bubble renders (showMoreExpanded ? attribute.fullInstantPage : nil) ?? attribute.instantPage — the full page only while expanded — in both the webpage build and layoutInstantPageV2. scrollToAnchor resolves anchors against the same effective page.
  • The link shows only when !showMoreExpanded AND !attribute.instantPage.isComplete (plus the original gates: not streaming via TypingDraftMessageAttribute, id.namespace == .Cloud since requestFullRichText is a no-op otherwise, and not a preview / .messageOptions context). The date/status trails the link's line by substituting the link frame for the last-text-line frame (see the status-node section).
  • showMoreExpanded is part of BOTH layout caches. It is in the currentPageLayout cache key and the pageView content key (pageViewMessageKey). This is required because the cached-expand path (full page already on the attribute) performs no postbox write, so stableVersion does not bump — without the key, the cached partial layout/content would shadow the expand.
  • Tap (activateShowMore): if fullInstantPage is already cached → set expanded + requestMessageUpdate immediately (no network, no shimmer); otherwise shimmer the link and fetch, expanding only once the full page lands. Guards against a second in-flight request and against re-expanding.
  • Expand grows the bubble downward in screen space (top fixed) via info?.setInvertOffsetDirection() on the ListViewItemApply in the apply closure, fired only on the appliedShowMoreExpanded → showMoreExpanded transition (never on first apply). Same mechanism as ChatMessageInteractiveFileNode's audio-transcription expand and the text/fact-check bubbles; the ListView clamps it to what fits.

A rich message's media (images / videos / audio / documents) lives in attribute.instantPage.media, not in message.media (which is empty — rich messages are sent with text: "" and no media reference). To make that media participate in the same shared-media-index, gallery, file-list, playback, download, and save/copy pipelines that normal message.media flows through, there is one shared accessor and a set of opt-in call-site swaps.

The accessor

Message.effectiveMedia: [Media] (+ a delegating EngineMessage.effectiveMedia) in submodules/TelegramCore/Sources/Utils/MessageUtils.swift:

var effectiveMedia: [Media] {
    if !self.media.isEmpty { return self.media }     // normal message: identical to message.media
    if let richText = self.richText { return richText.instantPage.allMedia() }  // rich: the instant-page media
    return self.media
}

Message.richText (same file) is already a typed RichTextMessageAttribute?; InstantPage.allMedia() (SyncCore_InstantPage.swift) recursively gathers media from the page's blocks (audio/collage/cover/details/image/list/slideshow/video) via its [MediaId: Media] dict. For a normal message effectiveMedia == message.media, so swapping a message.media read for message.effectiveMedia is behavior-preserving for non-rich content and only adds the rich media where the site should consider it. Scope is first-media for now (call sites keep their .first / iterate-and-break logic; the helper returns all media but callers stop at the first match — the //TODO:rewrite to take all media markers remain).

Where things live

Layer What
Discovery / index tagsForStoreMessage (StoreMessage_Telegram.swift) indexes rich media into MessageTags (photo/video/gif/voice/file). This is the linchpin: it makes rich messages appear in every tag-queried surface (shared-media tabs, search, downloads) — which is exactly why each rendering-side site below then needs effectiveMedia, or it renders the surfaced message blank.
Extraction helper Message.effectiveMedia (above).
Shared-media grids / rows PeerInfoVisualMediaPaneNode, PeerInfoGifPaneNode, ListMessageItem (row-type selection) + ListMessageFileItemNode (file/music/voice row), ChatListSearchMediaNode (search media grid).
Gallery open + items GalleryController (tagsForMessage + mediaForMessage — the duplicated message.media/message.richText blocks were collapsed into one effectiveMedia loop), GalleryData.chatMessageGalleryControllerData, SecretMediaPreviewController (its own local mediaForMessage), and the gallery item nodes ChatDocumentGalleryItem / ChatExternalFileGalleryItem / ChatAnimationGalleryItem (these re-derive from message.media in node(), so a rich doc/animation rendered blank without the swap) + UniversalVideoGalleryItem secondary affordances + ChatItemGalleryFooterContentNode.
Playback PeerMessagesMediaPlaylist.extractFileMedia (the peer music/voice playlist), OverlayAudioPlayerControllerNode (audio context menu).
Resolution / downloads / cleanup FetchedMediaResource.findMediaResourceById(message:), SyncCore_RecentDownloadItem, StoreDownloadedMedia, DeleteMessages.addMessageMediaResourceIdsToRemove(message:) (rich media was leaking on delete), CollectCacheUsageStats, ChatHistoryListNode (download manager), ChatListSearch{ListPaneNode,ContainerNode}.
Actions ChatInterfaceStateContextMenus (Save-to-Camera-Roll, copy-image, save-audio/music-to-files, debug/premium), ChatControllerNode (post-suggestion media ref), ChatControllerLoadDisplayNode (edit send-validation), ShareController.saveToCameraRoll.

Non-obvious invariants

  • The tag-index change is what creates the work. tagsForStoreMessage surfacing rich messages into tag-queried lists, without the rendering-side effectiveMedia swaps, produces visible blank cells / blank rows / wrong row types. Index and render must move together.
  • The rich message's own in-chat bubble + in-bubble gallery do NOT read message.media — a rich message renders via ChatMessageRichDataBubbleContentNode (InstantPage V2), in-bubble image/video tap opens InstantPageGalleryController (reads the instant page directly), and in-bubble audio uses InstantPageV2AudioContentNode. So the text-bubble / interactive-file / interactive-media nodes' message.media reads are never reached by a rich message and are deliberately left alone.
  • Do NOT route the FORWARD path through effectiveMedia (ChatControllerNode forwardedMessages ~556/560/568). The RichTextMessageAttribute already travels with a forward, so the forwarded copy reconstructs from the attribute; injecting the instant-page media as top-level message.media there would double-render (rich bubble + a separate media attachment). That message.media processing is caption-hiding / poll-stripping only, both irrelevant to rich — left as message.media.
  • Rich messages are edited as reconstructed MARKDOWN, not via the media-caption edit path. So ChatControllerLoadDisplayNode's edit caption-max-length / original-media-reference reads (~1241/1775/4463) stay on message.media — they belong to the .media edit state a rich message never enters. (The send-validation .contains at ~2273 IS swapped, so an edit that leaves only media isn't wrongly rejected.)
  • RichTextMessageAttribute.associatedMediaIds stays [] — intentionally. MessageHistoryTable resolves associatedMediaIds via getMedia(id) in the postbox media table, but rich-message media is embedded inside the attribute blob, not the table — so returning the keys would be a no-op without also inserting the media into the table. The embedded-blob approach is self-contained.
  • fullInstantPage is not indexed (the server doesn't index it either, and it's fetched on demand after store-time). The first media lives in the partial instantPage anyway.
  • Only switch the loop SOURCE, never the per-type branches. Many swapped loops still contain TelegramMediaPoll/TelegramMediaPaidContent/TelegramMediaWebpage branches that rich messages never match — that's fine and intentional; only the for … in <msg>.media source changes.
  • Build-only completeness gate. Every swap is type-identical ([Media][Media]), so the only compile risk is a receiver that is neither Message nor EngineMessage; the full Bazel build is the gate (no per-module build / unit tests). Deferred, NOT done: chat-list/reply/pinned/notification/forward thumbnail previews and the "Photo"/"Video" media-kind labels (messageContentKind/ChatListItemStrings) — those are preview surfaces, not blank-cell breakage — and multi-media (first-media-only is the current scope).