General

Social Comments for Sentiment Analysis with an API (2026)

Pull paginated comments from TikTok, YouTube, Instagram, Reddit, and more as normalized JSON — ready for keyword rules, classifiers, or LLM batch jobs.

Social FetchUpdated

Social Comments for Sentiment Analysis with an API (2026)

Pull paginated comments from TikTok, YouTube, Instagram, Reddit, and more as normalized JSON — ready for keyword rules, classifiers, or LLM batch jobs.

A sponsored caption can read positive while the comment section argues about shipping delays. Sentiment on post text alone misses that split — and on most networks the argument lives behind infinite scroll, different sort orders, and nested reply trees.

This guide walks through comment collection for scoring pipelines: which endpoints to call on TikTok, YouTube, Instagram, and Reddit, how pagination actually works, when to sample instead of draining a thread, and how to hand normalized text to keyword rules or an LLM classifier without double-counting the same reply on every cron run.

The short version

Pass a public post URL to a platform's comments endpoint, paginate with cursor, and feed comment.text into your classifier. Same { data, meta } envelope on every platform.

You'll need an API key. Test comment lookups in the Playground.

Why comments beat post text alone

Post copy is one voice. Comments are hundreds of independent reactions — often angrier, more specific, and closer to purchase intent than anything in the creative brief.

SignalWhere it shows upWhy it matters for sentiment
Objections"shipping took 3 weeks"Product gaps surface in replies, not captions
Brand perception"used to love them until…"Churn language clusters in threads
Campaign QADebate under sponsored postsCreative testing via audience pushback
Creator riskToxic reply patternsPartnership vetting before you sign

Manual copy-paste fails on pagination, breaks when URLs redirect, and does not scale to a launch monitor that runs hourly. A scheduled job with cursors and a warehouse table does.

Pair comment pulls with search-based listening when you do not already have the post URL — search finds candidates; comments endpoints score the ones that crossed your velocity threshold.

Which platforms expose comments

Social Fetch normalizes comment listing under predictable routes. Pattern: GET /v1/{platform}/posts/comments or .../videos/comments, required url, optional cursor.

PlatformRouteNotable query paramsReply depth
TikTok/v1/tiktok/videos/commentstrim for lighter payloadsTop-level per page; replyCount on each row
YouTube/v1/youtube/videos/commentsorder=top or newest/v1/youtube/videos/comments/replies per thread
Instagram/v1/instagram/posts/commentsworks on posts and reelsTop-level; replyCount when reported
Reddit/v1/reddit/posts/commentstrimNested replies on each comment
Facebook/v1/facebook/posts/commentsTop-level pages
Rumble/v1/rumble/videos/commentsTop-level pages

Every response includes data.lookupStatus (found or not_found), data.comments, data.page.nextCursor, data.page.hasMore, and meta.requestId. Field names differ slightly per platform — TikTok uses likes on the comment object; YouTube uses likeCount; Reddit uses score — but text and id are always there for scoring.

Phase 0: Scope the scoring job

Before you paginate, write down what "done" means. Vague goals produce expensive full-thread pulls on videos that do not need them.

Job typeExample questionTypical pull
Launch monitorDid sentiment flip negative after the drop?Top 2–3 pages per flagged post, hourly
Creator vettingAre reply threads hostile?First 500 comments by likes on recent posts
Support triageWhich posts need a human?Keyword filter on newest page only
Research archiveWhat objections appear in r/SaaS?Full pagination on 10–20 high-score threads

List inputs: post URLs from your CMS, URLs from a Reddit research sprint, or video IDs from TikTok search. Set a per-post page cap before you write the loop — cursors make it easy to overspend.

TikTok video comments

TikTok threads move fast and skew short. Comments often carry the meme reaction while the on-screen caption stays neutral.

Request
curl -sS \
  -H "x-api-key: $SOCIALFETCH_API_KEY" \
  -G "https://api.socialfetch.dev/v1/tiktok/videos/comments" \
  --data-urlencode "url=https://www.tiktok.com/@mrbeast/video/7596844935442189598"

Useful fields for scoring:

  • text — primary classifier input
  • likes — weight positive/negative aggregates (high-like sarcasm still counts)
  • language — route to locale-specific models without guessing
  • replyCount — prioritize threads with debate before you pay for more pages
  • author.handle — blocklist repeat spam accounts client-side

Optional trim=true drops ancillary fields when you only need text and ids for batch scoring.

Reference: TikTok video comments.

On viral videos, totalComments in the response helps you decide whether full pagination is worth it — a post with 40,000 comments rarely needs every page for a dashboard chart.

YouTube video comments

YouTube exposes sort order at query time. That matters because "top" and "newest" can disagree on sentiment for the same video.

Request
const params = new URLSearchParams({"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ","order":"top"});

const response = await fetch(
  `https://api.socialfetch.dev/v1/youtube/videos/comments?${params.toString()}`,
  {
    headers: {
      "x-api-key": process.env.SOCIALFETCH_API_KEY,
    },
  }
);

const body = await response.json();

console.log(response.status, body);
order valueWhen to use
topDefault for brand monitoring — surfaces the visible thread
newestLaunch week, controversy spikes, or support triage

Fields worth storing:

  • likeCount and replyCount — same weighting story as TikTok
  • author.creator — creator replies often clarify context; tag separately in aggregates
  • repliesCursor — non-null means a sub-thread exists

Nested replies are a separate endpoint. After listing top-level comments, fetch replies for rows where replyCount exceeds your threshold:

curl -sS \
  -H "x-api-key: $SOCIALFETCH_API_KEY" \
  -G "https://api.socialfetch.dev/v1/youtube/videos/comments/replies" \
  --data-urlencode "cursor=REPLIES_CURSOR_FROM_COMMENT"

Each reply page bills separately. For sentiment, scoring top-level comments plus high-reply threads usually captures the argument without walking every leaf node.

Reference: YouTube video comments · Comment replies.

Instagram post comments

Instagram comments on reels and feed posts share one route. Pass the full post or reel URL from the share sheet — shortened instagr.am links usually work, but the canonical /p/ or /reel/ URL is safest.

curl -sS \
  -H "x-api-key: $SOCIALFETCH_API_KEY" \
  -G "https://api.socialfetch.dev/v1/instagram/posts/comments" \
  --data-urlencode "url=https://www.instagram.com/reel/example/"

TypeScript with the SDK:

const result = await client.instagram.getPostComments({
  url: "https://www.instagram.com/reel/example/",
});

if (result.ok && result.value.data.lookupStatus === "found") {
  for (const comment of result.value.data.comments) {
    console.log(comment.text, comment.likeCount);
  }
}

Instagram returns top-level comments per page. replyCount tells you where sub-threads exist; there is no separate replies route today — weight high-reply top-level comments heavily if the debate lives in one visible reply chain.

author.verified helps filter brand-account replies when you want audience-only sentiment.

Reference: Instagram post comments.

Reddit post comments

Reddit is where sentiment work pays off for B2B and product research — long objections, switching stories, and nested arguments. If you arrived from search, pull comments only on threads that passed your score filter (see the Reddit research guide).

curl -sS \
  -H "x-api-key: $SOCIALFETCH_API_KEY" \
  -G "https://api.socialfetch.dev/v1/reddit/posts/comments" \
  --data-urlencode "url=https://www.reddit.com/r/SaaS/comments/example/"

Reddit comments include depth, nested replies, and score. Flatten before feeding an LLM if your classifier expects a list of strings:

function flattenComments(comments: Array<{ text: string; replies?: { items: typeof comments } }>): string[] {
  const out: string[] = [];
  for (const c of comments) {
    out.push(c.text);
    if (c.replies?.items?.length) {
      out.push(...flattenComments(c.replies.items));
    }
  }
  return out;
}

Use trim=true when you only need text, score, and ids. Keep isSubmitter — OP clarifications change how you read negative replies.

Reference: Reddit post comments.

Facebook posts follow the same URL-in pattern at /v1/facebook/posts/comments when you need comment text on Page posts or public group discussions.

Paginate the full thread

Comment endpoints return one page plus data.page.nextCursor. Loop until hasMore is false or you hit your budget cap:

Example
typescript
import { SocialFetchClient } from "@socialfetch/sdk";

const client = new SocialFetchClient({
  apiKey: process.env.SOCIALFETCH_API_KEY!,
});

const videoUrl = "https://www.tiktok.com/@mrbeast/video/7596844935442189598";
const comments = [];
let cursor: string | undefined;

do {
  const result = await client.tiktok.getVideoComments({ url: videoUrl, cursor });
  if (!result.ok) break;
  if (result.value.data.lookupStatus !== "found") break;

  comments.push(
    ...result.value.data.comments.map((c) => ({
      text: c.text,
      likes: c.metrics?.likes ?? 0,
    })),
  );
  cursor = result.value.data.page.nextCursor ?? undefined;
} while (cursor);

// Feed comments.text into your sentiment model or keyword rules.
console.log(comments.length, "comments ready for analysis");

Checklist for every pagination loop:

  1. Stop when lookupStatus !== "found" — do not keep passing cursors after not_found.
  2. Prefer hasMore over guessing from array length — empty pages can still have a cursor on some platforms.
  3. Pass cursor verbatim — do not parse or construct tokens.
  4. Log meta.requestId per page for support traces.

Re-running the same cursor chain twice bills twice. Treat cursors as single-use pagination tokens within one ingestion run.

Sampling strategies

Full-thread pulls are the exception, not the default. A 30-second TikTok with 8,000 comments does not need 8,000 rows for a launch dashboard — the top three pages by default sort already skew what humans see in-app.

StrategyHowGood for
Page capStop after N cursor iterationsHourly monitors, credit budgets
Engagement floorSkip comments where likes/score < thresholdNoise reduction on viral posts
Reply thresholdOnly fetch YouTube reply pages when replyCount > 10Debate-heavy threads
Time windowRe-fetch newest page only on repeat runsTrend detection vs archive
Stratified sampleTop page + newest page (YouTube order)When sort order disagrees
Search handoffComments only on posts from search hits above velocityMonitoring workflows

For LLM batch jobs, chunk comments into batches of 50–100 texts with the post URL in the system prompt. Asking the model to score 5,000 comments in one prompt produces mushy averages and blows context limits.

Weight aggregates by engagement: weighted_score = sentiment_score * log1p(likes) damps single-comment noise without ignoring a 10k-like pile-on.

Feed LLM classifiers

Normalized JSON maps cleanly to three scoring patterns.

Keyword rules (cheap, deterministic)

Run first. Flag refund, scam, love this, competitor names before you spend tokens.

const NEGATIVE = /\b(scam|refund|broken|worst)\b/i;

for (const row of comments) {
  if (NEGATIVE.test(row.text)) {
    await routeToSlack({ postUrl, text: row.text, requestId: row.requestId });
  }
}

Classical ML (fast at volume)

Export text + likes/score to your sklearn or Hugging Face pipeline. Train on a few hundred hand-labeled comments per vertical; rules catch spikes, the model catches paraphrases.

LLM batch classification

Send structured input — one JSON object per batch:

{
  "postUrl": "https://www.tiktok.com/@brand/video/123",
  "comments": [
    { "id": "c1", "text": "shipping took forever", "likes": 240 },
    { "id": "c2", "text": "ordered two more", "likes": 89 }
  ]
}

Ask for per-id labels (positive, negative, neutral, mixed) and optional themes. Require JSON output; parse defensively. Store the model version on each row.

Preprocessing that usually helps:

  1. Strip URLs and bare @mentions if your model treats them as noise.
  2. Keep emoji when monitoring Gen-Z brands — stripping flattens sarcasm.
  3. Attach language (TikTok) or detect locale before routing to the right prompt.
  4. Pair with transcripts when spoken claims and comment reactions diverge.

Roll per-comment labels into post-level scores: weighted fraction positive, or "mixed" when top and bottom quartiles disagree.

Deduplication and warehouse rows

Comment pipelines fail in the warehouse, not at the HTTP layer. Plan keys before your second cron run.

ProblemFix
Same comment on re-fetchUpsert on platform + comment.id
Same text, different ids (rare)Secondary hash on normalize(text) for analytics only
Same post scored twice in one runDedupe URLs before the outer loop
Stale sentiment in dashboardsStore capturedAt; re-pull on schedule, do not mutate old rows
Audit trailPersist meta.requestId and meta.creditsCharged per page

Suggested flat row for Postgres or BigQuery:

{
  "platform": "tiktok",
  "postUrl": "https://www.tiktok.com/@brand/video/123",
  "commentId": "7596851467001119502",
  "text": "shipping took forever",
  "likes": 240,
  "sentiment": "negative",
  "themes": ["shipping"],
  "capturedAt": "2026-06-30T14:00:00.000Z",
  "requestId": "req_01example",
  "creditsCharged": 1
}

When you merge comment sentiment back to a listening snapshot, join on postUrl or platform-native video id — not on title text, which collides across reposts.

Troubleshooting

not_found but the post opens in a browser

  • Comments may be disabled on that post — lookup still resolves for many platforms; you get zero rows with found.
  • Private, age-gated, or region-blocked media returns not_found at request time.
  • Pass the canonical URL (full YouTube watch link, full Reddit /comments/ path). Tracker URLs and stripped share links fail more often.

Empty comments array with lookupStatus: found

  • Legitimate — new posts, brand accounts with comments off, or cleared threads.
  • Still billed — the upstream lookup completed. Budget for empty pages in cron jobs.

Pagination stops early or repeats

  • Only pass cursor from the immediately previous response.
  • If hasMore is true but nextCursor is null, stop and log requestId — that is an upstream anomaly worth a support ticket.
  • Cap pages per post so a bug does not loop until your balance drains.

YouTube top vs newest disagree

  • Run two passes with different order values if sort skew matters for your decision. That is two credit lines per page, not one.

Reddit nested replies missing from classifier input

  • Top-level pagination does not always inline every nested reply on one page. Walk replies.items recursively or flatten as shown above.

TikTok id looks like a placeholder

  • Some comments use deterministic ids derived from available fields. Still stable within a pull — use them for dedupe keys in that run.

lookup_failed or HTTP 503

  • Not charged. Retry with backoff; include meta.requestId from the failed attempt if you contact support.

Classifier drift

  • Re-score a frozen comment snapshot when you change models — do not compare June scores to March scores across model versions without a calibration pass.

Billing notes

Each page is its own metered lookup (typically one credit). meta.creditsCharged on every response tells you what that page cost.

ActionCredits (typ.)
1 TikTok comment page1
1 YouTube comment page1
1 YouTube reply page1
1 Instagram comment page1
1 Reddit comment page1
10 posts × 3 pages each30

Completed lookups that return not_found (deleted video) still ran upstream and are billed. Infrastructure failures (lookup_failed, 503) are not. See Credits.

Rough planning formula: (posts you care about) × (pages per post) + (YouTube reply pages you opt into). A launch monitor on 20 posts with a 3-page cap is about 60 credits per run.

What you can build

  • Launch monitoring — score comment sentiment hourly after a product drop; alert when negative fraction crosses a threshold.
  • Creator vetting — flag toxic reply patterns and recurring spam before signing a partnership.
  • Support triage — route negative threads to Slack when keyword rules fire on the newest page.
  • Competitive dashboards — same classifier on your posts and competitor posts discovered via search.
  • Research exports — flat CSV of Reddit thread comments with themes for PM decks.

Next steps: Playground · API reference · Pricing