Social Comments for Sentiment Analysis with an API (2026)
Pull paginated comments from TikTok, YouTube, Instagram, Reddit, and more as normalized JSON — ready for keyword rules, classifiers, or LLM batch jobs.
A sponsored caption can read positive while the comment section argues about shipping delays. Sentiment on post text alone misses that split — and on most networks the argument lives behind infinite scroll, different sort orders, and nested reply trees.
This guide walks through comment collection for scoring pipelines: which endpoints to call on TikTok, YouTube, Instagram, and Reddit, how pagination actually works, when to sample instead of draining a thread, and how to hand normalized text to keyword rules or an LLM classifier without double-counting the same reply on every cron run.
The short version
Pass a public post URL to a platform's comments endpoint, paginate with cursor, and feed comment.text into your classifier. Same { data, meta } envelope on every platform.
You'll need an API key. Test comment lookups in the Playground.
Why comments beat post text alone
Post copy is one voice. Comments are hundreds of independent reactions — often angrier, more specific, and closer to purchase intent than anything in the creative brief.
| Signal | Where it shows up | Why it matters for sentiment |
|---|---|---|
| Objections | "shipping took 3 weeks" | Product gaps surface in replies, not captions |
| Brand perception | "used to love them until…" | Churn language clusters in threads |
| Campaign QA | Debate under sponsored posts | Creative testing via audience pushback |
| Creator risk | Toxic reply patterns | Partnership vetting before you sign |
Manual copy-paste fails on pagination, breaks when URLs redirect, and does not scale to a launch monitor that runs hourly. A scheduled job with cursors and a warehouse table does.
Pair comment pulls with search-based listening when you do not already have the post URL — search finds candidates; comments endpoints score the ones that crossed your velocity threshold.
Which platforms expose comments
Social Fetch normalizes comment listing under predictable routes. Pattern: GET /v1/{platform}/posts/comments or .../videos/comments, required url, optional cursor.
| Platform | Route | Notable query params | Reply depth |
|---|---|---|---|
| TikTok | /v1/tiktok/videos/comments | trim for lighter payloads | Top-level per page; replyCount on each row |
| YouTube | /v1/youtube/videos/comments | order=top or newest | /v1/youtube/videos/comments/replies per thread |
/v1/instagram/posts/comments | works on posts and reels | Top-level; replyCount when reported | |
/v1/reddit/posts/comments | trim | Nested replies on each comment | |
/v1/facebook/posts/comments | — | Top-level pages | |
| Rumble | /v1/rumble/videos/comments | — | Top-level pages |
Every response includes data.lookupStatus (found or not_found), data.comments, data.page.nextCursor, data.page.hasMore, and meta.requestId. Field names differ slightly per platform — TikTok uses likes on the comment object; YouTube uses likeCount; Reddit uses score — but text and id are always there for scoring.
Phase 0: Scope the scoring job
Before you paginate, write down what "done" means. Vague goals produce expensive full-thread pulls on videos that do not need them.
| Job type | Example question | Typical pull |
|---|---|---|
| Launch monitor | Did sentiment flip negative after the drop? | Top 2–3 pages per flagged post, hourly |
| Creator vetting | Are reply threads hostile? | First 500 comments by likes on recent posts |
| Support triage | Which posts need a human? | Keyword filter on newest page only |
| Research archive | What objections appear in r/SaaS? | Full pagination on 10–20 high-score threads |
List inputs: post URLs from your CMS, URLs from a Reddit research sprint, or video IDs from TikTok search. Set a per-post page cap before you write the loop — cursors make it easy to overspend.
TikTok video comments
TikTok threads move fast and skew short. Comments often carry the meme reaction while the on-screen caption stays neutral.
curl -sS \
-H "x-api-key: $SOCIALFETCH_API_KEY" \
-G "https://api.socialfetch.dev/v1/tiktok/videos/comments" \
--data-urlencode "url=https://www.tiktok.com/@mrbeast/video/7596844935442189598"Useful fields for scoring:
text— primary classifier inputlikes— weight positive/negative aggregates (high-like sarcasm still counts)language— route to locale-specific models without guessingreplyCount— prioritize threads with debate before you pay for more pagesauthor.handle— blocklist repeat spam accounts client-side
Optional trim=true drops ancillary fields when you only need text and ids for batch scoring.
Reference: TikTok video comments.
On viral videos, totalComments in the response helps you decide whether full pagination is worth it — a post with 40,000 comments rarely needs every page for a dashboard chart.
YouTube video comments
YouTube exposes sort order at query time. That matters because "top" and "newest" can disagree on sentiment for the same video.
const params = new URLSearchParams({"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ","order":"top"});
const response = await fetch(
`https://api.socialfetch.dev/v1/youtube/videos/comments?${params.toString()}`,
{
headers: {
"x-api-key": process.env.SOCIALFETCH_API_KEY,
},
}
);
const body = await response.json();
console.log(response.status, body);order value | When to use |
|---|---|
top | Default for brand monitoring — surfaces the visible thread |
newest | Launch week, controversy spikes, or support triage |
Fields worth storing:
likeCountandreplyCount— same weighting story as TikTokauthor.creator— creator replies often clarify context; tag separately in aggregatesrepliesCursor— non-null means a sub-thread exists
Nested replies are a separate endpoint. After listing top-level comments, fetch replies for rows where replyCount exceeds your threshold:
curl -sS \
-H "x-api-key: $SOCIALFETCH_API_KEY" \
-G "https://api.socialfetch.dev/v1/youtube/videos/comments/replies" \
--data-urlencode "cursor=REPLIES_CURSOR_FROM_COMMENT"Each reply page bills separately. For sentiment, scoring top-level comments plus high-reply threads usually captures the argument without walking every leaf node.
Reference: YouTube video comments · Comment replies.
Instagram post comments
Instagram comments on reels and feed posts share one route. Pass the full post or reel URL from the share sheet — shortened instagr.am links usually work, but the canonical /p/ or /reel/ URL is safest.
curl -sS \
-H "x-api-key: $SOCIALFETCH_API_KEY" \
-G "https://api.socialfetch.dev/v1/instagram/posts/comments" \
--data-urlencode "url=https://www.instagram.com/reel/example/"TypeScript with the SDK:
const result = await client.instagram.getPostComments({
url: "https://www.instagram.com/reel/example/",
});
if (result.ok && result.value.data.lookupStatus === "found") {
for (const comment of result.value.data.comments) {
console.log(comment.text, comment.likeCount);
}
}Instagram returns top-level comments per page. replyCount tells you where sub-threads exist; there is no separate replies route today — weight high-reply top-level comments heavily if the debate lives in one visible reply chain.
author.verified helps filter brand-account replies when you want audience-only sentiment.
Reference: Instagram post comments.
Reddit post comments
Reddit is where sentiment work pays off for B2B and product research — long objections, switching stories, and nested arguments. If you arrived from search, pull comments only on threads that passed your score filter (see the Reddit research guide).
curl -sS \
-H "x-api-key: $SOCIALFETCH_API_KEY" \
-G "https://api.socialfetch.dev/v1/reddit/posts/comments" \
--data-urlencode "url=https://www.reddit.com/r/SaaS/comments/example/"Reddit comments include depth, nested replies, and score. Flatten before feeding an LLM if your classifier expects a list of strings:
function flattenComments(comments: Array<{ text: string; replies?: { items: typeof comments } }>): string[] {
const out: string[] = [];
for (const c of comments) {
out.push(c.text);
if (c.replies?.items?.length) {
out.push(...flattenComments(c.replies.items));
}
}
return out;
}Use trim=true when you only need text, score, and ids. Keep isSubmitter — OP clarifications change how you read negative replies.
Reference: Reddit post comments.
Facebook posts follow the same URL-in pattern at /v1/facebook/posts/comments when you need comment text on Page posts or public group discussions.
Paginate the full thread
Comment endpoints return one page plus data.page.nextCursor. Loop until hasMore is false or you hit your budget cap:
import { SocialFetchClient } from "@socialfetch/sdk";
const client = new SocialFetchClient({
apiKey: process.env.SOCIALFETCH_API_KEY!,
});
const videoUrl = "https://www.tiktok.com/@mrbeast/video/7596844935442189598";
const comments = [];
let cursor: string | undefined;
do {
const result = await client.tiktok.getVideoComments({ url: videoUrl, cursor });
if (!result.ok) break;
if (result.value.data.lookupStatus !== "found") break;
comments.push(
...result.value.data.comments.map((c) => ({
text: c.text,
likes: c.metrics?.likes ?? 0,
})),
);
cursor = result.value.data.page.nextCursor ?? undefined;
} while (cursor);
// Feed comments.text into your sentiment model or keyword rules.
console.log(comments.length, "comments ready for analysis");Checklist for every pagination loop:
- Stop when
lookupStatus !== "found"— do not keep passing cursors afternot_found. - Prefer
hasMoreover guessing from array length — empty pages can still have a cursor on some platforms. - Pass
cursorverbatim — do not parse or construct tokens. - Log
meta.requestIdper page for support traces.
Re-running the same cursor chain twice bills twice. Treat cursors as single-use pagination tokens within one ingestion run.
Sampling strategies
Full-thread pulls are the exception, not the default. A 30-second TikTok with 8,000 comments does not need 8,000 rows for a launch dashboard — the top three pages by default sort already skew what humans see in-app.
| Strategy | How | Good for |
|---|---|---|
| Page cap | Stop after N cursor iterations | Hourly monitors, credit budgets |
| Engagement floor | Skip comments where likes/score < threshold | Noise reduction on viral posts |
| Reply threshold | Only fetch YouTube reply pages when replyCount > 10 | Debate-heavy threads |
| Time window | Re-fetch newest page only on repeat runs | Trend detection vs archive |
| Stratified sample | Top page + newest page (YouTube order) | When sort order disagrees |
| Search handoff | Comments only on posts from search hits above velocity | Monitoring workflows |
For LLM batch jobs, chunk comments into batches of 50–100 texts with the post URL in the system prompt. Asking the model to score 5,000 comments in one prompt produces mushy averages and blows context limits.
Weight aggregates by engagement: weighted_score = sentiment_score * log1p(likes) damps single-comment noise without ignoring a 10k-like pile-on.
Feed LLM classifiers
Normalized JSON maps cleanly to three scoring patterns.
Keyword rules (cheap, deterministic)
Run first. Flag refund, scam, love this, competitor names before you spend tokens.
const NEGATIVE = /\b(scam|refund|broken|worst)\b/i;
for (const row of comments) {
if (NEGATIVE.test(row.text)) {
await routeToSlack({ postUrl, text: row.text, requestId: row.requestId });
}
}Classical ML (fast at volume)
Export text + likes/score to your sklearn or Hugging Face pipeline. Train on a few hundred hand-labeled comments per vertical; rules catch spikes, the model catches paraphrases.
LLM batch classification
Send structured input — one JSON object per batch:
{
"postUrl": "https://www.tiktok.com/@brand/video/123",
"comments": [
{ "id": "c1", "text": "shipping took forever", "likes": 240 },
{ "id": "c2", "text": "ordered two more", "likes": 89 }
]
}Ask for per-id labels (positive, negative, neutral, mixed) and optional themes. Require JSON output; parse defensively. Store the model version on each row.
Preprocessing that usually helps:
- Strip URLs and bare @mentions if your model treats them as noise.
- Keep emoji when monitoring Gen-Z brands — stripping flattens sarcasm.
- Attach
language(TikTok) or detect locale before routing to the right prompt. - Pair with transcripts when spoken claims and comment reactions diverge.
Roll per-comment labels into post-level scores: weighted fraction positive, or "mixed" when top and bottom quartiles disagree.
Deduplication and warehouse rows
Comment pipelines fail in the warehouse, not at the HTTP layer. Plan keys before your second cron run.
| Problem | Fix |
|---|---|
| Same comment on re-fetch | Upsert on platform + comment.id |
| Same text, different ids (rare) | Secondary hash on normalize(text) for analytics only |
| Same post scored twice in one run | Dedupe URLs before the outer loop |
| Stale sentiment in dashboards | Store capturedAt; re-pull on schedule, do not mutate old rows |
| Audit trail | Persist meta.requestId and meta.creditsCharged per page |
Suggested flat row for Postgres or BigQuery:
{
"platform": "tiktok",
"postUrl": "https://www.tiktok.com/@brand/video/123",
"commentId": "7596851467001119502",
"text": "shipping took forever",
"likes": 240,
"sentiment": "negative",
"themes": ["shipping"],
"capturedAt": "2026-06-30T14:00:00.000Z",
"requestId": "req_01example",
"creditsCharged": 1
}When you merge comment sentiment back to a listening snapshot, join on postUrl or platform-native video id — not on title text, which collides across reposts.
Troubleshooting
not_found but the post opens in a browser
- Comments may be disabled on that post — lookup still resolves for many platforms; you get zero rows with
found. - Private, age-gated, or region-blocked media returns
not_foundat request time. - Pass the canonical URL (full YouTube watch link, full Reddit
/comments/path). Tracker URLs and stripped share links fail more often.
Empty comments array with lookupStatus: found
- Legitimate — new posts, brand accounts with comments off, or cleared threads.
- Still billed — the upstream lookup completed. Budget for empty pages in cron jobs.
Pagination stops early or repeats
- Only pass
cursorfrom the immediately previous response. - If
hasMoreis true butnextCursoris null, stop and logrequestId— that is an upstream anomaly worth a support ticket. - Cap pages per post so a bug does not loop until your balance drains.
YouTube top vs newest disagree
- Run two passes with different
ordervalues if sort skew matters for your decision. That is two credit lines per page, not one.
Reddit nested replies missing from classifier input
- Top-level pagination does not always inline every nested reply on one page. Walk
replies.itemsrecursively or flatten as shown above.
TikTok id looks like a placeholder
- Some comments use deterministic ids derived from available fields. Still stable within a pull — use them for dedupe keys in that run.
lookup_failed or HTTP 503
- Not charged. Retry with backoff; include
meta.requestIdfrom the failed attempt if you contact support.
Classifier drift
- Re-score a frozen comment snapshot when you change models — do not compare June scores to March scores across model versions without a calibration pass.
Billing notes
Each page is its own metered lookup (typically one credit). meta.creditsCharged on every response tells you what that page cost.
| Action | Credits (typ.) |
|---|---|
| 1 TikTok comment page | 1 |
| 1 YouTube comment page | 1 |
| 1 YouTube reply page | 1 |
| 1 Instagram comment page | 1 |
| 1 Reddit comment page | 1 |
| 10 posts × 3 pages each | 30 |
Completed lookups that return not_found (deleted video) still ran upstream and are billed. Infrastructure failures (lookup_failed, 503) are not. See Credits.
Rough planning formula: (posts you care about) × (pages per post) + (YouTube reply pages you opt into). A launch monitor on 20 posts with a 3-page cap is about 60 credits per run.
What you can build
- Launch monitoring — score comment sentiment hourly after a product drop; alert when negative fraction crosses a threshold.
- Creator vetting — flag toxic reply patterns and recurring spam before signing a partnership.
- Support triage — route negative threads to Slack when keyword rules fire on the newest page.
- Competitive dashboards — same classifier on your posts and competitor posts discovered via search.
- Research exports — flat CSV of Reddit thread comments with themes for PM decks.
Next steps: Playground · API reference · Pricing