Skip to Content

Welcome!

Share and discuss the best content and new marketing ideas, build your professional profile and become a better marketer together.

Sign up

This question has been flagged

SeekOne  Digital Droplet (SeekOne DD) is a High throughput Single Cell Full-length RNA Sequence Transcriptome-seq (scFAST-seq) Kit distributed by Gentaur.

SeekOne ® DD scFAST-seq is a powerful commercial tool for high-throughput whole transcriptome profiling. The scFAST-seq method makes use of innovative techniques including semi-random primers, efficient reverse transcription, template swapping, and effective rRNA removal to build full-length RNA libraries of up to 12,000 cells.

 Compared to 10x conventional 3' scRNA-seq, scFAST- seq has distinct advantages in detecting non- polyadenylated transcripts, transcript coverage length, and identification of more splice junctions.

 With target region enrichment, scFAST-seq can simultaneously detect somatic mutations and cell states in individual tumour cells, providing valuable information for mutation study in Cancer Research.


 scFAST seq innovative technology  key principle os semi-random primer sounds easy, but there are still many difficulties that 10x genomics was not able to solve.

 SeekOne designed a special primer called blocker, which can block the rRNAs and mtRNAs amplification, so that they can get about<5% of rRNAs and mtRNAs in the final QC result!

 

  1. random primer means they can randomly capture all RNA, but we don't want rRNAs(which about 90%), how to remove?

     2. the primer we called semi-random, in fact there are 12N7K(12 of unknown sequence and 7 of known sequence), the 7K sequence we conducted so many tests to determine


SeekOne designed a special primer called the Ribosomal and Mitochondrial RNA blocker, which can block the rRNAs and mtRNAs amplification, so that they can get about<5% of rRNAs and mtRNAs in the final QC result. 

10x Genomics in 2025 was still not able to do this.

Avatar
Discard
Author Best Answer

 Ribosomal genes should be excluded prior to normalization in scRNA-seq as contaminants.

Do mitochondrial genes have to be excluded as well? I plotted the top 50 expressed genes for a specific dataset and they tend to appear often (for example MT-ATP6). My assumption is that, given that they work for mitochondrial function and may be highly expressed, they can dilute the signal of gene differential across cell types but expressed at lower levels. 

Distinguishing between the ribosomal RNA gene (Rn45s) and the many genes that code for ribosomal protein (which start with RPS, RPL, MRPS, or MRPL). 

mRNA's for ribosomal protein subunits are an intermediate stage used for protein production. You might catch a few lucky ribosomal RNA (Rn45s) copies that were transcribed recently and still haven't been processed, but depending on your RNA-seq protocol, there's a chance they are acting as a functional part of the ribosome rather than an intermediate stage.

Mitochondrial reads are innately different from rRNA due to rRNA typically being excluded during normal library preparation (so any of it getting carried over is essentially a contaminant).

 Some scRNA seq programs (e.g., RaceID) can use sampling at various stages, which can break if you have a lot of your signal going to mitochondrial transcripts. Then it's not that they should be excluded because they're from mitochondria, but rather that the results are heavily affected by the presence of VERY highly expressed genes and your results are suspect if you don't remove these RNAs. 

When your methods aren't affected by that (e.g., you use more robust scaling methods) or you don't have super high expression of mtRNA then that's not explicitly needed.

Further, sometimes you very much want to keep mtRNAs, since you might be interested in metabolic changes. Since I work in an institute partially dedicated to immunology, we often look at things like metabolic changes involved in immune activation and removing mtRNAs would hinder that. 

But if you're a priori not interested in metabolism then excluding mtRNA  or rRNA genes you won't care about isn't unreasonable.

Is this biologically sound?

Avatar
Discard
Author

The principle of scFAST-seq
To develop a widely accepted single-cell RNA sequencing (scRNA-seq) method that
captures full-length transcripts, including non-polyadenylated transcripts, we consideredusing random primers instead of oligo-dT to initiate reverse transcription. This allows us tocapture transcripts independent of their polyadenylated tails. We also chose to usewater-in-oil droplets as a partitioning reaction assay to label transcripts fromindividual
cells with unique cell barcodes and prevent cell-to-cell contamination. This approachcanbe conveniently adopted on similar platforms such as inDrop and 10X Genomics.
Additionally, we prioritized using short-read next-generation sequencing platforms duetotheir low cost. Lastly, our method should have detection sensitivity comparable to its 3’ scRNA-seq counterparts.
However, when we replaced cell-barcoded oligo-dT primers with randomer-dN6 inour
previously developed 3’ scRNA-seq method, we found that the detection sensitivity wasreduced by more than three times compared to 3’ scRNA-seq (737 vs 271 median genesper cell at a comparable sequencing depth; see Supplemental Figure 1). Additionally, weobserved substantial reads mapping to ribosomal and mitochondrial RNA, whichincreased sequencing costs. We determined that the reduced sensitivity was mainly duemade available under aCC-BY-NC 4.0 International license.
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
bioRxiv preprint doi: https://doi.org/10.1101/2023.03.19.533382; this version posted March 23, 2023. The copyright holder for this preprint
to the low reverse transcription efficiency of randomer-dN6. To improve reversetranscription initiation, we evaluated completely-random primers (dN9 and dN15) [9] andsemi-random primers (5N3G/5N3T [10] or 12N7K (random 12N followed by 7bp withknown sequence)and found that 12N7K provided the best sensitivity for transcript
detection (Figure 1a and supplemental fig2).
Several technologies exist for depleting ribosomal RNA (rRNA) [11], including separatingrRNAs using hybrid-specific antibodies [12] or magnetic streptavidin-coated beads[13], selectively degrading rRNA using RNase H[14] and treating with duplex-specific nuclease(DSN)[10, 15]. However, none of these methods can be seamlessly appliedtodroplet-based scRNA-seq methods. . Here We developed a convenient rRNA depletionmethod derived from PNA-mediated PCR clamping[16, 17]. As shown in Figure 1b, wedesigned probes with a 3’ non-extension blocker (3’ phosphorylation) that could perfectlyhybridize with cDNA from rRNA/mitochondrial RNA (mtRNA) and mixed these probes intothe cDNA amplification assay. During the annealing and elongation steps of PCR, theprobes rapidly bind to cDNA derived from rRNA/mtRNA and inhibit strand elongation whenusing a polymerase without 5’→3’ exonuclease activity. Meanwhile, RNAs not hybridizedwith probes can be exponentially amplified. This difference in amplification efficiency leadsto minimal rRNA percentage in the final library after several PCR cycles. As showninFigure 1c, the proportion of ribosomal reads in total sequencing reads was effectivelyreduced from 30% to 10%, and the proportion of mtRNA reads was reduced fromover 6%to less than 1%.
Given that semi-randomers can anneal and initiate reverse transcription at multiple siteson transcripts, multiple cDNA chains can be transcribed from a single RNA molecule. Asaresult, other methods for adding adaptors to the 3’ end of cDNA may produce higher
sensitivity for RNA detection than template switching, which preferentially adds TSOsequences to the 3’ end of cDNA where the complementary RNA has a 5’ cap structure.
The adaptor added to the 3’ end of cDNA serves as a PCR primer to exponentially amplifycDNA and generate enough product for library construction. We compared two methods, multiplexed primer extension [18, 19] and TdT-mediated tailing of cDNA ends [20], withtemplate switching. Surprisingly, template switching had better performance in terms of
cDNA yield and median genes per cell compared to the other two methods (Figure1D).
In conclusion, we developed a Full-length RNA Sequence Transcriptome sequencingmethod (scFAST-seq) that can capture RNA independent of the polyadenylated tail of
transcripts using a simple process (Figure1E). First, fresh cells were suspendedinreverse transcription master mix and then encapsulated into droplets with cell-barcodedgel beads. Each gel bead was coupled with cleavable oligos consisting of a read1seqprimer and a specific 17bp cell barcode in front of the 12N7K sequence. RNAreleasedfrom each cell was reverse transcribed to cDNA and barcoded by the correspondingsemi-randomers and tailed by template switching oligos within droplets. After breakingthedroplets, pooled cDNA from all droplets was purified and amplified by PCRwhileamplification of cDNA derived from rRNA and mtRNA was inhibited by blocking probes.
The PCR product was then fragmented and ligated with a read2 seqprimer beforeconstructing the library by selectively amplifying DNA with cell barcodes using a uniquedouble-indexed primer. Furthermore, we also developed targeted region sequencingmade available under aCC-BY-NC 4.0 International license.
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
bioRxiv preprint doi: https://doi.org/10.1101/2023.03.19.533382; this version posted March 23, 2023. The copyright holder for this preprint
methods based on scFAST-seq using nested PCR or biotin-probe bait enrichment toobtain high depth at low sequencing cost.

Author

Consistency and differences between scFAST-seq and 3’ scRNA-seq
To evaluate the technical performance of the scFAST-seq method, we performedscFAST-seq and 3’ scRNA-seq on mixtures of K562, A549, and HCC827 cell lines as well
as breast cancer (BRCA), glioblastoma (GBM), mouse pancreatic cancer model (PAAD),
and PBMC samples. As expected, scFAST-seq effectively suppressed the proportionof
reads aligned to rRNA and mtRNA in total sequencing data, which was even lower thanthat of 3’ scRNA-seq (Figure 2A). The gene body coverage map further showed that
compared to 3’ scRNA-seq with its obvious 3’ end bias, scFAST-seq had homogeneouscoverage along the body of protein-coding genes (Figure 2B). Detection sensitivity wascomparable between the two methods and the number of genes detected by scFAST-seqtechnology was even higher in cell line mixtures and PBMC samples (Figure 2C). Bycalculating the correlation of average gene expression in each specific cell type, weevaluated gene expression correlation in cancer samples and found that scFAST-seqshowed a significant positive correlation with 3’ scRNA-seq techniques (P value <2.2e-16;
cor >0.8; see Figure 2D). We then applied typical correlation analysis methods fromtheSeurat package to integrate and visualize both scFAST-seq and 3’ scRNA-seq data. Wefound that clustering and relative positions of cells in both types of data were highlyoverlapped in two-dimensional UMAPs (Figure 2E). Furthermore, analysis of fibroblast
subtype proportions in pancreatic cancer and epithelial cell proportions in breast cancer
showed that proportions of different subtypes were consistent between scFAST-seq and3’ scRNA-seq for subtypes such as Luminal HS, Luminal AV, and Myoepithelial (Figure 2F).
Finally, we analyzed copy number variation in single cells from both scFAST-seq and3’ scRNA-seq using the inferCNV pipeline. In human glioma, breast cancer, and mousepancreatic cancer samples we found little difference between copy number changesdetected by both methods.
Advantages of scFAST-seq in transcript analysis
The scFAST-seq technology is superior to traditional 3’ scRNA-seq in terms of uniformcoverage of gene body regions without terminal preference. This provides us withtheopportunity to further study RNA characteristics and functions. It is well known that asingle gene can produce multiple transcripts through splicing and that different transcriptscan perform different biological functions within cells. We compared traditional 3’ scRNA-seq with scFAST-seq in terms of transcription characteristics and splice junctions.
Firstly, we found that the ratio of transcription reads for lncRNA in GBMand BRCAsamples using scFAST-seq was higher than with 3’ scRNA-seq. Additionally, the ratioof
lncRNA transcription reads detected by scFAST-seq in PBMC was higher than with3’ scRNA-seq (Figure 3A). To further demonstrate the higher capability to detect lncRNA, wecompared published 3’ scRNA-seq data of lung cancer from 10X Genomics [21] andGEXSCOPE [22] platforms with scFAST-seq data of lung cancer (unpublished). We foundthat scFAST-seq could significantly detect more lncRNAs in lung cancer while maintainingequivalent expression levels for housekeeping genes (see Figure 3B). This result is not
made available under aCC-BY-NC 4.0 International license.
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
bioRxiv preprint doi: https://doi.org/10.1101/2023.03.19.533382; this version posted March 23, 2023. The copyright holder for this preprint
surprising because the 10X genomics or GEXSCOPE platform use oligo-dT primers either
coupled to magnetic beads or hydrogel beads to capture poly(A)+
transcripts whilenon-poly(A) transcripts including lncRNAs were omited during reverse transcription.
We used the StringTie software to reassemble transcripts and obtain a newset of
transcripts. Transcripts with an upper quartile length less than the length of all transcriptswere selected for statistical analysis of their length distribution. We found that scFAST-seqcould detect longer transcripts than 3’ scRNA-seq, which was consistent with the original
assumptions and objectives of the scFAST-seq technique (Figure 3C). To find moresensitive new junctions, we used the STAR --twopassMode Basic method to enable moresplice reads to map to new junctions. Statistically, we found that scFAST-seq discoveredagreater number of known and new splice junctions (Figure 3D) than traditional 3’ scRNA-seq. We also performed this analysis at the cellular level and found that
scFAST-seq detected more splice junctions (see Figure 3F) in each of six breast cancer
cell types (see Figure 3F). Finally, by comparing reassembled transcripts with referencegenomes, we annotated transcript types such as j, i, and u. Among six breast cancer cell
types, scFAST-seq identified a significantly higher proportion of c and j transcripts than3’ scRNA-seq with j transcripts defined as potential protein isoforms (see Figure 3G). Inconclusion, scFAST-seq provided us with more opportunities and support to study variablesplicing events and lncRNAs in disease at the single-cell level as well as more complextranscript types and functions.
Accurately inferring the direction of T cell evolution by scFAST-seq
It has been shown that when mature mRNAs are expressed, a portion of immature
transcripts are spliced. When gene expression increases, an instantaneous increase inthe proportion of immature unspliced transcripts is observed within the cell. Conversely,
when gene expression decreases, a higher proportion of spliced transcripts is seen for ashort period of time. Therefore, we calculated the ratio of spliced to unspliced transcriptsin our samples and found that scFAST-seq detected more unspliced transcripts in all threesamples accounting for about twice as many as 3’ scRNA-seq (see Figure 4A).
In recent years, the concept of RNA velocity has been proposed in literature as an
indicator to reveal dynamic changes in transcript abundance over time by evaluating theabundance of unspliced (nascent) and spliced (mature) mRNA. This is often used to studycell differentiation, lineage development and dynamic changes in tumor
microenvironments. Based on our previous findings, we calculated the ratio of unsplicedto spliced transcripts for each gene in single cells using both scFAST-seq and 3’ scRNA-seq data to reveal changes in gene expression and cell status in breast cancer
samples. We used scVelo software to label cell state change directions on UMAPs
according to RNA velocities calculated using ratios of unspliced-to-spliced counts for eachgene. Our results showed that scFAST-seq could label directions for almost all cells while3’ scRNA-seq could only label a small portion with epithelial endothelial and B cells
remaining unlabeled (see Figure 4B)
In patients with chronic infections and cancer, T cells are continuously stimulated duetolong-term exposure to persistent antigens and inflammation. This can lead to Tcell
exhaustion where exhausted T cells gradually lose their effector function and memory Tmade available under aCC-BY-NC 4.0 International license.
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
bioRxiv preprint doi: https://doi.org/10.1101/2023.03.19.533382; this version posted March 23, 2023. The copyright holder for this preprint
cell features. In breast cancer samples, we separated T cells into three subtypes: ‘naïve’,
‘effector’ and ‘exhaustion and Treg’ according to previous studies. As shown in Figure4C,
the whole sequence technique can define the direction of differentiation for almost everyTcell with predicted results conforming to known changes in the three states of T cells frominitial to final exhaustion. However, 3’ scRNA-seq could only predict directions for aportion of cells with results not consistent with known directions of T cell differentiation(see Figure 4C). We also analyzed cell differentiation trajectories according to geneexpression and found that scFAST-seq clearly described the branching trajectory of Tcells from their origin to final exhaustion (see Figure 4D). These results indicate that
scFAST-seq has obvious advantages in studying RNA velocity and inferring differentiationtrajectories.
Combining scFAST with target regions enrichment techniques to accurately detect
gene mutations and fusion
Genome instability and mutations in driver genes are hallmarks of cancer andfunction-altering somatic mutations provide valuable information for basic cancer researchand treatment [23]. Additionally, somatic mutations contribute to heterogeneity amongtumor cells and alterations in tumorigenic signaling pathways. Therefore, there is anunmet need to detect somatic mutations in single cells. Given that sense somaticmutations can occur at any site within genes and must be translated into protein to gainfunction, scFAST-seq with its ability to detect full-length RNA is considered an ideal
method for detecting mutations in single cells. To test this hypothesis, we equally mixedstandard cell lines HCC827 (with EGFR 19del), A549 (with KRAS G12S) and K562 (withBCR-ABL fusion) and performed scFAST-seq to assess mutation detection sensitivity. Asshown in Figure 5A, cell clusters were correctly identified as the three cell lines indicatingminimal cross-contamination with scFAST-seq. However, only 6.77% of A549 cells hadthe KRAS G12S mutation and 30.53% of HCC827 cells had the EGFR 19del mutation(Figure 5A, left panel). Further analysis indicated that lower sequencing depth andcoverage contributed to this low sensitivity of mutation detection. However, increasingtranscriptome library sequencing data was not an option due to high costs sowedeveloped two target region enrichment methods based on scFAST-seq to detect
mutations with high sequencing depth at a lower cost.
Firstly, we assessed the biotin-probe bait enrichment method due to its high scalabilityfrom hundreds of genes to the entire human exome (Figure 1E, right panel) [24]. Compared with the low detection sensitivity of transcriptome-scale scFAST-seq, the G12Smutation could be detected in 28.50% of A549 cells using biotin-probe bait enrichment
methods while the percentage for EGFR 19del increased from 30.53%to 70.01%inHCC827 cells (see Figure 5A). Next, we used nested PCR which has been appliedinCytoseq to detect low-abundance transcripts and rare cells (see Figure 1E, right panel)
[25] to enrich fragments containing KRAS G12 site and EGFR hotspots fromamplifiedcDNA. As shown in Figure 5B, fractions of cells with reads covering specific hotspotssignificantly increased by 2-3 times after nested PCR enrichment. We also detected somegene fusion sites using STAR-Fusion software based on genome alignment results withpanel enrichment techniques. For example, we detected the BCR-ABL1 fusion gene inthemade available under aCC-BY-NC 4.0 International license.
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
bioRxiv preprint doi: https://doi.org/10.1101/2023.03.19.533382; this version posted March 23, 2023. The copyright holder for this preprint
K562 cell line (Figure 5C) although its detection rate was only 2.86%, significantly lower
than G12S and 19del mutations. The breakpoint of BCR occurred at the same siteasvisualized in the Integrative Genomics Viewer between data from scFAST-seq withenrichment and bulk sequencing technology (Figure 5D).
Taken together, these results illustrate that combining scFAST-seq with target enrichment
can significantly increase sensitivity for detecting mutations in exons at the single-cell
level.
Discussion
We developed scFAST-seq to provide a high-throughput and streamlined process for
detecting tens of thousands of single-cell transcriptomes within 8 hours. By usingbarcoded semi-random primers to initiate reverse transcription, all exons independent
of its location in RNA and non-polyadenylated transcripts have an equal opportunity tobeanalyzed. We noticed that 3 ′ scRNA-seq could also detect substantial amounts of
lncRNA on droplet-based platforms. Further analysis revealed that oligo-dT primers couldinitiate reverse transcription at A-rich sequences located in the middle or at the 3′ tail of
lncRNA. This enrichment of A-rich lncRNA can create detection bias in 3′ scRNA-seq,
while the use of semi-random primers in scFAST-seq can alleviate this bias.
Despite its advantages, scFAST-seq still has some shortcomings that need tobeoptimized. For example, the detection rate of BCR-ABL fusion in K562 cells was lower
than expected. Possible reasons for this include limited expression copies of fusion genesand lower efficiency of semi-random primers in capturing some regions of RNAwithout
complementary sequences. We also noticed that the coverage rate of specific genesvaries greatly depending on their expression levels. This is mainly because sequencingreads belonging to one UMI can assemble and cover a sequence of 500-1000bp. Thus,
for genes with a length of 2000bp, at least 3 UMIs - but frequently 10 UMIs in practice- are needed to achieve 90% coverage of a specific gene in a single cell. This issue may bealleviated by increasing sensitivity through the recovery of more cDNA, as Seqwell-S^3does[26].
In summary, scFAST-seq has several advantages over 3 ′ scRNA-seq. These includebetter detection of non-polyadenylated transcripts, longer coverage of transcripts,
identification of more splice junctions, and more accurate prediction of cell differentiationdirection. When combined with targeted region enrichment, scFAST-seq has greater
potential to detect randomly occurring mutations in exons at the single-cell level. Overall,
scFAST-seq is a desirable alternative to 3 ′ scRNA-seq - especially consideringitscomparable sensitivity for mRNA detection, sequencing cost and experimental workflow?