Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

Sparse is Enough in Scaling Transformers (aka Terraformer) | ML Research Paper Explained

Share

Yannic Kilcher
Published at : 03 Dec 2021

Subscribe to Yannic Kilcher

6548 views

249

1

#scalingtransformers #terraformer #sparsity

Transformers keep pushing the state of the art in language and other domains, mainly due to their ability to scale to ever more parameters. However, this scaling has made it prohibitively expensive to run a lot of inference requests against a Transformer, both in terms of compute and memory requirements. Scaling Transformers are a new kind of architecture that leverage sparsity in the Transformer blocks to massively speed up inference, and by including additional ideas from other architectures, they create the Terraformer, which is both fast, accurate, and consumes very little memory.

OUTLINE:
0:00 - Intro & Overview
4:10 - Recap: Transformer stack
6:55 - Sparse Feedforward layer
19:20 - Sparse QKV Layer
43:55 - Terraformer architecture
55:05 - Experimental Results & Conclusion

Paper: https://arxiv.org/abs/2111.12763
Code: https://github.com/google/trax/blob/master/trax/examples/Terraformer_from_scratch.ipynb

Abstract:
Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size. Surprisingly, the sparse layers are enough to obtain the same perplexity as the standard Transformer with the same number of parameters. We also integrate with prior sparsity approaches to attention and enable fast inference on long sequences even with limited memory. This results in performance competitive to the state-of-the-art on long text summarization.

Authors: Sebastian Jaszczur, Aakanksha Chowdhery, Afroz Mohiuddin, Łukasz Kaiser, Wojciech Gajewski, Henryk Michalewski, Jonni Kanerva

Links:
TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick
YouTube: https://www.youtube.com/c/yannickilcher
Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF
BitChute: https://www.bitchute.com/channel/yannic-kilcher
LinkedIn: https://www.linkedin.com/in/ykilcher
BiliBili: https://space.bilibili.com/2017636191

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Universal Robots Eliminates Bottlenecks and Reduces Welding Time by Half

Universal Robots Eliminates Bottlenecks and Reduces Welding Time by Half

An animal that NEVER DIES?! #Shorts

An animal that NEVER DIES?! #Shorts

FALL TRY-ON CLOTHING HAUL 2021!! *tbh these are the cutest clothes i have ever bought*

FALL TRY-ON CLOTHING HAUL 2021!! tbh these are the cutest clothes i have ever bought

DAT 400 CALIBRATION WITH A PROOF WEIGHT

DAT 400 CALIBRATION WITH A PROOF WEIGHT

INTERSTELLAR for GREAT PIPE ORGAN — ABSOLUTELY EPIC!

INTERSTELLAR for GREAT PIPE ORGAN — ABSOLUTELY EPIC!

Mona's Chicano Soul Selections

Mona's Chicano Soul Selections

Group Brainstorming Sessions - Writing this post reminded

Group Brainstorming Sessions - Writing this post reminded

One Brilliant Move!! - But is it Enough? || Firouzja vs Howell || Grand Swiss (2021)

One Brilliant Move!! - But is it Enough? || Firouzja vs Howell || Grand Swiss (2021)

FORZA HORIZON 5 PRO DRIVING SINGAMS ON MEXICO

FORZA HORIZON 5 PRO DRIVING SINGAMS ON MEXICO

HUSTLANG Robber - LILYHAMMER (M/V)

HUSTLANG Robber - LILYHAMMER (M/V)

I Became A Totally Accurate Spider and Ate Everything in Webbed

I Became A Totally Accurate Spider and Ate Everything in Webbed

Ancient Philosophy Intro 20-1: The Science We Are Seeking by Katja Maria Vogt, Columbia University

Ancient Philosophy Intro 20-1: The Science We Are Seeking by Katja Maria Vogt, Columbia University

Dark Souls (11) Whole Crew Bout to Pull Up

Dark Souls (11) Whole Crew Bout to Pull Up

Things a sports commentator would never say - Mock the Week: 2017 - BBC Two

Things a sports commentator would never say - Mock the Week: 2017 - BBC Two

We had a really scary moment .VLOG#790

We had a really scary moment .VLOG#790

EBL AA Lithium Best Rechargeable Battery? Review

EBL AA Lithium Best Rechargeable Battery? Review

Writing fictional novels with the intention of guiding people or giving dawah? - Assim al hakeem

Writing fictional novels with the intention of guiding people or giving dawah? - Assim al hakeem

The Gold Standard, Explained

The Gold Standard, Explained

Try Not To Laugh - Best Funny Videos Can Make Your Day | LIFE AWESOME

Try Not To Laugh - Best Funny Videos Can Make Your Day | LIFE AWESOME

Warren Buffett: How Many Stocks Do You Need?

Warren Buffett: How Many Stocks Do You Need?

5 Undervalued Altcoins Ready to give BIG PROFIT. Don't Miss Again.

5 Undervalued Altcoins Ready to give BIG PROFIT. Don't Miss Again.

Affaires Sensibles : Jacques Médecin, « le parrain de Nice »

Affaires Sensibles : Jacques Médecin, « le parrain de Nice »

How to Live Without REGRET | Marisa Peer

How to Live Without REGRET | Marisa Peer

A Radical Solution to a Radical Equation

A Radical Solution to a Radical Equation

Tigo SMART Site for System Owners

Tigo SMART Site for System Owners

This Black Friday NordVPN Deal in 2021 is Cheapest Ever Offered! 💰💰

This Black Friday NordVPN Deal in 2021 is Cheapest Ever Offered! 💰💰

The ONLY MONEY Guide YOU WILL Need to Become Rich in Roblox Retail Tycoon 2

The ONLY MONEY Guide YOU WILL Need to Become Rich in Roblox Retail Tycoon 2

OUR DAILY ROUTINE WITH A TODDLER 💕

OUR DAILY ROUTINE WITH A TODDLER 💕

When Masks Go Wrong

When Masks Go Wrong

Very Easy Paperbag Belted Short Cutting and Sewing (With Side Pockets) | Tuğba İşler

Very Easy Paperbag Belted Short Cutting and Sewing (With Side Pockets) | Tuğba İşler

How do I live such a luxurious life in my 20's ? | My Secret | | Youtube Money ? | Business video|

How do I live such a luxurious life in my 20's ? | My Secret | | Youtube Money ? | Business video|

October an 'average' month for stocks: Albion Financial Group CIO

October an 'average' month for stocks: Albion Financial Group CIO

Destruction New BeamNG Drive Cars Live - Random Vehicles High Speed Jumps and Crashes

Destruction New BeamNG Drive Cars Live - Random Vehicles High Speed Jumps and Crashes

Figuring out - Koryin - 1 hour

Figuring out - Koryin - 1 hour

Mouse & Keyboard On Console vs PC: Which Is Better? (Fortnite)

Mouse & Keyboard On Console vs PC: Which Is Better? (Fortnite)

Sickick - Official SickMix Part 3

Sickick - Official SickMix Part 3

Re-examining Jack Steuf/ DB Cooper Book

Re-examining Jack Steuf/ DB Cooper Book

9 Price Action Strategies with Entries, Stops and Targets - Complete Strategies

9 Price Action Strategies with Entries, Stops and Targets - Complete Strategies

MR SEED - ONLY ONE ( DAWA YA BARIDI ) ft MASAUTI ( OFFICIAL MUSIC VIDEO).

MR SEED - ONLY ONE ( DAWA YA BARIDI ) ft MASAUTI ( OFFICIAL MUSIC VIDEO).

Bushcraft Chair 2.0 - the cosy DIY 5$ chair - adjustable, collapsable, transportable.

Bushcraft Chair 2.0 - the cosy DIY 5$ chair - adjustable, collapsable, transportable.

Manually Operated Capper for Pre-threaded closure

Manually Operated Capper for Pre-threaded closure

Portable

Portable

Επισκευαζοντας το tablet του Παύλου! Επισκευές υψηλού ρίσκου!

Επισκευαζοντας το tablet του Παύλου! Επισκευές υψηλού ρίσκου!

Christmas by the Fireside

Christmas by the Fireside

Practice Video Navigating and Selecting

Practice Video Navigating and Selecting

History 11 Literature Session 1 Then each day passes quickly How much do I understand an old mother

History 11 Literature Session 1 Then each day passes quickly How much do I understand an old mother

Our Lady—the Green Plant of Humility, Pliancy, & Health

Our Lady—the Green Plant of Humility, Pliancy, & Health

LIFE BEYOND 3: In Search of Giants. The hunt for intelligent alien life (4K)

LIFE BEYOND 3: In Search of Giants. The hunt for intelligent alien life (4K)

Key & Peele - Black Republicans

Key & Peele - Black Republicans