Wednesday, May 25, 2022

Intel Arc Gaming GPU Architecture, XeSS vs. DLSS & FSR, & Ray Tracing

intel today is officially detailing some of the architecture information for its upcoming gpus a few days ago it announced its arc gpu line that's with a c not intel arc with a k so it's a different one but now they're getting into some of the basics for architectural details of upcoming .

Products like the one called alchemist battle mage these are actual names druid uh and celestial and we'll talk about that more in a moment and this is sort of the first look as we approach a real launch for intel gpus in discrete cart there's the dg1 we already looked at that but this is going towards dg2 which will be the high performance gaming .

Cards that are actually what will compete ultimately with nvidia and amd on the market now before that this video is brought to you by squarespace we use squarespace for our own gn store and juggle complex multi-piece orders all the time with it squarespace makes it fast for us to roll out new products with detailed pages full of galleries .

Videos and descriptors it's also useful for your own resume sites for photographer or project portfolios or for starting your new small business idea there's never been a better time to try and start your new business then right now and we can vouch that squarespace makes it easy visit gamersnexus to get 10 .

Off your first purchase with squarespace in this news roundup we're going to be going through some of the initial architectural details we'll also be talking about intel xess which is maybe comparable to dlss in some ways so that's an intel technology that's coming out alongside its gpus it's supposed to .

Be somewhat universal similar to amd fsr in that sense and we'll also be going through naming conventions code names things like that this is stuff everyone's gonna need to learn as intel coverage ramps couple key points for today so intel's gpus hpg is the one that you all will be most interested in if you're in the diy enthusiast gaming .

Audience so those would be the code name previously at least dg2 gpus on actual discrete video cards as opposed to iris xelp for example uh or iris xe max or dg1 whatever existed previously these dg2 gpus will include things like ray tracing hardware uh they will come alongside xes so you're getting some of the sort of .

Modern day premiere features in a sense that both amd and nvidia now have although amd doesn't do the lss they have their own thing called fsr which runs on nvidia as well presumably will run on these intel gpus because they work on igps so they're starting to get a bit more cross platform in the non-nvidia camp which is now two-thirds .

Of the market as intel rolls towards production as opposed to one half so that'll be a big change and intel today outlined its goals very modest ones of being not only peak quality for images but also peak performance so going for both of the extremes uh the two things that you need to be good at they want to .

Be good at both of those and not compromise on one we can respect the desire but uh obviously we'll need to test the cards once they come out they're not here yet let's start with the naming conventions on the code names first off intel somewhat annoyingly has names it's gpu series arc arc we don't have a problem with the name arc it's .

Actually kind of a cool name but the name itself uh is one that intel has already used so they liked it so much that they're using it again if you google search intel arc arc google will correct you and say did you mean intel arc ark and all the top results will be for intel's extremely useful product .

Database called arc either way it's a different thing we'll get over this one it's it's at least not 10980xe or i71165g7 is that a real i don't know if that's a real product name or not at this point i think i made it up anyway uh arc is at least something we can say so within arc there are four known .

Revisions coming out and someone at intel has been playing dnd or maybe mtg forgotten realms recently because the code names for the four products are in order of release alchemist battle mage celestial and druid fortunately there is no necromancer .

Class named even though they're reviving their attempts at making gpus hopefully there doesn't have to be a necromancer class named although uh if that's all it takes to make a gpu come to life then we're all for it i guess at this point there's one key oversight this whole intel presentation and we're going to .

Actually nerd intel for a moment so alchemist battle mage and druid are all nouns clearly these are medieval fantasy classes of some kind celestial is an adjective and is not a noun and is therefore not a class you can be an alchemist or battle mage or druid you can't be a celestial unless you're something else .

After it celestial cleric or a tempest but celestial is ultimately a modifier some other thing some other species hippogriffs or elves or minotaurs or something of that nature not a class and that mostly recaps the chapter that was devoted to celestials in the dnd 3.5 edition .

Monster manual revision 1.1 so as you can see intel's choice of celestial is obviously out of line with the rest of these and is an outlier and hopefully this is something that intel will correct and it's errata when it launches the updated version of the intel monster manual that will be the guide for intel's upcoming gpus .

Let's move on to something decidedly less important and talk about architecture and product details intel has a few more names that we'll need to get familiar with as we get into gpu coverage as well so in addition to the product level code names that we just went over intel also has uh xehpg hpc hp and lp and all of these sets of two or .

Three characters are also important to know so intel xe hpc is specifically focused on high performance computing so it is a compute focused solution and focuses therefore on compute efficiency specifically intel xehp is for scalability this might be something more like data center for example intel lp is low power and this variation .

We've actually already tested in the form of the intel dg1 card that's the code name of it we have a whole video on that and that talks a little bit about the architecture as well we'll link that in the description below if you want to know more about what's behind the existing intel xe architecture that's already out on the market .

And then beyond lp there's also intel hpg that's the main one we're talking about that's high performance gaming and intel's slide here sort of shows arrows from xehpc hp and lp all leading into hpg the company's intent with this is to say that it has learned from the development .

Of each of these other products to then build the gaming product which is somewhat standard now we can get into some of the new information so xess is actually pretty interesting this is in some ways you can think about it as an analog to dlss somewhat related to fsr but it's closer to dlss from what we know today than and .

The fsr so xess is just intel xe and then super sampling that's all that means xcss is a software solution it reconstructs sub pixel data by interpolating information from the pixels surrounding whatever's getting filled in whatever's getting uh the xess treatment to it so .

We're working from neighboring pixels here to then interpolate data down to subpixel similar again to dlss canonically in the pipeline this would therefore happen before post-processing and it would happen after rasterization after lighting and it would also account for motion vectors .

And also some historical information so with this what we know is that intel is working with an ai researched or deep learned set of historical data or at least it can in order to use xes as the part that shows in this slide at least frame n and velocity is where we get the motion vector information .

And uh ultimately this is a bit different than amd fidelity fx super resolution because amd fsr unlike dlss doesn't account for motion and doesn't use deep learning in some ways this is good it makes amd fsr a little bit more flexible easier to more rapidly deploy doesn't require any sort of training but it does limit the extent to which .

Image quality can be improved with fsr you get into maybe a debate over good enough versus better than good enough how much does that matter depends kind of on what you're doing however intel had for whatever it's worth which is not a lot here some side-by-side demos in its presentation these are obviously completely useless .

Right now because it's basically an exercise in futility this was a demonstration done over the internet even in person in the same room as the presentation we can never get any value out of these because through a compressed feed and captured and re-rendered at least once or twice not in real time we really have no idea .

Of what either image is supposed to look like to the original human player the demo however is running on intel alchemist which is the most immediate hpg launch it uses xmx hardware acceleration that is an intel acceleration built on the intel xe matrix engines you are going to be hearing that a bit more .

As these cards come out and ultimately this is something that quality is something we'll have to evaluate once it all launches intel xess uses the ai assistance to help improve performance just like dlss does and it does so by rendering at a lower native resolution and then bringing it up to a higher effective resolution intel claims quote .

Up to two times performance boost end quote with the xcss solution and it made a statement about lower end cards using xcss to render at an effective 4k the company then stated that the gpu can quote run smoothly at 4k just to be really clear here it's not technically 4k it's not actually 4k .

They're the gpu whatever they're using if it can run smoothly for whatever you know that's supposed to mean but we get the intent it's with a better frame time consistency if it can run smoothly at the effective resolution sure that's fine but presumably you couldn't run it .

Smoothly at the actual native 4k resolution otherwise you wouldn't be using the technology so just want to be really clear on that with everyone make sure that we're all on the same page you're not really supposed to get better than native quality out of these solutions sometimes it can certainly be argued .

Cyberpunk had a few scenarios where we saw it with the lss and that's mostly a result of how the image was built originally but that's not the intent of them they're just supposed to help you get better performance out of a lower native resolution in actuality so the word effective is key and intel .

Dropped there now intel explicitly mentioned cross compatibility with the rest of the industry this is really important especially because intel is not in first place so nvidia can get away with doing more proprietary stuff they what else are you going to do you either have to you either need nvidia solutions .

Or you don't and if you need them they're the only one who builds it so it kind of makes sense amd leans a little bit more towards being somewhat open fsr kind of is in this camp uh freesync for example and then intel it looks like is posturing to follow amd's lead here and also be somewhat open they both have to do this .

But intel in particular has to do this because they'll never get traction with anything if it's completely proprietary with a market share of zero percent that's where they're starting it's not amd where they've got some percent it's it's actually zero so this makes sense it's good to .

See that intel is aware of its position uh and that it is enabling cross compatibility how good that'll be we're not sure but xcss was stated as having quote super sampling support for a wide range of hardware across the industry mentioning competitors meaning nvidia and amd and that includes the neural network element of xcss what we're .

Saying we're assuming that's maybe a reference to fsr but we're not positive intel xcss will also work on intel igps that exist today we assume they would work on apus as a result and it's also dp4a enabled intel slides are mostly useless marketing slides because they don't have an axis you know that like the kind that .

Normally has numbers and a scale but they give us at least something to maybe interpret which is that from these slides intel seems to expect xess with xmx acceleration 3ds frame render time from native 4k to interpolated 4k by about half the original time the other xes .

Slide shows dp4a and xmx jointly with dp4a taking an indeterminate amount of time more based on the numberless axis intel says to expect xess via sdk later this month and says that the dp4a support will be available by end of year so then we get back to some of the open source stuff intel made big .

Claims here saying intel we believe in open source standards which is something that everybody says and few actually reinforce over time with actions uh in fact our raw notes we wrote during the presentation just said yeah okay after that quote but the company's posturing at least .

To maybe go this direction without css the big thing to remember is uh supporting open source standards doesn't mean making your code open source maybe they will maybe they won't ultimately the part that actually matters is how easy they make it for developers to actually implement and work with .

Open sourcing the code may not be necessary and we don't really begrudge companies for not wanting to open source code especially if they're worried about competitors but depending on how easy it is to work with that might be more necessary than .

Than it otherwise would be so intel plans to get the sdks out there for everybody and the tools out there for everybody uh and it's going to need to do that but let's move into some of the architecture stuff so there's a little bit of architecture we can get into here our intel dg one piece talks about some of .

The basics again with xelp and iris xe max products things like that uh we will link that below if you want to check it out but some more information today first up intel intends to ditch the naming of eu's going forward at least for this product line and it stated quote we defined a new compute building block which serves as a foundation for .

Xe architecture as part of this we're updating naming we won't be talking about execution units or eu's much anymore eu's are getting too large to reason about and generational changes make it tough to compare this all makes sense it's pretty basic stuff in the sense that generationally the building blocks of a .

Gpu are not the same as they used to be even if the naming stays the same so you look at the structure of an eu and it's kind of like what nvidia really maybe more famously dealt with going from maxwell to pascal where the efficiency as in the performance per watt improved so much with pascal as they moved .

Further and further away from the 400 series that one sm really didn't equal 1sm anymore and that changes every generation but it was particularly noticeable in this generation amd goes through this as well equal assignment cu counts across architectures doesn't necessarily well .

It absolutely doesn't give you the full picture it doesn't necessarily give you a picture of performance but if you have enough numbers you can start kind of filling it in and figuring it out working your way through it just by sam or cu count though and with nothing else cross architecturally it's not very useful and the same is going to .

Be true for eu's that's all intel saying here and that's why they're ditching the eu naming they've instead resorted to a a new architectural building block name which is simply xe core same sort of basic idea same container on a block diagram new name xe cores will be used in xe hpg gpus that's going to include dg2 and everything coming to the gaming .

Market that our audience will care about each xe core includes arithmetic units uh in the form of 16 vector engines there are also 16 matrix engines this is where xmx acceleration comes from that would be the matrix side of things those are 256 bit wide for the vector engines and 1024 bit for matrix engines and then xecor also includes the usual .

Load and store logic and cache like l1 for example the new architecture is ultimately meant to build larger gpus than what xelp currently supports clearly which is a maximum of six xe cores in xelp the rest of the terminology is familiar to our previous dg1 coverage though so the rendering slice that word comes back is sort of .

The top level unit similar to maybe a gpc or something in concept and it contains a maximum of four xe cores which contain the vectors the matrix engines the load and store units and the cache that we just talked about the render slice also contains fixed function hardware that would include things like samplers geometry processors .

And the rasterizer alongside the ray tracing hardware the rt hardware supports the existing functions that we would expect you've encountered these words if you've watched previous architecture pieces for amd or nvidia those would be bounding box intersection triangle intersection and ray traversal this all then can get scaled up similar .

To any other gpu where the highest ish level building block a render slice for intel is multiplied x number of times up to a maximum count for larger silicon so these are then attached to the shared l2 cache and that is done via what intel calls a high bandwidth memory fabric intel hpg can support up to eight slices this would be a maximum of 32 xe cores .

We're not sure which number intel will decide to market eventually it could say 32 core gpu it could say 512 vectors or 10-24 arithmetic units whatever it is it's going to be the number that is the highest of course that's how marketing works so we'll see what they end up using as sort of their their standard marketing number for cores or .

Stream processors as an equivalent but uh xe cores would be maybe more similar to saying sms or cus or something as opposed to cuda cores the xe core also changes between the gpus and this is also not anything new nvidia does or has done the same thing an xe core for hpg is what we just described but an xe core for hpc the .

High performance computing option would instead run eight vector and eight matrix units and it would also run 5 12 bit vector engines 4096 bit matrix engines and l1 cache balloons to 512 kilobytes on these which was specifically done to help with those larger data sets you're going to encounter with hpc type work on the .

Intel xe hpc gpus each vector engine will be able to run 256 fp32 or fp64 operations per clock or they've got listed 512 fp16 operations per clock to sort of elaborate here fp16 or floating point 16 is more useful where you don't need as much precision something like that might be deep learning where you you're maybe .

Processing say millions or a billion images or something like that it doesn't matter quite as much as you get something wrong because you have so much other data to back it up fp64 would be double precision uh that would be more useful for specific scientific simulations where you can't be wrong because bad things would happen if you .

Are so if you're simulating say the design of an aircraft you might want additional precision it all depends on the the software you're using the data set you're using and what goes into doing that processing so if it's a scientific data set that is instead processing with .

Millions billions of samples you do need a bit more speed instead of the precision and it all depends on what you're doing now for the gaming audience none of this really matters most of gaming is still heavy fp32 and every now and then you'll get some integer and some fp16 depending on if you're using .

Any of the software that nvidia named especially they've put out in the past for the rest xmxs can do 2048 tf32 4096 fp16 or bf 16 operations per clock and up to 81.92 into 8 operations so hpc cards will still contain rt hardware they will have 16 rt units per hpc slice and each slice contains 16 x e chords maximally .

For eight megabytes of l1 maximal intel spent some time as well talking about power efficiency and pushing the frequency higher so these are things where we don't really have hard numbers for what intel's talking about right now i just sort of said this is stuff that we focused on and here's some sort of vague scaling .

Numbers those vague scaling numbers for what they're worth you can kind of multiply from them but they claimed a 1.5 x increase in performance per watt over x e l p uh discrete and that would be the dg1 we looked at uh and the same for frequency 1.5x so yeah i could calculate the the .

Frequency especially from dg1 but we'd rather just wait until it comes out and see what it says as for the rest intel noted it specifically worked on process technology improvements with the researchers at tsmc tsmc being the fabricator here is the one that is ultimately responsible for a lot of the .

Details of the silicon uh and intel's using tsmc's n6 process which tsmc says is 18 higher transistor density than its existing sort of flagship seven nanometer process so n6 is an advancement of its existing seven nanometer node which would be n7 the main one this is much different actually than the intel .

Cpu situation right now where intel for tie-in cpus is still making them or at least most of the parts of the platform at this point some of the chipsets have gone off elsewhere but because intel is not making uh the actual wafer and the silicon for this ultimately tsmc .

Is largely responsible for the advancements in power efficiency intel contributes to this it's not like they're working in a vacuum but the point of bringing this up is that there's sort of a meme in the industry especially in the diy industry of intel is high power consumption runs overly hot things like .

That it's kind of flipped from what it used to be and while that meme you can somewhat generously apply it to intel cpu products right now and likely be accurate for gps we really we don't know and it's not fair to apply the meme to the same .

Place there until we see something come out because intel's not making it and it's based on what tsmc does ultimately and uh sometimes that stuff runs hot depends on what their partner packs into the silicon the design and and obviously the cooling solution but uh we'll see how it looks when it comes .

Out we'd have a bit more hope for it though than if it were made with intel's existing technology so that's going to be it for the intel gpu side of news right now there was a little bit more that intel spoke about most of it was getting into really far on the data center side of things and enterprise .

Side it is genuinely interesting and we would like to talk about it but we'll probably split that out in a separate piece maybe in a hardware news roundup because that's getting pretty far away from anything consumer very interesting though mostly just to give a quick recap they talk about bridging the gpus and xe link basically you're looking at using .

Emib to have multiple stacks of xe gpus pretty cool stuff but say that for another time that's it though for xe hpg some of the basics on hpc and we'll be doing a lot more with this stuff the biggest thing here that everybody needs to keep in mind especially on the review side but viewers as well is .

The nomenclature in naming is completely different than anything we've really covered before intel dg1 was an entry to that but all the terminology is different no one's really used intel xe naming as it exists right now for this upcoming set of consumer gaming cards so there's going to be a learning .

Curve and we'll be publishing some additional architecture pieces as we ramp into the intel dg series coverage because we're all going to need to be on the same page it's not as easy as just saying cu or sm and assuming most people probably have an idea of what that is uh so we'll be publishing more on this you can .

Subscribe for all of that as it comes out go to or directly and check back for the intel ultra lake architecture piece as well which we are also working on thanks for watching we'll see you all next time


Most Popular