What is the basic of SG Procedure for Data visualization?

In this program, I have a PROC means step highlighted. This step is using the data set of SAS Help period baseball. We are grouping the data by team, doing an analysis on the number of runs and the number of home runs, summing up those two numeric columns. This PROC means step produces this report. What I see is a table. The first column shows me the team name. In the far right column, I can see the sum of two numbers. The first number represents the number of runs.

The second number represents the number of home runs. I want to know which team did the best or the opposite. Which one didn’t do so well? As I scroll this table, it’s a little hard to see who did the best. This is a perfect example. Instead of creating a summary table, let’s represent the summary in a graph. Here, I have a PROC SGPLOT highlighted. Again, with the SAS Data Set, SAS Help period baseball, I have to VBAR statements. That stands for the vertical bar chart. I want to create a bar chart for the team. Each team will be a different bar. The first VBAR statement is going to show me the summary of the number of runs for that team. In addition, I’m specifying a data label to be at the end of the bar.

In the second VBAR team statement, I’m specifying that I want to see the total number of home runs. In addition, I’m specifying a bar width of 0.4, which is going to give me a skinnier bar, and I’m also adding that data label option. In addition to the PROC SGPLOT step, we’ve started it with an ODS Graphics statement. The first ODS Graphic statement is specifying that we want to graph to be a width of 10 inches, and our height to be 4 inches. At the end of the step, we have specified resetting those ODS Graphics options. Here is the result of the PROC SGPLOT step. I see my graph. I see blue bars, and I see red bars. The blue bars represent the total number of runs for that team. The red bars represent the total number of home runs for the team.

It’s very easy now to see who did the best. If we look at the blue bars, New York has the most runs at 1,459. Second place would go to Chicago at 1,170. If we look at which team has the lowest number of runs, that’s going to go to Atlanta. As far as home runs go, no surprise, New York and Chicago are at the top of the list. But if we look at which team has the fewest home runs, that’s going to go to St. Louis. The purpose of working through this scenario was to give you motivation of why to do data visualization. At times, it is just easier to understand your data if it is in a graph. Now that we’ve seen an example of representing your data in graphical form.

I like to show people how with a little code, you can get a graph. And with a little persistence, you can get that graph looking the way you want. Today’s presentation is about SG Procedures, which is part of ODS Graphics. Before I say anything more about SG Procedures, I want to give a little background about creating graphs in SAS. You have options. For example, if you use Enterprise Guide or SAS Studio, you can use point and click tasks to create graphs. Another option, you can use SAS Visual Analytics and Interactive Application for reporting, data exploration, and analytics. But maybe you don’t want to do point and click. Maybe you are a SAS programmer, and you want to write code.

If that is the case, you really have two options. SAS/GRAPH, or ODS Graphics. SAS/GRAPH has been around a long time, as long as I have been using SAS. SAS/GRAPH is considered device-based, and you must have a license for that product. You are using SAS/GRAPH if you use a graphing procedure that starts with the letter G, Such as PROC CHART, or GPLOT, or GMAP, or REPLAY, or so on. ODS Graphics is newer but has been around since SAS 9.2.
All you need is based on SAS to use ODS Graphics. The technique is template-based, meaning behind the scenes, the graph template language is being used. You are using ODS Graphics if you use a graphing procedure that starts with the letter SG, such as PROC SGPLOT like we looked at in the last demo. If you are new to writing graph code, I strongly suggest you use ODS Graphics, due to the straightforward nature of the concepts and syntax. For the remainder of this presentation, I want to focus on the five SG Procedures used for creating specific graph types.

The procedures that I will talk about today are SGPLOT, SGPANEL, SGSCATTER, SGPIE, and SGMAP. To cover these procedures, I will go to my SAS Studio Session, and I will submit the PROC steps. I will share with you the purpose of the procedures and the syntax that is needed. As I demonstrate the procedures, realize I will be using SAS Help Data Sets. What that means, is any of the syntaxes I show you, you can enter into your SAS Session, whether that is SAS Studio, Enterprise Guide, or Windowing Environment. Feel free to pause the recording as we go along to give yourself plenty of time to practice.

I am back in my SAS Studio Session, and I want to start with us looking at the syntax of PROC SGPLOT in more detail. SGPLOT is the most common of the procedures. SGPLOT is used to create single-cell graphs. Within one cell, you can create one plot, or you can overlay numerous plots. Let’s look at how simple it is to create a graph. I need a PROC statement, and I need a run statement. Between these two statements, at a minimum, I need one plot statement. I am going to add a VBAR statement for the team. Running that simple PROC step produced this bar chart. I see a bar for every team. That bar represents the number of players for that team. I can always add syntax to a plot statement. In this case, I’ve added a forward slash to specify options to follow.

What I want, instead of a frequency of the number of players, I want to sum up the number of runs. Notice the difference. When you look at the y-axis, you see the runs for 1986 versus a frequency. I’ve added more options. For the full of my bars, I want the fill attributes to have a color of Dodger blue. I do not want there to be an outline around each bar. I want there to be a data label at the end of the bars. That data label will represent the sum of the number of runs, and then for the data label attributes, I want the weight to be bold. It’s obvious to see the change. I got Dodger blue for the color of the bars, no black outline around the bars, and at the end of each bar, I see a data label. That data label represents the total number of runs for that team. That number is in bold. I have added a second VBAR statement.

VBAR statement will overlay the initial VBAR statement. For this one, we are representing bars for the number of home runs. The fill color of these bars is going to be sandy brown. If you wonder what colors you can specify, you can always search for web colors, and web colors are one possible method for specifying colors. The bar width equals 0.4. This is a number between 0 and 1. The closer to 0, the skinnier the bar. Again, no outline, data label, and data label adders with a weight of bold, but the color of the text is going to be white.

Leave a Comment