What are the statistics you can use in PROC MEANS?

In the PROC MEANS, I’m asking for certain statistics, MIN MEAN MAX, on total retail, broken down by customer ID. Then we have a filter, where the year of the order date is 2010. Now, the year function will extract the 4-digit year. Now, keep in mind order_fact is in the Oracle database. Let’s see if the Oracle database can handle this query against that table. If we have the CASTRATE option on, we will get notes in the log indicating what happened. And as you can see highlighted in this sort of blue color, it was able to transition this syntax in such a way that the database can handle it. So the Oracle database has an equivalent function that it can use to extract a year from the order date and compare it to 2010. So it was successful. The year function has an equivalent Oracle function.

Let’s try to weekday function. The reason I have date part, because that extracts the SAS date from a date time column. The date part is the number of days since January 1, 1960. Most of us know that story. And then we’ll use weekday to return a number between 1 and 7. 1 is Sunday, the beginning of the week. 7 is Saturday. And if the weekday is 7, I want those records to be processed to analyze total retail by customer ID. So this time in the log, because we have CASTRATE on– I haven’t shown you the syntax yet. We will get to that, but there’s a warning. The WHERE clause contains SAS-specific syntax that cannot be passed to the database. That’s good. I would want to know that. Because I may be able to change the syntax such that the database can handle it, can make it more efficient. This is good to know. These are different functions, or capabilities I should say, that the database, the Oracle engine, supports. Functions like COUNT and MIN, the LOG function, FOR function, CELL function, average, standard deviation, UP CASE function. So the Oracle database can handle these functions, or it has equivalents on its side. So oftentimes, the same result can be produced in different ways. But our goal is trying to get the database to do the work for us instead of passing it to the SAS side. In most instances, it will be better for the database to handle it efficiency-wise.

All right, especially when you’re writing a WHERE, because a WHERE is filtering records coming in. So if you go against the database table, the WHERE is applied, it will control how much data is passed on to SAS. If it cannot handle the WHERE clause, then all of that data comes out in SAS, and SAS will filter it on its side. So let’s think about this. Selecting Supplier_Name, Count. Then the star is just going to count the records. AS Products FROM the product_dim Oracle table, and I’m filtering. From this table, product_dim, I don’t want all of the data. I want to filter only for the records where the supplier name contains the word “Sports.” Now, if you haven’t used scan, scans can scan the value of a column from left or right. If you use a non-negative, an absolute value number, it will search from left to right. If you put a negative number, it will search from right to left. Well, all I want to check is the last word in the string, is the last word “Sports.” So I do a scan of supplier_name, negative 1.

Well, how do you know it’s returning the last word? Because it’s looking for a delimiter. There are default delimiters if you do not explicitly specify one, like a space, a period a comma. Those are default delimiters. But it works. We’re checking the last word to see if it’s equal to “Sports,” and we do a GROUP BY, because we’re doing a count, and order the Products in descending order. Well, let’s verify if SCAN was translated to an equivalent function in the database. Well, we get the result. Check the notes. Unable, OK, unable to convert the query to a DBMS-specific SQL statement due to an error. OK, well, it’s basically saying it wasn’t able to do it. SAS will do the processing. So we get the supplier name from here, but the WHERE was not applied, which means all that data from the table has to be passed to SAS. All right, just an idea exchange. How could this WHERE be rewritten to improve the performance of the query? Remember, we have scan supplier_name getting the first word from right to left, looking for “Sports.” Hm, what about the LIKE operator? Some of you have used the LIKE operator. So we will go with LIKE because it can be translated to an equivalent operator for the database to handle, where the supplier_name is like– if you have not used LIKE before, you can type the text you want it to search for. It’s looking for a pattern, you can use a percent symbol or underscore to indicate characters. The percent symbol means any number of characters. Underscore means any single character.

The way this works, I’m asking for any characters, a space, and the word “Sports.” So “Sports” is at the end of the string. All these characters precede it. All right, we get the same result. Check the log. All right, compare, take a look. Here it is. That was submitted to the Oracle database to process that table, where SUPPLIER_NAME LIKE percent sports. So it was able to use that function to have the database performed that task. So here’s the setup poll. Where would this where be handled by the database to filter against the Order_Fact Oracle table? If you think about it, I would imagine that most databases would support an equal comparison. So the answer is true. Let’s move on to 3.2 and talk about data set options with implicit pass-through. We will use some of the data set options and use SAS/ACCESS data set options to a regular SAS data set and options with an Oracle table. Some of you have seen the KEEP equal or DROP equal. You don’t use both. You keep what you want or drop what you don’t want in terms of columns. It can improve performance at times.

If you have 397 columns in a table, and you only need five, why read all of those other columns? You can keep the five you want. That can improve efficiency. Let’s take a look at the DROP equal option. You either drop what you don’t want or keep what you want. The asterisk denotes in SQL that I want all of my columns from the product dimension Oracle table. But next to it, I’m using a SAS instant option, DROP equals supplier’s colon. The colon indicates every column that begins with the word supplier. Supplier ID, supplier country. Where the product_name LIKE. And here’s the LIKE operator. And we already confirmed that the LIKE operator can be converted to an equivalent on the database side, right– or used on the database. And what about DROP? Let’s see if any columns begin with the word supplier. Down below, the answer is no, so apparently, it worked. Instead of the star and then a drop, you can be explicit and tell it which columns you want. You don’t put a comma after the last column, but that list requires more typing. This compared to this, you get the same result, and there’s no guessing the database will certainly handle this.

I’m going to bring in sum, this program, 03a02, and submit it, check the log, and see what effect, if any, the DROP equal data set option produced in the DBMS SQL query. I’m going to transition to my virtual section, a02.

Leave a Comment