From 102341d7ae8793c29d44fa416d3b5b797d1eca3e Mon Sep 17 00:00:00 2001 From: Clay Smith Date: Tue, 1 Aug 2023 01:09:09 -0500 Subject: First commit --- file_operations.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 file_operations.c (limited to 'file_operations.c') diff --git a/file_operations.c b/file_operations.c new file mode 100644 index 0000000..b55dc32 --- /dev/null +++ b/file_operations.c @@ -0,0 +1,101 @@ +//FILE (talk about opaque and abstract data in a seperate section). +//fopen options: r w a, r+ w+ a+, rb wb ab, rb+ ab+ wb+: +// Echo of those options determines 5 different things you can do with the file +// b stands for binary which means that the file it will be dealing with will be read and/or written using binary meaning that when you use and read numbers you are not reading 0-9 as in ASCII 48-57 '0' - '9', but rather are reading ASCII 0 - 9. The purpose of binary files is that they can be used to store large structures on disk rather than memory, IO with binary files is faster because there is no conversion to binary required as it is already in binary. However binary can be more tricky because you need to know how to read the ones and zeros and determine which ones belong in a group and which ones do not. +// when using the b option with the + option the + and the b can be in either order such as r+b or rb+ etc. +// + /* ``r'' Open text file for reading. The stream is positioned at the */ + /* beginning of the file. */ + /* ``r+'' Open for reading and writing. The stream is positioned at the */ + /* beginning of the file. */ + /* ``w'' Truncate file to zero length or create text file for writing. */ + /* The stream is positioned at the beginning of the file. */ + /* ``w+'' Open for reading and writing. The file is created if it does not */ + /* exist, otherwise it is truncated. The stream is positioned at */ + /* the beginning of the file. */ + /* ``a'' Open for writing. The file is created if it does not exist. The */ + /* stream is positioned at the end of the file. Subsequent writes */ + /* to the file will always end up at the then current end of file, */ + /* irrespective of any intervening fseek(3) or similar. */ + /* ``a+'' Open for reading and writing. The file is created if it does not */ + /* exist. The stream is positioned at the end of the file. Subse- */ + /* quent writes to the file will always end up at the then current */ + /* end of file, irrespective of any intervening fseek(3) or similar. */ +// FILE is the data structure that is returned by the fopen function, this is an abstract or opaque data structure that you are NOT SUPPOSED to know what it is as they are trying to abstract that away from you. The reason they do this is because how operating systems handle files differs drastically and thus using this opaque/abstract data type allows you to port this code much easier than if you realied on the structure and its specific makeup itself. Just note that for Unix it is usually implemented as a structure and in glibc there are literal warnings in the source code that says please do not use the structure directly and instead just refer to it via pointers which is just abstracting away from it even more.
+// The value stored in our FILE pointer and the value returned from fopen() function in the C standard library and the POSIX open() syscall is often reffered to as a file discrypter or file handler depending on whom you are talking about, this nomeneclature is reffering to the general data returned that we use inadvertently when we pass our FILE pointer to any of the soon to be mentioned file handling/operating functions that are provded to us.
+// This is why we use a FILE * variable in the assignment of fopen() +// Files themselves are stored in storage/disk of the computer, but when we open it, we create a copy and store it in RAM/memory in what is reffered to as a buffer. A buffer is proxy to a file, changes to the buffer are NOT applied to the file directly until we finish operating on the buffer and it has time to save to the file. Most of the underlyings are handled by the host operating system and we do not have to worry much about them. +// there are three special file discriptors on Unix platforms (Linux, BSD, MacOS), they are stdin, stdout, and stderr. stdin or Standard Input as its often reffered to is a buffer that takes input FROM a source, often the users keyboard, but could be from another process that writes to the stdin file discriptor, stdin stores this input in the buffer and waits until it is flushed or taken from another process or function. When you have used scanf() and then typed into the terminal when you are first learning to program in C, your input is first stored in stdin and then when you press the enter/return key (carriage return, newline, form feed) the input is flushed from the stdin buffer into the scanf function, which then probably stores it in some variable that you specified. +// stdout is the name of the second important file discriptor which stands for Standard Output, this is by default the stuff that you print to the terminal/console/screen such as with the printf() function. Again it is stored in the buffer first and then is flushed to the printf() function and to the screen. This is why sometimes your printf() function calls do not always print immediatly when you want them to because there is a buffer holding it in the intermediary that has to be flushed before it prints, this is where sometimes you have to manually change the buffering which we will talk about shortly +// stderr is the third and final of the special file discriptors and it is very similar to stdout, in that it prints to the terminal/console/screen by default. stderr stands for Standard Error and is used by developers/coders/programmers specifically for the purpose of storing error information, ther reason it is seperated from stdout even though by default they both print to the terminal/console/screen is so that if you wanted to redirect just one or the other so that they do not both clutter your screen this is much easier to do, on unix you can pipe and redirect output, error, and input away from the default areas. +// input is by default the users keyboard, but can be redirected so that it takes the output of another program/process/thread/function and uses its output as stdin, in most shells that is what the | is doing, it is redirecting stdout into the stdin of something. The > operator in bash and many shells is used to pipe stdout to a file specifically rather than to another process/program. > is the same as &1> +// stdout is as already mentioned piped to the terminal screen by default but this can be changed to be piped into something else using the | to pipe, or > to write or the >> to append to a file +// stderr is reffered to similarly as the second file discriptor and can change where it outputs to via &2> or have it append somewhere using &2>> +// stdin can be redirected into a process using the < operator on most linux systems (all of these will be covered in more detail in my bash guide whenever I make that) +// As mentioned earlier that input and output is buffered before it is actually sent to the function that then reads the data and uses it in your program. This may seem like a nuisance, like why cant it just be fed directly into the program without a middle man you might ask? Well, the reason for this is actually quite complicated, but generally comes down to it actually improves performace and prevents data loss. +// To remove buffering you can use either the setbuf() function or the setvbuf() function, both of which must be called before you actually use the buffer you want to modify. setbuf takes two arguments the first being the stream that you want to modify its buffering and the second being the size you want to buffer, honestly I have personally only gotten turning off buffering with this function by putting a 0 or NULL as the second argument to work. I have not got the function to do anything else. setvbuf() function on the other hand takes four arguments/paramters as compared to setbuf()'s two. The first is the file stream for the stream you want to modify its buffer, the second is a pointer to a character string or null to be used as the new buffer, the third is a special argument which takes one of three arguments _IOFBF, _IOLBF and _IONBF which stands for full buffering, line buffering and no buffering respectfully, the final argument is a size argument which is a backup for if the second argument passed in is NULL/0 then this argument will change the current buffering of this file stream to the size specified here rather then change the actual buffer that the file stream uses. +// Finally the fflush() function is used to flush/clear a stream. This is often used with printf to force it to print to termianl even if it has not reached its desired buffer length in order to trigger it to do so +// The next file operation I want to talk about is moving around the file, just like when you are writing in a file on your computer in every other way, there is a concept of a cursor-like object in C which is where you are currently at in the file you are reading and/or writing and when you read you read from the point of the cursor or write from it. However you can change where your currently at in the file as well as figure out where you are at in the file with a couple of differnent tools which we will discuss now that are all provided to us in the in the stdio.h header. +// The tools for moving around the file are: fseek(), rewind(), fsetpos(), fgetpos() +// fseek() is a function that you use to move the cursor around, it takes three arguments. The first argument is the stream in which you want to modify its cursor position. The second argument is how far away from the third argument you want to move the cursor to. The thrird argument is for one of three constants being SEEK_SET, SEEK_CUR, and SEEK_END. These three constants specifiy the very beggining of the stream, the current cursor location in the stream and the end of the stream. Thus the second argument is always relative to one of these three constants. So if we wanted to change the cursor to the 5th byte in a file we have to specifiy SEEK_SET in the thrid argument and 5 in the second argument which combined means "Five bytes past the begining of the file". The function returns 0 if it was successful and a non-zero value if an error occured in moving the cursor location. This function is used in conjunction with another function namely ftell(). These functions also have compatibility problems when on some OS newline characters are actually two characters rathr than just one which messes with the ftell and count of bytes. +// ftell() is a simple function that takes a single argument being the stream that you are working with and it simply returns the current position of the cursor in the file. This is used in conjunction with fseek() to move around the file. It should also be noted that there is a major downside to using these functions and constants to move around a file in that it becomes increasingly harder (near impossible??) to use some of these tools when working with files that are larger than 2GB in size. The reason for this being that ftell returns a long value which can only store so much. fseek with still work, but you can no longer use past the 2gb size the ftell function in conjunction and offset from it, you will have to have another way of keeping track of where the cursor is. If you are working with large files figure out how to work around these issues with these functions or the ones I am about to talk about. Another downside to the fseek and ftell functions is there lack of portability due to the fact that they are not great with wide characters and only recognize ASCII characters. This means they are not great for porting to other programs and machines that make great use of widechar, but for most simple programs this is not an issue. +// fgetpos() and fsetpos() are functions that make use of the fpos_t type, the main benefit of these is that they can work with files greater than 2GB, however they suck about moving around the file as easily. +// finally now that we have covered creating and mvoing around files and a lot of other important details about file operations it is now for the most important? fun? interesting? parts of file operations which is reading and writing your own data. This is pretty tricky in C and can be potentially dangerous. Please read other sources than mine as this website is just a hobby and should not be your only source for C knowledge and you should practice all of the following, preferably in a controlled in safe enviornment to see for your self how all of these tools work to prevent you from writing all over and destroying any of your data. Writing and reading files in C is no easy thing and I as someone who loves the C programming language still feel the need to warn you the C does not make this already difficult subject and dangerous operation that much easier for you. YOU HAVE BEEN WARNED! +// lets first talk about reading from a file, the functions that pertain to reading from files in C are: fgets(), fscanf(), fgetc(), fread, fwide +// EOF is a specific character in a file which refers to the END OF FILE, it is not an ascii character and is represented using an int whcih is why some functions return an int rather than a char when they are dealing with characters, often times when reading until the end of a file you check if the character EOF has been foudn in a conditional statement, sometimes you will see the eof() function use in a loop, but this function should be used in caution because it does not tell you if the character is the EOF, but rather if the last character read previously was EOF. Thus eof() returns true (non-zero) if the EOF has been reached. Use this with caustion and maybe just check each attempt to grab a character for success or failure and if they were EOF instead. +// files for reading from a file or buffer are prefixed with an f, such as fscanf, fgets, fgetc, fread. +// fgets(), fscanf(), fread() are the three most used to read from strings but all vary in multiple ways, firstly they can all be told to read up to a specific number of characters, but what they do with this information differs. fscanf which is the only one where telling it how much to read is optional reads that many characters from the input source regardless if the number passed to it correspsonds correctly to the number of characters that can be stored in the buffer/array passed to it. This means that it will keep reading information passed to it and overwrite the contents of the memory after the array is filled, this can lead to several problems including overwriting any variables that are stored after the array, stack overflow if you run out of available stack space with information that you write and segmentation faults meaning that you are writing to memory you do not have control over. All of these things can be minimized by prepending a number that indicates the size of the array/buffer that you are storing the string into between the % and the 's' such as %10s which will store 10 characters in the array that it points to but no more. However, here is where things must be taken into account, if you type in 11 characters two major problems can still happen. Firstly if your array is only 10 bytes long, there will be no room to add a null-terminator which is how all strings are stored in C and thus it will not be a string and trying to print the string can have major problems including the ones listed before since printf() has no way of knowing when the string ends. Secondly all of the text from the stream that scanf is reading from will be left on the stream and thus further reads from this input stream will still have these characters waiting meaning they will be added first before any other input from the stream. Thus if you do scanf("%5s",foo); followed by another scanf("%6s", bar); and you type in "Clay is my first name" for the first scanf the second scanf will not ask the user for more input and will automatically be populated with "Otis S". Another thing to keep in mind using scanf is that while a null terminated 0 will be added to the end of the string automatically by default, this will only be appended to the end of the string if there is room for the null terminator after the rest of the string. As scanf fills the array without error checking and will not aid you if you fill up the space. Scanf also differs as it reads until a whitespace characer if used with %s or until a character if using a scanset [^] however it will not store the newline character. Remember that everything that scanf does not read is left in the stream and thus future calls will read those things first. fgets() does read and store newline characters and unlike scanf does not stop at white space. fgets requires you to tell it how much to read and it will reead one less than that number and add a null terminator in the end. +// fscanf allows input limits, but doesnt require it, both fgets and fread require max input. None of them enforce that the limit is actually less than or equal to the size of the buffer they are storing too and thus allow buffer overflows and stack overflows which can write over other data. scanf and fgets both append a null terminator to the string although fgets will do so even if the number/limit passed into the function is less than or equal to size of the buffer. scanf limit passed to the function must be at least one characeter less than the size of the buffer for it to add the null terrminator. Meaning that fgets will replace the last character with a null terminator. fscanf adds a null terminaotr by default but will not do so if the input and the size specified in the function call is greater than or equal to the size of the buffer. fread does not append a null terminator and thus one needs to be added later. fgets reads until the limit specified in the function call OR until it reaches a newline character or EOF character whichever comes first. It does store any whitespace and newline characters it comes across as long as they are read before its limit. fread also stores any whitespace or newline character, but will continue past it until it reaches the limit specified in the function call. If the scanf is using %s it will read until: whitespace or a newline character is read, until the input length is equal to the optional length such as %10s will read up to 10 characters from the stream. fgets and fread are also useful as tehy read any text up to a set number of bytes or the conditions mentioned before, but scanf must know ahead of time what the text it will be reading will be formatted like. scanf is for FORMATTED text and thus if you want it to read letter and whitespace and everything you have to be very clear what you do and do not want it to read using format specifiers.
+//fputs, IOCTL, fprintf, fputc, tmpfile, tmpnam, fwide, fwrite, ferror + + + +#include +#include +#include + +#define BUFSIZE 5 + +int main(void) +{ + char array[BUFSIZE]; + for (int i = 0; i < BUFSIZE; ++i){ + array[i] = 'z'; + } + while (1) { + printf("Enter: "); + /* fread(array, BUFSIZE - 1, sizeof(char), stdin); */ + fgets(array, 10, stdin); + /* fscanf(stdin, " %11s", array); */ + printf("%s",array); + printf("\n"); + } + return 0; +} + + +/* int main(void) */ +/* { */ +/* char* buffer = (char*) malloc(298172 * sizeof(char)); */ +/* char* buffer2 = (char*) malloc(298172 * sizeof(char)); */ +/* */ +/* FILE *my_fptr = fopen("file_operations_test.txt", "r+"); */ +/* */ +/* if (my_fptr == NULL) { */ +/* fprintf(stderr, "ERROR: Failed to find and open file.\n"); */ +/* } else { */ +/* fread((void*) buffer, sizeof(char), 298170, my_fptr); */ +/* char* location = strstr(buffer, "18"); */ +/* memcpy(buffer2, buffer, location - buffer); */ +/* strcat(buffer2, "2"); */ +/* strcat(buffer2, buffer + (location - buffer + 2)); */ +/* fseek(my_fptr,SEEK_SET, 0); */ +/* fprintf(my_fptr,"%s", buffer2); */ +/* // printf("buffer is: \n%s\n\nlocation of substring \"1\" is: %ld\n",buffer, location - buffer); */ +/* } */ +/* */ +/* free(buffer); buffer = NULL; */ +/* fclose(my_fptr); */ +/* */ +/* return 0; */ +/* } */ -- cgit v1.2.1